Let's Write A PATH Traverser
When you open up your terminal, you can type in commands like python3 or
pip. Your shell uses the PATH variable to find a file named python3 which
it can execute. But how?
This article is meant to be an interactive tutorial on how to write an
important component of a shell: the PATH traverser.
You can read it in two ways:
- Glance over the code examples
- Follow along like a tutorial
Either is fine!
Getting Started
Skip to Part 1 if you don't want to follow along locally.
I have setup a small playground which you can run locally using podman or docker:
podman pull ghcr.io/possiblyashrub/path-traversal:latest
podman run --name path-traversal-tutorial -it ghcr.io/possiblyashrub/path-traversal:latest
# OR (if you have docker instead)
docker pull ghcr.io/possiblyashrub/path-traversal:latest
docker run --name path-traversal-tutorial -it ghcr.io/possiblyashrub/path-traversal:latest
You can find the source and CI pipeline which builds these images on this GiHub repository.
In this article, we will be writing the program in python. You are free to try
to follow along in any language. C, Go, Rust or Zig will work too. If you use a
different language, make sure that it has libc bindings. You will need to be
able to make syscalls. We won't use any fancy language features, so you might
be fine following along in python even if it's not your main language.
Part 1: What's PATH
We will start by parsing the path variable. If you've never seen your PATH
before, now is a great time to learn. The PATH is a list of directories,
separated by a colon ":", which instruct the path traversal algorithm where to
search for a program and in what order.
For example, in the container if we run echo $PATH you should get
something like: "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin".
This corresponds to the list, ['/usr/local/sbin', '/usr/local/bin', '/usr/sbin', '/usr/bin', '/sbin', '/bin']. So when I run a command like
whoami, we:
- Check
/usr/local/sbin/whoami-> No such file, continue - Check
/usr/local/bin/whoami-> No such file, continue - Check
/usr/sbin/whoami-> No such file, continue - Check
/usr/bin/whoami-> File exists, execute it
Don't believe me? We can use a tool called strace to see all the system calls
the shell makes:
osh-0.35# strace -e access osh -c 'whoami'
access("/etc/terminfo/x/xterm", R_OK) = 0
access("/usr/local/sbin/whoami", X_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/whoami", X_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/whoami", X_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/whoami", X_OK) = 0
root
+++ exited with 0 +++
strace -e access some commandrunssome commandbut logs all theaccesssystem calls made bysome command. This is the syscall a shell uses to check if a file is executable.
Part 2: The PATH Traversal Algorithm
From the description in part 1, let's try to implement a PATH traversal algorithm.
At it's core, the algorithm has the following shape:
def lookup_executable(name: str, path: str) -> str | None:
# PATH is a list of directories separated by colons ':'. For example "/usr/bin:/usr/local/bin"
directories = path.split(":")
# We check each directory in order from start to end
for directory in directories:
candidate = f"{directory}/{name}"
if is_exe(candidate):
return candidate
return None
But you should notice that we have to call out to a magic is_exe function.
How can we see if a file is executable? In UNIX, any file, from a shell script
to a ELF-binary (Linux version of a .exe), could be executable.
If you were keen, in part 1 you saw that there is a syscall called access
(often written access(2)) which can be used to check if a file exists AND is
executable.
But, let's dig into the details of what access(2) does. We can check its
"manpage" with man 2 access. Run this command locally. What should we set
as the mode in our PATH traversal?
access(2) System Calls Manual access(2)
...
DESCRIPTION
access() checks whether the calling process can access the file
pathname. If pathname is a symbolic link, it is dereferenced.
The "(2)" in
access(2)refers to the fact thataccessis in "section 2" and thus is a system call.The "(2)" is confusing, and I often forget it. You can try to run
man accessinstead ofman 2 accessand it should work. But beware that dropping the 2 send you to a man page in a section you did not intend. For example, on my desktop PCman stattakes me to thestatentry in section 1 with details on the stat command-line tool, not thestatsyscall. For that I'd need to runman 2 stat.
Let's edit our path traversal algorithm to use access(2). First, we need to
import os to get access to os.access(), the python binding to access(2).
# traverse.py
import os
You can call os.access as follows:
# Can the current user read and execute the file at $path?
os.access(path, os.X_OK)
The second argument to access(2) is mode. In our case, we want to pass
os.X_OK because we only need to check that we are able to execute the file at
the candidate path. You could also check for read permissions with os.R_OK.
However, you don't actually need to have permission to read a file in order to
execute it. (Try it!)
Now, we can implement the lookup_executable routine we sketched above, but
we will replace is_exe with os.access(candidate, os.X_OK):
# traverse.py
def lookup_executable(name: str, path: str) -> str | None:
directories = path.split(":")
for directory in directories:
candidate = f"{directory}/{name}"
if os.access(candidate, os.X_OK):
return candidate
return None
If we run our tests...
osh-0.35# ./test.ysh
Running tests
==== 1 path, 1 executable ====
$ python3 driver.py hello /test
/test/hello
PASS
...
==== Directories aren't executable ====
...
FAIL: Directories aren't executable: Expected /test/path2/hello but got /test/path1/hello
============
Pass: 4
Fail: 1
Almost there! There is one more gotcha.
Part 3: Directories are Executable
UNIX filesystem APIs are tricky. A lot of this has to do with historical
cruft. So here's a trivia question, what does access(2) return when called on
a directory (not a file) with mode = os.X_OK?
It returns true! And here's the crazy part, many directories are executable!
Why? The executable bit on a file has been repurposed for the ability to read files inside a directory (that also have the read bit set). The read bit is instead used to mark if a user can list all the files in that directory. This means it's possible to have a directory where you can read the files in it, but you cannot see what files are present.
Why would this be helpful? Take the following example: on my University's
undergrad lab machines, there was a file /home/cmput415/env.sh that setup
some environment variables when you source-ed it. The folder
/home/cmput415/ did not let students read (list its contents). Maybe it
contained all the answers to our assignments? I do not and cannot know. But,
that folder gave all students permissions which allowed us to access the
/home/cmput415/env.sh file anyways.
But you can't actually execute directories! So, how can we tell access(2) to
only return true if the file is a real executable? This is a trick
question; we can't!
Instead, we need to make another syscall, stat. Then, we need to check the
stat result to see if the path is a directory.
First, import stat, which contains a needed python helper.
# traverse.py
import stat
Then we need to add an additional check after os.access guarding against the
case that the file is a directory. stat(2) (see man 2 stat) returns many
properties. All we care about is if the directory is a path. For that, we use
stat.S_ISREG(stat_result.st_mode) which is true if the file is a "regular
file", ie. not a directory.
# traverse.py
if os.access(candidate, os.X_OK):
stat_result = os.stat(candidate)
if stat.S_ISREG(stat_result.st_mode):
return candidate
Now, when we run test:
osh-0.35# ./test.ysh
Running tests
...
============
Pass: 5
Fail: 0
Whoo hoo! You've now written a core component of a shell :)
Part 4: You cannot simply stat
One of the drawbacks of our PATH traversal routine is that it requires 2
syscalls. Since a shell tends to make lots of syscalls, any chance to reduce
the number of syscalls needed is an easy way to optimize. Our solution
requiring 2 syscalls per PATH entry is not detrimental, but I want to
argue why this is necessary.
stat returns the permission bits and the ids of the file's owning user and
group. Combined with our knowledge of the current user's id and group ids we
should be able to determine if the current user can execute the file. But there
are some exceptions!
Here are 2 ways I know of to prevent execution even though the permission bits might imply otherwise:
- Place the executable in a
noexecfilesystem - Use access control lists to restrict the current user from executing the file
stat(2) is not powerful enough to check all these cases, but access(2) is.
Meanwhile, access(2) is not powerful enough to check if the given path is a
directory. So we are at an impasse; both must be used.
Conclusion
In this article we:
- Learned what
PATHtraversal is - Built a
PATHtraversal algorithm - Learned about the
access(2)andstat(2)syscalls
You could continue down the rabbit hole in a few ways. First, try testing the
path traversal algorithm using noexec filesystems or access control lists.
Can you convince yourself why we need to stat(2) and access(2)?
Another direction is on the subject of performance. Shells like OSH and Bash
tend to cache PATH traversal using a hash-table. You could try adding a cache
to your PATH traversal algorithm.
Finally, if this type of work interests you. Try contributing to oils for
unix!
We are working on two new shells, OSH and YSH. OSH is the
most bash compatible shell (other than bash itself) and YSH is an upgrade path
from bash to a more predictable language. (Curious about the language? The
test.ysh script in this tutorial was written in YSH!)