CS202: Lab 5: File system
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS202: Lab 5: File system
Introduction
In this lab, you will implement (pieces of) a simple disk-based file system. There is not a lot of code to write; instead, a lot of the work of the lab is understanding the system that youʼve been given.
By the end of the lab, you will be able to run your lab 2 ls against this labʼs file system.
Getting Started
Youʼ ll be working in the Docker container as usual. We assume that you have set up the upstream as described in the lab setup. Then run the following on your local machine from outside of Docker:
$ cd ~/cs202 $ git fetch upstream $ git merge upstream/main |
This labʼs files are located in the lab5 subdirectory.
If you have any “conflicts” from lab 4, resolve them before continuing. Run git push to save your work back to your personal repository.
In this lab, we have to rebuild the Docker image. As part of the steps above, you should have incorporated a modification to the Dockerfile that specifies the inclusion of libraries related to FUSE.
To rebuild the image, do the following (we assume you are in your local cs202 directory). This may take 5-10 minutes:
$ cd docker $ ./cs202-build-docker |
If your host is Windows, make sure to rebuild in a bash shell (either by typing bash in cmd or starting bash.exe from C:\Program Files\Git\bin).
Check to make sure FUSE is installed. Before continuing, make sure that you rebuilt the Docker image appropriately: |
|
$ cd ~/cs202 # on your host machine $ ./cs202-run-docker $ cs202-user@172b6e333e91:~/cs202-labs$ dpkg -s fuse |
|
You should see output like this:
|
|
Package: fuse Status: install ok installed ... |
|
If you see output like this: |
|
dpkg-query: package 'fuse' is not installed and no information is available |
|
|
Then make sure you have run the ./cs202-build-docker command against the latest Dockerfile (from outside the container: $ grep fuse docker/Dockerfile* should produce output lines). If you have the latest Dockerfile and you have rebuilt the image, but you are not seeing fuse installed from within the Docker image, then please ask the course staff for help.
The rest of these instructions presume that you are in the Docker environment. We omit the cs202- user@172b6e333e91:~/cs202-labs part of the prompt.
FUSE
The file system that we will build is implemented as a user-level process. This file system's storage will be a file (in the example given below, we call it testfs.img) that lives in the normal file system of your Docker container. Much of your code will treat this file as if it were a disk.
This entire arrangement (file system implemented in user space with arbitrary choice of storage) is due to software called FUSE (Filesystem in Userspace). In order to really understand what FUSE is doing, we need to take a brief detour to describe VFS. Linux (like Unix) has a layer of kernel software called VFS; conceptually, every file system sits below this layer, and exports a uniform interface to VFS. (You can think of any potential file system as being a "driver" for VFS; VFS asks the software below it to do things like "read", "write", etc.; that software fulfills these requests by interacting with a disk driver, and interpreting the contents of disk blocks.) The purpose of this architecture is to make it relatively easy to plug a new file system into the kernel: the file system writer simply implements the interface that VFS is expecting from it, and the rest of the OS uses the interface that VFS exports to it. In this way, we obtain the usual benefits of pluggability and modularity.
FUSE is just another "VFS driver", but it comes with a twist. Instead of FUSE implementing a disk-based file system (the usual picture), it responds to VFS's requests by asking a user-level process (which is called a "FUSE driver") to respond to it. So the FUSE kernel module is an adapter that speaks "fuse" to a user-level process (and you will be writing your code in this user-level process) and "VFS" to the rest of the kernel.
Meanwhile, a FUSE driver can use whatever implementation it wants. It could store its data in memory, across the network, on Jupiter, whatever. In the setup in this lab, the FUSE driver will interact with a traditional Linux file (as noted above), and pretend that this file is a sector-addressable disk.
The FUSE driver registers a set of callbacks with the FUSE system (via libfuse and ultimately the FUSE kernel module); these callbacks are things like read, write, etc. A FUSE driver is associated with a particular directory, or mount point. The concept of mounting was explained in OSTEP 39 (see 39.17). Any I/O operations requested on files and directories under this mount point are dispatched by the kernel (via VFS, the FUSE kernel module, and libfuse) to the callbacks registered by the FUSE driver.
To recap all of the above: the file system user interacts with the file system roughly in this fashion:
1. When the file system user, Process A, makes a request to the system, such as listing all files in a directory via ls, the ls process issues one or more system calls (stat(), read(), etc.).
2. The kernel hands the system call to VFS.
3. VFS finds that the system call is referencing a file or directory that is managed by FUSE.
4. VFS then dispatches the request to FUSE, which dispatches it to the corresponding FUSE driver (which is where you will write your code).
5. The FUSE driver handles the request by interacting with the "disk", which is implemented as an ordinary file. The FUSE driver then responds, and the responses go back through the chain.
Here's an example from the staff solution to show what this looks like, where testfs.img is a disk image with only the root directory and the file hello on its file system:
# Create a directory to serve as a mount point. # Note: the / is important, because the directory # should live only in docker's filesystem $ mkdir /lab5mnt
# create simlink to local directory mnt $ ln -s /lab5mnt mnt
# see what file system mnt is associated with |
|
$ df mnt Filesystem overlay |
1K-blocks Used Available Use% Mounted on 61202244 8831452 49229468 16% / |
# notice, 'mnt' is empty $ ls mnt
# mount testfs.img at mnt: $ build/fsdriver testfs.img mnt
# below, note that mnt's file system is now different $ df mnt Filesystem 1K-blocks Used Available Use% Mounted on CS202fs#testfs.img 8192 24 8168 1% /lab5mnt
# and there's the hello file... $ ls mnt hello
# ...which we can read with any program $ cat mnt/hello Hello, world!
# now unmount mnt $ fusermount -u mnt
# and its associated file system is back to normal $ df mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 7092728 4536616 2172780 68% /
# and hello is gone, but still lives in testfs.img $ ls mnt |
Note that in the above example, after we run fsdriver, the kernel is actually dispatching the all the open(), read(), readdir(), etc. calls that ls and cat make to our FUSE driver. The FUSE driver takes care of searching for a file when open() is called, reading file data when read() is called, and so on. When fusermount is run, our file system is unmounted from mnt, and then all I/O operations under mnt return to being serviced normally by the kernel.
Our File System
Below, we give an overview of the features that our file system will support; along the way, we review some of the file system concepts that we have studied in class and the reading.
On-Disk File System Structure
Most UNIX file systems divide available disk space into two main types of regions: inode regions and data regions. UNIX file systems assign one inode to each file in the file system; a file's inode holds a file's meta- data (pointers to data blocks, etc.). The data regions are divided into much larger (ty
2023-04-13