CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Unix v6 Filesystem design, part 1 (files)
Unix v6 Filesystem design, part 2 (large files + directories)
Interacting with the filesystem from our programs
assign2: implement portions of a filesystem!
void readSector(size_t sectorNumber, void *data);
void writeSector(size_t sectorNumber, const void *data);
The inode table at the start of the disk stores one inode per file.
An inode is a structure containing information about a file such as its size and which blocks elsewhere on disk store its contents.
Problem: an inode stores only 8 block numbers. Therefore, the largest a file can be is 512 * 8 = 4096 bytes (~4KB). That definitely isn't realistic!
Solution: let's store them in a block, and then store that block's number in the inode! This approach is called indirect addressing.
inode
filesize: 5000
blocknums: 450
...
block 450
451,42,15,67,
125,665,467,
231,162,136
block 450
451,42,15,67,
125,665,467,
231,162,136
block 451
The quick brown fox jumped over the...
The Unix V6 filesystem uses singly-indirect addressing (blocks that store payload block numbers) just for large files.
inode
filesize: 5000
blocknums: 450
...
block 450
451,42,15,67,
125,665,467,
231,162,136
block 450
451,42,15,67,
125,665,467,
231,162,136
block 451
The quick brown fox jumped over the...
Let's assume for now that an inode for a large file uses all 8 block numbers for singly-indirect addressing. What is the largest file size this supports? Each block number is 2 bytes big.
inode
filesize: 5000
blocknums: 450
...
block 450
451,42,15,67,
125,665,467,
231,162,136
block 450
451,42,15,67,
125,665,467,
231,162,136
block 451
The quick brown fox jumped over the...
8 block numbers in an inode x
256 block numbers per singly-indirect block x
512 bytes per block
= ~1MB
After our discussion so far, you may still have key questions like:
Problem: even with singly-indirect addressing, the largest a file can be is 8 * 256 * 512 = 1,048,576 bytes (~1MB). That still isn't realistic!
Solution: let's use doubly-indirect addressing; store a block number for a block that contains singly-indirect block numbers.
Solution: let's use doubly-indirect addressing; store a block number for a block that contains singly-indirect block numbers.
inode
filesize: 5000
blocknums: 450
...
block 450
451,42,15,67,
125,665,467,
231,162,136
block 450
451,42,15,67,
125,665,467,
231,162,136
block 451
55,34,12,44,...
block 55
The quick brown fox jumped over the...
Allows even larger files, but data takes even more steps to access. How do we employ this idea?
The Unix V6 filesystem uses indirect addressing (blocks that store payload block numbers) just for large files.
In other words; a file can be represented using at most 256 + 7 = 263 singly-indirect blocks. The first seven are stored in the inode. The remaining 256 are stored in a block whose block number is stored in the inode.
An inode for a large file stores 7 singly-indirect block numbers and 1 doubly-indirect block number. What is the largest file size this supports? Each block number is 2 bytes big.
263 singly-indirect block numbers total x
256 block numbers per singly-indirect block x
512 bytes per block
= ~34MB
An inode for a large file stores 7 singly-indirect block numbers and 1 doubly-indirect block number. What is the largest file size this supports? Each block number is 2 bytes big.
OR:
(7 * 256 * 512) + (256 * 256 * 512) ~ 34MB
(singly indirect) + ( doubly indirect )
Better! still not sufficient for today's standards, but perhaps in 1975. Moreover, since block numbers are 2 bytes, we can number at most 2^16 - 1 = 65,535 blocks, meaning the entire filesystem can be at most 65,535 * 512 ~ 32MB.
Assume we have a large file with inumber 16. How do we find the block containing the start of its payload data? How about the remainder of its payload data?
/classes/cs110/index.html
A directory is a file container. It needs to store what files/folders are contained within it. It also has associated metadata.
We can layer support for directories right on top of our implementation for files!
Design decision: the Unix V6 filesystem makes directory payloads contain a 16 byte entry for each file/folder that is in that directory.
Given the inode for a directory, how could we find the inumber for a file it contains called "b.txt"?
/classes/cs110/index.html
Start at the root directory
/classes/cs110/index.html
In the root directory, find the entry named "classes".
/classes/cs110/index.html
In the "classes" directory, find the entry named "cs110".
/classes/cs110/index.html
In the "cs110" directory, find the entry named "index.html". Then read its contents.
The root directory ("/") is set to have inumber 1. That way we always know where to go to start traversing. (0 is reserved to mean "NULL" or "no inode").
/classes/cs110/index.html
Go to inode with inumber 1 (root directory).
/classes/cs110/index.html
In its payload data, look for the entry "classes" and get its inumber. Go to that inode.
/classes/cs110/index.html
In its payload data, look for the entry "cs110" and get its inumber. Go to that inode.
/classes/cs110/index.html
In its payload data, look for the entry "index.html" and get its inumber. Go to that inode and read in its payload data.
We want to find a file called "/local/files/fairytale.txt", which is a small file.
We want to find a file called "/medfile", which is a large file.
We want to find a file called "/largefile", which is a very large file.
ln
command.-s
flag with ln
.cgregg@myth66:/tmp$ echo "This is some text in a file" > file1
cgregg@myth66:/tmp$ ls -l file*
-rw------- 1 cgregg operator 28 Sep 27 09:57 file1
cgregg@myth66:/tmp$ ln -s file1 file2
cgregg@myth66:/tmp$ ls -l file*
-rw------- 1 cgregg operator 28 Sep 27 09:57 file1
lrwxrwxrwx 1 cgregg operator 5 Sep 27 09:58 file2 -> file1
cgregg@myth66:/tmp$ echo "Here is some more text." >> file2
cgregg@myth66:/tmp$ cat file1
This is some text in a file
Here is some more text.
cgregg@myth66:/tmp$ rm file1
rm: remove regular file 'file1'? y
cgregg@myth66:/tmp$ ls -l file*
lrwxrwxrwx 1 cgregg operator 5 Sep 27 09:58 file2 -> file1
cgregg@myth66:/tmp$ cat file2
cat: file2: No such file or directory
cgregg@myth66:/tmp$
ls
gives us the path to the original fileWe built layers on top of the low-level readSector and writeSector to implement a higher-level filesystem:
Modularity: subdivision of a larger system into a collection of smaller subsystems, which themselves may be further subdivided into even smaller sub-subsystems.
Layering: the organization of several modules that interact in some hierarchical manner, where each layer typically only opens its interface to the module above it.
These ideas aren't specific to filesystems! Eg. networking systems also rely on layering.
These ideas aren't specific to filesystems! Eg. networking systems also rely on layering.
Our filesystem resolves human-friendly names (like "/usr/bin/program") to machine-friendly names (inumbers). This is called name resolution. Names let us refer to system resources.
This idea isn't specific to filesystems! E.g. when we visit a website URL like google.com, DNS ("domain name service") converts google.com (human-friendly) to an IP address like 74.125.239.51 (machine-friendly).
Next time: how do we interface with the filesystem in our programs?