Filesystem Fundamentals

Live presentation link: https://slides.com/chipbuster/deck-e91d0b/live

Online Lecture Logistics

Instapoll

If you intend to claim synchronous lecture credit, please have Instapoll open in a browser tab so that we don't have to wait for the service to start up.

These slides are written using the reveal.js editor at slid.es. You can follow along with a live presentation at https://slides.com/chipbuster/deck-e91d0b/live

(At this point, Kevin should make sure to post the link in the Zoom chat).

This link may be easier to use/more fluid if you have a low-bandwidth connection, since it needs to transmit less data. The downside is that it might be very slightly out of sync with the audio.

You will be able to follow along with my mouse movements at this link.

I will also be sharing a standard video feed in Zoom.

Online Lecture Logistics

I will be attempting to monitor chat while giving this lecture (I have no idea how successfully). If you have a question, you can ask it there and I'll try to respond.

You can also use the raise-hand and other reactions to communicate with me, or just interrupt me if you can't use those.

If you raise your hand and I don't respond within 15 seconds, I'm probably forgetting to check my other windows. Please unmute yourself and interrupt me at that point.

The Story So Far

(Almost) everything we've discussed so far occurs in main memory (RAM):

The PCB and TCB are stored in main memory.
Virtual address spaces are mapped onto physical address spaces.
The heap/stack/SDS are part of the process's virtual address space

RAM is nice! It's relatively speedy, and you can store a lot of stuff in there.

....but it's got a major drawback.

RAM is not persistent! If the power gets cut off, all data in main memory is lost.

The Story So Far

To solve this problem, we introduce stable storage (disks). These devices retain data even after the power to them has been shutoff (i.e. they are persistent).

Now we can turn off the computer without losing important data!

Yay!

But now we have to deal with communication between the CPU and disk, which is very different from communication with main memory!

Disk versus Memory

	Memory	Disk
How to Access
Time for Request
While Waiting

LOAD/STORE instructions

Special I/O Access Instructions

A VERY LONG TIME

Program Blocks (can be scheduled)

Program stalls (still on CPU)

A long time

So where are we now?

We need to come up with some scheme to organize and interact with this storage.

(Almost) everything we've discussed so far occurs in main memory (RAM):

The PCB and TCB are stored in main memory.
Virtual Address spaces are mapped onto physical address spaces.
The heap is part of the process's virtual address space

Now we've added persistent storage which is large, very slow, and order-dependent.

This is the filesystem.

Today's Questions

What makes a good filesystem?
How should applications interact with the filesystem?
How do we organize a file on-disk?
How can we present files in a way that makes sense to users?

Goals in Filesystem Design

What makes a filesystem good?

Speed

For a relatively fast hard drive, the average data access has a latency (request made to data available) of 12.0 ms.

Action	Latency	1ns = 1s	Conversation Action

Values taken from "Numbers Every Programmer Should Know", 2020 edition

1m 40s

138d 21hr+

1ns

4ns

100ns

12ms

L1 Cache Hit

L2 Cache Hit

Cache Miss*

* a.k.a. Main Memory Reference

Disk Seek

Talking Normally

Long Pause

Go upstairs to ask Dr. Norman

Walk to Washington D.C. and back

DISK ACCESS

Other Destinations:

Canada
Guatemala
San Francisco

Speed

Unfortunately, we cannot always avoid paying the disk access cost (if we could, we wouldn't need the disk!)

DISK ACCESS

But we can use lots of common systems design tricks to make sure this doesn't affect us too badly!

Avoid unneeded access
Caches
Interleave I/O with computation

Resilience

Read account A from disk
Read account B from disk

Add $100 to account A
Subtract $100 from account B

Write new account B to disk
Write new account A to disk

Stable storage: we can be interrupted at any point in time, and we have to be able to recover afterwards!

Result: We just destroyed $100!

Resilience

Stable storage: we can be interrupted at any point in time, and we have to be able to recover afterwards!

Our filesystem must provide tools to recover from a crash!

Otherwise we could be irrecoverably stuck in an invalid state.

Read account A from disk
Read account B from disk

Add $100 to account A
Subtract $100 from account B

Write new account B to disk
Write new account A to disk

Usability

The filesystem consists of raw block numbers. User programs are responsible for keeping track of which blocks they use, and for making sure they don't overwrite other program's blocks.

If the filesystem isn't easy to use, users will try to implement their own filesystem (usually poorly)

For example, the following filesystem is easy to implement, and is as fast and as consistent as the user chooses to make it:

Is this fast? Yes.

Is this resilient? Yes.

But who would use it?

Usability

A file is named by the a hash of its contents. All files are in the root directory. Filenames cannot be changed.

Who in their right mind is going to use this??

Filesystem: Is it good?

When analyzing a potential filesystem design, we'll use these three criteria to determine how much we like it:

Speed:

Resilience:

Usability:

How fast is this design? How many disk accesses do we need in order to perform common operations?

Will this design cause application programmers to tear their hair out? Does it require some deep knowledge of the system or is that abstracted away?

Can this design become corrupted if the computer fails (e.g. through sudden power loss), or if small pieces of data are damaged? Can it be recovered? How fast is the recovery procedure?

What's the Big Picture of the filesystem?

Big Picture

Data is stored on the disk as a bunch of blocks. A block is the smallest unit the filesystem can read/write. Blocks are identified by their order on the disk (e.g. #3124 is the block after #3123)

Big Picture

Data is stored on the disk as a bunch of blocks. A block is the smallest unit the filesystem can read/write. Blocks are identified by their order on the disk (e.g. #3124 is the block after #3123)

Big Picture

We need some way to describe how the data is organized. These are the metadata blocks.

Once you have the file metadata, you know everything you need to access the file.

Big Picture

We need some way to describe how the data is organized. These are the metadata blocks.

Once you have the file metadata, you know everything you need to access the file.

Big Picture

There are some in-memory structures that the OS uses to track what's happening with the filesystem (e.g. which files are open, synchronization tools).

Big Picture

There are some in-memory structures that the OS uses to track what's happening with the filesystem (e.g. which files are open, synchronization tools).

Big Picture

There are also some per-process pieces of information that need to be tracked in-memory.

Where does this information live? What's an example that you've worked with before?

Big Picture

There are also some per-process pieces of information that need to be tracked in-memory.

Where does this information live? What's an example that you've worked with before?

Big Picture

Finally, we need to worry about how a user program accesses all of this!

Big Picture

Finally, we need to worry about how a user program accesses all of this!

Big Picture

So...how does a user program access system services?

Big Picture

So...how does a user program access system services?

Big Picture

Syscalls! create(), read(), write(), etc.

Since the application thinks in terms of filenames and open, the filesystem is responsible for translating user requests into something the lower levels can use, e.g. "the 700th byte of ~/.bashrc" might translate to "block #52730"

Common Filesystem Operations

YOU ARE HERE

create()

Creates a new file with some metatdata and a name.

On create(), the OS will:

Allocate disk space (check quotas, permissions, etc).
Create metadata for the file in the file header, such as name, location, and file attributes
Add an entry to the directory containing the file

create(const char* filename);

link()

Creates a hard link--a user-friendly name for some underlying file.

On link(), the OS will:

Add the entry to the directory with a new name
Increment counter in file metadata that tracks how many links the file has

link(const char* name, struct inode* inode);

This new name points to the same underlying file!

unlink()

Removes an existing hard link.

To delete() a file, the OS needs to:

Find directory containing the file
Remove the file entry from the directory
Clear file header
Free disk blocks used by file and file headers

unlink(const char* name);

The OS decrements the number of links in the file metadata. If the link count is zero after unlink, the OS can delete the file and all its resources.

So far, all the system calls we've seen only edit system-wide data (files, names, etc.)

The system calls we'll see next also need to manage some per-process data.

open()

Creates in-memory data structures used to manage open files. Returns integer to the caller.

open(const char* name, enum mode);

On open(), the OS needs to:

Check if the file is already opened by another process. If it is not:
- Find the file
- Copy information into the system-wide open file table
Check protection of file against requested mode. If not allowed, abort syscall.
Increment the open count (number of processes that have this file open).
Create an entry in the process's file table pointing to the entry in the system-wide file table.
Initialize the current file pointer to the start of the file.
Return the index into the process's file table.
- Index used for read, write, seek, close operations. You know this as a _______?

struct open_file {
   struct file_header* metadata;
   file_offset pos;
   int file_mode; //e.g. "r" or "rw"
};

close()

Close the file.

open(const char* name, enum mode);

On close(), the OS needs to:

Remove the entry for the file in the process's file table
Decrement the open count in the system-wide file table
If the open count is zero, remove the entry from the system-wide file table

read()

Read designated bytes from the file

read(file_id, file_pos, num_bytes, bufAddress)

On read(), the OS needs to:

Determine which blocks correspond to the requested reads (starting at file_pos and ending at file_pos + num_bytes)
Dispatch disk reads to the appropriate sectors
Place the read results into the buffer pointed at by bufAddress

Filesystem can also provide a read call that uses the file_pos in the open file structure.

Other Common Syscalls

write() is like read(), but copies the buffer to the appropriate disk sectors
seek() updates the current file pointer
fsync() does not return (blocks) until all data is written to persistent storage. This will be important for consistency.

Does write() require us to access the disk?

How about seek()?

A. Yes, Yes

B. Yes, No

C. No, Yes

D. No, No

Common Filesystem Operations

YOU ARE HERE

File Design and Layout

YOU ARE HERE

File Design And Layout

Files Are Stored as Data and Metadata

Metadata: the file header contains information that the operating system cares about: where the file is on the disk, and attributes of the file.

Metadata for all files is stored at a fixed location (something known by the OS) so that they can be accessed easily.

Data is the stuff the user actually cares about. It consists of sectors of data placed on disk.

Examples: file owner, file size, file permissions, creation time, last modified time, location of data blocks.

Files Are Stored as Data and Metadata

For now, we'll focus on how the file data is laid out on disk.

Assume we already know where the metadata is.

Evaluating File Layouts

Speed and Usability

Most files on a computer are small!

Figures to to the right are from my desktop
25% of files are smaller than 800 bytes
50% of files are smaller than 2.5 kilobytes

So we should have good support for lots of small files!

The user probably cares about accessing large files (they might be saved videos, or databases), so large file access shouldn't be too slow!

Most disk space is used by large files.

Evaluating File Layouts

Speed and Usability

Fast access to small files
Reasonably efficient access to large files
Limit fragmentation (wasted space!)
- We care about both internal and external fragmentation
Allow files to grow past their initial size
Allow random and sequential access (at decent speeds!)

We have to allocate some of these data blocks to hold a file.

How do we choose?

Contiguous Allocation

Files are allocated as a contiguous (adjacent) set of blocks.
The only location information needed in the file header is the first block and the size.
How do we keep track of free blocks and allocate them to files?

Contiguous Allocation: Access

How many disk reads do we need to access a particular block?

CPU

We start knowing the block # of the appropriate file header

We have enough space in memory to store two blocks worth of data

Everything else has to be requested from disk.

The request must be in the form of a block#. E.g. we can request "read block 27", but we cannot request "read next block" or "read next file"

Contiguous Allocation: Access

How many disk reads do we need to access a particular block?

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Contiguous Allocation: Access

How many disk reads do we need to access a particular block?

CPU

H

How many disk reads to access the first data block?

(We always start with the block# of the file header)

Read header block

Contiguous Allocation: Access

How many disk reads do we need to access a particular block?

CPU

H

How many disk reads to access the first data block?

(We always start with the block# of the file header)

Read header block
Find address of first data block

Contiguous Allocation: Access

How many disk reads do we need to access a particular block?

CPU

H

How many disk reads to access the first data block?

(We always start with the block# of the file header)

Read header block
Find address of first data block
Read first data block

Contiguous Allocation: Access

How many disk reads do we need to access a particular block?

Read header block
Find address of first data block
Add three to this index
Read third data block

What if I want to do random access?

Let's say I want to read only the third block.

Contiguous Allocation: Assessment

How can we get around these problems?

This method is very simple (this is good!)

How fast is sequential access?

How fast is random access?

What if we want to grow a file?

How bad is fragmentation?

Linked Allocation

The file is stored as a linked list of blocks.
In the file header, keep a pointer to the first and last block allocated.
In each block, keep a pointer to the next block.

Linked Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header block

Linked Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header block

H

Linked Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header block
Find address of first block

H

Linked Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header block
Find address of first block
Read first block

H

1

Linked Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header block
Find address of first block
Read first block

H

1

What if I want to read the second block from here? (sequential access)

Linked Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header block
Find address of first block
Read first block
Find address of second block using data in first block
Read second block

2

1

Linked Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Linked Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header block

H

Linked Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header block
Find address of first block (oh no)
Read first block

H

1

Linked Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header block
Find address of first block (oh no)
Read first block
Find address of second block in first block (oh no oh no)

H

1

Linked Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header block
Find address of first block (oh no)
Read first block
Find address of second block in first block (oh no oh no)
Read second block
Find address of third block in second block (auuugh)

H

2

Linked Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header block
Find address of first block (oh no)
Read first block
Find address of second block in first block (oh no oh no)
Read second block
Find address of third block in second block (auuugh)
Read third block

H

3

Linked Allocation: Assessment

How fast is sequential access? Is it always good?

How bad is fragmentation?

What if we want to grow the file?

How fast is random access?

What happens if a disk block becomes corrupted? (what sort of problem is this?)

Example: FAT File System

File Allocation Table (FAT)

Started with MS-DOS (Microsoft, late 70s)

Descendants include FATX and exFAT

FAT File System

Advantages

Disadvantages

Simple!

Poor random access
- Requires sequential traversal of linked blocks
Limited Access Control
- No file owner or group ID
- Any user can read/write any file
No support for hard links
Volume and file size are limited
- Example: FAT-32 is limited to 32 bits
- Assuming 4KB blocks (fairly typical), filesystem cannot be larger than 2TB
- Individual files cannot be larger than 4GB
No support for transactional updates (more on this next time)

Direct Allocation

File header points to each data block directly (that's it!)

Direct Allocation: Assessment

How fast is sequential access? How about random access?

How bad is fragmentation?

What if we want to grow the file?

Does this support small files? How about large files?

What if other file metadata takes up most of the space in the header?

What do we do about large files?

Indexed Allocation

OS keeps a special block of disk pointers called the index block.
The array is initially empty.
When a block needs to be allocated to the file, the OS allocates that block, then fills in the appropriate parts of the index block.

Indexed Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Indexed Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header

H

Indexed Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header
Find address of index block
Read index block

H

Indexed Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header
Find address of index block
Read index block
Find address of first block from index block
Read first block

H

1

What if I want to read the second block from here? (sequential access)

Indexed Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header
Find address of index block
Read index block
Find address of first block from index block
Read first block, replacing the header

1

What if I want to read the second block from here? (sequential access)

Indexed Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header
Find address of index block
Read index block
Find address of first block from index block
Read first block, replacing the header
Find address of second block in index block
Read second block

2

Indexed Allocation: Access

CPU

How many disk reads to access the first block?

(We always start with the block# of the file header)

Read header
Find address of index block
Read index block
Find address of first block from index block
Read first block, replacing the header
Find address of second block in index block
Read second block

2

Indexed Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Indexed Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header

H

Indexed Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header
Find address of index block
Read index block

H

Indexed Allocation: Access

CPU

How many disk reads to access the third block?

(We always start with the block# of the file header)

Random Access

Read header
Find address of index block
Read index block
Find address of third block in IB
Read third block

3

Indexed Allocation: Assessment

How fast is sequential access? How about random access?

How bad is fragmentation?

What if we want to grow the file?

Does this support small files? How about large files?

Indexed Allocation: Extensions

Linked Index Blocks

Multilevel Index Blocks

For very large files,do multilevel index blocks or linked index blocks provide better random access?

Multilevel Indexed Files

Remember: small files are common, large files need to be supported (speed).

Direct allocation works well with small files
Indexed allocation allows for large files
What if we mix & match?
- Header contains space for some direct pointers and pointers to index blocks
- Small files referenced directly, large files use multilevel index blocks

Example: Fast File System

Developed for the Berkeley Software Distribution UNIX (BSD) in the early 80s
Was possible to use with OSX (MacOS) systems until 2012 (!!)
Each inode (file header in UNIX terminology) contains 13 pointers
First 10 pointers point directly to data blocks
11th pointer points to an index block of 1024 pointers (one indirection)
12th pointer points to a block of pointers to index blocks blocks (two indirections)
13th pointer points to a block of pointers to blocks of pointers to index blocks (three indirections)
Advantages: simple to implement, supports incremental file growth, supports small files
Disadvantages: random access to large files is inefficient, many seeks.

Example: Fast File System

Multileveled Indexed Files: Key Ideas

Pointer Type	# Ref Blocks	Total Size At Level
Direct	10 * 1 = 10	40 KB
Single Indirect	1 * 512 = 512	2 MB
Double Indirect	512 * 512 = 2^18	1 GB
Triple Indirect	512512512 = 2^24	512 GB
	Total	~513 GB

FFS: assuming 4KB blocks, 8-byte pointers = 512 pointers/block

Tree-like structure
- Efficient for finding blocks
Efficient in sequential reads
- Once indirect block is read, can read 100s of data blocks
Fixed Structure
- Relatively straightforward to implement
Asymmetric
- Efficiently support both large and small files
- Most files can be stored using only direct blocks, but allows very large files using indirect blocks.

YOU ARE HERE

User-level Organization

Or: All About Directories

The Story So Far

We know how to get the data associated with a file if we know where its metadata (file header) is. We also know how to identify file headers (by their index in the file header array).

Great! Are we done?

To edit your shell configuration, open file 229601, unless you have Microsoft Word installed, in which case you need to edit file 92135113

(╯°□°)╯︵ ┻━┻

Which of our three goals are we not meeting here?

We don't have the ability to name files right now. Which of our three main goals is this a failure in?

The Story So Far

To edit your shell configuration, open file 229601, unless you have Microsoft Word installed, in which case you need to edit file 92135113

We still don't have human-readable names or organization!

This is a big usability issue.

The Simpleton's Guide to File Organization

Use one name space for the entire disk.

Use a special area of the disk to hold the directory info.
Directory info consists of <filename, inode #> pairs.
If one user uses a name, nobody else can.

File Name	inode number
.user1_bashrc	27
.user2_bashrc	30
firefox	3392
.bob_bashrc	7

Is this a good scheme?

The Simpleton's Guide to File Organization: Part 2

Simple improvement: make a separate "special area" for each user.
Now files only have to be uniquely named per-user.

(Yeah, it's not that great of an improvement)

File Name	inode number
.bashrc	30
Documents	173

File Name	inode number
.bashrc	391
failed_projects	8930
zsh	3392

user1's Directory

user2's Directory

Introducing: Directories

Directories are files that contain mappings from file names to inode numbers.

The "special reserved area" we saw previously was an example of a directory, albeit a very primitive one.
The inode number is called the inumber.
Only the OS can modify directories
- Ensures that mappings cannot be broken by a malicious user (referee role)
- User-level code can read directories
Directories create a name space for files (file names within the same directory have to be unique, but can have same filename in different directories)

Introducing: Directories

Note: the i# in a directory entry may refer to another directory!

The OS keeps a special bit in the inode to determine if the file is a directory or a normal file.

There is a special root directory (usually inumber 0, 1, or 2).

i#	Filename
3226	.bashrc
251	Documents
7193	pintos
2086	todo.txt
1793	Pictures

Example directory with 16B entries

14B

So...how do you find the data of a file?

To find the data blocks of a file, we need to know where its inode (file header) is.

To find an inode (file header), we need to know its inumber.

To find a file's inumber, read the directory that contains the file.

The directory is just a file, so we need to find its data blocks.

There's an infinite loop in here...or is there?

To find a file's inumber, read the directory that contains the file.

We can break the loop here by agreeing on a fixed inumber for a special directory.

It should be possible to reach every other file in the filesystem from this directory.

This is the root directory. On UNIX, it is called "/"

On most UNIX systems, the root directory is inumber 2

Putting it all together

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

How does the filesystem service this syscall?

struct open_file {
   struct file_header* metadata;
   file_offset pos;
   int file_mode; //e.g. "r" or "rw"
};

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

CPU

In previous examples, we started knowing the inumber of the file to read. This time, we don't have that.

But we do have....what?

CPU

Read inode 2 (the root inode)

i2

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory

i2

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory

i2

2713	tmp
2011	bin
3301	usr
99	etc
11	home
426	var

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

Contents of Block #1214

B 1214

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11

i2

2713	tmp
2011	bin
3301	usr
99	etc
11	home
426	var

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 1214

Contents of Block #1214

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")

i11

2713	tmp
2011	bin
3301	usr
99	etc
11	home
426	var

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 1214

Contents of Block #1214

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory

i11

2713	tmp
2011	bin
3301	usr
99	etc
11	home
426	var

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 1214

Contents of Block #1214

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory
Read the data in the "/home" directory

i11

6	user1
394	user2
2201	admin

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 2772

Contents of Block #2772

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory
Read the data in the "/home" directory
See that our next path target is "user1", which has i# = 6

i11

6	user1
394	user2
2201	admin

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 2772

Contents of Block #2772

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory
Read the data in the "/home" directory
See that our next path target is "user1", which has i# = 6
Read inode 6 (the inode for "/home/user1")

i6

6	user1
394	user2
2201	admin

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 2772

Contents of Block #2772

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory
Read the data in the "/home" directory
See that our next path target is "user1", which has i# = 6
Read inode 6 (the inode for "/home/user1")
Use the inode to locate the data for "/home/user1"

i6

6	user1
394	user2
2201	admin

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 2772

Contents of Block #2772

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory
Read the data in the "/home" directory
See that our next path target is "user1", which has i# = 6
Read inode 6 (the inode for "/home/user1")
Use the inode to locate the data for "/home/user1"
Read the data for "/home/user1"

i6

273	Documents
94	.ssh
2201	.bash_profile
4	.bashrc
61	.vimrc

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 537

Contents of Block #537

CPU

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory
Read the data in the "/home" directory
See that our next path target is "user1", which has i# = 6
Read inode 6 (the inode for "/home/user1")
Use the inode to locate the data for "/home/user1"
Read the data for "/home/user1"
We know what inode the file is at now!

i6

273	Documents
94	.ssh
2201	.bash_profile
23	.bashrc
61	.vimrc

int config_fd = open("/home/user1/.bashrc", O_RDONLY);

B 537

Record that the requested file has i# = 23 and end the search

Contents of Block #537

How many disk reads was that?

Read inode 2 (the root inode)
Use the inode to locate the data in the root directory
Read the data in the root directory
See that our next path target is "home", which has i# = 11
Read inode 11 (the inode for "/home")
Use the inode to locate the data in the "/home" directory
Read the data in the "/home" directory
See that our next path target is "user1", which has i# = 6
Read inode 6 (the inode for "/home/user1")
Use the inode to locate the data for "/home/user1"
Read the data for "/home/user1"
We know what inode the file is at now!

6 disk reads just to open the file

We didn't even try to read anything out of the file--that was just an open() call!

Simple optimization: cwd

Maintain the notion of a per-process current working directory.

Users can specify files relative to the CWD

We can't avoid this disk access...

OS caches the data blocks of CWD in the disk cache

Store CWD data block here so that we don't have to go to disk 6x to get it.

Summary

Without persistent storage, computers are very annoying to use.

Persistent storage requires a different approach to organizing and storing data, due to differences in its behavior (speed, resilience, request ordering). This leads naturally to the idea of a file system.

When designing filesystems, we care about three properties:

Speed
Resilience
Usability

We should use these three properties to guide our design choices.

Use of the filesystem involves the filesystem API, in-memory bookkeeping structures, and the structure of data on disk. All three need to be considered when designing a filesystem.

Summary

File headers describe how file data can be found. Part of this is the choice for file layout. Different layouts give different tradeoffs in speed, extensibility, etc.

Finally, the filesystem gives users an option for organizing files using directories. Directories are just files containing mappings from names to other files. Traversing directories can be expensive.

The filesystem exposes various syscalls for applications to work with files, e.g. create(), open(), read(). These syscalls manipulate both the state of bits on the disk and some in-memory data structures (open file table, file records).

Lecture 17: Filesystem Fundamentals

By Kevin Song

Lecture 17: Filesystem Fundamentals

5 years ago
498

Kevin Song

I'm a student at UT (that's the one in Austin) who studies things.

Filesystem Fundamentals

Online Lecture Logistics

Online Lecture Logistics

If you raise your hand and I don't respond within 15 seconds, I'm probably forgetting to check my other windows. Please unmute yourself and interrupt me at that point.

The Story So Far

The Story So Far

Disk versus Memory

So where are we now?

We need to come up with some scheme to organize and interact with this storage.

This is the filesystem.

Today's Questions

What makes a good filesystem?

How should applications interact with the filesystem?

How do we organize a file on-disk?

How can we present files in a way that makes sense to users?

Goals in Filesystem Design

What makes a filesystem good?

Speed

Speed

Avoid unneeded access

Caches

Interleave I/O with computation

Resilience

Stable storage: we can be interrupted at any point in time, and we have to be able to recover afterwards!

Result: We just destroyed $100!

Resilience

Stable storage: we can be interrupted at any point in time, and we have to be able to recover afterwards!

Our filesystem must provide tools to recover from a crash!

Otherwise we could be irrecoverably stuck in an invalid state.

Usability

The filesystem consists of raw block numbers. User programs are responsible for keeping track of which blocks they use, and for making sure they don't overwrite other program's blocks.

If the filesystem isn't easy to use, users will try to implement their own filesystem (usually poorly)

Is this fast? Yes.

Is this resilient? Yes.

But who would use it?

Usability

A file is named by the a hash of its contents. All files are in the root directory. Filenames cannot be changed.

Who in their right mind is going to use this??

Filesystem: Is it good?

When analyzing a potential filesystem design, we'll use these three criteria to determine how much we like it:

Speed:

Resilience:

Usability:

What's the Big Picture of the filesystem?

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Big Picture

Common Filesystem Operations

create()

link()

unlink()

So far, all the system calls we've seen only edit system-wide data (files, names, etc.)

The system calls we'll see next also need to manage some per-process data.

open()

close()

Close the file.

read()

Read designated bytes from the file

Other Common Syscalls

Does write() require us to access the disk?

How about seek()?

Common Filesystem Operations

File Design and Layout

File Design And Layout

Files Are Stored as Data and Metadata

Files Are Stored as Data and Metadata

For now, we'll focus on how the file data is laid out on disk.

Evaluating File Layouts

Speed and Usability