CS110 Lecture 4: Filesystem System Calls
CS110: Principles of Computer Systems
Winter 2021-2022
Stanford University
Instructors: Nick Troccoli and Jerry Cain
Asking Questions
- Feel free to raise your hand at any time with a question
- If you are more comfortable, you can post a question in the Ed forum thread for each day’s lecture (optionally anonymously)
- We will monitor the thread throughout the lecture for questions
CS110 Topic 1: How can we design filesystems to store and manipulate files on disk, and how can we interact with the filesystem in our programs?
Learning About Filesystems
Unix v6 Filesystem design, part 1 (files)
Unix v6 Filesystem design, part 2 (large files + directories)
Interacting with the filesystem from our programs
Lecture 2
Lecture 3
This Lecture
assign2: implement portions of a filesystem!
Learning Goals
- Learn about the open, close, read and write functions that let us interact with files
- Get familiar writing programs that read, write and create files
- Learn what the operating system manages for us so that we can interact with files
Lecture Code
- Lecture code for today is in /usr/class/cs110/lecture-examples/filesystems
- Make a copy of all lecture code: git clone /usr/class/cs110/lecture-examples
- get updates by running git pull within your lecture examples folder
Lecture Plan
- Interacting with the filesystem as users
-
Interacting with the filesystem as programmers
- System calls
- open() and close()
- read() and write()
- Practice: copying files
- Operating system data structures
Lecture Plan
- Interacting with the filesystem as users
-
Interacting with the filesystem as programmers
- System calls
- open() and close()
- read() and write()
- Practice: copying files
- Operating system data structures
Filesystem: User Perspective
- Studying how we interact with the filesystem as users will inform how we interact with it as programmers.
- As users, we can run ls to get details about particular files.
- Using -a shows all files (even hidden ones), -l shows more info about each file
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Filename
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Last modified time
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Size (bytes)
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Group name
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Owner name
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
# Hard Links
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Type and permissions
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Current directory
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Parent directory
Filesystem Information
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------ 5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan 9 15:12 ..
drwx------ 8 troccoli operator 2048 Jan 9 15:12 .git
-rw------- 1 troccoli operator 259 Jan 5 15:31 .gitignore
-rw------- 1 troccoli operator 1750 Jan 9 15:12 imdb.cc
-rw------- 1 troccoli operator 3501 Jan 5 15:31 imdb.h
-rw------- 1 troccoli operator 6439 Jan 5 15:31 imdbtest.cc
-rw------- 1 troccoli operator 1720 Jan 5 15:31 imdb-utils.h
-rw------- 1 troccoli operator 964 Jan 5 15:31 Makefile
drwx------ 2 troccoli operator 2048 Jan 5 15:31 .metadata
-rw------- 1 troccoli operator 2146 Jan 5 15:31 path.cc
-rw------- 1 troccoli operator 4122 Jan 5 15:31 path.h
-rw------- 1 troccoli operator 1829 Jan 5 15:31 search.cc
drwx------ 2 troccoli operator 2048 Jan 5 15:31 tools
Type and permissions
owner
Here, the owner has read, write, and execute permissions, the group has only read and execute permissions, and the user also has only read and execute permissions.
Filesystem represents permissions in binary (1 or 0 for each permission option):
- eg. for permissions above:
111 101 101
-
we can further convert each group of 3 into one base-8 digit
-
base 8: 7 5 5
-
- So, the permissions for the file would be 755
File Permissions
rwx r-x r-x
group
other
Lecture Plan
- Interacting with the filesystem as users
-
Interacting with the filesystem as programmers
- System calls
- open() and close()
- read() and write()
- Practice: copying files
- Operating system data structures
System Calls
- Functions to interact with the operating system are part of a group of functions called system calls.
- A system call is a public function provided by the operating system.
- The operating system handles these tasks because they require special privileges that we do not have in our programs.
- The operating system kernel actually runs the code for a system call, completely isolating the system-level interaction from your (potentially harmful) program.
- We are going to examine the system calls for interacting with files. When writing production code, you will often use higher-level methods that build on these (like C++ streams or FILE *), but let's see how they work!
open()
A function that a program can call to open a file:
int open(const char *pathname, int flags);
- pathname: the path to the file you wish to open
- flags: a bitwise OR of options specifying the behavior for opening the file
- the return value is a file descriptor representing the opened file, or -1 on error
Many possible flags (see man page for full list). You must include exactly one of the following flags:
-
O_RDONLY: read only
-
O_WRONLY: write only
-
O_RDWR: read and write
Another useful flag is O_TRUNC: if the file exists already, clear it ("truncate it").
open()
A function that a program can call to open (and potentially create) a file:
int open(const char *pathname, int flags, mode_t mode);
You can also use open to create a new file if the specified file doesn't exist. To do this, include O_CREAT as one of the flags. You must also specify a third mode parameter.
- mode: the permissions to attempt to set for a created file
Another useful flag here is O_EXCL: the file must be created from scratch, fail if already exists
Aside: how are there multiple signatures for open in C? See here.
File Descriptors
- A file descriptor is like a "ticket number" representing your currently-open file.
- It is a unique number assigned by the operating system to refer to that file
- Each program has its own file descriptors
- When you wish to refer to the file (e.g. read from it, write to it) you must provide the file descriptor.
close()
A function that a program can call to close a file when done with it.
int close(int fd);
It's important to close files when you are done with them to preserve system resources.
- fd: the file descriptor you'd like to close.
You can use valgrind to check if you forgot to close any files.
Example: Creating A File
// Create the file
int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_EXCL, 0644);
// Close the file now that we are done with it
close(fd);
touch.c
read()
A function that a program can call to read bytes from an open file.
ssize_t read(int fd, void *buf, size_t count);
- fd: the file descriptor for the file you'd like to read from
- buf: the memory location where the read-in bytes should be put
- count: the number of bytes you wish to read
- The function returns -1 on error, 0 if at end of file, or nonzero if bytes were read
Key idea: read may not read all the bytes you ask it to! The return value tells you how many were actually read.
Key idea #2: the operating system keeps track of where in a file a file descriptor is reading from. So the next time you read, it will resume where you left off.
write()
A function that a program can call to write bytes to an open file.
ssize_t write(int fd, const void *buf, size_t count);
- fd: the file descriptor for the file you'd like to write to
- buf: the memory location storing the bytes that should be written
- count: the number of bytes you wish to write from buf
- The function returns -1 on error, or otherwise the number of bytes that were written
Key idea: write may not write all the bytes you ask it to! The return value tells you how many were actually written.
Key idea #2: the operating system keeps track of where in a file a file descriptor is writing to. So the next time you write, it will write to where you left off.
Example: Copy
Let's write an example program copy that emulates the built-in cp command. It takes in two command line arguments (file names) and copies the contents of the first file to the second.
copy.c
Copying Files
File descriptors are just integers - for that reason, we can store and access them just like integers.
- If you're interacting with many files, it may be helpful to have an array of file descriptors
There are 3 special file descriptors provided by default to each program:
- 0: standard input (user input from the terminal) - STDIN_FILENO
- 1: standard output (output to the terminal) - STDOUT_FILENO
- 2: standard error (error output to the terminal) - STDERR_FILENO
Example: Copy Extended
Let's build on our copy.c program to add support for copying to multiple files, and have it also output the contents of the file to the terminal.
copy-extended.c
Lecture Plan
- Interacting with the filesystem as users
-
Interacting with the filesystem as programmers
- System calls
- open() and close()
- read() and write()
- Practice: copying files
- Operating system data structures
- Linux maintains a data structure for each active process. These data structures are called
process control blocks
, and they are stored in theprocess table
- We'll explain exactly what a process is later in lecture
- We'll explain exactly what a process is later in lecture
- Process control blocks store many things (the user who launched it, what time it was launched, CPU state, etc.). Among the many items it stores is the
file descriptor table
- A file descriptor (used by your program) is a small integer that's an index into this table
- Descriptors 0, 1, and 2 are standard input, standard output, and standard error, but there are
no predefined meanings for descriptors 3 and up. When you run a program from the terminal, descriptors 0, 1, and 2 are most often bound to the terminal
- Descriptors 0, 1, and 2 are standard input, standard output, and standard error, but there are
File Descriptor Table and File Descriptors
- A file descriptor is the identifier needed to interact with a resource (most often a file) via system calls (e.g.,
read, write,
andclose
) - A name has semantic meaning, an address denotes a location; an identifier has no meaning
- /etc/passwd vs.34.196.104.129 vs. file descriptor 5
- Many system calls allocate file descriptors
- read: open a file
- pipe: create two unidirectional byte streams (one read, one write) between processes
- accept: accept a TCP connection request, returns descriptor to new socket
- When allocating a new file descriptor, kernel chooses the smallest available number
- These semantics are important! If you close stdout (1) then open a file, it will be assigned to file descriptor 1 so act as stdout (this is how
$ cat in.txt > out.txt
works)
- These semantics are important! If you close stdout (1) then open a file, it will be assigned to file descriptor 1 so act as stdout (this is how
Creating and Using File Descriptors
-
E.g., a file table entry (for a regular file) keeps track of a current position in the file
- If you read 1000 bytes, the next read will be from 1000 bytes after the preceding one
- If you write 380 bytes, the next write will start 380 bytes after the preceding one
- If you want multiple processes to write to the same log file and have the results be intelligible, then you have all of them share a single file table entry: their calls to write will be serialized and occur in some linear order
File Descriptor vs. File Table Entries
- A entry in the file descriptor table is just a pointer to a file table entry
- Multiple entries in a table can point to the same file table entry
- Entries in different file descriptor tables (different processes!) can point to the same file table entry
- Each process maintains its own descriptor table, but there is one, system-wide open file table. This allows for file resources to be shared between processes, as we've seen
- As drawn above, descriptors 0, 1, and 2 in each of the three PCBs alias the same three open files. That's why each of the referred table entries have refcounts of 3 instead of 1.
- This shouldn't surprise you. If your
bash
shell callsmake
, which itself callsg++
, each of them inserts text into the same terminal window: those three files could be stdin, stdout, and stderr for a terminal
File Table Details
vnodes
- The vnode is the kernel's abstraction of an actual file: it includes information on what kind of file it is, how many file table entries reference it, and function pointers for performing operations.
- A vnode's interface is file-system independent, but its implementation is file-system specific; any file system (or file abstraction) can put state it needs to in the vnode (e.g., inode number)
- The term vnode comes from BSD UNIX; in Linux source it's called a generic inode (CONFUSING!)
- Each open file entry has a pointer to a vnode, which is a structure housing static information about a file or file-like resource.
File Decriptors -> File Table -> vnode Table
- There is one system-wide vnode table for the same reason there is one system-wide open file table. Independent file sessions reading from the same file don't need independent copies of the vnode. They can all alias the same one.
Filesystem Data Structures
None of these
kernel-resident
data structures
are visible to
users. Note the
filesystem itself
is a completely
different
component, and
that filesystem
inodes of open
files are loaded into vnode table entries. The yellow inode in the vnode is an in-memory replica of the yellow sliver of memory in the filesystem.
Recap
- Interacting with the filesystem as users
-
Interacting with the filesystem as programmers
- System calls
- open() and close()
- read() and write()
- Practice: copying files
- Operating system data structures
Next time: more details on system calls, and an introduction to multiprocessing
Copy of CS110 Lecture 4: Filesystem System Calls
By Jerry Cain
Copy of CS110 Lecture 4: Filesystem System Calls
- 676