CS110 Lecture 4: Filesystem System Calls

CS110: Principles of Computer Systems

Winter 2021-2022

Stanford University

Instructors: Nick Troccoli and Jerry Cain

The Stanford University logo

Asking Questions

  • Feel free to raise your hand at any time with a question
  • If you are more comfortable, you can post a question in the Ed forum thread for each day’s lecture (optionally anonymously)
  • We will monitor the thread throughout the lecture for questions
The Ed logo

CS110 Topic 1: How can we design filesystems to store and manipulate files on disk, and how can we interact with the filesystem in our programs?

Learning About Filesystems

Unix v6 Filesystem design, part 1 (files)

Unix v6 Filesystem design, part 2 (large files + directories)

Interacting with the filesystem from our programs

Lecture 2

Lecture 3

This Lecture

assign2: implement portions of a filesystem!

Learning Goals

  • Learn about the open, close, read and write functions that let us interact with files
  • Get familiar writing programs that read, write and create files
  • Learn what the operating system manages for us so that we can interact with files

Lecture Code

  • Lecture code for today is in /usr/class/cs110/lecture-examples/filesystems
  • Make a copy of all lecture code: git clone /usr/class/cs110/lecture-examples
    • get updates by running git pull within your lecture examples folder

Lecture Plan

  • Interacting with the filesystem as users
  • Interacting with the filesystem as programmers
    • System calls
    • open() and close()
    • read() and write()
    • Practice: copying files
  • Operating system data structures

Lecture Plan

  • Interacting with the filesystem as users
  • Interacting with the filesystem as programmers
    • System calls
    • open() and close()
    • read() and write()
    • Practice: copying files
  • Operating system data structures

Filesystem: User Perspective

  • Studying how we interact with the filesystem as users will inform how we interact with it as programmers.
  • As users, we can run ls to get details about particular files.  
    • Using -a shows all files (even hidden ones), -l shows more info about each file
troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Filename

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Last modified time

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Size (bytes)

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Group name

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Owner name

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

# Hard Links

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Type and permissions

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Current directory

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Parent directory

Filesystem Information

troccoli@myth54:~/assign1$ ls -al
total 42
drwx------  5 troccoli operator 2048 Jan 10 08:41 .
drwxr-xr-x 51 troccoli operator 6144 Jan  9 15:12 ..
drwx------  8 troccoli operator 2048 Jan  9 15:12 .git
-rw-------  1 troccoli operator  259 Jan  5 15:31 .gitignore
-rw-------  1 troccoli operator 1750 Jan  9 15:12 imdb.cc
-rw-------  1 troccoli operator 3501 Jan  5 15:31 imdb.h
-rw-------  1 troccoli operator 6439 Jan  5 15:31 imdbtest.cc
-rw-------  1 troccoli operator 1720 Jan  5 15:31 imdb-utils.h
-rw-------  1 troccoli operator  964 Jan  5 15:31 Makefile
drwx------  2 troccoli operator 2048 Jan  5 15:31 .metadata
-rw-------  1 troccoli operator 2146 Jan  5 15:31 path.cc
-rw-------  1 troccoli operator 4122 Jan  5 15:31 path.h
-rw-------  1 troccoli operator 1829 Jan  5 15:31 search.cc
drwx------  2 troccoli operator 2048 Jan  5 15:31 tools

Type and permissions

owner

Here, the owner has read, write, and execute permissions, the group has only read and execute permissions, and the user also has only read and execute permissions.

Filesystem represents permissions in binary (1 or 0 for each permission option):

  • eg. for permissions above: 111 101 101
  • we can further convert each group of 3 into one base-8 digit

    • base 8:               7   5   5

  • So, the permissions for the file would be 755

File Permissions

rwx r-x r-x

group

other

Lecture Plan

  • Interacting with the filesystem as users
  • Interacting with the filesystem as programmers
    • System calls
    • open() and close()
    • read() and write()
    • Practice: copying files
  • Operating system data structures

System Calls

  • Functions to interact with the operating system are part of a group of functions called system calls.
  • A system call is a public function provided by the operating system.
  • The operating system handles these tasks because they require special privileges that we do not have in our programs.
  • The operating system kernel actually runs the code for a system call, completely isolating the system-level interaction from your (potentially harmful) program.
  • We are going to examine the system calls for interacting with files.  When writing production code, you will often use higher-level methods that build on these (like C++ streams or FILE *), but let's see how they work!

open()

A function that a program can call to open a file:

int open(const char *pathname, int flags);
  • pathname: the path to the file you wish to open
  • flags: a bitwise OR of options specifying the behavior for opening the file
  • the return value is a file descriptor representing the opened file, or -1 on error

 

Many possible flags (see man page for full list).   You must include exactly one of the following flags:

  • O_RDONLY: read only

  • O_WRONLY: write only

  • O_RDWR: read and write

Another useful flag is O_TRUNC: if the file exists already, clear it ("truncate it").

open()

A function that a program can call to open (and potentially create) a file:

int open(const char *pathname, int flags, mode_t mode);

You can also use open to create a new file if the specified file doesn't exist.  To do this, include O_CREAT as one of the flags.  You must also specify a third mode parameter.

  • mode: the permissions to attempt to set for a created file

 

Another useful flag here is O_EXCL: the file must be created from scratch, fail if already exists

 

Aside: how are there multiple signatures for open in C?  See here.

File Descriptors

  • A file descriptor is like a "ticket number" representing your currently-open file.
  • It is a unique number assigned by the operating system to refer to that file
  • Each program has its own file descriptors
  • When you wish to refer to the file (e.g. read from it, write to it) you must provide the file descriptor.

close()

A function that a program can call to close a file when done with it.

int close(int fd);

It's important to close files when you are done with them to preserve system resources.

  • fd: the file descriptor you'd like to close.

 

You can use valgrind to check if you forgot to close any files.

Example: Creating A File

// Create the file
int fd = open("myfile.txt", O_WRONLY | O_CREAT | O_EXCL, 0644);

// Close the file now that we are done with it
close(fd);

read()

A function that a program can call to read bytes from an open file.

ssize_t read(int fd, void *buf, size_t count);
  • fd: the file descriptor for the file you'd like to read from
  • buf: the memory location where the read-in bytes should be put
  • count: the number of bytes you wish to read
  • The function returns -1 on error, 0 if at end of file, or nonzero if bytes were read

 

Key idea: read may not read all the bytes you ask it to!  The return value tells you how many were actually read.

 

Key idea #2: the operating system keeps track of where in a file a file descriptor is reading from.  So the next time you read, it will resume where you left off.

write()

A function that a program can call to write bytes to an open file.

ssize_t write(int fd, const void *buf, size_t count);
  • fd: the file descriptor for the file you'd like to write to
  • buf: the memory location storing the bytes that should be written
  • count: the number of bytes you wish to write from buf
  • The function returns -1 on error, or otherwise the number of bytes that were written

 

Key idea: write may not write all the bytes you ask it to!  The return value tells you how many were actually written.

 

Key idea #2: the operating system keeps track of where in a file a file descriptor is writing to.  So the next time you write, it will write to where you left off.

Example: Copy

Let's write an example program copy that emulates the built-in cp command.  It takes in two command line arguments (file names) and copies the contents of the first file to the second.

copy-soln.c and copy-soln-full.c (with error checking)

Copying Files

File descriptors are just integers - for that reason, we can store and access them just like integers.

  • If you're interacting with many files, it may be helpful to have an array of file descriptors

 

There are 3 special file descriptors provided by default to each program:

  • 0: standard input (user input from the terminal) - STDIN_FILENO
  • 1: standard output (output to the terminal) - STDOUT_FILENO
  • 2: standard error (error output to the terminal) - STDERR_FILENO

Example: Copy Extended

Let's build on our copy.c program to add support for copying to multiple files, and have it also output the contents of the file to the terminal.

Lecture Plan

  • Interacting with the filesystem as users
  • Interacting with the filesystem as programmers
    • System calls
    • open() and close()
    • read() and write()
    • Practice: copying files
  • Operating system data structures
  • Linux maintains a data structure for each active process. These data structures are called process control blocks, and they are stored in the process table
    • We'll explain exactly what a process is later in lecture
       
  • Process control blocks store many things (the user who launched it, what time it was launched, CPU state, etc.).  Among the many items it stores is the file descriptor table
     
  • A file descriptor (used by your program) is a small integer that's an index into this table
    • Descriptors 0, 1, and 2 are standard input, standard output, and standard error, but there are
      no predefined meanings for descriptors 3 and up. When you run a program from the terminal, descriptors 0, 1, and 2 are most often bound to the terminal

File Descriptor Table and File Descriptors

This diagram shows the "Process Control Blocks", handled by the OS. Specifically, it shows the "descriptor table for process ID 1000", "descriptor table for process ID 1001", and "descriptor table for process ID 1002". Each sub-diagram has an array from 0 to 9.
  • A file descriptor is the identifier needed to interact with a resource (most often a file) via system calls (e.g.,  read, write, and close)
  • A name has semantic meaning, an address denotes a location; an identifier has no meaning
    • /etc/passwd vs.34.196.104.129 vs. file descriptor 5
  • Many  system calls allocate file descriptors
    • read: open a file
    • pipe: create two unidirectional byte streams (one read, one write) between processes
    • accept: accept a TCP connection request, returns descriptor to new socket
  • When allocating a new file descriptor, kernel  chooses the smallest available number
    • These semantics are important! If you close stdout (1) then open a file, it will be assigned to file descriptor 1 so act as stdout (this is how $ cat in.txt > out.txt works)

Creating and Using File Descriptors

(same as previous)
  • E.g., a file table entry (for a regular file) keeps track of a current position in the file
    • If you read 1000 bytes, the next read will be from 1000 bytes after the preceding one
    • If you write 380 bytes, the next write will start 380 bytes after the preceding one
  • If you want multiple processes to write to the same log file and have the results be intelligible, then you have all of them share a single file table entry: their calls to write will be serialized and occur in some linear order

File Descriptor vs. File Table Entries

  • A entry in the file descriptor table is just a pointer to a file table entry
  • Multiple entries in a table can point to the same file table entry
  • Entries in different file descriptor tables (different processes!) can point to the same file table entry
Almost the same as previous, with the addition of an expansion of one of the elements in process 1001. The expansion has an entry that reads: "mode: r, cursor: 0, refcount: 1, vnode: (blank)"
  • Each process maintains its own descriptor table, but there is  one, system-wide open file table. This allows for file resources to be shared between processes, as we've seen
  • As drawn above, descriptors 0, 1, and 2 in each of the three PCBs alias the same three open files. That's why each of the referred table entries have refcounts of 3 instead of 1.
  • This shouldn't surprise you. If your bash shell calls make, which itself calls g++, each of them inserts text into the same terminal window: those three files could be stdin, stdout, and stderr for a terminal

File Table Details

Almost the same as the previous diagram, except that there are many descriptor table entries. The idea is that file descriptior 0 from each descriptor table points to the same open file table entry, corresponding to STDIN. Likewise for descriptor 1 (all point to the same open file table entry, for STDOUT, which has a reference count of 3, because there are three files that have STDOUT open), and for descriptor 2 (STDERR). Each descriptor table also has some entries (above 2) that point to open files that are not shared between other processes.

vnodes

  • The vnode is the kernel's abstraction of an actual file: it includes information on what kind of file it is, how many file table entries reference it, and function pointers for performing operations.
  • A vnode's interface is file-system independent, but its implementation is file-system specific; any file system (or file abstraction) can put state it needs to in the vnode (e.g., inode number)
  • The term vnode comes from BSD UNIX; in Linux source it's called a generic inode (CONFUSING!)
  • Each open file entry has a pointer to a vnode, which is a structure housing static information about a file or file-like resource.
Almost the same as the last image, except that one of the vnode entries is further expanded to show the details in the "vnode table". In particular: "type: regfile, refcount: 1, fnptrs ***, inode: 0644; 8:23pm; cgregg"

File Decriptors -> File Table -> vnode Table

  • There is one system-wide vnode table for the same reason there is one system-wide open file table. Independent file sessions reading from the same file don't need independent copies of the vnode. They can all alias the same one.
Almost the same as the last diagram, except now many of the open file table vnode entries are expanded for the vnode table.

Filesystem Data Structures

None of these
kernel-
resident
data
structures
are
visible to
users. Note the
filesystem itself
is a
completely
different
component, and
that
filesystem
inodes of open
files are loaded into vnode table entries. The
yellow inode in the vnode is an in-memory replica of the yellow sliver of memory in the filesystem.

Recap

  • Interacting with the filesystem as users
  • Interacting with the filesystem as programmers
    • System calls
    • open() and close()
    • read() and write()
    • Practice: copying files
  • Operating system data structures

 

Next time: more details on system calls, and an introduction to multiprocessing