CS110: Principles of Computer Systems

Autumn 2021
Jerry Cain

PDF

Principles of System Design

  • Let's take a step back and look at the big picture.
    • This class is about implementation-driven lectures. The code teaches the material.
    • However, you also need to walk away from this course with an understanding of the basic design principles that the implementation of large systems are coded to, be they file systems, RSS feed aggregators, shells, or proxies.
    • An understanding of and appreciation for these principles will help you make better design and implementation decisions should you take our more advanced systems courses.
      • CS140: Operating Systems, which has you design and implement processes, threads, virtual memory, and a much more robust filesystem than what you were charged with implementing for Assignment 2.
        • CS110 emphasizes the client use of processes, threads, concurrency directives, and so forth. CS140 is all about implementing them.
      • CS143: Compiler Construction, which has you implement a pipeline of components that ultimately translates Java-like programs into an equivalent stream of assembly code instructions.
      • CS144: Computer Networking, where you study how computer networks (the Internet in particular) are designed and implemented.
        • CS110 emphasizes the client use of sockets and the socket API functions as a vehicle for building networked applications. CS144 is all about understanding, and in some cases implementing, the various network layers to allow those functions to work and to work well.


These slides were constructed by Jerry Cain. They are the product of many conversations with Professors Mendel Rosenblum (CS110, CS140, CS142 instructor) and Phil Levis (CS107E, CS110, CS144 instructor).

Principles of System Design

  • Principles of System Design: CS110 touches on seven such principles
    • Abstraction
    • Modularity and Layering
    • Naming and Name Resolution
    • Caching
    • Virtualization
    • Concurrency
    • Client-server request-and-response

Principles of System Design

  • Principles of System Design
    • Abstraction
      • Separating behavior from implementation (e.g. sort has one interface, but many different implementations).
      • Defining a clean interface that makes a library much easier to use.
      • Examples of abstractions we've taken for granted (or will soon take for granted) this quarter in CS110:
        • filesystems (you've dealt with C FILE *s and C++ iostreams for a while now, and knew little of how they might work until we studied them this quarter). We did learn about file descriptors this quarter, and we leverage that abstraction to make other data sources (e.g. networked servers) look and behave like files.
        • processes (you know how to fork off new processes now, even though you have no idea how fork and execvp work).
        • signals (you know they're used by the kernel to message a process that something significant occurred, but you don't know how they're implemented).
        • threads (you know how to create C++ threads, but you don't really know how they're implemented).
        • HTTP (you're just now learning the protocol used to exchange text documents, images, audio files, etc.).
    • Modularity and Layering
    • Naming and Name Resolution
    • Caching
    • Virtualization
    • Concurrency
    • Client-server request-and-response

Principles of System Design

  • Principles of System Design
    • Abstraction
    • Modularity and Layering
    • Naming and Name Resolution
      • We've covered these two principles at length already, because they were very relevant to filesystems and easily discussed during Week 1.
    • Caching
    • Virtualization
    • Concurrency
    • Client-server request-and-response

Principles of System Design

  • Principles of System Design
    • Abstraction
    • Modularity and Layering
    • Naming and Name Resolution
    • Caching
      • Simply stated, a cache is a hardware or software component that remembers recently generated results so that future requests for the same data can be handled more quickly.
      • Examples of basic address-based caches (as taught in CS107 and CS107E)
        • L1-level instruction and data caches that serve as a staging area for CPU registers.
        • L2-level caches that serve as a staging area for L1 caches.
        • A portion of main memory—the portion not backing the virtual address spaces of active processes—used as a disk cache to store pages of file payload.
      • Examples of caches to store results of repeated (often expensive) calculations:
        • Web browsers that cache recently fetched documents when the server says the documents can be cached.
        • Web proxies that cache static resources so other clients requesting that data can be served more quickly.
        • DNS caches, which hold a mapping of recently resolved domain names to their IP addresses.
        • memcached, which maintains a dictionary of objects frequently used to generate web content
    • Virtualization
    • Concurrency
    • Client-server request-and-response

Principles of System Design

  • Principles of System Design
    • Abstraction
    • Modularity and Layering
    • Naming and Name Resolution
    • Caching
    • Virtualization
      • Virtualization is an abstraction mechanism used to make many resources look like one. Examples include:
        • RAID, which aggregates many typically inexpensive storage devices to behave as a single hard drive.
        • the Andrew File System which grafts many independent, networked file systems into one rooted at /afs.
        • a web server load balancer, where hundreds, thousands, or even tens of thousands of servers are fronted by a smaller set of machines in place to intercept all requests and forward them to the least loaded server.
      • Virtualization is an abstraction mechanism used to make a one resource look like many. Examples include:
        • virtual-to-physical memory mappings, which allows each process to believe it owns all of memory.
        • threads, where a process's stack segment is subdivided into many stack frames so that multiple threads of execution can be rotated through in much the same way a scheduler rotates through multiple processes.
        • virtual machines, which are software implementations designed to execute programs as a physical machine would. VMs can do something as small as provide a runtime for Java executable, or they can do as much as run several different operating systems on an architecture that otherwise couldn't support them.
    • Concurrency
    • Client-server request-and-response

Principles of System Design

  • Principles of System Design
    • Abstraction
    • Modularity and Layering
    • Naming and Name Resolution
    • Caching
    • Virtualization
    • Concurrency
      • We have a good amount of experience with concurrency already:
        • Multiple processes running on a single processor, seemingly at the same time.
        • Multiple threads running within a single process, seemingly at the same time.
      • When multiple processors and/or multiple cores are available, processes can truly run in parallel, and threads within a single process can run in parallel.
      • Signal and interrupt handlers are also technically concurrent. Program execution occasionally needs to be halted to receive information from an external source (the OS, the file system, the keyboard, or the Internet).
      • Some programming languages—Erlang comes to mind—are so inherently concurrent that they adopt a programming model making race conditions much less likely. Other languages—JavaScript comes to mind—take the stance that concurrency, or at least multithreading to the extent we've studied it, was too complicated and error-prone to support until very, very recently.
    • Client-server request-and-response

Principles of System Design

  • Principles of System Design
    • Abstraction
    • Modularity and Layering
    • Naming and Name Resolution
    • Caching
    • Virtualization
    • Concurrency
    • Client-server request-and-response
      • Request/response is a way to organize functionality into modules that have a clear set of responsibilities.
      • We've already had some experience with the request-and-response aspect of this.
        • system calls (open, write, fork, sleep, bind, etc. are userland wrappers around a special type of function call into the kernel. User space and kernel space are two separate modules with a hard boundary separating them).
        • HTTP, IMAP, DNS
        • NFS, AFS