Reading Large Codebases

Your Programming Experience So Far

We give you

A program specification (i.e. what should this program do?)
A test suite/test cases
Some skeleton code

You give us

A few hundred lines of code that pass the tests. You write almost all of this code.

In Practice

Code is written once but read many times.

A lot of programming time is spent understanding existing code and how to interface new code with it.

Example: how do I get the standard library HashMap to do (_____)?

Is it possible to get this programming framework to do (________) or do I have to do it myself?

Reading (existing) code is hard

Why?

Existing code has context.

Some goal its trying to accomplish
Things that it assumes are true
Implicitly encodes an appropriate choice of algorithm (e.g. linear vs binary search).
Decomposition of problems into smaller pieces
This is not all evident in the code!!!

Reading code is hard

void traverse(Tree* root, Function* fptr){
  for(int i = 0; i < root.numChildren; ++i){
    traverse(root->children[i], fptr);
  }
  *fptr(root);
}

Software is

BIG

UTCSH is a big project

Most submissions came in between 400 and 1000 lines of non-comment, non-blank C.

UTCSH

Pintos is a Big project

Pintos has 11.5k lines of C (13.5k if you count headers)

Almost 10x as much code!

UTCSH

Pintos

GCC is a BIG project

About 6 million lines of code (600x more)

(not to scale)

UTCSH

Pintos

GCC

Linux is....

28 million lines of code and counting

UTCSH

Pintos

GCC

Linux

The techniques you can use to understand and work with a 1k LOC project will not scale to 1m LOC!

In other words, it won't scale to real software projects!

Big Ideas

You cannot hold every piece of the codebase in your head at once.

You cannot hold the entire codebase in your head at once

Corollary: Knowing how to find something can be almost as valuable as knowing it

Understand your system

void traverse(Tree* root, Function* fptr){
  for(int i = 0; i < root.numChildren; ++i){
    traverse(root->children[i], fptr);
  }
  *fptr(root);
}

Always have a goal

question

A good question usually has a few key properties:

The answer can be written down in just a few sentences
Gives you some insight into the context of the system

What's in this code?

Do I need to rewrite this function?

How does this function relate to these data structures?

What does this function assume about its inputs? Are these assumptions valid?

I want to know more about struct Block

I want to know how each of struct Block's members are used in various functions

Jump In

(Once you're ready)

Reading code is hard

This means it's easy to fool yourself into thinking it's much trickier than it is

Example: Linked Lists

Reading Large Codebases

By Kevin Song

Reading Large Codebases

4 years ago
252

Kevin Song

I'm a student at UT (that's the one in Austin) who studies things.

Reading Large Codebases

Your Programming Experience So Far

In Practice

Reading (existing) code is hard

Why?

Reading code is hard

Software is

UTCSH is a big project

Pintos is a Big project

GCC is a BIG project

Linux is....

The techniques you can use to understand and work with a 1k LOC project will not scale to 1m LOC!

Big Ideas

Big Ideas

You cannot hold the entire codebase in your head at once

You cannot hold the entire codebase in your head at once

Corollary: Knowing how to find something can be almost as valuable as knowing it

Understand your system

Understand your system

Always have a goal

question

A good question usually has a few key properties:

Other Good Questions

Jump In

Reading code is hard

This means it's easy to fool yourself into thinking it's much trickier than it is

Example: Linked Lists

Reading Large Codebases

More from Kevin Song