Jalex Chang

2021.07.31

Memory Management in High-Performance Go Applications:

A Case Study of Pebble

Jalex Chang

  • Saff Software Engineer @ Umbo Computer Vision
  • Gopher
  • Love software engineering, database systems, and distributed systems

 

Contact:

  • jalex.cpc @ gmail.com
  • jalex.chang @ Facebook
  • JalexChang @ GitHub

Agenda

  • Introduction

  • Pebble & CockroachDB

  • Memory Management Approaches in Pebble

  • Discussions

  • Summary

Introduction

  • In this tech talk, we are going to introduce the memory management approaches in Pebble, a key-value data store written in GO.

  • The topics will be covered in the talk:

    • What is Pebble?

    • How does Pebble manage its memory usage efficiently?

    • What are the pros and cons of the approach?

    • What are the use cases of the approach?

  • The topics will not be covered in the talk:

    • ​Concurrency control

    • The details of data formats

      • WAL (write-ahead logging) and SSTable (String Sorted Table)

Go's memory allocation & managemet

  • Go's memory allocation and management mechanisms are complicated.

    • Such as garbage collection (GC), TCMalloc, multi-layered memory allocator, escape analysis, and etc.  

  • Some shared topics in recent years

Go memory management mechanisms are powerful enough in most cases, especially for Web-based applications.

 

However, does it also meet the needs of high-performance applications, such as database systems? 

 

To find out the answer, I start to dig out the Go database system - CockroachDB and its underlying data engine Pebble.  

CockroachDB & Pebble

CockroachDB

  • CockroachDB is a distributed SQL database
    • Written in Go
    • NewSQL: designed for high scalability and strong consistency
    • Compatible with PostgreSQL wire protocol
    • Built on a key-value storage engine with LSM trees (log-structured merge-tree)
    • Developed by Cockroach Labs
    • Under business source license (BSL)
      • Free to use
      • Forbid offering CockroachDB as a service without buying licenses

 

Original architecture layers (before 2020)

RocksDB is awesome, but......

  • RocksDB is written in C++.
  • Communication between CockroackDB and RocksDB leverages CGo.
  • Painpoints
    • CGo generates around 70ns overhead per call.
    • Copying memory from C to Go has performance penalties.

Pebble

  • Pebble is a RocksDB inspired key-value store
    • Written in GO
    • Inherit RocksDB's API and data format (use LSM tree as well)
    • Focus on performance and internal usage by CockroachDB
    • Under BSD license
  • History
    • May 2020: introduced as an alternative storage engine to RocksDB in CockroachDB v20.1
    • Nov 2020: made the default storage engine in CockroachDB v20.2

 

Pebble vs RocksDB

Memory Management in Pebble

Significant memory usage sources

  • MemTable
    • In-memory LSM tree
    • Used for buffering data changes that
      • have been written to the WAL
      • have not been flushed to an SSTable
    • mutable but append-only
  • Block Cache
    • An in-memory cache for uncompressed SSTable blocks
    • Adopt Clock-Pro for page replacement

Memory management problems in Pebble

  • MemTables and Block Cache generate a lot of heap garbage in seconds
    • MemTables are discarded after flushing data to the disk.
    • Cached blocks are evicted by others.
  • The garbage leads to a huge amount of pressure on GC
    • The high pressure makes Pebble perform badly in extreme cases.
    • Serving unpredictable performance is unacceptable to a database system.
  • Developers have tried improving memory usage and GC
    • Such as reusing the memory in cached blokcs and tuning GOGC
    • They turn out finding tuning Go GC may solve some problems but generate others......

Manual memory management in Pebble

  • Move memory used in MemTables and Block Cache out of the Go heap and use C memory allocator.
  • Go GC ignores the C allocated memory
    • Extra lifetime tracking is needed, or memory leaks may happen.

 

//source code: pebble/internal/manual/manual.go
//go:linkname throw runtime.throw
func throw(s string)

func New(n int) []byte {
	if n == 0 {
		return make([]byte, 0)
	}
	ptr := C.calloc(C.size_t(n), 1)
	if ptr == nil {
		throw("out of memory")
	}
	// Interpret the C pointer as a pointer
	// to a Go array, then slice.
	return (*[MaxArrayLen]byte)(unsafe.Pointer(ptr))[:n:n]
}
func Free(b []byte) {
	if cap(b) != 0 {
		if len(b) == 0 {
			b = b[:cap(b)]
		}
		ptr := unsafe.Pointer(&b[0])
		C.free(ptr)
	}
}

The journey of C allocated memory

Write data

Flush memTables

Read data

Trivial cases

  • Free related blocks in Block Cache when flushing MemTables

    • To avoid cache collision

  • Free related blocks in Block Cache when compacting SSTables

Observations

  • MemTable

    • Each MemTable has a single but huge C memory space. 

    • The space is size-fixed and is allocated at the very beginning.

    • C memory is encapsulated in the MemTable.

  • Block Cache

    • Block Cache contains a huge amount of small C memory spaces.

    • Each space is size-fixed but is allocated on-demanded.

    • C memory can be manipulated out of the Block Cache.

      • Spaces may get lost if callers have not sent them back.

Memory leak detection

  • To avoid the memory leak in Block Cache, runtime.SetFinalizer is used in Pebble's testing and development.

    • The finalizer is a function associated with an object.

    • The finalizer is run when the object is no longer reachable => run when the object is going to be GC.

// source: pebble/internal/cache/value_invariants.go
func newValue(n int) *Value {
	b := manual.New(n)
	v := &Value{buf: b}
	// Note: this is a no-op if invariants and tracing are disabled or race is
	// enabled.
	invariants.SetFinalizer(v, func(obj interface{}) {
		v := obj.(*Value)
		if v.buf != nil {
			fmt.Fprintf(os.Stderr, "%p: cache value was not freed: refs=%d\n%s",
				v, v.refs(), v.ref.traces())
			os.Exit(1)
		}
	})
	return v
}

Disccussion

Q1: What are the pros and cons of the manual memory management approach?

 

  • Pros
    • No need to tune GO GC anymore 
    • Have better and predictable performance on applications
  • Cons
    • Should track the lifetime of C memory manually and carefully
    • Memory leaks may happen

Disccussion

Q2: What are the use cases of the manual memory management approach? 

 

  • Applications having a high GC pressure.
  • Applications having a heavy loading on disk I/O buffering.
  • The the adopted data is originally represented as bytes or pointers.

Takeaways

In this tech sharing, we have introduced Pebble and its memory management approach.

  • To avoid high pressure on Go GC, it uses the C memory allocator to its significant memory sources: MemTable and Block Cache.

  • To avoid memory leaks:

    • Manual lifetime tracking is needed.

    • A finalizer is used during the testing and development.

Through the case study of Pebble, we have learned:

  • Although Go's memory management mechanisms are powerful, they still cannot meet the requirements of really high-performance applications

  • Manual memory management is a possible alternative though risky.

Thanks for listening.