GC Essentials

Diego Parra


@diegolparra
@dpsoft
Lucas Amoroso


@_lucasamoroso
@lucasamoroso
Outline


Automatic Memory Management

GC Building Blocks(Algorithms)

Generational Hypotesis

Generational Collectors

JVM Collectors

Intro to Concurrent Collectors
Intro
What's memory management?
Memory management is the process of controlling and coordinating the way a software application access computer memory
Memory management
Safety First: Automatically freeing objects when all reacheable pointers to them are gone
Control First: Your program's memory consumption is entirelly in your hands
Ownership: memory is managed through a system of ownership with a set of rules that the compiler checks at compile time




Python, JavaScript, Ruby, Java, C#, Haskell and Go

C, C++

Rust
Automatic Memory Management

GC - Building blocks
Reachability




GC Roots
- Local variables
- Active threads
- Static fields
Reachable Objects
- Objects that are referenced
Unreachable Objects
- Objects that are no longer referenced(garbage)
Direct Collectors
Reference Counting

The simplest form of Garbage Collection

Count the number of references from live objects

Each object has a Reference Counter(RC)

An object is presumed live iff its RC > 0
An object can be reclaimed when its RC == 0

The RC is incremented/decremented when a reference is copied/deleted


Tracing Collectors
Mark-Sweep

Indirect Collection algorithm
Non-moving collector
Collection operates in Two phases


Stop the World mode



Heap tends to become Fragmented
Mark-Sweep

Mark-Sweep

Mark-Sweep

Mark-Sweep

Mark-Compact

Indirect Collection algorithm
Moving collector
Collection need multiple passes over live objects


Avoid fragmentation

May rearrange objects in the heap


More Slow than mark-sweep

Mark-Compact

Indirect Collection algorithm
Moving collector
Collection need multiple passes over live objects


Avoid fragmentation

May rearrange objects in the heap


More Slow than mark-sweep

Copying Garbage Collection

Memory is divided into two equal-size regions
Moving collector
Requires only a single pass over the live objects


Compacted Heap

Locality benefits on large heaps(posible)

Space Overhead


Copying Garbage Collection

Memory is divided into two equal-size regions
Moving collector


Compacted Heap

Locality benefits on large heaps(posible)

Space Overhead

Requires only a single pass over the live objects

Generational Collectors
Segregation by age
Generational Hypothesis

This is called the Weak Generational Hypothesis
Most objects die young
The ones that do not usually survive for a long time








Generational GC
Concentrate on the young generation to reduce pause time
Collect different generations at different frequencies


Segregate objects by ages into generations



Copying Collector
Mark/Sweep
Mark/Compact
Generational GC

Concentrate on the young generation to reduce pause time
Collect different generations at different frequencies


Segregate objects by ages into generations

Garbage in an old generation cannot be reclaimed by collection of younger generation

Summary - Generational GC

It's based on the generational hypothesis

A minor GC, on the Young Generation, is performed when the Eden fills up

A major GC, on the Old Generation, is performed when the Tenured fills up

A full GC, on the entire heap, is performed when there is no more space to allocate new objects
JVM - Collectors
Serial

The simplest implementation of a GC algorithm

There is only one thread performing GC

When it runs it freezes all of the application threads (Stop The world)

It uses Mark-copy in the Young Generation

It uses Mark-sweep-compact in the Old Generation

Parallel

Similar to Serial GC

There are N threads performing GC

When it runs it freezes all of the application threads (Stop The world)

It uses Mark-copy in the Young Generation

It uses Mark-sweep-compact in the Old Generation

These pauses time are lower than the ones from Serial

CMS


It scans heap memory using multiple threads

There are two stop the world phases

It uses Mark-copy in the Young Generation

It uses Mark-sweep in the Old Generation

Initial mark: mark all live objects, in the Old Gen, that are reachable from GC roots or referenced from an object in the YG

Remark: find objects that were missed by the concurrent tracing phase

Pause times are lower than the previous ones but there is fragmentation in the Old Generation

G1


A generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector

The heap is divided in regions

Performs space-reclamation incrementally in steps and in parallel
Reclaims space in the most efficient areas first and mostly by using evacuation


G1

A generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector

The heap is divided in regions

Performs space-reclamation incrementally in steps and in parallel
Reclaims space in the most efficient areas first and mostly by using evacuation



Collectors Summary





Concurrent Garbage Collectors

Sneak Peek
Shenandoah

Regionalized GC(Derived from G1)

Concurrent Compaction

Single Generation

Sub-millisecond max pause times
-XX:+UseShenandoahGC



ZGC
-XX:+UseZGC




Divides memory into regions(ZPages)

Concurrent Compaction

Colored pointers

Single Generation

Sub-millisecond max pause times
Conclusions

Like everything, memory management is about trade-off
Safety

Throughput

Pause Time

Space overhead

...








Resources

Extras

The Tricolour Abstraction


Describe the State of objects during collection
Black nodes that have been marked and their children have been marked as well
White nodes that have not yet been marked, and at the end of mark-phase, are garbage
Gray nodes that have been marked but their children have not been visited, and must be visited again to be painted black

The Algorithm


Invariant*: after the marking loop, there can be no references from a black node to a white one

Tricolour marking

Tricolour marking

Tricolour marking

Tricolour marking

Tricolour marking

Garbage Collection Essentials
By Diego Parra
Garbage Collection Essentials
Through these slides we’ll learn about GC Essentials starting with the building blocks algorithms from direct collectors (RC) to the indirect ones (Mark-Sweep, Mark-Compact and Copy GC). After that we’ll focus on the indirect ones and learn about the Generational Collectors and we’ll see how they are useful in current GC implementations on the JVM . We'll learn things such as “How the Heap is divided?”, “What kind of GC implementations are available in the JVM?”, “ What are the differences between them?” and other interesting things. Finally we’ll do a sneak peek about Non-Generational collectors available on the JVM (Shenandoah, ZGC).
- 868