GC Essentials

Diego Parra

@diegolparra

@dpsoft

Lucas Amoroso

@_lucasamoroso

@lucasamoroso

Outline

Automatic Memory Management

GC Building Blocks(Algorithms)

Generational Hypotesis

Generational Collectors

JVM Collectors

Intro to Concurrent Collectors 

Intro

What's memory management?

 

Memory management is the process of controlling and coordinating the way a software application access computer memory

 

 

Memory management

Safety First: Automatically freeing objects when all reacheable pointers to them are gone

Control First: Your program's memory consumption is entirelly in your hands

Ownership: memory is managed through a system of ownership with a set of rules that the compiler checks at compile time

Python, JavaScript, Ruby, Java, C#, Haskell and Go

C, C++

Rust

Automatic Memory Management

GC - Building blocks

Reachability

GC Roots

  • Local variables
  • Active threads
  • Static fields

Reachable Objects

  • Objects that are referenced

Unreachable Objects

  • Objects that are no longer referenced(garbage)

Direct Collectors

Reference Counting

The simplest form of Garbage Collection

Count the number of references from live objects

Each object has a Reference Counter(RC)

An object is presumed live iff its RC > 0

An object can be reclaimed when its RC == 0

The RC is incremented/decremented when a reference is copied/deleted

Tracing Collectors

Mark-Sweep

Indirect Collection algorithm

Non-moving  collector

Collection  operates in Two phases

Stop the World mode

Heap tends to become Fragmented

Mark-Sweep

Mark-Sweep

Mark-Sweep

Mark-Sweep

Mark-Compact

Indirect Collection algorithm

Moving  collector

Collection need multiple passes over live objects

Avoid fragmentation

May rearrange objects in the heap

More Slow than mark-sweep

Mark-Compact

Indirect Collection algorithm

Moving  collector

Collection need multiple passes over live objects

Avoid fragmentation

May rearrange objects in the heap

More Slow than mark-sweep

Copying Garbage Collection

Memory is divided into two equal-size regions

Moving  collector

Requires only a single pass over the live objects

Compacted Heap

Locality benefits on large heaps(posible)

Space Overhead

Copying Garbage Collection

Memory is divided into two equal-size regions

Moving  collector

Compacted Heap

Locality benefits on large heaps(posible)

Space Overhead

Requires only a single pass over the live objects

Generational Collectors

Segregation by age

 Generational Hypothesis

This is called the Weak Generational Hypothesis

Most objects die young

The ones that do not usually survive for a long time

Generational GC

Concentrate on the young generation to reduce pause time

Collect different generations at different frequencies

Segregate objects by ages into generations

Copying Collector

Mark/Sweep

Mark/Compact

Generational GC

Concentrate on the young generation to reduce pause time

Collect different generations at different frequencies

Segregate objects by ages into generations

Garbage in an old generation cannot be reclaimed by collection of younger generation

Summary - Generational GC 

 It's based on the generational hypothesis

 A minor GC, on the Young Generation, is performed when the Eden fills up

 A major GC, on the Old Generation, is performed when the Tenured fills up

 A full GC, on the entire heap, is performed when there is no more space to allocate new objects

JVM - Collectors

Serial

 The simplest implementation of a GC algorithm

 There is only one thread performing GC

 When it runs it freezes all of the application threads (Stop The world)

 It uses Mark-copy in the Young Generation

 It uses Mark-sweep-compact  in the Old Generation

Parallel

 Similar to Serial GC

 There are N threads performing GC

 When it runs it freezes all of the application threads (Stop The world)

 It uses Mark-copy in the Young Generation

 It uses Mark-sweep-compact  in the Old Generation

 These pauses time are lower than the ones from Serial

CMS

It scans heap memory using multiple threads

 There are two stop the world phases

 It uses Mark-copy in the Young Generation

 It uses Mark-sweep in the Old Generation

 Initial mark: mark all live objects, in the Old Gen, that are reachable from GC roots or referenced from an object in the YG

 Remark: find objects that were missed by the concurrent tracing phase

 Pause times are lower than the previous ones but there is fragmentation in the Old Generation

G1

A generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector

The heap is divided in regions

Performs space-reclamation incrementally in steps and in parallel

Reclaims space in the most efficient areas first and mostly by using evacuation

G1

A generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector

The heap is divided in regions

Performs space-reclamation incrementally in steps and in parallel

Reclaims space in the most efficient areas first and mostly by using evacuation

Collectors Summary

Concurrent Garbage Collectors

Sneak Peek

Shenandoah

Regionalized GC(Derived from G1)

Concurrent  Compaction 

Single Generation

Sub-millisecond max pause times

-XX:+UseShenandoahGC

ZGC

ZGC

-XX:+UseZGC

Divides memory into regions(ZPages)

Concurrent  Compaction 

Colored pointers

Single Generation

Sub-millisecond max pause times

Conclusions

Like everything, memory management is about trade-off

 Safety 

Throughput

Pause Time

Space overhead

...

Resources

Extras

The Tricolour Abstraction

Describe the State of objects during collection

Black nodes that have been marked and their children have been marked as well

White nodes that have not yet been marked, and at the end of mark-phase, are garbage

 

Gray nodes that have been marked but their children have not been visited, and must be visited again to be painted black

 

The Algorithm

Invariant*: after the marking loop, there can be no references from a black node to a white one

Tricolour marking

Tricolour marking

Tricolour marking

Tricolour marking

Tricolour marking

Garbage Collection Essentials

By Diego Parra

Garbage Collection Essentials

Through these slides we’ll learn about GC Essentials starting with the building blocks algorithms from direct collectors (RC) to the indirect ones (Mark-Sweep, Mark-Compact and Copy GC). After that we’ll focus on the indirect ones and learn about the Generational Collectors and we’ll see how they are useful in current GC implementations on the JVM . We'll learn things such as “How the Heap is divided?”, “What kind of GC implementations are available in the JVM?”, “ What are the differences between them?” and other interesting things. Finally we’ll do a sneak peek about Non-Generational collectors available on the JVM (Shenandoah, ZGC).

  • 808