GC Essentials
Diego Parra
@diegolparra
@dpsoft
Lucas Amoroso
@_lucasamoroso
@lucasamoroso
Outline
Automatic Memory Management
GC Building Blocks(Algorithms)
Generational Hypotesis
Generational Collectors
JVM Collectors
Intro to Concurrent Collectors
Intro
What's memory management?
Memory management is the process of controlling and coordinating the way a software application access computer memory
Memory management
Safety First: Automatically freeing objects when all reacheable pointers to them are gone
Control First: Your program's memory consumption is entirelly in your hands
Ownership: memory is managed through a system of ownership with a set of rules that the compiler checks at compile time
Python, JavaScript, Ruby, Java, C#, Haskell and Go
C, C++
Rust
Automatic Memory Management
GC - Building blocks
Reachability
GC Roots
- Local variables
- Active threads
- Static fields
Reachable Objects
- Objects that are referenced
Unreachable Objects
- Objects that are no longer referenced(garbage)
Direct Collectors
Reference Counting
The simplest form of Garbage Collection
Count the number of references from live objects
Each object has a Reference Counter(RC)
An object is presumed live iff its RC > 0
An object can be reclaimed when its RC == 0
The RC is incremented/decremented when a reference is copied/deleted
Tracing Collectors
Mark-Sweep
Indirect Collection algorithm
Non-moving collector
Collection operates in Two phases
Stop the World mode
Heap tends to become Fragmented
Mark-Sweep
Mark-Sweep
Mark-Sweep
Mark-Sweep
Mark-Compact
Indirect Collection algorithm
Moving collector
Collection need multiple passes over live objects
Avoid fragmentation
May rearrange objects in the heap
More Slow than mark-sweep
Mark-Compact
Indirect Collection algorithm
Moving collector
Collection need multiple passes over live objects
Avoid fragmentation
May rearrange objects in the heap
More Slow than mark-sweep
Copying Garbage Collection
Memory is divided into two equal-size regions
Moving collector
Requires only a single pass over the live objects
Compacted Heap
Locality benefits on large heaps(posible)
Space Overhead
Copying Garbage Collection
Memory is divided into two equal-size regions
Moving collector
Compacted Heap
Locality benefits on large heaps(posible)
Space Overhead
Requires only a single pass over the live objects
Generational Collectors
Segregation by age
Generational Hypothesis
This is called the Weak Generational Hypothesis
Most objects die young
The ones that do not usually survive for a long time
Generational GC
Concentrate on the young generation to reduce pause time
Collect different generations at different frequencies
Segregate objects by ages into generations
Copying Collector
Mark/Sweep
Mark/Compact
Generational GC
Concentrate on the young generation to reduce pause time
Collect different generations at different frequencies
Segregate objects by ages into generations
Garbage in an old generation cannot be reclaimed by collection of younger generation
Summary - Generational GC
It's based on the generational hypothesis
A minor GC, on the Young Generation, is performed when the Eden fills up
A major GC, on the Old Generation, is performed when the Tenured fills up
A full GC, on the entire heap, is performed when there is no more space to allocate new objects
JVM - Collectors
Serial
The simplest implementation of a GC algorithm
There is only one thread performing GC
When it runs it freezes all of the application threads (Stop The world)
It uses Mark-copy in the Young Generation
It uses Mark-sweep-compact in the Old Generation
Parallel
Similar to Serial GC
There are N threads performing GC
When it runs it freezes all of the application threads (Stop The world)
It uses Mark-copy in the Young Generation
It uses Mark-sweep-compact in the Old Generation
These pauses time are lower than the ones from Serial
CMS
It scans heap memory using multiple threads
There are two stop the world phases
It uses Mark-copy in the Young Generation
It uses Mark-sweep in the Old Generation
Initial mark: mark all live objects, in the Old Gen, that are reachable from GC roots or referenced from an object in the YG
Remark: find objects that were missed by the concurrent tracing phase
Pause times are lower than the previous ones but there is fragmentation in the Old Generation
G1
A generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector
The heap is divided in regions
Performs space-reclamation incrementally in steps and in parallel
Reclaims space in the most efficient areas first and mostly by using evacuation
G1
A generational, incremental, parallel, mostly concurrent, stop-the-world, and evacuating garbage collector
The heap is divided in regions
Performs space-reclamation incrementally in steps and in parallel
Reclaims space in the most efficient areas first and mostly by using evacuation
Collectors Summary
Concurrent Garbage Collectors
Sneak Peek
Shenandoah
Regionalized GC(Derived from G1)
Concurrent Compaction
Single Generation
Sub-millisecond max pause times
-XX:+UseShenandoahGC
ZGC
-XX:+UseZGC
Divides memory into regions(ZPages)
Concurrent Compaction
Colored pointers
Single Generation
Sub-millisecond max pause times
Conclusions
Like everything, memory management is about trade-off
Safety
Throughput
Pause Time
Space overhead
...
Resources
Extras
The Tricolour Abstraction
Describe the State of objects during collection
Black nodes that have been marked and their children have been marked as well
White nodes that have not yet been marked, and at the end of mark-phase, are garbage
Gray nodes that have been marked but their children have not been visited, and must be visited again to be painted black
The Algorithm
Invariant*: after the marking loop, there can be no references from a black node to a white one
Tricolour marking
Tricolour marking
Tricolour marking
Tricolour marking
Tricolour marking
Garbage Collection Essentials
By Diego Parra
Garbage Collection Essentials
Through these slides we’ll learn about GC Essentials starting with the building blocks algorithms from direct collectors (RC) to the indirect ones (Mark-Sweep, Mark-Compact and Copy GC). After that we’ll focus on the indirect ones and learn about the Generational Collectors and we’ll see how they are useful in current GC implementations on the JVM . We'll learn things such as “How the Heap is divided?”, “What kind of GC implementations are available in the JVM?”, “ What are the differences between them?” and other interesting things. Finally we’ll do a sneak peek about Non-Generational collectors available on the JVM (Shenandoah, ZGC).
- 808