Memory management and garbage collection

Varik Matevosyan

Full Stack JS Engineer in Steadfast.tech

github.com/var77

varikmatevosyan@gmail.com

16 y'old

Started JS at 14

Agenda

  • Heap (young space, old pointer space, large object space, etc...) 
  • V8 Engine
  • Chakra Core
  • Optimization tips

Heap

Young space

Old pointer space

Large object space

Old data space

1-8 MB

Each object is allocated with mmap

These objects are never moved by GC

Newly allocated objects

Objects that survive 2 GC's are moved here

Strings, boxed numbers, and arrays of unboxed doubles are moved here after surviving in young space for a while

max-old-space-size (def: 1.4GB)

V8 Design elements

  • Fast Property Access
  • Dynamic Machine Code Generation
  • Efficient Garbage Collection

Fast Property Access

 

JavaScript is a dynamic programming language: properties can be added to, and deleted from, objects on the fly.

JS has much slower property access of objects than other languages, because here objects don't have fixed offset on the heap so the engine can't access their properties in a fixed layout, To reduce the time required to access JavaScript properties, V8 does not use dynamic lookup to access properties. Instead, V8 dynamically creates hidden classes behind the scenes.

class Person {
  
  constructor ( name, age ) {
    
    this.name = name;
    this.age = age;

  }

}

V8 Hidden classes

JS code

P0

If you add name prop, use P1

P1

If you add age prop, use P2

For name see offset 0

P2

For name see offset 0

For age see offset 1

let a = new Person(); //P0
let b = new Person('John'); //P1
let c = new Person('John', 23); //P2
<Person object is allocated>

  Map M0
    "name": TRANSITION to M1 at offset 12

this.name = name;

  Map M1
    "name": FIELD at offset 12
    "age": TRANSITION to M2 at offset 16

this.age = age;

  Map M2
    "name": FIELD at offset 12
    "age": FIELD at offset 16
let a = {
 x: 1,
 y: 2
};

V8 Hidden classes

JS code

C0

let b = {
 x: 4,
 y: 5
};

For x see offset 0

For y see offset 1

let c = {
 y: 4,
 x: 5
};

D0

For y see offset 0

For x see offset 1

E0

If you have x prop go to E1

E1

For x see offset 0

E2

For x see offset 0

For y see offset 1

If you have y prop go to E2

function Vector(x, y) {
  this.x = x;
  this.y = y;
}

let e = new Vector(10, 20);

Purpose

Access object properties without dictionary look up by knowing it's fixed offset

Dynamic Machine Code Generation

 

let person = new Person ('Name', 20); 

/**
 * Now Person class will haven 3 hidden classes as shown above
 * P0 <0xfb780123>, P1 <0xfb78012b>, P2 <0xfb780133>
 * and person object will refer to the hidden class P2
 * as it has name and age properties.
*/

person.name; 

/**
  *Property look up now will be in this way
  * <person offset> + (P2 -> name offset <0x8>)
*/

person.name;

/**
  * <person offset> + (P2 -> name offset <0x8>)
  * But now the engine will change the offset of name property
  * person.name =  <person offset> + (P2 -> name offset <0x8>)
*/

person.name; // return the pointer it has.

person.surname = "Surname"; // P3 class will be created P3 <0xfb780173>

person.name; //  <person offset> + (P3 -> name offset <0x8>)
/* 
 *inline cache now will be false, because person now refers to a new hidden class P3, 
 * so the Engine should cache the offset again
*/
person.name;
#ebx = the person object
#ecx = person's hidden class offset e.g <0x4>
cmp [ebx, ecx], <cached hidden class>
#if hidden class was changed
jne <V8 runtime system> # Jump to V8 runtime system and patch inline cache
#else just return the property value from cached offset
mov eax, [ebx, <cached name offset>]

Generated machine code to obtain the hidden class

person = {
    map: 0x03415512,
    name: 0x07312731,
    cachedMap: 0x03415512,
    pointer: 0x7312729
}

/*
0x03415512: {
    name: 8
}
*/

if (person.map === person.cachedMap) {
   return person.name;
} else {
   fixPersonMapCache();
   let offset = person.pointer + person.map.name;
   return offset;
}

Something similar in JS will look like this

Garbage collection

JS has stop-the-world, generational, accurate, garbage collector

stop-the-world - pauses all JS execution

generational - uses new and old space

accurate - Accurate garbage collection requires the ability to identify all pointers in the program at run-time (which is tricky in V8)

Why we need GC?

As newly created objects are being kept in heap, we need to somehow clean it from the garbage to free space for new objects.

V8 doesn't give an API for manually allocating and freeing space in heap like malloc and free in C, so we need GC to manage it instead.

The way GC works

var obj2 = {a: 2};
//................
obj2 = null;
var obj1 = {a: 1};
//...............
obj1.a++;
function test() {
    var obj3 = {a: 3};
}

test();

Heap

Young space

Old space

Obj1

Obj2

Obj3

GC cycle

GC cycle

Obj1

"Young Space", is divided into two parts "to space" and "from space", newly created objects are being allocated in "to space" , and GC mostly goes through "from space", to clean the junk.

This way is acceptable as long as we keep new-space small, but it's impractical to use this approach for more than a few megabytes. To collect old space, which may contain several hundred megabytes of data, and collecting them in 1MB pages may cost a lot, so we use two closely related algorithms, Mark-sweep and Mark-compact.

Both these algorithms works in two phases, mark phase (which is discovering objects) and sweeping or compacting phase.

GC divide objects in heap in three "colors" white, gray and black.

Heap

obj1

obj2 ->

obj2

obj3 ->

obj3

obj4

obj5

At each step, the GC pops an object from the deque, marks it black, marks neighboring white objects as grey, and pushes them onto the deque. The algorithm terminates when the deque is empty and all discovered objects have been marked black. Very large objects, such as long arrays, may be processed in pieces to reduce the chance of the deque overflowing.

Sweeping

obj1

obj2 ->

obj2

obj3

obj4

Compacting

obj1

obj2 ->

obj2

obj4

obj3

Makes free lists, which will be used for new allocations.

Moves objects between pages, to cover free memory "holes", and then releases freed heap pages to OS

Free space for new allocations

Problems GC faces

  • Identify and differ pointers from integers to avoid memory leaks
  • Manage old-to-new references without scanning whole old space objects

Discovering pointers

Discovering pointers and data on the heap, is the first problem any garbage collector needs to solve. The GC needs to follow pointers in order to discover live objects. Most garbage collection algorithms can migrate objects from one part of memory to another (to reduce fragmentation and increase locality), so we also need to be able to rewrite pointers without disturbing plain old data.

There are three popular approaches to identifying pointers

  • Conservative - treat everything as pointers, this may lead to memory leaks, as some integers can look like a pointer.
  • Compiler hints - this is widely used in statically typed languages, we can identify what class an object comes from, we can find all of its pointers 
  • Tagged pointers - With this approach, we reserve a bit at the end of each word to indicate whether it is pointer or data. V8 takes this approach

V8 represents numbers with 31-bit signed integers, so the last bit is always 0, this is to identify integers from pointers

Old to new references

Old space

Young space

someObj

pointer to someObj

Common approach to deal with this problem will be, to just scan whole "old space" for pointers to "young space", but it will cost a lot of time while program is being stopped during GC.

V8 approach

V8 keeps a store buffer for pointers from "old space" to "young space" objects.

How it works

As new objects are being created in "young space", we know that no other object have pointer to it. When a pointer to "young space" object is placed in an object in "old space" , it's location is being written in the store buffer, so in next GC cycle, garbage collector just goes over that pointers before declaring the "young space" object "dead".  

Old space

Young space

someObj

obj2

{x: refToSomeObj}

var obj2 = {
    x: null
};

//===============================================
//obj2 survived GC cycle, and now is in old space

var someObj = new Object();

obj.x = someObj;

Store buffer

pointer to ob2.x

Chakra Core

  • Parallel threads
  • JIT compiler
  • Background Garbage Collection

Chakra Core is the core of Chakra engine used by Microsoft Edge.

Parallel threads

ChakraCore has ability to spawn multiple concurrent background threads for JIT compilation and Garbage Collection.

JIT compiler

ChakraCore uses two types of JIT compilers "simple JIT" and "full JIT"

Simple JIT

Full JIT

Produces not so optimized code, but is a lot faster, and helps applications for a quick startup and is used by full JIT.

Produces highly optimized code, but is slower.

JIT compilation in parallel threads.

ChakraCore uses Simple JIT-ed code until "boil out" happens. "boil out" is simply a process when "predicted profiling data"  doesn't match the code anymore, for example inline cache missing in case of V8.

CC can spawn more additional threads in spite of the system preferences

Garbage Collector

ChakraCore has a generational mark-and-sweep garbage collector that supports concurrent and partial collections.

Optimization tips

How to avoid "boil outs" or "fallbacks"

Avoid adding properties to objects on the fly

function Vector(x, y) {
    this.x = x;
    this.y = y;
}

let a = new Vector(1, 2);

a.z = 3;
let a = {
  x: 1,
  y: 2
}

a.z = 3;
function Vector(x, y) {
    this.x = x;
    this.y = y;
    this.z;
}

let a = new Vector(1, 2);

a.z = 3;
let a = {
  x: 1,
  y: 2,
  z: undefined
}

a.z = 3;

Bad Code

Good Code

Avoid changing type of object properties

var c = {
    x: 8,
    y: 9,
    z: 30
}

c.z = "string"
var d = {
    x: 9,
    y: 80,
    z: 30
}

Bad Code

c and d won't share hidden class anymore, until you access z property on d object.

Prefer using arrays over objects when possible

let arr = [1];

arr[5] = 5;
//array will now act as a 'dictionary' which is much slower than an array
let arr = [1, 2];
arr.push(3); 
//it will copy entire array and add the value 3 to it
let arr = new Array(3);
arr.push(1, 2, 3);
//now array will be allocated for 3 elements at first and adding third property won't reallocate it

Optimized

Thank You!

Any questions?

JS: Memory management and garbage collection

By Varik Matevosyan

JS: Memory management and garbage collection

  • 175