How the Go runtime implement maps efficiently

David Chou

We are Umbo Computer Vision

We build autonomous video security system

Golang Taipei

Streaming Meetup

 

david74.chou @ facebook

david74.chou @ medium

david7482 @ github

How the Go runtime implements maps efficiently (without generics)

Dave Cheney, GoCon Spring 2018

C++

JAVA

template<
    class Key, 
    class T,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator< std::pair<const Key, T> >
> class unordered_map;
Class HashMap<K,V>

java.lang.Object
    java.util.AbstractMap<K,V>
        java.util.HashMap<K,V>

Type Parameters:
K - the type of keys maintained by this map
V - the type of mapped values

Go

var m map[string]int

map(key) value

The map function

Go uses HashMap

hash(key) integer

The hash function

0
1
2
3
4
5
6
7
key value

Hashmap

Bucket: 3

Hashmap Data Structure

0
1
2
3
4
5
6
7
key value
pkg/errors 2903
spf13/cobra 7136
golang/go 40260

Hashmap

Bucket: 3

insert(star, "golang/go", 40260)

"golang/go"

HashFunction

78356113

Mask

Four properties of a hash map

  1. A hash function for the key
  2. An equality function to compare keys
  3. Need to know the size of the key type
  4. Need to know the size of the value type

C++

template<
    class Key, 
    class T,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator< std::pair<const Key, T> >
> class unordered_map;
  • class Key
  • class T
  • std::hash<Key>
  • std::equal_to<Key>
0
1
2
3
4
5
6
7
key value
pkg/errors 2903
spf13/cobra 7136
golang/go 40260

Hashmap

Bucket: 3

insert(star, "golang/go", 40260)

"golang/go"

std::hash<key>

78356113

Mask

std::equal_to<key>

JAVA

Class HashMap<K,V>

java.lang.Object
    java.util.AbstractMap<K,V>
        java.util.HashMap<K,V>

Type Parameters:
K - the type of keys maintained by this map
V - the type of mapped values
  • K and V are Object
    • Object.equals()
    • Object.hashCode()
  • Need boxing for primitive types
0
1
2
3
4
5
6
7
key value next
pkg/errors 2903

Hashmap

Bucket: 3

insert(star, "golang/go", 40260)

"golang/go"

key.hashCode()

78356113

Mask

spf13/cobra 7136
golang/go 40260 null

key.equals()

C++

  • Pros
    • The size of key and value are always known
    • Array implementation
    • No need for boxing or pointer chasing
  • Cons
    • Larger binary size. Different types means different maps.
    • Slower compile time.
    • Larger memory footprint for predetermined size for each array element.

JAVA

  • Pros
    • Single implementation for any subclass of Object
    • Faster compile time and smaller binary size
    • Linked list implementation. No predetermined size for each array element.
  • Cons
    • Boxing would increase gc preasure 
    • Slower for boxing and linked list pointer chasing

Go's hashmap implementaion

Use interface{} ?

Code generation ?

No

No

Compiler + Runtime 

v := m["key"]     → runtime.mapaccess1(m, ”key", &v)
v, ok := m["key"] → runtime.mapaccess2(m, ”key”, &v, &ok)
m["key"] = 9001   → runtime.mapinsert(m, ”key", 9001)
delete(m, "key")  → runtime.mapdelete(m, “key”)

Compile time rewriting

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer

mapaccess1

Different maptype values for each unique map declaration

map[string]int                  → var mt1 maptype{...}
map[string]http.Header → var mt2 maptype{...}
map[structA]structB       → var mt3 maptype{...}

type maptype struct {
         typ           _type
         key         *_type
         elem        *_type
         bucket        *_type // internal type representing a hash bucket
        hmap          *_type // internal type representing a hmap
         keysize       uint8  // size of key slot
         indirectkey   bool   // store ptr to key instead of key itself
         valuesize     uint8  // size of value slot
         indirectvalue bool   // store ptr to value instead of value itself
         bucketsize    uint16 // size of bucket
         reflexivekey  bool   // true if k==k for all keys
         needkeyupdate bool   // true if we need to update key on overwrite
}
type _type struct {
        size       uintptr
        alg       *typeAlg
        ...
}
type typeAlg struct {
        // function for hashing objects of this type
        // (ptr to object, seed) -> hash
        hash func(unsafe.Pointer, uintptr) uintptr
        // function for comparing objects of this type
        // (ptr to object A, ptr to object B) -> ==?

       equal func(unsafe.Pointer, unsafe.Pointer) bool

}

C++

map<K0,V0>

map<K0,V0>

map<K0,V0>

map<K0,V0>

Compile Time

JAVA

map<K,V>

Run Time

Object0

Object0

Object0

Object0

Go

map<K,V>

Compile Time

maptype0

maptype0

maptype0

maptype0

Conclusion

  • A good compromise between C++ and JAVA 
  • Single hashmap implementation to reduce binary size
  • Already known the the size of key and value.
    Array implementation for better performance.
  • Could use primitive types without boxing.
    No extra gc preasure

Any Question?

How the Go runtime implement maps efficiently

By Ting-Li Chou

How the Go runtime implement maps efficiently

  • 197