Borg Backup

 

 

"The holy grail of backup software?"

 

 

Thomas Waldmann

Borg - a fork of Attic

  • Attic: about 5y old, good design, proven code

  • but:

    • development going slowly

    • some bugs and annoyances

    • not very open to new developers
       

  • Borg Backup: fork of Attic, now ~2y old

  • and:

    • a community project, bus_factor++

    • lots of fixes and good PRs merged

    • open and inviting to new contributors

    • faster paced, lots of activity

Feature Set (1)

  • easy and fast
  • content-defined chunking (*)
  • chunk deduplication (*)
  • lz4, zlib, lzma compression
  • encryption with AES256-CTR

  • authentication with HMAC-SHA256

  • simple backend (k/v, fs, via ssh)

Feature Set (2)

  • FOSS (BSD license)

  • good docs

  • good platform / arch support

  • xattr / acl / bsdflags support

  • mount a backup via FUSE

  • Python 3.4+, a little Cython & C

  • good test coverage, CI

Deduplication (1)

  • No problem with:
    • VM images (sparse file support)
    • (physical) disks, LV snapshots
    • renamed huge directories
    • inner duplication of data set
    • historical duplication
    • duplication on different machines

 

Deduplication (2)

  • Content defined chunking:
    • "buzhash" rolling hash
    • cut data when hash has specific bit pattern,
      yields chunks with ~ 2^n bytes target size
    • n + other chunker params configurable
    • seeded, to avoid fingerprinting chunk lengths
       
  • Store chunks under id into store:
    • id = HASH(chunk), or
    • id = HMAC(id_key, chunk)

Now and Future

  • 1.0 released   --   important: use 1.0.9+
  • soon:  1.1  (new features, code cleanup)
     
  • 1.2   Crypto Enhancements
    • AES-GCM (AES-OCB? chacha20-poly1305? keccak?)
    • Key Management
    • Ciphersuite Flexibility
       
  • 1.2   Parallelization
    • "Serial Threaded Workers"?   (avoids races)
    • zeromq?

How you can help

Python / Cython / C? Help us coding.

 

do a security review

 

do real-world performance tests / comparisons

 

find bugs / issues, improve docs

 

spread the word, borg is not well-known yet

 

sponsor development via bountysource

Borg Backup - Links

github.com/borgbackup
 

#borgbackup  on  chat.freenode.net

Questions / Feedback?

Meet me afterwards, breakfast table area.

 

Bonus: Crypto

  • OpenSSL (1.0 or 1.1), but only for the crypto primitives (AES in CTR mode)
  • uses hardware acceleration (AES-NI)
  • authentication is not hw accelerated, hmac-sha256 (1.0), faster blake2b (1.1)
  • in 1.2 hw accelerated AEAD modes (gcm, ocb, ...)
  • crypto hashes from python stdlib / OpenSSL / blake2b reference implementation
  • we use random from /dev/urandom (via Py stdlib)
  • btw: use sha512(-256), it is quite a bit faster than sha256 on 64bit machines. or even better: blake2b

Bonus: Compression

  • lz4 is super fast - use it! often faster than without.
  • except if your remote repo connection is super slow, then use zlib or lzma.
  • borg 1.1 will use lz4 to predict compressibility (and then either use none, lz4 or zlib/lzma)
  • don't use lzma > level 6, it is pointless - we only have <= ~ 2 MiB chunks to compress.
  • you can use different compression in same repo.
  • existing chunks won't get recompressed.
  • 1.1 will have "recreate" to recompress.

Bonus: Chunking / Dedupe

  • you can use different chunker params in same repo.
  • existing chunks won't get re-chunked.
  • 1.1 will have "recreate" to re-chunk.
  • differently cut chunks won't deduplicate.
  • deduplication is based on (hmac-)sha256 of chunks' plaintext, before compression / encryption.

Bonus: Hash Table

  • own hashtable implementation in C
  • lots of chunks to manage, use memory efficiently
  • currently: rather simple linear hashing
  • maybe soon: robin hood hashing
  • performance measurements / comparisons to do
  • hashtable performance determines fast-skip speed: we check file mtime / size / inode number AND (via a chunks index hashtable lookup) that we have all chunks in the repository
Made with Slides.com