Borg Backup

(a fork of Attic-Backup)

 

"I found the Holy Grail of backups."

(Stavros K. about Attic-Backup, 8/2013)

 

 

34c3 presentation by Thomas Waldmann

Borg - a fork of Attic

  • Attic: 2010-2015, good design, proven code

  • but:

    • development going slowly

    • some bugs and annoyances

    • not very open to new developers
       

  • Borg Backup: forked from Attic in May 2015

  • and:

    • a community project, bus_factor++

    • lots of fixes and good PRs merged

    • open and inviting to new contributors

    • faster paced, lots of activity

Feature Set (1)

  • easy and fast
  • content-defined chunking (*)
  • chunk deduplication (*)
  • lz4, zstd, zlib, lzma compression
  • encryption with aes256-ctr

  • authentication with
    hmac-sha256 or blake2b

  • simple backend (k/v, fs, via ssh)

Feature Set (2)

  • FOSS (BSD license)

  • good docs

  • good platform / arch support

  • xattr / acl / bsdflags support

  • mount a backup via FUSE

  • Python 3.4+, a little Cython & C

  • good test coverage, CI

Deduplication (1)

  • No problem with:
    • VM images (sparse file support)
    • (physical) disks, LV snapshots
    • renamed huge directories
    • inner duplication of data set
    • historical duplication
    • duplication on different machines

 

Deduplication (2)

  • Content defined chunking:
    • "buzhash" rolling hash
    • cut data when hash has specific bit pattern,
      yields chunks with ~ 2^n bytes target size
    • n + other chunker params configurable
    • seeded, to avoid fingerprinting chunk lengths
       
  • Store chunks under id into store:
    • id = HASH(chunk), or
    • id = MAC(id_key, chunk)

Now and Future

  • 1.0 "oldstable", widely distributed, use 1.0.9+
  • 1.1 "stable", recently released, use 1.1.4+
    (new features, code cleanup)

     
  • 1.2   Crypto Enhancements
    • AES-GCM (AES-OCB? chacha20-poly1305? keccak?)
    • Key Management
    • Ciphersuite Flexibility
       
  • 1.2   Parallelization
    • "Serial Threaded Workers"?   (avoids races)
    • zeromq?

How you can help

Python / Cython / C? Help us coding.

 

do a security review

 

do real-world performance tests / comparisons

 

find bugs / issues, improve docs

 

spread the word, borg is not well-known yet

 

sponsor development via bountysource

 

help with the windows native port

Borg Backup - Links

github.com/borgbackup
 

#borgbackup  on  chat.freenode.net

Questions / Feedback?

Find me at the Python assembly (sometimes).

Or use IRC, github issues or the mailing list.

 

Bonus: Crypto

  • OpenSSL (1.0 or 1.1), but only for the crypto primitives (currently: AES in CTR mode)
  • uses hardware acceleration (AES-NI)
  • authentication is not hw accelerated:
    borg 1.0+:  hmac-sha256
    borg 1.1+:  additionally faster blake2b
  • borg 1.2 (future):  fast AEAD modes
    AES-OCB (HW accelerated)
    chacha20-poly1305 (quite quick in SW)
  • crypto hashes from python stdlib / OpenSSL / blake2b reference implementation
  • we use random from /dev/urandom (via Py stdlib)

Bonus: Compression

  • lz4 is super fast - use it! often faster than without.
  • zstd is also cool: offers a wide range from very fast to very good compression (borg 1.1.4+)
  • there is also zlib or lzma.
  • borg 1.1 can use lz4 to predict compressibility (and then either use none, lz4, zstd, zlib or lzma)
  • don't use lzma > level 6, it is pointless: small chunks!
  • you can use different compression in same repo.
  • existing chunks won't get recompressed.
  • 1.1 has "recreate" to recompress.

Bonus: Chunking / Dedupe

  • you can use different chunker params in same repo.
  • existing chunks won't get re-chunked.
  • 1.1 has "recreate" to re-chunk.
  • differently cut chunks won't deduplicate.
  • deduplication is based on (hmac-)sha256 of chunks' plaintext, before compression / encryption.

Bonus: Hash Table

  • own hashtable implementation in C
  • lots of chunks to manage, use memory efficiently
  • doing rather simple linear hashing
  • HT perf determines speed for unchanged files:
    we check file mtime|ctime / size / inode number
    AND
    (via a HT lookup) that we have all chunks in the repo

BorgBackup LT 34c3 (updated 2017-12)

By Thomas Waldmann

BorgBackup LT 34c3 (updated 2017-12)

  • 1,875