Borg Backup
"The holy grail of backup software?"
Thomas Waldmann
Borg - a fork of Attic
-
Attic: about 5y old, good design, proven code
-
but:
-
development going slowly
-
some bugs and annoyances
-
not very open to new developers
-
-
Borg Backup: fork of Attic, now ~2y old
-
and:
-
a community project, bus_factor++
-
lots of fixes and good PRs merged
-
open and inviting to new contributors
-
faster paced, lots of activity
-
Feature Set (1)
- easy and fast
- content-defined chunking (*)
- chunk deduplication (*)
- lz4, zlib, lzma compression
-
encryption with AES256-CTR
-
authentication with HMAC-SHA256
-
simple backend (k/v, fs, via ssh)
Feature Set (2)
-
FOSS (BSD license)
-
good docs
-
good platform / arch support
-
xattr / acl / bsdflags support
-
mount a backup via FUSE
-
Python 3.4+, a little Cython & C
-
good test coverage, CI
Deduplication (1)
-
No problem with:
- VM images (sparse file support)
- (physical) disks, LV snapshots
- renamed huge directories
- inner duplication of data set
- historical duplication
- duplication on different machines
Deduplication (2)
-
Content defined chunking:
- "buzhash" rolling hash
- cut data when hash has specific bit pattern,
yields chunks with ~ 2^n bytes target size - n + other chunker params configurable
- seeded, to avoid fingerprinting chunk lengths
-
Store chunks under id into store:
- id = HASH(chunk), or
- id = HMAC(id_key, chunk)
Now and Future
- 1.0 released -- important: use 1.0.9+
-
soon: 1.1 (new features, code cleanup)
-
1.2 Crypto Enhancements
- AES-GCM (AES-OCB? chacha20-poly1305? keccak?)
- Key Management
- Ciphersuite Flexibility
-
1.2 Parallelization
- "Serial Threaded Workers"? (avoids races)
- zeromq?
How you can help
Python / Cython / C? Help us coding.
do a security review
do real-world performance tests / comparisons
find bugs / issues, improve docs
spread the word, borg is not well-known yet
sponsor development via bountysource
Borg Backup - Links
github.com/borgbackup
#borgbackup on chat.freenode.net
Questions / Feedback?
Meet me afterwards, breakfast table area.
Bonus: Crypto
- OpenSSL (1.0 or 1.1), but only for the crypto primitives (AES in CTR mode)
- uses hardware acceleration (AES-NI)
- authentication is not hw accelerated, hmac-sha256 (1.0), faster blake2b (1.1)
- in 1.2 hw accelerated AEAD modes (gcm, ocb, ...)
- crypto hashes from python stdlib / OpenSSL / blake2b reference implementation
- we use random from /dev/urandom (via Py stdlib)
- btw: use sha512(-256), it is quite a bit faster than sha256 on 64bit machines. or even better: blake2b
Bonus: Compression
- lz4 is super fast - use it! often faster than without.
- except if your remote repo connection is super slow, then use zlib or lzma.
- borg 1.1 will use lz4 to predict compressibility (and then either use none, lz4 or zlib/lzma)
- don't use lzma > level 6, it is pointless - we only have <= ~ 2 MiB chunks to compress.
- you can use different compression in same repo.
- existing chunks won't get recompressed.
- 1.1 will have "recreate" to recompress.
Bonus: Chunking / Dedupe
- you can use different chunker params in same repo.
- existing chunks won't get re-chunked.
- 1.1 will have "recreate" to re-chunk.
- differently cut chunks won't deduplicate.
- deduplication is based on (hmac-)sha256 of chunks' plaintext, before compression / encryption.
Bonus: Hash Table
- own hashtable implementation in C
- lots of chunks to manage, use memory efficiently
- currently: rather simple linear hashing
- maybe soon: robin hood hashing
- performance measurements / comparisons to do
- hashtable performance determines fast-skip speed: we check file mtime / size / inode number AND (via a chunks index hashtable lookup) that we have all chunks in the repository
BorgBackup LT (updated 04/2017)
By Thomas Waldmann
BorgBackup LT (updated 04/2017)
- 2,371