(a fork of Attic-Backup)
"I found the Holy Grail of backups."
(Stavros K. about Attic-Backup, 8/2013)
OSBAR 2017 presentation made by Thomas Waldmann
Borg - a fork of Attic
Attic: 2010-2015, good design, proven code
development going slowly
some bugs and annoyances
not very open to new developers
Borg Backup: forked from Attic in May 2015
a community project, bus_factor++
lots of fixes and good PRs merged
open and inviting to new contributors
faster paced, lots of activity
Feature Set (1)
- easy and fast
- content-defined chunking (*)
- chunk deduplication (*)
- lz4, zlib, lzma compression
encryption with AES256-CTR
authentication with HMAC-SHA256
simple backend (k/v, fs, via ssh)
Feature Set (2)
FOSS (BSD license)
good platform / arch support
xattr / acl / bsdflags support
mount a backup via FUSE
Python 3.4+, a little Cython & C
good test coverage, CI
No problem with:
- VM images (sparse file support)
- (physical) disks, LV snapshots
- renamed huge directories
- inner duplication of data set
- historical duplication
- duplication on different machines
Content defined chunking:
- "buzhash" rolling hash
- cut data when hash has specific bit pattern,
yields chunks with ~ 2^n bytes target size
- n + other chunker params configurable
- seeded, to avoid fingerprinting chunk lengths
Store chunks under id into store:
- id = HASH(chunk), or
- id = HMAC(id_key, chunk)
Now and Future
- 1.0 "oldstable", widely distributed, use 1.0.9+
1.1 "stable", recently released, use 1.1.2+
(new features, code cleanup)
1.2 Crypto Enhancements
- AES-GCM (AES-OCB? chacha20-poly1305? keccak?)
- Key Management
- Ciphersuite Flexibility
- "Serial Threaded Workers"? (avoids races)
How you can help
Python / Cython / C? Help us coding.
do a security review
do real-world performance tests / comparisons
find bugs / issues, improve docs
spread the word, borg is not well-known yet
sponsor development via bountysource
Borg Backup - Links
#borgbackup on chat.freenode.net
Questions / Feedback?
Use IRC, github issues or the mailing list.
- OpenSSL (1.0 or 1.1), but only for the crypto primitives (AES in CTR mode)
- uses hardware acceleration (AES-NI)
- authentication is not hw accelerated, hmac-sha256 (1.0), faster blake2b (1.1)
- in 1.2 hw accelerated AEAD modes (gcm, ocb, ...)
- crypto hashes from python stdlib / OpenSSL / blake2b reference implementation
- we use random from /dev/urandom (via Py stdlib)
- btw: use sha512(-256), it is quite a bit faster than sha256 on 64bit machines. or even better: blake2b
- lz4 is super fast - use it! often faster than without.
- except if your remote repo connection is super slow, then use zlib or lzma.
- borg 1.1 will use lz4 to predict compressibility (and then either use none, lz4 or zlib/lzma)
- don't use lzma > level 6, it is pointless - we only have <= ~ 2 MiB chunks to compress.
- you can use different compression in same repo.
- existing chunks won't get recompressed.
- 1.1 will have "recreate" to recompress.
Bonus: Chunking / Dedupe
- you can use different chunker params in same repo.
- existing chunks won't get re-chunked.
- 1.1 will have "recreate" to re-chunk.
- differently cut chunks won't deduplicate.
- deduplication is based on (hmac-)sha256 of chunks' plaintext, before compression / encryption.
Bonus: Hash Table
- own hashtable implementation in C
- lots of chunks to manage, use memory efficiently
- currently: rather simple linear hashing
- maybe soon: robin hood hashing
- performance measurements / comparisons to do
- hashtable performance determines fast-skip speed: we check file mtime|ctime / size / inode number AND (via a chunks index hashtable lookup) that we have all chunks in the repository
BorgBackup technical LT (updated 11/2017)
By Thomas Waldmann