Borg Backup
(a fork of Attic)
"The holy grail of backup software"
Thomas Waldmann @ GPN16
Feature Set (1)
- simple & fast
- deduplication
- compression
-
authenticated encryption
-
easy pruning of old backups
-
simple backend (k/v, fs, via ssh)
Feature Set (2)
-
FOSS (BSD license)
-
good docs
-
good platform / arch support
-
xattr / acl support
-
FUSE support ("mount a backup")
Code
- 95% Python 3.4+, Cython
(high-level code, glue code) - 5% C
(performance critical stuff) - only ~12000 LOC total
- few dependencies
- unit tests, CI
Security
-
Signatures / Authentication
no undetected corruption/tampering
-
Encryption / Confidentiality
only you have access to your data
-
FOSS in Python
review possible, no buffer overflows
Safety
-
Robustness
(by append-only design, transactions)
-
Checkpoints
every 5 minutes (between files)
-
msgpack with "limited" Unpacker
(no memory DoS)
Crypto Keys
-
client-side meta+data encryption
-
separate keys for sep. concerns
-
passphrase pbkdf2 100k rounds
-
Keys:
- none
- repokey (replaces: passphrase-only)
- passphrase protected keyfile
Crypto Cipher/MAC
-
AEAD, Encrypt-then-MAC
- AES256-CTR + HMAC-SHA256
- Counter / IV deterministic, never repeats
-
we're working on adding AES256-GCM, maybe
also others (AES-OCB? chacha20-poly1305?)
-
uses OpenSSL
- Intel/AMD: AES-NI, PCLMULQDQ
Compression
-
none
- no compression, 1:1 pass through, no cpu usage
-
lz4
- low compression, super fast
- sometimes faster than w/o compression
-
zlib
- medium compression, medium fast, level 0..9
-
lzma
- high compression, slow, level 0..9
- beware of higher levels of lzma: super slow and they do not compress better due to chunk size
Deduplication (1)
-
No problem with:
- VM images (sparse file support)
- (physical) disk images
- renamed huge directories/trees
- inner deduplication of data set
- historical deduplication
- deduplication between different machines
Deduplication (2)
-
Content defined chunking:
- "buzhash" rolling hash
- cut data when hash has specific bit pattern,
yields chunks with 2^n bits target size - n + other chunker params configurable now
- seeded, to avoid fingerprinting chunk lengths
-
Store chunks under id into store:
- id = HASH(chunk) [without encryption]
- id = HMAC(mac_key, chunk) [with encryption]
Fork from Attic (May 2015)
- attic has a good codebase
- attracted quite some devs
-
lots of pull requests and activity
- but:
- low / slow PR acceptance
- 1 main developer with little time
- rather wanted it as his little pet project
- rather coding on his own than review code
- "compatibility forever"
Borg - different goals
-
developed by "The Borg Collective"
-
more open development
-
new developers are welcome!
-
quicker development
-
redesign where needed
-
changes, new features
-
incompatible changes with good reason
-
thus: less "sta(b)le"
Borg, the year after forking
-
attic repo: ~600 changesets
- borg repo: ~2400 changesets
- developers, developers, developers!
- active community:
on github, irc channel, mailing list - bug and scalability fixes, #5
- features! testing. platforms. docs.
Borg 1.0 (now)
-
packaged for many Linux distributions
-
also in *BSD and Mac OS X dists
-
more or less works on Windows & Cygwin
-
Slashdot's "borgbackup 1.0 released" coverage helped quite a bit.
-
Happy users on Twitter, Reddit and the Blogosphere.
Borg 1.1 (soon)
-
new features:
- diff
- recreate
- with-lock
- comment
- compression: by .ext & heuristic
- better speed: FUSE, traversal, HDDs
- some source reorg / cleanup
Borg 1.2 (future)
-
"serial multi-threading"
- traverse, read, chunk
- hash, dedup, compress, encrypt
- store, sync
- avoid CPU idle time
-
but: avoid races, crypto issues
-
crypto:
- flexibility: add aes-gcm (speed)
- improve key management
Borg - you can be assimilated!
-
test scalability / reliability / security
-
be careful!
-
find, file and fix bugs
-
file feature requests
-
improve docs
-
contribute code
-
spread the word
-
create dist packages
-
care for misc. platforms
Borg Backup - Links
borgbackup.readthedocs.io
#borgbackup on chat.freenode.net
Questions / Feedback?
-
Just grab me at the conference!
-
Thomas J Waldmann @ twitter
Borg Demo ->
Borg Internals & Ideas v
Error Correction?
-
borg does error (and even tampering) detection
-
but not (yet?) error correction
-
kinds of errors / threat model:
-
single/few bit errors
-
defect / unreadable blocks
-
media failure (defect disk, ssd)
-
-
see issue #225 for discussion
-
implement something in borg?
-
rely on other soft- or hardware solutions?
-
avoid futile attempts
Modernize Crypto
-
sha256, hmac-sha256, crc32 are slow
-
aes is also slow, if not hw accelerated
-
faster: poly1305, blake2, sha512-256, crc32c, chacha20
-
issues:
-
openssl does not have much recent crypto
-
openssl 1.1 will have few more / recent algos
-
libsodium is nice for recent crypto
-
libsodium can't replace openssl (no aes256-ctr), adds a dependency
-
packaging issues due to libsodium / openssl 1.1?
-
Key Gen. / Management
-
currently:
-
1 AES key
-
1 HMAC key
-
1 chunker seed
-
stored highest IV value for AES CTR mode
-
encrypted using key passphrase
-
- ideas:
- session keys? always start from IV=0.
- per archive? per thread? per chunk?
- asymm. crypto: encrypt these keys for receiver
RAM consumption
-
borg >= 1.0 now has lower RAM consumption
-
chunks, files and repo index kept in memory
-
uses bigger chunks (2MiB, was: 64kiB)
-
less chunks to manage -> smaller chunks index.
-
be careful on small machines (NAS, rpi, ...)
-
or with huge amount of data / huge file count
-
in the docs, there is a formula to estimate RAM usage
Hash Tables
-
hash table implementation in C
-
e.g. used for the chunks index
-
uses closed hashing
-
uses linear probing for collision handling
-
sometimes slow
-
use Robin Hood hashing?
-
needs C or Cython developer
Chunk Index Sync
-
problem: multiple clients updating same repo
-
then: chunk index needs to get re-synced
-
slow, esp. if remote, many and/or big archives
-
local collection of single-archive chunk indexes
-
needs lots of space, merging still expensive
-
idea: "borgception"
-
backup chunks index into a secondary borg repo
-
fetch it from there when out of sync
-
-
other ideas?
Borg - Demo
I'll show borgbackup using the
"all inclusive" binary.
You could also use:
Source code checkout from github
Release packages from PyPi
Linux / BSD / ... packages
install
# download the binary and the gpg signature.
...
# verify the gpg signature
gpg --verify borgbackup-linux64.asc
# install / fix permissions
cp borg-linux64 /usr/local/bin/borg
chmod 755 /usr/local/bin/borg
chown root.root /usr/local/bin/borg
init / create
# initialize a repository:
borg init /tmp/borg
# create a "first" archive inside this repo (verbose):
borg create -v --stats /tmp/borg::first ~/Desktop
# create a "second" archive (less verbose):
borg create /tmp/borg::second ~/Desktop
# even more verbose:
borg create -v --list --stats /tmp/borg::third ~/Desktop
borg create -v --progress --stats /tmp/borg::third ~/Desktop
list / extract / check
# list repo / archive contents:
borg list /tmp/borg
borg list /tmp/borg::first
# extract ("restore") from an archive to cwd:
mkdir test ; cd test
borg extract /tmp/borg::third
# simulate extraction (good test):
borg extract -v --dry-run /tmp/borg::third
# check consistency of repo:
borg check -v /tmp/borg
info / delete / help
# info about archive:
borg info /tmp/borg::first
# delete archive:
borg delete /tmp/borg::first
# delete repo:
borg delete /tmp/borg
remote via ssh
# connect to remote borg via ssh:
# remote borg needs to be compatible with local
borg init ssh://user@host:22/mnt/backup/borg
borg create ssh://user@host:22/mnt/backup/borg::first ~
# also possible: using sshfs or other locally mounted
# network filesystems, but be careful: worse performance
Links
borgbackup.readthedocs.io
#borgbackup on chat.freenode.net
Questions / Feedback?
-
Just grab me at the conference!
-
Thomas J Waldmann @ twitter
BorgBackup Talk (GPN16)
By Thomas Waldmann
BorgBackup Talk (GPN16)
- 2,210