Borg Backup
(a fork of Attic)
"The holy grail of backup software"
Thomas Waldmann @ shackday 2015
Feature Set (1)
- simple & fast
- deduplication
- compression
-
authenticated encryption
-
easy pruning of old backups
-
simple backend (k/v, fs, via ssh)
Feature Set (2)
-
FOSS (BSD license)
-
good docs
-
good platform / arch support
-
xattr / acl support
-
FUSE support ("mount a backup")
Code
- 91% Python3 + Cython
(high-level code, glue code) - 9% C
(performance critical stuff) - only ~6000 LOC total
- few dependencies
- unit tests, CI
Security
-
Signatures / Authentication
no undetected corruption/tampering
-
Encryption / Confidentiality
only you have access to your data
-
FOSS in Python
review possible, no buffer overflows
Safety
-
Robustness
(by append-only design, transactions)
-
Checkpoints
every 5 minutes (between files)
-
msgpack with "limited" Unpacker
(no memory DoS)
Crypto Keys
-
client-side meta+data encryption
-
separate keys for sep. concerns
-
passphrase pbkdf2 100k rounds
-
Keys:
- none
- repokey (replaces: passphrase-only)
- passphrase protected keyfile
Crypto Cipher/MAC
-
AEAD, Encrypt-then-MAC
- AES256-GCM / GHASH (exp.)
- AES256-CTR + HMAC-SHAxxx
-
Counter / IV deterministic, never repeats
-
uses OpenSSL
- Intel/AMD: AES-NI, PCLMULQDQ
Compression
-
none
- no compression, 1:1 pass through, no cpu usage
-
lz4
- low compression, super fast
- sometimes faster than w/o compression
-
zlib
- medium compression, medium fast, level 0..9
-
lzma
- high compression, slow, level 0..9
Deduplication (1)
-
No problem with:
- VM images (sparse file support)
- (physical) disk images
- renamed huge directories/trees
- inner deduplication of data set
- historical deduplication
- deduplication between different machines
Deduplication (2)
-
Content defined chunking:
- "buzhash" rolling hash
- cut data when hash has specific bit pattern,
yields chunks with 2^n bits target size - n + other chunker params configurable now
- seeded, to avoid fingerprinting chunk lengths
-
Store chunks under id into store:
- id = HASH(chunk)
- id = MAC(mac_key, chunk)
Borg, the present
-
Borg Backup is a fork of Attic:
-
currently tracking attic dev.
-
plus a lot of conservative PRs
(stuff from attic/merge branch "merge") -
bug and scalability fixes
-
plus a lot of new stuff in "experimental" branch ("exp.")
-
not compatible to Attic
-
Borg, what's different?
-
developed by "The Borg Collective"
-
more open development
-
new developers are welcome!
-
quicker development
-
redesign where needed
-
changes, new features
-
incompatible changes with good reason
-
thus: less "sta(b)le"
Borg, the future
-
scalability improvements
-
speed improvements
-
architectural changes
-
pull backups? backup-only mode?
-
better logging / exception handling
-
more backends? http / ftp / aws / google / ...
-
other platforms / architectures
-
BorgWeb GUI (for daily user needs)
-
<you name it>
Borg - you can be assimilated!
-
test scalability / reliability / security
-
be careful!
-
file bugs
-
file feature requests
-
improve docs
-
contribute code
-
spread the word
-
create dist packages
-
care for misc. platforms
Borg Backup - Links
borgbackup.github.io
#borgbackup on chat.freenode.net
Questions / Feedback?
-
Just grab me at the conference!
-
Thomas J Waldmann @ twitter
Borg Demo ->
Borg Internals & Ideas v
Multithreading
-
GIL? No (big) problem, just release the GIL:
-
I/O: python file read, write/fsync
-
C: reader / chunker
-
C: id hashing
-
C: compression
-
C: encryption
-
-
CPU usage (i5, 2 Cores + HT)
-
no MT: 30-80%
-
with MT: 300%
-
-
but: thread safety, race conditions!
Hashes / MACs
-
slow:
-
sha256 (and hmac-sha256)
-
crc32
-
-
faster:
-
poly1305-AES
-
siphash (only 64bit result)
-
blake2
-
xxhash (not cryptographic)
-
sha512-256
-
crc32c (intel cpu instr.)
-
Crypto
-
authenticated encryption with associated data
-
slow:
-
aes-ctr + hmac-sha256 (= 2 passes)
-
openssl + py stdlib
-
-
faster (TODO / exp. branch):
-
aes-gcm (1 pass, intel + amd cpu instr.)
-
openssl
-
but: rare aes-gcm issue with weak keys
-
-
Nonce / IV / Counter generation / management
-
session keys (per worker thread per backup)
-
use libsodium? no aes256-ctr support there.
RAM consumption (1)
-
high RAM usage to achieve high speed (
chunker bitmask default N=16, 64kiB) -
repo index (id -> storage segment, offset)
-
chunks cache (id -> refcnt, size, csize)
-
files cache (H(path) -> mtime, size, inode, chunks)
-
chunk_count ~= total_file_size / 2^N repo_index = chunk_count * 40 chunks_cache = chunk_count * 44 files_cache = total_file_count * 240 + chunk_count * 80
RAM consumption (2)
-
1 Mi files, 1 TiB data -> 2.8 GiB RAM
-
use custom chunker params for little RAM + large storage -> N=20, up to 1/16 RAM consumption
-
maybe switch off the files cache
-
use multiple repos, purge often
-
use smaller ids (128 instead of 256 bits)
-
help fixing this:
-
use different data structure than hash table?
-
mmaped-file?
-
provide on-disk fallback code?
-
1.0 release - soon!
- drop python 3.2 / 3.3 legacy support
- only support modern 3.4 and 3.5
- less, easier, simpler, more powerful code
- less bugs (3.2 and 3.3 were not that great)
- drop "deprecated" borgbackup/attic stuff
- make some minor, but incompatible changes
- other command line syntax
- other environment variable semantics
- other defaults (e.g. bigger chunk size)
- likely no major incompatible changes this time
- 1.x better reflects state of the project than 0.x
- can still work on older systems:
- "all inclusive" binaries
- python 3.5: "make altinstall" / backport
beyond 1.0
- better crypto:
- faster (e.g. aes-gcm, faster ID hash)
- more flexible (not hardcoded)
- safer (random session keys?)
- libsodium?
- multithreading?
- UI work? click? console + ncurses?
- storage api? storage backends?
- ...
Borg - Demo
I'll show borgbackup using the
"all inclusive" binary.
You could also use:
Source code checkout from github
Release packages from PyPi
Linux / BSD / ... packages
install
# download the binary and the gpg signature.
...
# verify the gpg signature
gpg --verify borgbackup-linux64.asc
# install / fix permissions
cp borg-linux64 /usr/local/bin/borg
chmod 755 /usr/local/bin/borg
chown root.root /usr/local/bin/borg
init / create
# initialize a repository:
borg init /tmp/borg
# create a "first" archive inside this repo (verbose):
borg create --stats /tmp/borg::first ~/Desktop
# create a "second" archive (less verbose):
borg create /tmp/borg::second ~/Desktop
# even more verbose:
borg create -v --stats /tmp/borg::third ~/Desktop
list / extract / check
# list repo / archive contents:
borg list /tmp/borg
borg list /tmp/borg::first
# extract ("restore") from an archive to cwd:
mkdir test ; cd test
borg extract /tmp/borg::third
# simulate extraction (good test):
borg extract -v --dry-run /tmp/borg::third
# check consistency of repo:
borg check /tmp/borg
info / delete / help
# info about archive:
borg info /tmp/borg::first
# delete archive:
borg delete /tmp/borg::first
# delete repo:
borg delete /tmp/borg
encrypted repo
# options, options, options, ...
borg init --help
# create a encrypted repo
# (pw protected key file stored locally)
borg init -e keyfile /tmp/borg-enc
# create a encrypted repo
# (pw-protected key file stored in repo):
borg init -e keyfile /tmp/borg-enc
# ... (same as before, but you need to give passphrase)
remote via ssh
# connect to remote borg via ssh:
# remote borg needs to be compatible with local
borg init ssh://user@host:22/mnt/backup/borg
borg create ssh://user@host:22/mnt/backup/borg::first ~
# also possible: using sshfs or other locally mounted
# network filesystems, but be careful: worse performance
Links
borgbackup.github.io
#borgbackup on chat.freenode.net
Questions / Feedback?
-
Just grab me at the conference!
-
Thomas J Waldmann @ twitter
BorgBackup Talk (updated)
By Thomas Waldmann
BorgBackup Talk (updated)
- 2,430