Borg Backup
(a fork of Attic)
"The holy grail of backup software"
Thomas Waldmann @ GPN 2015
Feature Set (1)
- simple & fast
- deduplication
- compression
-
authenticated encryption
-
easy pruning of old backups
-
simple backend (k/v, fs, via ssh)
Feature Set (2)
-
FOSS (BSD license)
-
good docs
-
good platform / arch support
-
xattr / acl support
-
FUSE support ("mount a backup")
Code
- 91% Python3 + Cython
(high-level code, glue code) - 9% C
(performance critical stuff) - only ~6000 LOC total
- few dependencies
- unit tests, CI
Security
-
Signatures / Authentication
no undetected corruption/tampering
-
Encryption / Confidentiality
only you have access to your data
-
FOSS in Python
review possible, no buffer overflows
Safety
-
Robustness
(by append-only design, transactions)
-
Checkpoints
every 5 minutes (between files)
-
msgpack with "limited" Unpacker
(no memory DoS)
Crypto Keys
-
client-side meta+data encryption
-
separate keys for sep. concerns
-
passphrase pbkdf2 100k rounds
-
Keys:
- none
- passphrase-only
- passphrase protected keyfile
Crypto Cipher/MAC
-
AEAD, Encrypt-then-MAC
- AES256-GCM / GHASH (*)
- AES256-CTR + HMAC-SHAxxx
-
Counter / IV deterministic, never repeats
-
uses OpenSSL
- Intel/AMD: AES-NI, PCLMULQDQ
Compression
-
Python stdlib:
- zlib (medium fast)
-
lzma (slow, high compression) (*)
-
blosc library: (*)
-
multithreaded, highly optimized
- "faster than memcpy"
- lz4 (superfast, reasonable compression)
-
lz4hc (very fast, "high compression")
-
zlib (faster than the implementation from stdlib)
-
Deduplication (1)
-
No problem with:
- VM images (sparse file support)
- disk images
- renamed huge directories/trees
- inner deduplication of data set
- historical deduplication
- deduplication between different machines
Deduplication (2)
-
Content defined chunking:
- "buzhash" rolling hash
- cut data when hash has specific bit pattern,
yields chunks with 2^n bits target size - seeded, to avoid fingerprinting chunk lengths
-
Store chunks under id into store:
- id = HASH(chunk)
- id = MAC(mac_key, chunk)
Borg, the present
-
Borg Backup is a fork of Attic:
-
currently tracking attic dev.
-
plus a lot of conservative PRs
(stuff from attic/merge branch "merge") -
(*) plus a lot of new stuff in "experimental" branch
-
not compatible to Attic
-
Borg, what's different?
-
developed by "The Borg Collective"
-
more open development
-
new developers are welcome!
-
quicker development
-
redesign where needed
-
changes, new features
-
incompatible changes with good reason
-
thus: less "sta(b)le"
Borg, the future
-
scalability improvements
-
speed improvements
-
architectural changes
-
pull backups? backup-only mode?
-
better logging / exception handling
-
more backends? http / ftp / aws / google / ...
-
other platforms / architectures
-
GUI? (needs a developer)
-
<you name it>
Borg - you can be assimilated!
-
test scalability / reliability / security
-
be careful!
-
file bugs
-
file feature requests
-
improve docs
-
contribute code
-
spread the word
-
later: create dist packages
Borg Backup - Links
borgbackup.github.io
##borgbackup on chat.freenode.net
Questions / Feedback?
-
Just grab me, I am here all days!
-
Thomas J Waldmann @ twitter
Borg Demo ->
Borg Internals & Ideas v
Multithreading
-
GIL? No (big) problem, just release the GIL:
-
I/O: python file read, write/fsync (ok)
-
C: reader / chunker (TODO)
-
C: id hashing (ok)
-
C: compression (ok)
-
C: encryption (TODO)
-
-
CPU usage (i5, 2 Cores + HT)
-
no MT: 30-80%
-
with MT: 300%
-
-
but: thread safety, race conditions!
Hashes / MACs
-
slow:
-
sha256 (and hmac-sha256)
-
crc32
-
-
faster:
-
poly1305-AES
-
siphash (only 64bit result)
-
blake2
-
xxhash (not cryptographic)
-
sha512-256
-
crc32c (intel cpu instr.)
-
Crypto
-
authenticated encryption with associated data
-
slow:
-
aes-ctr + hmac-sha256 (= 2 passes)
-
openssl + py stdlib
-
-
faster:
-
aes-gcm (1 pass, intel + amd cpu instr.)
-
openssl
-
but: rare aes-gcm issue with weak keys
-
-
Nonce / IV / Counter generation / management
-
session keys (per worker thread per backup)
-
passphrase mode -> keyfile mode, kf inside repo
RAM consumption (1)
-
high RAM usage to achieve high speed
-
repo index (id -> storage segment, offset)
-
chunks cache (id -> refcnt, size, csize)
-
files cache (H(path) -> mtime, size, inode, chunks)
-
chunk_count ~= total_file_size / 65536 repo_index = chunk_count * 40 chunks_cache = chunk_count * 44 files_cache = total_file_count * 240 + chunk_count * 80
RAM consumption (2)
-
1 Mi files, 1 TiB data -> 2.8 GiB RAM
-
be careful with little RAM + large storage (NAS)
-
have swap or switch off the files cache
-
use multiple repos, purge often
-
soon: use larger chunk size
-
use smaller ids (128 instead of 256 bits)
-
help fixing this:
-
use different data structure than hash table?
-
mmaped-file?
-
provide on-disk fallback code?
-
Borg - Demo
I'll show a developer installation / recent code.
In the future:
Release packages on PyPi
Linux / BSD / ... packages
Installation Preps
# Debian / Ubuntu
# Python 3.x (>= 3.2) + Headers, Py Package Installer
apt-get install python3.4-dev python3.4 python3-pip
# we need OpenSSL + Headers for Crypto
apt-get install libssl-dev openssl
# ACL support Headers + Library
apt-get install libacl1-dev libacl1
# if you do not have gcc / make / etc. yet
apt-get install build-essential
# optional: lowlevel FUSE py binding - to mount backup archives
apt-get install python3-llfuse fuse
# optional: for unit testing
apt-get install fakeroot
system wide install
# A) later: system-wide install with pip, latest release:
sudo pip install borgbackup
# note: maybe you have to use pip3 to get the python3 pip
dev install from git
# B) isolated install, latest borg git repo code:
git clone https://github.com/borgbackup/borg.git
apt-get install python-virtualenv
virtualenv --python=python3 borg-env
source borg-env/bin/activate # always before using!
# install borg + dependencies into virtualenv
pip install cython # compile .pyx -> .c
pip install tox # optional, for running unit tests
cd borg
pip install -e .
# check your install
fakeroot -u tox
init / create
# initialize a repository:
borg init /tmp/borg
# create a "first" archive inside this repo (verbose):
borg create --progress --stats /tmp/borg::first ~/Desktop
# create a "second" archive (less verbose):
borg create /tmp/borg::second ~/Desktop
# even more verbose:
borg create -v --stats /tmp/borg::third ~/Desktop
list / extract / check
# list repo / archive contents:
borg list /tmp/borg
borg list /tmp/borg::first
# extract ("restore") from an archive to cwd:
mkdir test ; cd test
borg extract /tmp/borg::third
# simulate extraction (good test):
borg extract -v --dry-run /tmp/borg::third
# check consistency of repo:
borg check /tmp/borg
info / delete / help
# info about archive:
borg info /tmp/borg::first
# delete archive:
borg delete /tmp/borg::first
# delete repo:
borg delete /tmp/borg
crypto/compression
# options, options, options, ...
borg init --help
# create a encrypted repo:
borg init -e keyfile /tmp/borg-enc
# (*) later: compression options
borg init ...
# ... (same as before, but you need to give passphrase)
remote via ssh
# connect to remote borg via ssh:
# remote borg needs to be compatible with local
borg init ssh://user@host:22/mnt/backup/borg
borg create ssh://user@host:22/mnt/backup/borg::first ~
# also possible: using sshfs or other locally mounted
# network filesystems, but be careful: locks, perf.
Links
borgbackup.github.io
##borgbackup on chat.freenode.net
Questions / Feedback?
-
Just grab me, I am here all days!
-
Thomas J Waldmann @ twitter
Borg
By Thomas Waldmann
Borg
- 4,600