Borg Backup
(a fork of Attic)
"The holy grail of backup software"
Thomas Waldmann @ EuroPython 2015
Feature Set (1)
- simple & fast
- deduplication
- compression
- 
	authenticated encryption 
- 
	easy pruning of old backups 
- 
	simple backend (k/v, fs, via ssh) 
Feature Set (2)
- 
	FOSS (BSD license) 
- 
	good docs 
- 
	good platform / arch support 
- 
	xattr / acl support 
- 
	FUSE support ("mount a backup") 
Code
- 91% Python3 + Cython
 (high-level code, glue code)
- 9% C
 (performance critical stuff)
- only ~6000 LOC total
- few dependencies
- unit tests, CI
Security
- 
Signatures / Authentication
 no undetected corruption/tampering
 
- 
Encryption / Confidentiality
 only you have access to your data
 
- 
FOSS in Python
 review possible, no buffer overflows
Safety
- 
Robustness
 (by append-only design, transactions)
 
- 
Checkpoints
 every 5 minutes (between files)
 
- 
msgpack with "limited" Unpacker
 (no memory DoS)
Crypto Keys
- 
client-side meta+data encryption
 
- 
separate keys for sep. concerns
 
- 
passphrase pbkdf2 100k rounds
 
- 
Keys:
	- none
- repokey (replaces: passphrase-only)
- passphrase protected keyfile
 
Crypto Cipher/MAC
- 
AEAD, Encrypt-then-MAC
	- AES256-GCM / GHASH (exp.)
- AES256-CTR + HMAC-SHAxxx
- 
Counter / IV deterministic, never repeats
 
 
- 
uses OpenSSL
 
 
- Intel/AMD: AES-NI, PCLMULQDQ
Compression
- 
Python stdlib:
	- zlib (medium fast, level 0..9)
- 
lzma  (slow, high compression) (exp.)
 
 
- 
	blosc library: (exp.) - 
		multithreaded, highly optimized 
- "faster than memcpy"
- lz4 (superfast, reasonable compression)
- 
		lz4hc (very fast, "high compression") 
- 
		zlib (faster than the implementation from stdlib) 
 
- 
		
Deduplication (1)
- 
No problem with:
	- VM images (sparse file support)
- (physical) disk images
- renamed huge directories/trees
- inner deduplication of data set
- historical deduplication
- deduplication between different machines
 
Deduplication (2)
- 
Content defined chunking:
	- "buzhash" rolling hash
- cut data when hash has specific bit pattern,
 yields chunks with 2^n bits target size
- n + other chunker params configurable now
- seeded, to avoid fingerprinting chunk lengths
 
 
- 
Store chunks under id into store:
	- id = HASH(chunk)
- id = MAC(mac_key, chunk)
 
Borg, the present
- 
	Borg Backup is a fork of Attic: - 
		currently tracking attic dev. 
- 
		plus a lot of conservative PRs 
 (stuff from attic/merge branch "merge")
- 
		bug and scalability fixes 
- 
		plus a lot of new stuff in "experimental" branch ("exp.") 
- 
		not compatible to Attic 
 
- 
		
Borg, what's different?
- 
	developed by "The Borg Collective" 
- 
	more open development 
- 
	new developers are welcome! 
- 
	quicker development 
- 
	redesign where needed 
- 
	changes, new features 
- 
	incompatible changes with good reason 
- 
	thus: less "sta(b)le" 
Borg, the future
- 
	scalability improvements 
- 
	speed improvements 
- 
	architectural changes 
- 
	pull backups? backup-only mode? 
- 
	better logging / exception handling 
- 
	more backends? http / ftp / aws / google / ... 
- 
	other platforms / architectures 
- 
	BorgWeb GUI (for daily user needs) 
- 
	<you name it> 
Borg - you can be assimilated!
- 
	test scalability / reliability / security 
- 
	be careful! 
- 
	file bugs 
- 
	file feature requests 
- 
	improve docs 
- 
	contribute code 
- 
	spread the word 
- 
	create dist packages 
- 
	care for misc. platforms 
Borg Backup - Links
borgbackup.github.io
 
#borgbackup on chat.freenode.net
Questions / Feedback?
- 
	Just grab me at the sprints! 
- 
	Thomas J Waldmann @ twitter 
Borg Demo ->
Borg Internals & Ideas v
Multithreading
- 
	GIL? No (big) problem, just release the GIL: - 
		I/O: python file read, write/fsync (ok) 
- 
		C: reader / chunker (TODO) 
- 
		C: id hashing (ok) 
- 
		C: compression (ok) 
- 
		C: encryption (TODO) 
 
- 
		
- 
	CPU usage (i5, 2 Cores + HT) - 
		no MT: 30-80% 
- 
		with MT: 300% 
 
- 
		
- 
	but: thread safety, race conditions! 
Hashes / MACs
- 
	slow: - 
		sha256 (and hmac-sha256) 
- 
		crc32 
 
- 
		
- 
	faster: - 
		poly1305-AES 
- 
		siphash (only 64bit result) 
- 
		blake2 
- 
		xxhash (not cryptographic) 
- 
		sha512-256 
- 
		crc32c (intel cpu instr.) 
 
- 
		
Crypto
- 
	authenticated encryption with associated data 
- 
	slow: - 
		aes-ctr + hmac-sha256 (= 2 passes) 
- 
		openssl + py stdlib 
 
- 
		
- 
	faster: - 
		aes-gcm (1 pass, intel + amd cpu instr.) 
- 
		openssl 
- 
		but: rare aes-gcm issue with weak keys 
 
- 
		
- 
	Nonce / IV / Counter generation / management 
- 
	session keys (per worker thread per backup) 
RAM consumption (1)
- 
	high RAM usage to achieve high speed (N=16, 64kB) 
- 
	repo index (id -> storage segment, offset) 
- 
	chunks cache (id -> refcnt, size, csize) 
- 
	files cache (H(path) -> mtime, size, inode, chunks) 
 
- 
	chunk_count ~= total_file_size / 2^N repo_index = chunk_count * 40 chunks_cache = chunk_count * 44 files_cache = total_file_count * 240 + chunk_count * 80
RAM consumption (2)
- 
	1 Mi files, 1 TiB data -> 2.8 GiB RAM 
- 
	use custom chunker params for little RAM + large storage -> N=20, up to 1/16 RAM consumption 
- 
	maybe switch off the files cache 
- 
	use multiple repos, purge often 
- 
	use smaller ids (128 instead of 256 bits) 
- 
	help fixing this: - 
		use different data structure than hash table? 
- 
		mmaped-file? 
- 
		provide on-disk fallback code? 
 
- 
		
Borg - Demo
I'll show a developer installation / recent code.
In the future:
Release packages on PyPi
Linux / BSD / ... packages
Installation Preps
# Debian / Ubuntu
# Python 3.x (>= 3.2) + Headers, Py Package Installer
apt-get install python3.4-dev python3.4 python3-pip
# we need OpenSSL + Headers for Crypto
apt-get install libssl-dev openssl
# ACL support Headers + Library
apt-get install libacl1-dev libacl1
# if you do not have gcc / make / etc. yet
apt-get install build-essential
# optional: lowlevel FUSE py binding - to mount backup archives
apt-get install python3-llfuse fuse
# optional: for unit testing
apt-get install fakeroot
system wide install
# A) later: system-wide install with pip, latest release:
sudo pip install borgbackup
# note: maybe you have to use pip3 to get the python3 pip
dev install from git
# B) isolated install, latest borg git repo code:
git clone https://github.com/borgbackup/borg.git
apt-get install python-virtualenv
virtualenv --python=python3 borg-env
source borg-env/bin/activate   # always before using!
# install borg + dependencies into virtualenv
pip install cython  # compile .pyx -> .c
pip install tox   # optional, for running unit tests
cd borg
pip install -e .
# check your install
fakeroot -u toxinit / create
# initialize a repository:
borg init /tmp/borg
# create a "first" archive inside this repo (verbose): 
borg create --progress --stats /tmp/borg::first ~/Desktop
# create a "second" archive (less verbose):
borg create /tmp/borg::second ~/Desktop
# even more verbose:
borg create -v --stats /tmp/borg::third ~/Desktop
list / extract / check
# list repo / archive contents:
borg list /tmp/borg
borg list /tmp/borg::first
# extract ("restore") from an archive to cwd:
mkdir test ; cd test
borg extract /tmp/borg::third
# simulate extraction (good test):
borg extract -v --dry-run /tmp/borg::third
# check consistency of repo:
borg check /tmp/borg
info / delete / help
# info about archive:
borg info /tmp/borg::first
# delete archive:
borg delete /tmp/borg::first
# delete repo:
borg delete /tmp/borg
crypto/compression
# options, options, options, ...
borg init --help
# create a encrypted repo:
borg init -e keyfile /tmp/borg-enc
# (*) later: compression options
borg init ...
# ... (same as before, but you need to give passphrase)
remote via ssh
# connect to remote borg via ssh:
# remote borg needs to be compatible with local
borg init ssh://user@host:22/mnt/backup/borg
borg create ssh://user@host:22/mnt/backup/borg::first ~
# also possible: using sshfs or other locally mounted
# network filesystems,  but be careful: locks, perf.
Links
borgbackup.github.io
#borgbackup on chat.freenode.net
Questions / Feedback?
- 
	Just grab me at the sprints! 
- 
	Thomas J Waldmann @ twitter 
BorgBackup (EP 2015)
By Thomas Waldmann
BorgBackup (EP 2015)
- 6,938
 
   
   
  