Borg Backup

(a fork of Attic)

"The holy grail of backup software"

Thomas Waldmann @ GPN 2015

Feature Set (1)

simple & fast
deduplication
compression
authenticated encryption
easy pruning of old backups
simple backend (k/v, fs, via ssh)

Feature Set (2)

FOSS (BSD license)
good docs
good platform / arch support
xattr / acl support
FUSE support ("mount a backup")

Code

91% Python3 + Cython
(high-level code, glue code)
9% C
(performance critical stuff)
only ~6000 LOC total
few dependencies
unit tests, CI

Security

Signatures / Authentication
no undetected corruption/tampering
Encryption / Confidentiality
only you have access to your data
FOSS in Python
review possible, no buffer overflows

Safety

Robustness
(by append-only design, transactions)
Checkpoints
every 5 minutes (between files)
msgpack with "limited" Unpacker
(no memory DoS)

Crypto Keys

client-side meta+data encryption
separate keys for sep. concerns
passphrase pbkdf2 100k rounds
Keys:
- none
- passphrase-only
- passphrase protected keyfile

Crypto Cipher/MAC

AEAD, Encrypt-then-MAC
- AES256-GCM / GHASH (*)
- AES256-CTR + HMAC-SHAxxx
- Counter / IV deterministic, never repeats
uses OpenSSL
Intel/AMD: AES-NI, PCLMULQDQ

Compression

Python stdlib:
- zlib (medium fast)
- lzma (slow, high compression) (*)
blosc library: (*)
- multithreaded, highly optimized
- "faster than memcpy"
- lz4 (superfast, reasonable compression)
- lz4hc (very fast, "high compression")
- zlib (faster than the implementation from stdlib)

Deduplication (1)

No problem with:
- VM images (sparse file support)
- disk images
- renamed huge directories/trees
- inner deduplication of data set
- historical deduplication
- deduplication between different machines

Deduplication (2)

Content defined chunking:
- "buzhash" rolling hash
- cut data when hash has specific bit pattern,
  yields chunks with 2^n bits target size
- seeded, to avoid fingerprinting chunk lengths
Store chunks under id into store:
- id = HASH(chunk)
- id = MAC(mac_key, chunk)

Borg, the present

Borg Backup is a fork of Attic:
- currently tracking attic dev.
- plus a lot of conservative PRs
  (stuff from attic/merge branch "merge")
- (*) plus a lot of new stuff in "experimental" branch
- not compatible to Attic

Borg, what's different?

developed by "The Borg Collective"
more open development
new developers are welcome!
quicker development
redesign where needed
changes, new features
incompatible changes with good reason
thus: less "sta(b)le"

Borg, the future

scalability improvements
speed improvements
architectural changes
pull backups? backup-only mode?
better logging / exception handling
more backends? http / ftp / aws / google / ...
other platforms / architectures
GUI? (needs a developer)
<you name it>

Borg - you can be assimilated!

test scalability / reliability / security
be careful!
file bugs
file feature requests
improve docs
contribute code
spread the word
later: create dist packages

Borg Backup - Links

borgbackup.github.io

##borgbackup on chat.freenode.net

Questions / Feedback?

Just grab me, I am here all days!
Thomas J Waldmann @ twitter

Borg Demo ->

Borg Internals & Ideas v

Multithreading

GIL? No (big) problem, just release the GIL:
- I/O: python file read, write/fsync (ok)
- C: reader / chunker (TODO)
- C: id hashing (ok)
- C: compression (ok)
- C: encryption (TODO)
CPU usage (i5, 2 Cores + HT)
- no MT: 30-80%
- with MT: 300%
but: thread safety, race conditions!

Hashes / MACs

slow:
- sha256 (and hmac-sha256)
- crc32
faster:
- poly1305-AES
- siphash (only 64bit result)
- blake2
- xxhash (not cryptographic)
- sha512-256
- crc32c (intel cpu instr.)

Crypto

authenticated encryption with associated data
slow:
- aes-ctr + hmac-sha256 (= 2 passes)
- openssl + py stdlib
faster:
- aes-gcm (1 pass, intel + amd cpu instr.)
- openssl
- but: rare aes-gcm issue with weak keys
Nonce / IV / Counter generation / management
session keys (per worker thread per backup)
passphrase mode -> keyfile mode, kf inside repo

RAM consumption (1)

high RAM usage to achieve high speed
repo index (id -> storage segment, offset)
chunks cache (id -> refcnt, size, csize)
files cache (H(path) -> mtime, size, inode, chunks)

chunk_count ~= total_file_size / 65536

repo_index = chunk_count * 40

chunks_cache = chunk_count * 44

files_cache = total_file_count * 240 +
              chunk_count * 80

RAM consumption (2)

```
1 Mi files, 1 TiB data -> 2.8 GiB RAM
```
be careful with little RAM + large storage (NAS)
have swap or switch off the files cache
use multiple repos, purge often
soon: use larger chunk size
use smaller ids (128 instead of 256 bits)
help fixing this:
- use different data structure than hash table?
- mmaped-file?
- provide on-disk fallback code?

Borg - Demo

I'll show a developer installation / recent code.

In the future:

Release packages on PyPi

Linux / BSD / ... packages

Installation Preps


# Debian / Ubuntu

# Python 3.x (>= 3.2) + Headers, Py Package Installer
apt-get install python3.4-dev python3.4 python3-pip

# we need OpenSSL + Headers for Crypto
apt-get install libssl-dev openssl

# ACL support Headers + Library
apt-get install libacl1-dev libacl1

# if you do not have gcc / make / etc. yet
apt-get install build-essential

# optional: lowlevel FUSE py binding - to mount backup archives
apt-get install python3-llfuse fuse

# optional: for unit testing
apt-get install fakeroot

system wide install


# A) later: system-wide install with pip, latest release:

sudo pip install borgbackup

# note: maybe you have to use pip3 to get the python3 pip

dev install from git

# B) isolated install, latest borg git repo code:

git clone https://github.com/borgbackup/borg.git

apt-get install python-virtualenv
virtualenv --python=python3 borg-env
source borg-env/bin/activate   # always before using!

# install borg + dependencies into virtualenv
pip install cython  # compile .pyx -> .c
pip install tox   # optional, for running unit tests
cd borg
pip install -e .

# check your install
fakeroot -u tox

init / create

# initialize a repository:

borg init /tmp/borg


# create a "first" archive inside this repo (verbose): 

borg create --progress --stats /tmp/borg::first ~/Desktop


# create a "second" archive (less verbose):

borg create /tmp/borg::second ~/Desktop


# even more verbose:

borg create -v --stats /tmp/borg::third ~/Desktop

list / extract / check

# list repo / archive contents:

borg list /tmp/borg
borg list /tmp/borg::first

# extract ("restore") from an archive to cwd:

mkdir test ; cd test
borg extract /tmp/borg::third

# simulate extraction (good test):

borg extract -v --dry-run /tmp/borg::third

# check consistency of repo:

borg check /tmp/borg

info / delete / help

# info about archive:

borg info /tmp/borg::first

# delete archive:

borg delete /tmp/borg::first

# delete repo:

borg delete /tmp/borg

crypto/compression

# options, options, options, ...

borg init --help

# create a encrypted repo:

borg init -e keyfile /tmp/borg-enc

# (*) later: compression options

borg init ...

# ... (same as before, but you need to give passphrase)

remote via ssh

# connect to remote borg via ssh:
# remote borg needs to be compatible with local

borg init ssh://user@host:22/mnt/backup/borg

borg create ssh://user@host:22/mnt/backup/borg::first ~


# also possible: using sshfs or other locally mounted
# network filesystems,  but be careful: locks, perf.

Links

borgbackup.github.io

##borgbackup on chat.freenode.net

Questions / Feedback?

Just grab me, I am here all days!
Thomas J Waldmann @ twitter

Borg

By Thomas Waldmann

Borg

4,506

Borg Backup

Feature Set (1)

Feature Set (2)

Code

Security

Safety

Crypto Keys

Crypto Cipher/MAC

Compression

Deduplication (1)

Deduplication (2)

Borg, the present

Borg, what's different?

Borg, the future

Borg - you can be assimilated!

Borg Backup - Links

Questions / Feedback?

Borg Demo ->

Borg Internals & Ideas v

Multithreading

Hashes / MACs

Crypto

RAM consumption (1)

RAM consumption (2)

Borg - Demo

Installation Preps

system wide install

dev install from git

init / create

list / extract / check

info / delete / help

crypto/compression

remote via ssh

Links

Questions / Feedback?

Borg

More from Thomas Waldmann