Borg Backup

(a fork of Attic)

 

"The holy grail of backup software"

 

 

Thomas Waldmann @ EuroPython 2015

Feature Set (1)

  • simple & fast
  • deduplication
  • compression
  • authenticated encryption

  • easy pruning of old backups

  • simple backend (k/v, fs, via ssh)

Feature Set (2)

  • FOSS (BSD license)

  • good docs

  • good platform / arch support

  • xattr / acl support

  • FUSE support ("mount a backup")

Code

  • 91% Python3 + Cython
    (high-level code, glue code)
  • 9% C
    (performance critical stuff)
  • only ~6000 LOC total
  • few dependencies
  • unit tests, CI

Security

  • Signatures / Authentication
    no undetected corruption/tampering
     
  • Encryption / Confidentiality
    only you have access to your data
     
  • FOSS in Python
    review possible, no buffer overflows

Safety

  • Robustness
    (by append-only design, transactions)
     
  • Checkpoints
    every 5 minutes (between files)
     
  • msgpack with "limited" Unpacker
    (no memory DoS)

Crypto Keys

  • client-side meta+data encryption
     
  • separate keys for sep. concerns
     
  • passphrase pbkdf2 100k rounds
     
  • Keys:
    • none
    • repokey (replaces: passphrase-only)
    • passphrase protected keyfile

Crypto Cipher/MAC

  • AEAD, Encrypt-then-MAC
    • AES256-GCM / GHASH (exp.)
    • AES256-CTR + HMAC-SHAxxx
    • Counter / IV deterministic, never repeats
       
  • uses OpenSSL

     
  • Intel/AMD: AES-NI, PCLMULQDQ

Compression

  • Python stdlib:
    • zlib  (medium fast, level 0..9)
    • lzma  (slow, high compression) (exp.)
       
  • blosc library: (exp.)

    • multithreaded, highly optimized

    • "faster than memcpy"
    • lz4  (superfast, reasonable compression)
    • lz4hc  (very fast, "high compression")

    • zlib  (faster than the implementation from stdlib)

       

Deduplication (1)

  • No problem with:
    • VM images (sparse file support)
    • (physical) disk images
    • renamed huge directories/trees
    • inner deduplication of data set
    • historical deduplication
    • deduplication between different machines

 

Deduplication (2)

  • Content defined chunking:
    • "buzhash" rolling hash
    • cut data when hash has specific bit pattern,
      yields chunks with 2^n bits target size
    • n + other chunker params configurable now
    • seeded, to avoid fingerprinting chunk lengths
       
  • Store chunks under id into store:
    • id = HASH(chunk)
    • id = MAC(mac_key, chunk)

Borg, the present

  • Borg Backup is a fork of Attic:

    • currently tracking attic dev.

    • plus a lot of conservative PRs
      (stuff from attic/merge branch "merge")

    • bug and scalability fixes

    • plus a lot of new stuff in "experimental" branch ("exp.")

    • not compatible to Attic

Borg, what's different?

  • developed by "The Borg Collective"

  • more open development

  • new developers are welcome!

  • quicker development

  • redesign where needed

  • changes, new features

  • incompatible changes with good reason

  • thus: less "sta(b)le"

Borg, the future

  • scalability improvements

  • speed improvements

  • architectural changes

  • pull backups? backup-only mode?

  • better logging / exception handling

  • more backends? http / ftp / aws / google / ...

  • other platforms / architectures

  • BorgWeb GUI (for daily user needs)

  • <you name it>

Borg - you can be assimilated!

  • test scalability / reliability / security

  • be careful!

  • file bugs

  • file feature requests

  • improve docs

  • contribute code

  • spread the word

  • create dist packages

  • care for misc. platforms

Borg Backup - Links

borgbackup.github.io
 

#borgbackup on chat.freenode.net

Questions / Feedback?

  • Just grab me at the sprints!

  • Thomas J Waldmann @ twitter

Borg Demo ->

Borg Internals & Ideas  v

Multithreading

  • GIL? No (big) problem, just release the GIL:

    • I/O: python file read, write/fsync (ok)

    • C: reader / chunker (TODO)

    • C: id hashing (ok)

    • C: compression (ok)

    • C: encryption (TODO)

  • CPU usage (i5, 2 Cores + HT)

    • no MT: 30-80%

    • with MT: 300%

  • but: thread safety, race conditions!

Hashes / MACs

  • slow:

    • sha256 (and hmac-sha256)

    • crc32

  • faster:

    • poly1305-AES

    • siphash (only 64bit result)

    • blake2

    • xxhash (not cryptographic)

    • sha512-256

    • crc32c (intel cpu instr.)

Crypto

  • authenticated encryption with associated data

  • slow:

    • aes-ctr + hmac-sha256 (= 2 passes)

    • openssl + py stdlib

  • faster:

    • aes-gcm (1 pass, intel + amd cpu instr.)

    • openssl

    • but: rare aes-gcm issue with weak keys

  • Nonce / IV / Counter generation / management

  • session keys (per worker thread per backup)

RAM consumption (1)

  • high RAM usage to achieve high speed (N=16, 64kB)

  • repo index (id -> storage segment, offset)

  • chunks cache (id -> refcnt, size, csize)

  • files cache (H(path) -> mtime, size, inode, chunks)
     

  • chunk_count ~= total_file_size / 2^N
    
    repo_index = chunk_count * 40
    
    chunks_cache = chunk_count * 44
    
    files_cache = total_file_count * 240 +
                  chunk_count * 80

RAM consumption (2)

  • 1 Mi files, 1 TiB data -> 2.8 GiB RAM
  • use custom chunker params for little RAM + large storage -> N=20, up to 1/16 RAM consumption

  • maybe switch off the files cache

  • use multiple repos, purge often

  • use smaller ids (128 instead of 256 bits)

  • help fixing this:

    • use different data structure than hash table?

    • mmaped-file?

    • provide on-disk fallback code?

Borg - Demo

I'll show a developer installation / recent code.

 

In the future:

Release packages on PyPi

Linux / BSD / ... packages

 

Installation Preps


# Debian / Ubuntu

# Python 3.x (>= 3.2) + Headers, Py Package Installer
apt-get install python3.4-dev python3.4 python3-pip

# we need OpenSSL + Headers for Crypto
apt-get install libssl-dev openssl

# ACL support Headers + Library
apt-get install libacl1-dev libacl1

# if you do not have gcc / make / etc. yet
apt-get install build-essential

# optional: lowlevel FUSE py binding - to mount backup archives
apt-get install python3-llfuse fuse

# optional: for unit testing
apt-get install fakeroot

system wide install


# A) later: system-wide install with pip, latest release:

sudo pip install borgbackup

# note: maybe you have to use pip3 to get the python3 pip

dev install from git

# B) isolated install, latest borg git repo code:

git clone https://github.com/borgbackup/borg.git

apt-get install python-virtualenv
virtualenv --python=python3 borg-env
source borg-env/bin/activate   # always before using!

# install borg + dependencies into virtualenv
pip install cython  # compile .pyx -> .c
pip install tox   # optional, for running unit tests
cd borg
pip install -e .

# check your install
fakeroot -u tox

init / create

# initialize a repository:

borg init /tmp/borg


# create a "first" archive inside this repo (verbose): 

borg create --progress --stats /tmp/borg::first ~/Desktop


# create a "second" archive (less verbose):

borg create /tmp/borg::second ~/Desktop


# even more verbose:

borg create -v --stats /tmp/borg::third ~/Desktop

list / extract / check

# list repo / archive contents:

borg list /tmp/borg
borg list /tmp/borg::first

# extract ("restore") from an archive to cwd:

mkdir test ; cd test
borg extract /tmp/borg::third

# simulate extraction (good test):

borg extract -v --dry-run /tmp/borg::third

# check consistency of repo:

borg check /tmp/borg

info / delete / help

# info about archive:

borg info /tmp/borg::first

# delete archive:

borg delete /tmp/borg::first

# delete repo:

borg delete /tmp/borg

crypto/compression

# options, options, options, ...

borg init --help

# create a encrypted repo:

borg init -e keyfile /tmp/borg-enc

# (*) later: compression options

borg init ...

# ... (same as before, but you need to give passphrase)

remote via ssh

# connect to remote borg via ssh:
# remote borg needs to be compatible with local

borg init ssh://user@host:22/mnt/backup/borg

borg create ssh://user@host:22/mnt/backup/borg::first ~


# also possible: using sshfs or other locally mounted
# network filesystems,  but be careful: locks, perf.

Links

borgbackup.github.io

 

#borgbackup on chat.freenode.net

 

Questions / Feedback?

  • Just grab me at the sprints!

  • Thomas J Waldmann @ twitter

BorgBackup (EP 2015)

By Thomas Waldmann

BorgBackup (EP 2015)

  • 6,565