Borg Backup

(a fork of Attic)

 

"The holy grail of backup software"

 

 

Thomas Waldmann @ shackday 2015

Feature Set (1)

  • simple & fast
  • deduplication
  • compression
  • authenticated encryption

  • easy pruning of old backups

  • simple backend (k/v, fs, via ssh)

Feature Set (2)

  • FOSS (BSD license)

  • good docs

  • good platform / arch support

  • xattr / acl support

  • FUSE support ("mount a backup")

Code

  • 91% Python3 + Cython
    (high-level code, glue code)
  • 9% C
    (performance critical stuff)
  • only ~6000 LOC total
  • few dependencies
  • unit tests, CI

Security

  • Signatures / Authentication
    no undetected corruption/tampering
     
  • Encryption / Confidentiality
    only you have access to your data
     
  • FOSS in Python
    review possible, no buffer overflows

Safety

  • Robustness
    (by append-only design, transactions)
     
  • Checkpoints
    every 5 minutes (between files)
     
  • msgpack with "limited" Unpacker
    (no memory DoS)

Crypto Keys

  • client-side meta+data encryption
     
  • separate keys for sep. concerns
     
  • passphrase pbkdf2 100k rounds
     
  • Keys:
    • none
    • repokey (replaces: passphrase-only)
    • passphrase protected keyfile

Crypto Cipher/MAC

  • AEAD, Encrypt-then-MAC
    • AES256-GCM / GHASH (exp.)
    • AES256-CTR + HMAC-SHAxxx
    • Counter / IV deterministic, never repeats
       
  • uses OpenSSL

     
  • Intel/AMD: AES-NI, PCLMULQDQ

Compression

  • none
    • no compression, 1:1 pass through, no cpu usage
  • lz4
    • low compression, super fast
    • sometimes faster than w/o compression
  • zlib
    • medium compression, medium fast, level 0..9
  • lzma
    • high compression, slow, level 0..9

Deduplication (1)

  • No problem with:
    • VM images (sparse file support)
    • (physical) disk images
    • renamed huge directories/trees
    • inner deduplication of data set
    • historical deduplication
    • deduplication between different machines

 

Deduplication (2)

  • Content defined chunking:
    • "buzhash" rolling hash
    • cut data when hash has specific bit pattern,
      yields chunks with 2^n bits target size
    • n + other chunker params configurable now
    • seeded, to avoid fingerprinting chunk lengths
       
  • Store chunks under id into store:
    • id = HASH(chunk)
    • id = MAC(mac_key, chunk)

Borg, the present

  • Borg Backup is a fork of Attic:

    • currently tracking attic dev.

    • plus a lot of conservative PRs
      (stuff from attic/merge branch "merge")

    • bug and scalability fixes

    • plus a lot of new stuff in "experimental" branch ("exp.")

    • not compatible to Attic

Borg, what's different?

  • developed by "The Borg Collective"

  • more open development

  • new developers are welcome!

  • quicker development

  • redesign where needed

  • changes, new features

  • incompatible changes with good reason

  • thus: less "sta(b)le"

Borg, the future

  • scalability improvements

  • speed improvements

  • architectural changes

  • pull backups? backup-only mode?

  • better logging / exception handling

  • more backends? http / ftp / aws / google / ...

  • other platforms / architectures

  • BorgWeb GUI (for daily user needs)

  • <you name it>

Borg - you can be assimilated!

  • test scalability / reliability / security

  • be careful!

  • file bugs

  • file feature requests

  • improve docs

  • contribute code

  • spread the word

  • create dist packages

  • care for misc. platforms

Borg Backup - Links

borgbackup.github.io
 

#borgbackup on chat.freenode.net

Questions / Feedback?

  • Just grab me at the conference!

  • Thomas J Waldmann @ twitter

Borg Demo ->

Borg Internals & Ideas  v

Multithreading

  • GIL? No (big) problem, just release the GIL:

    • I/O: python file read, write/fsync

    • C: reader / chunker

    • C: id hashing

    • C: compression

    • C: encryption

  • CPU usage (i5, 2 Cores + HT)

    • no MT: 30-80%

    • with MT: 300%

  • but: thread safety, race conditions!

Hashes / MACs

  • slow:

    • sha256 (and hmac-sha256)

    • crc32

  • faster:

    • poly1305-AES

    • siphash (only 64bit result)

    • blake2

    • xxhash (not cryptographic)

    • sha512-256

    • crc32c (intel cpu instr.)

Crypto

  • authenticated encryption with associated data

  • slow:

    • aes-ctr + hmac-sha256 (= 2 passes)

    • openssl + py stdlib

  • faster (TODO / exp. branch):

    • aes-gcm (1 pass, intel + amd cpu instr.)

    • openssl

    • but: rare aes-gcm issue with weak keys

  • Nonce / IV / Counter generation / management

  • session keys (per worker thread per backup)

  • use libsodium? no aes256-ctr support there.

RAM consumption (1)

  • high RAM usage to achieve high speed (
    chunker bitmask default N=16, 64kiB)

  • repo index (id -> storage segment, offset)

  • chunks cache (id -> refcnt, size, csize)

  • files cache (H(path) -> mtime, size, inode, chunks)
     

  • chunk_count ~= total_file_size / 2^N
    
    repo_index = chunk_count * 40
    
    chunks_cache = chunk_count * 44
    
    files_cache = total_file_count * 240 +
                  chunk_count * 80

RAM consumption (2)

  • 1 Mi files, 1 TiB data -> 2.8 GiB RAM
  • use custom chunker params for little RAM + large storage -> N=20, up to 1/16 RAM consumption

  • maybe switch off the files cache

  • use multiple repos, purge often

  • use smaller ids (128 instead of 256 bits)

  • help fixing this:

    • use different data structure than hash table?

    • mmaped-file?

    • provide on-disk fallback code?

1.0 release - soon!

  • drop python 3.2 / 3.3 legacy support
    • only support modern 3.4 and 3.5
    • less, easier, simpler, more powerful code
    • less bugs (3.2 and 3.3 were not that great)
  • drop "deprecated" borgbackup/attic stuff
  • make some minor, but incompatible changes
    • other command line syntax
    • other environment variable semantics
    • other defaults (e.g. bigger chunk size)
  • likely no major incompatible changes this time
  • 1.x better reflects state of the project than 0.x
  • can still work on older systems:
    • "all inclusive" binaries
    • python 3.5: "make altinstall" / backport

beyond 1.0

  • better crypto:
    • faster (e.g. aes-gcm, faster ID hash)
    • more flexible (not hardcoded)
    • safer (random session keys?)
    • libsodium?
  • multithreading?
  • UI work? click? console + ncurses?
  • storage api? storage backends?
  • ...

Borg - Demo

I'll show borgbackup using the

"all inclusive" binary.

 

You could also use:

 

Source code checkout from github

Release packages from PyPi

Linux / BSD / ... packages

 

install

# download the binary and the gpg signature.

...

# verify the gpg signature

gpg --verify borgbackup-linux64.asc


# install / fix permissions

cp borg-linux64 /usr/local/bin/borg
chmod 755 /usr/local/bin/borg
chown root.root /usr/local/bin/borg

init / create

# initialize a repository:

borg init /tmp/borg


# create a "first" archive inside this repo (verbose): 

borg create --stats /tmp/borg::first ~/Desktop


# create a "second" archive (less verbose):

borg create /tmp/borg::second ~/Desktop


# even more verbose:

borg create -v --stats /tmp/borg::third ~/Desktop

list / extract / check

# list repo / archive contents:

borg list /tmp/borg
borg list /tmp/borg::first

# extract ("restore") from an archive to cwd:

mkdir test ; cd test
borg extract /tmp/borg::third

# simulate extraction (good test):

borg extract -v --dry-run /tmp/borg::third

# check consistency of repo:

borg check /tmp/borg

info / delete / help

# info about archive:

borg info /tmp/borg::first

# delete archive:

borg delete /tmp/borg::first

# delete repo:

borg delete /tmp/borg

encrypted repo

# options, options, options, ...

borg init --help

# create a encrypted repo
# (pw protected key file stored locally)

borg init -e keyfile /tmp/borg-enc

# create a encrypted repo
# (pw-protected key file stored in repo):

borg init -e keyfile /tmp/borg-enc


# ... (same as before, but you need to give passphrase)

remote via ssh

# connect to remote borg via ssh:
# remote borg needs to be compatible with local

borg init ssh://user@host:22/mnt/backup/borg

borg create ssh://user@host:22/mnt/backup/borg::first ~


# also possible: using sshfs or other locally mounted
# network filesystems, but be careful: worse performance

Links

borgbackup.github.io

 

#borgbackup on chat.freenode.net

 

Questions / Feedback?

  • Just grab me at the conference!

  • Thomas J Waldmann @ twitter

BorgBackup Talk (updated)

By Thomas Waldmann

BorgBackup Talk (updated)

  • 2,449