Borg Backup

(a fork of Attic)

 

"The holy grail of backup software"

 

 

Thomas Waldmann (12/2016)

Feature Set (1)

  • simple & fast
  • deduplication
  • compression
  • authenticated encryption

  • easy pruning of old backups

  • simple backend (k/v, fs, via ssh)

Feature Set (2)

  • FOSS (BSD license)

  • good docs

  • good platform / arch support

  • xattr / acl support

  • FUSE support ("mount a backup")

Code

  • 95% Python 3.4+, Cython
    (high-level code, glue code)
  • 5% C
    (performance critical stuff)
  • only ~12000 LOC total
  • few dependencies
  • unit tests, CI

Security

  • Signatures / Authentication
    no undetected corruption/tampering
     
  • Encryption / Confidentiality
    only you have access to your data
     
  • FOSS in Python
    review possible, no buffer overflows

Safety

  • Robustness
    (by append-only design, transactions)
     
  • Checkpoints
    every 5 minutes (between files)
     
  • msgpack with "limited" Unpacker
    (no memory DoS)

Crypto Keys

  • client-side meta+data encryption
     
  • separate keys for sep. concerns
     
  • passphrase pbkdf2 100k rounds
     
  • Keys:
    • none
    • repokey (replaces: passphrase-only)
    • passphrase protected keyfile

Crypto Cipher/MAC

  • AEAD, Encrypt-then-MAC
    • AES256-CTR + HMAC-SHA256
    • Counter / IV deterministic, never repeats
    • we're working on adding AES256-GCM, maybe
      also others (AES-OCB? chacha20-poly1305?)

       
  • uses OpenSSL

     
  • Intel/AMD: AES-NI, PCLMULQDQ

Compression

  • none
    • no compression, 1:1 pass through, no cpu usage
  • lz4
    • low compression, super fast
    • sometimes faster than w/o compression
  • zlib
    • medium compression, medium fast, level 0..9
  • lzma
    • high compression, slow, level 0..9
    • beware of higher levels of lzma: super slow and they do not compress better due to chunk size

Deduplication (1)

  • No problem with:
    • VM images (sparse file support)
    • (physical) disk images
    • renamed huge directories/trees
    • inner deduplication of data set
    • historical deduplication
    • deduplication between different machines

 

Deduplication (2)

  • Content defined chunking:
    • "buzhash" rolling hash
    • cut data when hash has specific bit pattern,
      yields chunks with 2^n bits target size
    • n + other chunker params configurable now
    • seeded, to avoid fingerprinting chunk lengths
       
  • Store chunks under id into store:
    • id = HASH(chunk)  [without encryption]
    • id = HMAC(mac_key, chunk)  [with encryption]

Fork from Attic (May 2015)

  • attic has a good codebase
  • attracted quite some devs
  • lots of pull requests and activity
     
  • but:
  • low / slow PR acceptance
  • 1 main developer with little time
  • rather wanted it as his little pet project
  • rather coding on his own than review code
  • "compatibility forever"

Borg - different goals

  • developed by "The Borg Collective"

  • more open development

  • new developers are welcome!

  • quicker development

  • redesign where needed

  • changes, new features

  • incompatible changes with good reason

  • thus: less "sta(b)le"

Borg, the year after forking

  • attic repo:    ~600 changesets

  • borg repo: ~3300 changesets
  • developers, developers, developers!
  • active community:
    on github, irc channel, mailing list
  • bug and scalability fixes, #5
  • features!  testing.  platforms. docs.

Borg 1.0 (now)

  • packaged for many Linux distributions

  • also in *BSD and Mac OS X dists

  • more or less works on Windows & Cygwin

  • Slashdot's "borgbackup 1.0 released" coverage helped quite a bit.

  • Happy users on Twitter, Reddit and the Blogosphere.

Borg 1.1 (beta now)

  • new features:
    • diff
    • recreate
    • with-lock
    • comment
    • compression: by .ext & heuristic
    • blake2b id hash
  • better speed: FUSE, traversal, HDDs
  • some source reorg / cleanup

Borg 1.2 (future)

  • "serial multi-threading"
    • traverse, read, chunk
    • hash, dedup, compress, encrypt
    • store, sync
    • avoid CPU idle time
    • but: avoid races, crypto issues
       
  • crypto:
    • flexibility: add aes-gcm (speed)
    • improve key management

Borg - you can be assimilated!

  • test scalability / reliability / security

  • be careful!

  • find, file and fix bugs

  • file feature requests

  • improve docs

  • contribute code

  • spread the word

  • create dist packages

  • care for misc. platforms

Borg Backup - Links

borgbackup.readthedocs.io
 

#borgbackup on chat.freenode.net

Questions / Feedback?

  • Just grab me at the conference!

  • Thomas J Waldmann @ twitter

Borg Demo ->

Borg Internals & Ideas  v

Error Correction?

  • borg does error (and even tampering) detection

  • but not (yet?) error correction

  • kinds of errors / threat model:

    • single/few bit errors

    • defect / unreadable blocks

    • media failure (defect disk, ssd)

  • see issue #225 for discussion

  • implement something in borg?

  • rely on other soft- or hardware solutions?

  • avoid futile attempts

Modernize Crypto

  • sha256, hmac-sha256, crc32 are slow

  • aes is also slow, if not hw accelerated

  • faster: poly1305, blake2, sha512-256, crc32c, chacha20

  • we will support OpenSSL 1.1 for better crypto:

    • aes-ocb / aes-gcm

    • chacha2-poly1305

  • also use blake2b (borg 1.1)

  • see PR #1034 crypto-aead branch

Key Gen. / Management

  • currently:

    • 1 AES key

    • 1 HMAC key

    • 1 chunker seed

    • stored highest IV value for AES CTR mode

    • encrypted using key passphrase

  • ideas:
    • session keys? always start from IV=0.
    • per archive? per thread? per chunk?
    • asymm. crypto: encrypt these keys for receiver

RAM consumption

  • borg >= 1.0 now has lower RAM consumption

  • chunks, files and repo index kept in memory

  • uses bigger chunks (2MiB, was: 64kiB)

  • less chunks to manage -> smaller chunks index.

  • be careful on small machines (NAS, rpi, ...)

  • or with huge amount of data / huge file count

  • in the docs, there is a formula to estimate RAM usage

Hash Tables

  • hash table implementation in C

  • e.g. used for the chunks index

  • uses closed hashing

  • uses linear probing for collision handling

  • sometimes slow

  • use Robin Hood hashing? tried, but wasn't faster - huh!?

Chunk Index Sync

  • problem: multiple clients updating same repo

  • then: chunk index needs to get re-synced

  • slow, esp. if remote, many and/or big archives

  • local collection of single-archive chunk indexes

  • needs lots of space, merging still expensive

  • idea: "borgception"

    • backup chunks index into a secondary borg repo

    • fetch it from there when out of sync

  • other ideas?

Borg - Demo

I'll show borgbackup using the

"all inclusive" binary.

 

You could also use:

 

Source code checkout from github

Release packages from PyPi

Linux / BSD / ... packages

 

install

# download the binary and the gpg signature.

...

# verify the gpg signature

gpg --verify borgbackup-linux64.asc


# install / fix permissions

cp borg-linux64 /usr/local/bin/borg
chmod 755 /usr/local/bin/borg
chown root.root /usr/local/bin/borg

init / create

# initialize a repository:

borg init /tmp/borg


# create a "first" archive inside this repo (verbose): 

borg create -v --stats /tmp/borg::first ~/Desktop


# create a "second" archive (less verbose):

borg create /tmp/borg::second ~/Desktop


# even more verbose:

borg create -v --list --stats /tmp/borg::third ~/Desktop
borg create -v --progress --stats /tmp/borg::third ~/Desktop

list / extract / check

# list repo / archive contents:

borg list /tmp/borg
borg list /tmp/borg::first

# extract ("restore") from an archive to cwd:

mkdir test ; cd test
borg extract /tmp/borg::third

# simulate extraction (good test):

borg extract -v --dry-run /tmp/borg::third

# check consistency of repo:

borg check -v /tmp/borg

info / delete / help

# info about archive:

borg info /tmp/borg::first

# delete archive:

borg delete /tmp/borg::first

# delete repo:

borg delete /tmp/borg

remote via ssh

# connect to remote borg via ssh:
# remote borg needs to be compatible with local

borg init ssh://user@host:22/mnt/backup/borg

borg create ssh://user@host:22/mnt/backup/borg::first ~


# also possible: using sshfs or other locally mounted
# network filesystems, but be careful: worse performance

Links

borgbackup.readthedocs.io

 

#borgbackup on chat.freenode.net

 

Questions / Feedback?

  • Just grab me at the ...!

  • Thomas J Waldmann @ twitter

BorgBackup Talk (updated 12/2016)

By Thomas Waldmann

BorgBackup Talk (updated 12/2016)

  • 2,677