Borg Backup
(a fork of Attic)
"I found the Holy Grail of backups."
(Stavros K. about Attic-Backup, 8/2013)
Thomas Waldmann (PyCon DE, 2017-10-25)
ThomasWaldmann.__doc__
- Doing Python since 2001,
Linux since it was on Floppies,
loves FOSS. -
Projects:
- MoinMoin Wiki
- nsupdate.info
- bepasty
- vpngw
- BorgBackup
- Contact: tw @ waldmann-edv . de
- Python Developer (freelance & remote)
It's a backup tool
-
one you maybe actually would enjoy using.
- simple
- efficient
- secure
- safe
- free & open
simple
- each backup is a full backup
- FUSE = mount your backups
- easy pruning of old backups
- tooling: just borg, ssh, sh
- good help, manpages, docs
- single-file binary
- good fs / OS / arch support
efficient
- very fast for unchanged files
- chunk deduplication
- flexible compression
- sparse file support
- not flooding the fs cache
- sped up by a bit of C and Cython
- HW accelerated crypto
safe
-
borg uses:
- checksums
- transactions
- fs: syncing, atomic ops
- backup repo = log-like KV store
- checkpoints while backing up
- off-site remote repositories
secure
-
authenticated encryption
- nothing to see in the repo
- detect tampering / corruption
-
ssh transport for remote repos
-
append-only mode repos
-
FOSS, you can see the code
Crypto
- client-side, metadata and data
-
authenticated encryption (AE, EtM)
- aes256-ctr
- hmac-sha256 or blake2b
- counter management, never repeat
-
encrypted key material:
- on client or in backup repo
- passphrase protection: pbkdf2 + AES
- OpenSSL 1.0/1.1, only libcrypto
Compression
- chunk-based (not full file)
- algorithms: none, lz4, zlib, lzma
- with lz4 often faster than with none!
-
"auto" mode:
- first use lz4 predictor
- if compressible: expensive compression
- "borg recreate" can recompress
Deduplication in SpaceTime
-
Deduplication "dimensions":
-
inner deduplication of data set
- copies of files, similar files
- lots of zeros (sparse or not)
-
historical deduplication in backup repo
- many files don't change over time
- they are all in each of your full backups
-
deduplication between machines
- just moved that big directory from m1 to m2?
- same OS or data files everywhere?
- will all dedup if machines share a backup repo.
-
inner deduplication of data set
Borg Deduplication
-
Content defined Chunk Deduplication
- cut a file into variable sized chunks,
content defines where a cut happens
(efficiently done using a rolling hash) - MAC(chunk) is the key for the KV store
- cut a file into variable sized chunks,
-
No problem with:
- inserted / deleted / shifted file contents
- renamed files / dirs
- VM disk images: only few chunks change
$ borg info ssh://borg@myserver/repos/myrepo
Original size Compressed size Dedup size
All archives: 22.76 TB 18.22 TB 486.20 GB
Unique chunks Total chunks
Chunk index: 6305006 272643223
Borg assimilated Data
Real stats from a real backup repository (shortened).
2 machines, 147 backup archives, 2.5 years.
Borg 1.2: the future
- multi-threading, actors, zeromq
- fully use CPU and I/O
- GIL is no big issue (I/O, C code)
-
crypto enhancements:
- AEAD API, faster AEAD ciphers
- key and cipher agility
- require OpenSSL 1.1?
Borg the Project ->
Borg Internals & Ideas v
Error Correction?
-
borg does error (and even tampering) detection
-
but not (yet?) error correction
-
kinds of errors / threat model:
-
single/few bit errors
-
defect / unreadable blocks
-
media failure (defect disk, ssd)
-
-
see issue #225 for discussion
-
implement something in borg?
-
rely on other soft- or hardware solutions?
-
avoid futile attempts, borg is application level
Modernize Crypto
-
sha256, hmac-sha256 is slow
-
solved: borg 1.1 added blake2b
-
-
zlib crc32 is slow
-
solved: borg 1.1 added fast crc32 C code
-
-
AES-CTR + MAC 2-pass AE can be slow
-
todo: borg 1.2 will use OpenSSL 1.1 for:
-
AES-OCB (very fast, if hw accelerated)
-
chacha2-poly1305 (quite fast w/o hw accel.)
-
-
-
key / cipher agility
Key Gen. / Management
-
currently:
-
1 AES key
-
1 MAC key
-
1 chunker seed
-
stored highest IV value for AES CTR mode
-
encrypted using key passphrase
-
- ideas:
- session keys? always start from IV=0.
- per archive? per thread? per chunk?
- asymm. crypto: encrypt these keys for receiver
RAM consumption
-
borg >= 1.0 now has lower RAM consumption
-
uses bigger chunks (2MiB, was: 64kiB)
-
chunks, files and repo index kept in memory
-
less chunks to manage -> smaller chunks index.
-
be careful on small machines (NAS, raspi, ...)
-
or with huge amount of data / huge file count
-
in the docs, there is a formula to estimate RAM usage
Hash Tables
-
own hash table implementation in C
-
compact block of memory, no pyobj overhead
-
e.g. used for the chunks index, repo index
-
uses closed hashing (bucket array, no linked lists)
-
uses linear probing for collision handling
-
HT performance difficult to measure
Chunk Index Sync
-
problem: multiple clients updating same repo
-
then: chunk index needs to get re-synced
-
slow, esp. if remote, many and/or big archives
-
local collection of single-archive chunk indexes
-
needs lots of space, merging still expensive
-
idea: "borgception"
-
backup chunks index into a secondary borg repo
-
fetch it from there when out of sync
-
-
idea: "build chunks index from repo index" (in 1.1)
-
repo index knows all chunk IDs
-
but: no size/csize info in repo index
-
Python / Cython / C
- Python (90%):
- easy, high level logic
- Cython (5%):
- write pythonic code, get C-ish speed
- access C data types, functions, easy "nogil"
- simple interface code for C libs,
we use that for OpenSSL, lz4 and own C code.
- C (5%):
- used for the most resource-usage critical parts
(CPU as well as RAM usage) - own C code, bundled C code
- hard to maintain, debug
- used for the most resource-usage critical parts
travis-ci.org
- hosted service
- free for FOSS
-
automatically runs our tests:
- for branches
- for pull requests
- on Linux and macOS
- misc. Python versions
pytest & tox
-
pytest:
- pretty and simple tests,
less boilerplate than stdlib "unittest" - powerful framework
- have fun writing tests
- optionally remote and parallel tests
- pretty and simple tests,
-
tox:
- automates testing on all python versions
- each in a freshly built virtual env
- plus flake8 checker, for pep8 and more
pyenv
- pull and build any python version you want
- easily switch between versions
-
test on minimum requirement:
- older point release == more bugs
- 3.4.0, 3.5.0, 3.6.0 to find all the issues
-
build / bundle on latest / greatest release:
- newer point release == less bugs
- 3.5.4 (or 3.6.2) to get best build
vagrant, vbox, qemu
- automate VMs:
create, start, provision, ..., shutdown, destroy -
e.g. run tests / builds on:
- Linux (misc. dists, old / new, 32 / 64bit, ...)
- BSD (FreeBSD, OpenBSD, NetBSD)
- OS X
- OpenIndiana
- Windows (maybe)
- PowerPC64 qemu VM with Debian to test on non-x86/x64 BE arch (most archs are LE).
- less surprises "oh, it does not work on X?"
pyinstaller
- creates a single-file binary, bundling together:
- your Python / Cython / C code
- (C)Python Interpreter of your choice
- all required Python stdlib libraries
- other required libraries
- but not the (g)libc
- We use it to build Linux, FreeBSD, OS X borg binaries.
- Intentionally build on "old" OS:
- all as-old or newer deployments will usually work
- preferably not too old: security updates wanted
Secure Releasing with GPG
- creepy: users execute downloaded blobs, as root.
- give them a chance to make sure it is authentic:
- release signing key fingerprint widely published
- public key uploaded to keyserver
- document how to use GPG to verify the signatures
- git repo: sign the release tags (or every commit)
- release files: sign them, detached sig
- note: just publishing hashes of files is no protection against attacks (just against accidential corruption)
setuptools_scm
- tired of bumping your version numbers?
- setuptools_scm makes versions from git tags:
- considers latest tag
- distance to that tag (commits)
- workdir state (uncommitted changes?)
-
1.2.3 (tagged release code)
-
1.2.4.dev3+gdeadbee (3 commits later)
- 1.2.4.dev3+gdeadbee.d20170709 ("" + unclean)
Sphinx / Docs
- sphinx - generates html/pdf from reST
-
reuse your ArgParser help:
- build_usage: html cli usage docs
- build_man: man pages
- see archiver.py and setup.py
-
reuse your github README:
- include it docs intro
- it's your "elevator speech"
-
reuse your docs:
- nice hosted docs can be your "home page"
ReadTheDocs.org
-
builds and hosts your docs:
- url like borgbackup.readthedocs.io
- automatically built from github tags
- version selector, stable, latest
- nice theme (also for mobile devices)
- offers PDF (and other) downloads
- uses sphinx
asciinema.org
- create cool demo "videos" of console tools
-
embed them on:
- home page
- docs intro
- github README
- selectable static preview screen
- adjustable playback speed
- copy & paste works from the "video"
-
recording is json:
- fix typos by editing it
- commit it to your repo
GitHub
- "Organisation" == a common ground
- Main Repo "borg" with good README
- Repo "community" with links to related stuff
- "Issues" (+ Labels)
- bugs / todo / planning
- ideas / feature requests / discussion
- questions -> docs enhancements
- bounties $$ via bountysource.com
- "Pull Requests" + Code Review
- "Milestones" for Release Planning
- "Releases" to publish changelog link, src, bin
Communication Channels
- Mailing List and Archive:
- borgbackup @ python.org
- slow, async, permanent
- IRC (and also matrix):
- #borgbackup @ chat.freenode.net
- quicker, sync/async, transient
- Twitter:
- @borgbackup on Twitter
- Usages:
- support
- discussion
- (release) announcements
Borg - you can be assimilated!
-
test scalability / reliability / security
-
find, file and fix bugs
-
file and implement feature requests
-
improve docs
-
contribute or review code
-
spread the word
-
create dist packages
-
care for misc. platforms (windows)
-
donate funds via bountysource
For more information:
borgbackup.org
Questions / Feedback?
-
Just grab me at the conference or at the sprints!
-
Thomas J Waldmann @ twitter
BorgBackup Talk (updated 10/2017)
By Thomas Waldmann
BorgBackup Talk (updated 10/2017)
borgbackup, the software and the project.
- 2,675