(a fork of Attic)
"I found the Holy Grail of backups."
(Stavros K. about Attic-Backup, 8/2013)
Thomas Waldmann (@home, 202-12)
It's a backup tool
-
one you maybe actually would enjoy using.
ssh transport for remote repos
append-only mode repos
size obfuscation (1.2)
FOSS, you can see the code
$ borg info ssh://borg@myserver/repos/myrepo
Original size Compressed size Dedup size
All archives: 22.76 TB 18.22 TB 486.20 GB
Unique chunks Total chunks
Chunk index: 6305006 272643223
Real stats from a real backup repository (shortened).
2 machines, 147 backup archives, 2.5 years.
borg does error (and even tampering) detection
but not (yet?) error correction
kinds of errors / threat model:
single/few bit errors
defect / unreadable blocks
media failure (defect disk, ssd)
see issue #225 for discussion
implement something in borg?
rely on other soft- or hardware solutions?
avoid futile attempts, borg is application level
sha256, hmac-sha256 is slow
solved: borg 1.1 added blake2b
zlib crc32 is slow
solved: borg 1.1 added fast crc32 C code
AES-CTR + MAC 2-pass AE can be slow
todo: borg helium will use OpenSSL 1.1 for:
AES-OCB (very fast, if hw accelerated)
chacha2-poly1305 (quite fast w/o hw accel.)
key / cipher agility (todo, borg helium)
currently:
1 AES key
1 MAC key
1 chunker seed
stored highest IV value for AES CTR mode
encrypted using key passphrase
bigger chunks (e.g. 2MiB, default) == lower needs
smaller chunks (e.g. 64kiB) == higher RAM needs
chunks, files and repo index kept in memory
less chunks to manage -> smaller chunks index.
be careful on small machines (NAS, raspi, ...)
or with huge amount of data / huge file count
in the docs, there is a formula to estimate RAM usage
own hash table implementation in C
compact block of memory, no pyobj overhead
e.g. used for the chunks index, repo index
uses closed hashing (bucket array, no linked lists)
uses linear probing for collision handling
HT performance difficult to measure
problem: multiple clients updating same repo
then: chunk index needs to get re-synced
slow, esp. if remote, many and/or big archives
local collection of single-archive chunk indexes
needs lots of space, merging still expensive
repo index knows all chunk IDs
but: no size/csize info in repo index
XXX TODO do we have this in 1.1?
1.2.3 (tagged release code)
1.2.4.dev3+gdeadbee (3 commits later)
test scalability / reliability / security
find, file and fix bugs
file and implement feature requests
improve docs
contribute or review code
spread the word
create dist packages
care for misc. platforms (windows)
donate funds via bountysource
tw @ waldmann-edv . de
Thomas J Waldmann @ twitter