Borg Backup
(2.0 alpha, updates & plans)
Thomas Waldmann (@home, 2022-07)
Borg Versions
- Borg 1.1:
- supported and very stable.
- final release 1.1.18, after that only critical fixes.
- Borg 1.2:
- first releases done: 1.2.0, 1.2.1
- first releases done: 1.2.0, 1.2.1
- Borg 2.0:
- borg2 branch, will eventually get into master
- 2.0.0a2 alpha release is out.
- breaking changes, bleeding edge for testers!
2.0 = breaking!
- break compatibility, issue #6602
- do all breaking changes in one release
- all is new: new repos, new server, new clients
- cli syntax cleanup
- get rid of all legacy
- simplify code base
- solve some fundamental issues (pending since 7y)
- get rid of troublemakers, stuff that blocks progress
- no in-place repo upgrade:
- saves developer time
- less complex, less potential bugs, clean repos
- but instead: new borg archive transfer command
CLI: repos + archives
- no scp style repos any more: user@host:path
- no port possible
- parser sometimes confused this with local path
- ssh URL is better: ssh://user@host:port/path
- a port is possible here
- easy to parse / disambiguate
- archive name separate from repo, no "::" any more
- borg -r REPO diff ARCH1 ARCH2
- fixed amount of archs: positional params
- borg -r REPO delete -a ARCH_GLOB --first 3
- some?: -a 'crap-*'
- want just one specific?: -a only_this_one
- borg -r REPO diff ARCH1 ARCH2
repo vs. arch cmds
- commands work either on repo or on archive(s)
- borg rcreate = "repo create" (was: borg init)
- borg create = "archive create"
- borg rlist = "repo list"
- borg list = "archive list"
- borg delete = "archive delete"
- borg rdelete = "repo delete"
- exception: borg check syntax is unchanged
- works on repo and archives by default
- --repository-only
- --archives-only
borg repo concepts
A borg repo is a LOG.
(== stuff only gets appended at the end,
old stuff is never modified [only deleted])
A borg repo is a key/value store.
(key: chunk id = MAC(plaintext), value = ciphertext)
Low-level repo operations:
- PUT (append a new key/value pair)
- DELETE (register a delete for a previous put by key)
- COMMIT (finish a transaction, state is valid now)
Segment files
- contain a sequence of log entries created by repo ops
- a non-compact segment contains deleted PUT entries
borg < 2.0: PUT
PUT log entry structure:
- crc32 = CRC32(header + content)
- header: size of entry
- header: tag (== PUT)
- header: 256bit key (chunk id)
- content: value (data)
Big design issue
One can not check the header only.
To check the crc32, one must read ALL: header+content
Slow if one is only interested in header values.
Correct size is important to seek to next entry.
borg2: PUT2
PUT2 log entry structure:
- crc32 = CRC32(header + digest)
- header: size of entry
- header: tag
- header: 256bit key (chunk id)
- digest: xxh64 = XXH64(header + content)
- content: value (data)
Notable
Can check the header without reading the content.
Better error detection by stronger and super fast xxh64.
crc32 covers header+digest, digest also covers header.
Sometimes slow CRC32 impl. only used for few bytes.
Borg < 2.0 crypto
old crypto issues
- potential nonce reuse and counter measures:
- AES-CTR mode with 1 AES key per repo
- counter values (IV / nonce) must never be re-used
- complex counter management needed
- limited trust in repository
- local counter knowledge can be lost (e.g. disk defect)
- multiple clients need to trust repo
- to avoid counter issues, repos must use different keys
- no easy replication of encrypted chunks to other repo
- self-made layering of
- AES256-CTR + HMAC-SHA256
- AES256-CTR + BLAKE2b
- there are faster ready-to-use AEAD ciphers now
borg2: new crypto
new crypto features
- Fixes potential nonce reuse issue:
- random session id generated at start of a borg run
- session key derived from session id and master key
- counter (IV / Nonce) starts from 0 for each session
- no counter management needed, no risk of reuse
- OpenSSL >= 1.1.1 (including on OpenBSD), providing:
- super fast AES256-OCB (with AES hw acceleration).
patents first licensed to FOSS, now abandoned. - very fast CHACHA20-POLY1305 (pure sw implementation)
- super fast AES256-OCB (with AES hw acceleration).
- use AAD of AEAD cipher to protect header / chunkid
- Argon2 KDF used for the borg key (was: pbkdf2)
- undecided: maybe adopt BLAKE3 for the chunk id hash
hardlinks
- borg < 2.0 approach:
- first hardlink archived as regular file, with chunkid list
- second hardlink archived:
- refers back to first one by-name
- does not have own chunkid list
- problematic partial extraction, messy code, special cases
- borg 2.0 approach:
- hardlinks archived like a normal item
- regular files / HLs always have chunkid list (1st, 2nd, ...)
- if st_nlinks > 1: item.hlid = H(st_dev, st_ino)
- rule: hlid is same -> items point to same inode (are HLs).
- symmetric: 1st HL archived the same way as 2nd, 3rd...
msgpack
- msgpack old spec - type confusion:
- did not differentiate between text and binary data
- text could be encoded, but comes back as binary
- if you get binary, it could have been binary or text
- msgpack new spec - roundtripping done right:
- text (str):
- comes back as str
- borg uses utf-8 with surrogate-escape handler
- gets encoded/decoded automatically
- binary (bytes):
- comes back as bytes
- gets stored "as is"
- text (str):
- borg2 uses the new spec, borg < 2.0 uses old spec.
new tar formats
These were initially intended for data migration.
--tar-format=PAX
ctime and atime support, all ts in ns resolution
could support more metadata, like xattrs, ACLs, ...,
but a lot of work to implement and test
--tar-format=BORG
Like PAX plus custom BORG.* PAX headers
for perfect round-tripping
of all borg supported fs item metadata.
Copy archive from repo1 to repo2:
borg export-tar ... repo1::A | borg import-tar ... repo2::A
Problem: no dedup, huge amount of data
borg2 transfer
Create a related new repo:
borg --repo NEWREPO rcreate --other-repo OLDREPO --encryption CIDH
Transfer archives:
borg --repo NEWREPO transfer --other-repo OLDREPO [--dry-run]
Efficiency
- deduplication: transfer each chunk only once
- no expensive re-compression / re-chunking
- but: re-encryption is required (but fast!)
- old chunks deduplicate with future chunks, requires: related repo (key material), CIDH = compat. ID hash
- some (cheap) data conversions done on the fly:
cleanups, type conversions, msgpack changes
release N+1 plans
- in 2.0 we needed to keep some of the old stuff:
- borg transfer needs to read old repos / archives
- users need to transfer their archives to new repos
- in N+1 (2.1?) we can remove:
- AES-CTR mode, counter management code
- old style keys (pbkdf2, Encrypt and MAC)
- code for old repo index, old chunks index
- PUT(1) repo code
- zlib type bytes hack
- bigint stuff (replaced by msgpack Timestamp)
- msgpack-related "good that we know the type"
- hardlink_master processing
- old RPC protocols
- borg transfer / item code that converts old to new
Support the Borg
Contributions are welcome!
Code, documentation, review, testing, funding, ...
Just join us on GitHub and LiberaChat IRC #borgbackup.
Donations please via LiberaPay or BountySource:
https://www.borgbackup.org/support/fund.html
For more information:
borgbackup.org
Questions / Feedback?
-
tw @ waldmann-edv . de
-
Thomas J Waldmann @ twitter
BorgBackup 2.0 alpha
By Thomas Waldmann
BorgBackup 2.0 alpha
borgbackup update
- 808