Borg Backup

(2.0 alpha, updates & plans)

Thomas Waldmann (@home, 2022-07)

Borg Versions

Borg 1.1:
- supported and very stable.
- final release 1.1.18, after that only critical fixes.
Borg 1.2:
- first releases done: 1.2.0, 1.2.1
Borg 2.0:
- borg2 branch, will eventually get into master
- 2.0.0a2 alpha release is out.
- breaking changes, bleeding edge for testers!

2.0 = breaking!

break compatibility, issue #6602
do all breaking changes in one release
all is new: new repos, new server, new clients
cli syntax cleanup
get rid of all legacy
simplify code base
solve some fundamental issues (pending since 7y)
get rid of troublemakers, stuff that blocks progress
no in-place repo upgrade:
- saves developer time
- less complex, less potential bugs, clean repos
but instead: new borg archive transfer command

CLI: repos + archives

no scp style repos any more: user@host:path
- no port possible
- parser sometimes confused this with local path
ssh URL is better: ssh://user@host:port/path
- a port is possible here
- easy to parse / disambiguate
archive name separate from repo, no "::" any more
- borg -r REPO diff ARCH1 ARCH2
  - fixed amount of archs: positional params
- borg -r REPO delete -a ARCH_GLOB --first 3
  - some?: -a 'crap-*'
  - want just one specific?: -a only_this_one

repo vs. arch cmds

commands work either on repo or on archive(s)
- borg rcreate = "repo create" (was: borg init)
- borg create = "archive create"
- borg rlist = "repo list"
- borg list = "archive list"
- borg delete = "archive delete"
- borg rdelete = "repo delete"
exception: borg check syntax is unchanged
- works on repo and archives by default
- --repository-only
- --archives-only

borg repo concepts

A borg repo is a LOG.

(== stuff only gets appended at the end,
old stuff is never modified [only deleted])

A borg repo is a key/value store.

(key: chunk id = MAC(plaintext), value = ciphertext)

Low-level repo operations:

PUT (append a new key/value pair)
DELETE (register a delete for a previous put by key)
COMMIT (finish a transaction, state is valid now)

Segment files

contain a sequence of log entries created by repo ops
a non-compact segment contains deleted PUT entries

borg < 2.0: PUT

PUT log entry structure:

crc32 = CRC32(header + content)
header: size of entry
header: tag (== PUT)
header: 256bit key (chunk id)
content: value (data)

Big design issue

One can not check the header only.

To check the crc32, one must read ALL: header+content

Slow if one is only interested in header values.

Correct size is important to seek to next entry.

borg2: PUT2

PUT2 log entry structure:

crc32 = CRC32(header + digest)
header: size of entry
header: tag
header: 256bit key (chunk id)
digest: xxh64 = XXH64(header + content)
content: value (data)

Notable

Can check the header without reading the content.

Better error detection by stronger and super fast xxh64.

crc32 covers header+digest, digest also covers header.

Sometimes slow CRC32 impl. only used for few bytes.

Borg < 2.0 crypto

old crypto issues

potential nonce reuse and counter measures:
- AES-CTR mode with 1 AES key per repo
- counter values (IV / nonce) must never be re-used
- complex counter management needed
- limited trust in repository
- local counter knowledge can be lost (e.g. disk defect)
- multiple clients need to trust repo
- to avoid counter issues, repos must use different keys
- no easy replication of encrypted chunks to other repo
self-made layering of
- AES256-CTR + HMAC-SHA256
- AES256-CTR + BLAKE2b
there are faster ready-to-use AEAD ciphers now

borg2: new crypto

new crypto features

Fixes potential nonce reuse issue:
- random session id generated at start of a borg run
- session key derived from session id and master key
- counter (IV / Nonce) starts from 0 for each session
- no counter management needed, no risk of reuse
OpenSSL >= 1.1.1 (including on OpenBSD), providing:
- super fast AES256-OCB (with AES hw acceleration).
  patents first licensed to FOSS, now abandoned.
- very fast CHACHA20-POLY1305 (pure sw implementation)
use AAD of AEAD cipher to protect header / chunkid
Argon2 KDF used for the borg key (was: pbkdf2)
undecided: maybe adopt BLAKE3 for the chunk id hash

hardlinks

borg < 2.0 approach:
- first hardlink archived as regular file, with chunkid list
- second hardlink archived:
  - refers back to first one by-name
  - does not have own chunkid list
- problematic partial extraction, messy code, special cases
borg 2.0 approach:
- hardlinks archived like a normal item
- regular files / HLs always have chunkid list (1st, 2nd, ...)
- if st_nlinks > 1: item.hlid = H(st_dev, st_ino)
- rule: hlid is same -> items point to same inode (are HLs).
- symmetric: 1st HL archived the same way as 2nd, 3rd...

msgpack

msgpack old spec - type confusion:
- did not differentiate between text and binary data
- text could be encoded, but comes back as binary
- if you get binary, it could have been binary or text
msgpack new spec - roundtripping done right:
- text (str):
  - comes back as str
  - borg uses utf-8 with surrogate-escape handler
  - gets encoded/decoded automatically
- binary (bytes):
  - comes back as bytes
  - gets stored "as is"
borg2 uses the new spec, borg < 2.0 uses old spec.

new tar formats

These were initially intended for data migration.

--tar-format=PAX

ctime and atime support, all ts in ns resolution

could support more metadata, like xattrs, ACLs, ...,
but a lot of work to implement and test

--tar-format=BORG

Like PAX plus custom BORG.* PAX headers
for perfect round-tripping
of all borg supported fs item metadata.

Copy archive from repo1 to repo2:

borg export-tar ... repo1::A | borg import-tar ... repo2::A

Problem: no dedup, huge amount of data

borg2 transfer

Create a related new repo:

borg --repo NEWREPO rcreate --other-repo OLDREPO --encryption CIDH

Transfer archives:

borg --repo NEWREPO transfer --other-repo OLDREPO [--dry-run]

Efficiency

deduplication: transfer each chunk only once
no expensive re-compression / re-chunking
but: re-encryption is required (but fast!)
old chunks deduplicate with future chunks, requires: related repo (key material), CIDH = compat. ID hash
some (cheap) data conversions done on the fly:
cleanups, type conversions, msgpack changes

release N+1 plans

in 2.0 we needed to keep some of the old stuff:
- borg transfer needs to read old repos / archives
- users need to transfer their archives to new repos
in N+1 (2.1?) we can remove:
- AES-CTR mode, counter management code
- old style keys (pbkdf2, Encrypt and MAC)
- code for old repo index, old chunks index
- PUT(1) repo code
- zlib type bytes hack
- bigint stuff (replaced by msgpack Timestamp)
- msgpack-related "good that we know the type"
- hardlink_master processing
- old RPC protocols
- borg transfer / item code that converts old to new

Support the Borg

Contributions are welcome!

Code, documentation, review, testing, funding, ...

Just join us on GitHub and LiberaChat IRC #borgbackup.

Donations please via LiberaPay or BountySource:

https://www.borgbackup.org/support/fund.html

For more information:

borgbackup.org

Questions / Feedback?

tw @ waldmann-edv . de
Thomas J Waldmann @ twitter