Borg Backup

 

 

(2.0 alpha, updates & plans)

 

 

 

 

 

Thomas Waldmann (@home, 2022-07)

Borg Versions

  • Borg 1.1:
    • supported and very stable.
    • final release 1.1.18, after that only critical fixes.
       
  • Borg 1.2:
    • first releases done: 1.2.0, 1.2.1
       
  • Borg 2.0:
    • borg2 branch, will eventually get into master
    • 2.0.0a2 alpha release is out.
    • breaking changes, bleeding edge for testers!

2.0 = breaking!

  • break compatibility,  issue #6602
  • do all breaking changes in one release
  • all is new: new repos, new server, new clients
  • cli syntax cleanup
  • get rid of all legacy
  • simplify code base
  • solve some fundamental issues (pending since 7y)
  • get rid of troublemakers, stuff that blocks progress
  • no in-place repo upgrade:
    • saves developer time
    • less complex, less potential bugs, clean repos
  • but instead: new borg archive transfer command

CLI: repos + archives

  • no scp style repos any more:  user@host:path
    • no port possible
    • parser sometimes confused this with local path
  • ssh URL is better:  ssh://user@host:port/path
    • a port is possible here
    • easy to parse / disambiguate
  • archive name separate from repo, no "::" any more
    • borg -r REPO diff ARCH1 ARCH2
      • fixed amount of archs: positional params
    • borg -r REPO delete -a ARCH_GLOB --first 3
      • ​​some?:  -a 'crap-*'
      • want just one specific?:  -a only_this_one

repo vs. arch cmds

  • commands work either on repo or on archive(s)
    • borg rcreate = "repo create"  (was: borg init)
    • borg create = "archive create"
    • borg rlist = "repo list"
    • borg list = "archive list"
    • borg delete = "archive delete"
    • borg rdelete = "repo delete"
  • exception: borg check syntax is unchanged
    • works on repo and archives by default
    • --repository-only
    • --archives-only

borg repo concepts

A borg repo is a LOG.

(== stuff only gets appended at the end,
old stuff is never modified [only deleted])

A borg repo is a key/value store.

(key: chunk id = MAC(plaintext), value = ciphertext)

Low-level repo operations:

  • PUT (append a new key/value pair)
  • DELETE (register a delete for a previous put by key)
  • COMMIT (finish a transaction, state is valid now)

Segment files

  • contain a sequence of log entries created by repo ops
  • a non-compact segment contains deleted PUT entries

borg < 2.0:  PUT

PUT log entry structure:

  • crc32 = CRC32(header + content)
  • header: size of entry
  • header: tag (== PUT)
  • header: 256bit key (chunk id)
  • content: value (data)

 

Big design issue

One can not check the header only.

To check the crc32, one must read ALL: header+content

Slow if one is only interested in header values.

Correct size is important to seek to next entry.

borg2:  PUT2

PUT2 log entry structure:

  • crc32 = CRC32(header + digest)
  • header: size of entry
  • header: tag
  • header: 256bit key (chunk id)
  • digest: xxh64 = XXH64(header + content)
  • content: value (data)

 

Notable

Can check the header without reading the content.

Better error detection by stronger and super fast xxh64.

crc32 covers header+digest, digest also covers header.

Sometimes slow CRC32 impl. only used for few bytes.

Borg < 2.0 crypto

old crypto issues

  • potential nonce reuse and counter measures:
    • AES-CTR mode with 1 AES key per repo
    • counter values (IV / nonce) must never be re-used
    • complex counter management needed
    • limited trust in repository
    • local counter knowledge can be lost (e.g. disk defect)
    • multiple clients need to trust repo
    • to avoid counter issues, repos must use different keys
    • no easy replication of encrypted chunks to other repo
  • self-made layering of
    • AES256-CTR + HMAC-SHA256
    • AES256-CTR + BLAKE2b
  • there are faster ready-to-use AEAD ciphers now

borg2:  new crypto

new crypto features

  • Fixes potential nonce reuse issue:
    • random session id generated at start of a borg run
    • session key derived from session id and master key
    • counter (IV / Nonce) starts from 0 for each session
    • no counter management needed, no risk of reuse
  • OpenSSL >= 1.1.1 (including on OpenBSD), providing:
    • super fast AES256-OCB (with AES hw acceleration).
      patents first licensed to FOSS, now abandoned.
    • very fast CHACHA20-POLY1305 (pure sw implementation)
  • use AAD of AEAD cipher to protect header / chunkid
  • Argon2 KDF used for the borg key (was: pbkdf2)
  • undecided: maybe adopt BLAKE3 for the chunk id hash

hardlinks

  • borg < 2.0 approach:
    • first hardlink archived as regular file, with chunkid list
    • second hardlink archived:
      • refers back to first one by-name
      • does not have own chunkid list
    • problematic partial extraction, messy code, special cases
  • borg 2.0 approach:
    • hardlinks archived like a normal item
    • regular files / HLs always have chunkid list (1st, 2nd, ...)
    • if st_nlinks > 1:  item.hlid = H(st_dev, st_ino)
    • rule: hlid is same -> items point to same inode (are HLs).
    • symmetric:  1st HL archived the same way as 2nd, 3rd...

msgpack

  • msgpack old spec - type confusion:
    • did not differentiate between text and binary data
    • text could be encoded, but comes back as binary
    • if you get binary, it could have been binary or text
  • msgpack new spec - roundtripping done right:
    • text (str):
      • comes back as str
      • borg uses utf-8 with surrogate-escape handler
      • gets encoded/decoded automatically
    • binary (bytes):
      • comes back as bytes
      • gets stored "as is"
  • ​borg2 uses the new spec, borg < 2.0 uses old spec.

new tar formats

These were initially intended for data migration.

--tar-format=PAX

ctime and atime support, all ts in ns resolution

could support more metadata, like xattrs, ACLs, ...,
but a lot of work to implement and test

--tar-format=BORG

Like PAX plus custom BORG.* PAX headers
for perfect round-tripping
of all borg supported fs item metadata.

Copy archive from repo1 to repo2:

borg export-tar ... repo1::A | borg import-tar ... repo2::A

 

Problem: no dedup, huge amount of data

borg2 transfer

Create a related new repo:

borg --repo NEWREPO rcreate --other-repo OLDREPO --encryption CIDH

Transfer archives:

borg --repo NEWREPO transfer --other-repo OLDREPO [--dry-run]

Efficiency

  • deduplication: transfer each chunk only once
  • no expensive re-compression / re-chunking
  • but: re-encryption is required (but fast!)
  • old chunks deduplicate with future chunks, requires: related repo (key material), CIDH = compat. ID hash
  • some (cheap) data conversions done on the fly:
    cleanups, type conversions, msgpack changes

release N+1 plans

  • in 2.0 we needed to keep some of the old stuff:
    • borg transfer needs to read old repos / archives
    • users need to transfer their archives to new repos
  • in N+1 (2.1?) we can remove:
    • AES-CTR mode, counter management code
    • old style keys (pbkdf2, Encrypt and MAC)
    • code for old repo index, old chunks index
    • PUT(1) repo code
    • zlib type bytes hack
    • bigint stuff (replaced by msgpack Timestamp)
    • msgpack-related "good that we know the type"
    • hardlink_master processing
    • old RPC protocols
    • borg transfer / item code that converts old to new

Support the Borg

Contributions are welcome!

 

Code, documentation, review, testing, funding, ...

 

Just join us on GitHub and LiberaChat IRC #borgbackup.

 

 

Donations please via LiberaPay or BountySource:

 

https://www.borgbackup.org/support/fund.html

 

For more information:

borgbackup.org

Questions / Feedback?

  • tw @ waldmann-edv . de

  • Thomas J Waldmann @ twitter

BorgBackup 2.0 alpha

By Thomas Waldmann

BorgBackup 2.0 alpha

borgbackup update

  • 763