Borg Backup
(diving deeper and FAQ)
Thomas Waldmann (@home, 2021-07)
Borg Internals
Some insights into borg
architecture and data structures.
For more details, see our docs.
Borg Architecture
Borg encryption
Repo: Object Graph
borg compact
(borg 1.2)
previously: compact_segments()
Ideas:
log-like
(append, never modify in-place)
transactions
(completed on commit)
FAQ
Some stuff that comes up again and again.
More details about this are in our docs.
Also, check the github issue tracker
in case you run into a problem.
Repos - 1 or N?
"One big or multiple smaller repos?"
Pro One
- Deduplication
- fewer repos to manage / fewer keys to back up
Pro Multiple
- less risk
- more parallel operations
- less memory needs at a time
- faster checks (and other whole-repo ops)
- more fine granular access management
Repo Copies?
"rsync / rclone vs. borg to another place?"
"rsync / rclone"
- client → borg → repo1 → rsync → repo1'
- + other target requirements (no borg required)
- + less time needed on borg client
"borg directly to multiple target repos"
- client → borg → repo1
- client → borg → repo2
- + independent backups, no error propagation
- + no AES-CTR management related issues
- note: for non-first backups, borg is rather quick.
Chunkers
Borg deduplicates based on chunks (not: whole files).
buzhash chunker (content-defined chunking):
rolling hash computed over window
window rolling over whole input file in 1 byte steps
if hash(window) & bitmask == 0: cut a chunk!
fixed chunker (borg 1.2+, fixed size chunks):
cutting a block device into blocks
cutting a LV in LEs
cutting a (fixed record size) DB into records
buzhash chunker
borg create --chunker-params=PARAMS ...
buzhash,19,23,21,4095 (variable size, default)
min 2^19, max 2^23, target 2^21 bytes chunks
(produces chunks 0.5MiB <= target 2MiB <= 8MiB)
window size 4095 bytes
large chunks → low management overhead
buzhash,10,23,16,4095 (variable size)
produces chunks 1kiB <= target 64kiB <= 8MiB
small chunks → high management overhead
fixed chunker
borg create --chunker-params=PARAMS ...
fixed,4194304 (fixed chunk size)
fixed blocks of 4MiB size (e.g. LVM LEs)
fixed,65536,4096 (fixed size w/ header)
4kiB header followed by 64kiB blocks
Faster / way less CPU than buzhash, good if contents do not shift inside the input file (no insertions / deletions).
New in borg 1.2.
chunking pitfalls
small chunks:
- might dedupe better / small granularity
- more chunks, bigger chunks index (RAM, disk)
big chunks:
- might dedupe worse / big granularity
- fewer chunks, smaller chunks index (RAM, disk)
problems usually on unbalanced systems:
lots of data, little RAM - amplified by small chunks.
keep in mind: each file will be at least 1 chunk!
So, if you have a lot of small files, the typical chunk size will be smaller than the chunker target size.
Indexes
chunks index (client): chunkid → (refcount, size, csize)
Chunk presence detection / reference counting /
garbage collection and statistics.
If lost, can be rebuilt from archives in repo.
repo index (server): chunkid → (segment, offset)
Find segment file and offset in there to read a chunk.
If lost, can be rebuilt from segment files.
Indexes implemented in C as in-memory hashtables.
Smaller chunks → more chunks → more memory usage!
Space vs. Time
If the chunks index (client) gets out of sync
with the repo, it needs to get rebuilt.
Fast way / needs much space on client:
Use chunks.archive.d/* (cached per-archive chunks indexes) to avoid having to query remote repo.
Space requirement is O(#archives * #chunks).
Slow way / needs less space on client:
$ rm -rf chunks.archive.d ; touch chunks.archive.d
Fetches all archive metadata from remote repo
to rebuild master chunks index.
Files Cache
H(fullpath) → (size, ctime, inode, chunkids)
Processing a backup input file
stat(fullpath) and lookup H(fullpath) in files cache.
Miss → new or renamed file, read / chunk it and remember it in new cache entry.
Hit, but size, ctime, inode changed → file was changed, process like new file.
Hit and size, ctime, inode match → file is unchanged!
FAST: No need to read the file to add it to the archive, just use the cached chunkids!
Note: In any case, borg needs to read flags, xattrs, acls from the filesystem.
Files Cache hits
H(fullpath) is the cache lookup key.
ctime/mtime,inode,size must match.
For fast backups, make sure that:
- the full absolute path does not accidentally change.
⚡️using different mountpoints - file ctimes behave normal.
⚡️chmod/chown/chgrp -R ... changes lots of ctimes - file mtimes behave normal.
⚡️userspace tools fooling around with mtimes - inode numbers are stable.
⚡️network fs often do not have stable inodes
Tweaking via: --files-cache=ctime/mtime,inode,size
FC RAM needs
If you have a lot of files,
the files cache can grow rather large
(RAM and disk space).
env var BORG_FILES_CACHE_TTL [20]
files not seen for N times are removed from cache.
adjust to at least # of backup input data sets.
env var BORG_FILES_CACHE_SUFFIX [None]
use multiple files caches instead of a single one.
lower memory usage by keeping the files caches separate (e.g. per data set).
a FC curiosity
File(s) with newest timestamp are not put into the FC.
A failure scenario:
- newest file changed at time T
- snapshot at time T (within ts granularity)
- file changed again at time T (within ts granulary)
- borg backs up the snapshot, fc knows file with ts T
- later borg does another backup, fs file has ts T
- borg would think file is unchanged, because
files cache file timestamp T == fs file ts T
Optimisation: touch /backupdata/dummyfile
dealing w/ defects
Borg does a lot of checksumming,
thus detects issues often before otherwise noted.
First, hw must work ok:
bad RAM (or CPU or mainboard): memtest86+
bad hdd / ssd: smartctl -t long / -a
replace any bad hardware
Then:
borg check [--repair] REPO
For more information:
borgbackup.org
Questions / Feedback?
-
tw @ waldmann-edv . de
-
Thomas J Waldmann @ twitter
BorgBackup - diving deeper and FAQ
By Thomas Waldmann
BorgBackup - diving deeper and FAQ
borgbackup, diving deeper and discussing some FAQs.
- 1,945