Venti: a new approach to archival storage
Introduction & background
The Venti Archival Server
Applications
Implementation
Performance
Reliability & recovery
Related & future work
Archival storage is a second class functionality for current computer environments.
The storage capacity exceeds the ability of many users to generate data, making it practical to archive data in perpetuity.
Write-once policy
A prevalent form of archival storage is magnetic tape.
Restoring data from a tape can be tedious and error prone.
A trade off exist between performance of backup and restore operations.
Snapshots avoid the trade off between full and incremental backups.
Venti is a block-level network storage system.
It identifies data blocks by a hash of their contents.
Write-once, data replication is idempotent.
Though, magnetic disk storage is not as stable or permanent as optical media.
Using magnetic disks for Venti has the benefit of reducing the disparity in performance between conventional and archival storage.
Applications use the block level service provided by Venti to store more complex data structures.
Data is divided into blocks and written to the server. To enable this data to be retrieved, the application must record the fingerprints into additional blocks.
Vac is an application for storing a collection of files and directories as a single object.
An important attribute of vac is that it writes each file as a separate collection of Venti blocks, thus ensuring that duplicate copies of a file will be coalesced on the server.
Vac also implements an incremental option based on the file modification times.
In this alternative, the disk blocks that make up the file system are directly copied without interpretation. This enables simplicity and potentially much higher throughput.
The simplest form of physical backup is to copy the raw contents of the disk drives to Venti. Main advantage: coalescing duplicate blocks.
The new version of Plan 9 uses Venti instead of an optical jukebox. This equalizes access to active and archival view of the file system. It also allows the cache to be quite small.
The implementation uses an append-only log of data blocks and an index that maps fingerprints to locations in this log. One main goal of the prototype is robustness.
Storage of data blocks is separated from the indexes used to locate them. In particular, blocks are stored in an append-only log on a RAID array of disk drives.
To ease maintenance, the log is divided into self-contained structures called arenas. Each arena contains a large number of data blocks and is sized to facilitate operations such as copying to removable media.
Data blocks are variable sized up to a current limit of 52 Kb. Each block is prefixed with a header that describes the contents of the block. The header provides integrity checking.
Client
Client
Client
Network
Block Cache
Index Cache
Index
data log
arena
Data blocks
header
Directory
Trailer
magic |
---|
fingerprint |
TYPE |
size |
user |
wtime |
encoding |
esize |
The uncached sequential read performance is
particularly bad. The problem is that these sequential reads require a random read of the index.
One possible solution is a form of read-ahead.When reading a block from the data log, it is feasible to
also read several following blocks. These extra blocks
can be added to the caches without referencing the
index.
Integrity checking and error recovery is of fundamental importance. There are several tools implemented along with Venti to achieve this: verifying the structure of the arena, checking a one on one relation between data blocks and entries in the data log and copying an arena to removable media.
There is also a type identifier associated with each block, this integer is included in every write or read operation and has the effect of partitioning the server into multiple independent domains.
There are several systems similar to Venti. For example the Stanford Archival Vault, that unlike Venti, it has no way to share data between objects that are partially the same. Another system is the Read-Only Secure File System though, the focus of this system is security and not archival storage. Finally the Elephant file system, could incorporate Venti as the storage device for the permanent versions of files.