Venti: a new approach to archival storage

Contents

  • Introduction & background

  • The Venti Archival Server

  • Applications

  • Implementation

  • Performance

  • Reliability & recovery

  • Related & future work

  • Conclusion & Critic

Introduction & background

Introduction & background

 

Archival storage is a second class functionality for current computer environments.

 

The storage capacity exceeds the ability of many users to generate data, making it practical to archive data in perpetuity.

 

Write-once policy

Introduction & background

 

A prevalent form of archival storage is magnetic tape.

Restoring data from a tape can be tedious and error prone.

A trade off exist between performance of backup and restore operations.

Introduction & background

 

Snapshots avoid the trade off between full and incremental backups.

The Venti Archival Server

The Venti Archival Server

 

Venti is a block-level network storage system.

It identifies data blocks by a hash of their contents.

Write-once, data replication is idempotent.

The Venti Archival Server

 

Though, magnetic disk storage is not as stable or permanent as optical media.

Using magnetic disks for Venti has the benefit of reducing the disparity in performance between conventional and archival storage.

Applications

Applications

Applications use the block level service provided by Venti to store more complex data structures.

Data is divided into blocks and written to the server.  To enable this data to be retrieved, the application must record the fingerprints into additional blocks.

Root H(p_0)
RootH(p0)Root H(p_0)
H(p_1)
H(p1)H(p_1)
H(p_2)
H(p2)H(p_2)
\quad...\quad
...\quad...\quad
H(p_1)
H(p1)H(p_1)
H(p_2)
H(p2)H(p_2)
\quad...\quad
...\quad...\quad
D_0
D0D_0
D_1
D1D_1
D_6
D6D_6
D_7
D7D_7
D_8
D8D_8
Root H(p_3)
RootH(p3)Root H(p_3)
P_0
P0P_0
P_1
P1P_1
...
......
...
......
H(p_1)
H(p1)H(p_1)
H(p_2)
H(p2)H(p_2)
\quad...\quad
...\quad...\quad
P_2
P2P_2
H(p_1)
H(p1)H(p_1)
H(p_2)
H(p2)H(p_2)
\quad...\quad
...\quad...\quad
P_3
P3P_3
H(p_1)
H(p1)H(p_1)
H(p_2)
H(p2)H(p_2)
P_4
P4P_4

Vac is an application for storing a collection of files and directories as a single object. 

VAC

An important attribute of vac is that it writes each file as a separate collection of Venti blocks, thus ensuring that duplicate copies of a file will be coalesced on the server. 

Vac also implements an incremental option based on the file modification times. 

Applications

In this alternative, the disk blocks that make up the file system are directly copied without interpretation. This enables simplicity and potentially much higher throughput.

Physical backup

The simplest form of physical backup is to copy the raw contents of the disk drives to Venti. Main advantage: coalescing duplicate blocks.

Applications

The new version of Plan 9 uses Venti instead of an optical jukebox. This equalizes access to active and archival view of the file system.  It also allows the cache to be quite small.

Plan 9 file system

Applications

Implementation

Implementation

 

The implementation uses an append-only log of data blocks and an index that maps fingerprints to locations in this log. One main goal of the prototype is robustness.

Storage of data blocks is separated from the indexes used to locate them. In particular, blocks are stored in an append-only log on a RAID array of disk drives. 

To ease maintenance, the log is divided into self-contained structures called arenas. Each arena contains a large number of data blocks and is sized to facilitate operations such as copying to removable media. 

Data blocks are variable sized up to a current limit of 52 Kb. Each block is prefixed with a header that describes the contents of the block. The header provides integrity checking

Implementation

 

Implementation

 

Client

Client

Client

Network

Block Cache

Index Cache

Index

Implementation

 

arena_0
arena0arena_0
arena_0
arena0arena_0
\quad...\quad
...\quad...\quad
\quad...\quad
...\quad...\quad

data log

arena

Data blocks

header

Directory

Trailer

\quad...\quad
...\quad...\quad
\quad...\quad
...\quad...\quad
header_2
header2header_2
offset
offsetoffset
header_1
header1header_1
offset
offsetoffset
header_1
header1header_1
\quad data\quad
data\quad data\quad
header_2
header2header_2
\quad data\quad
data\quad data\quad
magic
fingerprint
TYPE
size
user
wtime
encoding
esize

Implementation

 

Performance

Performance

 

The uncached sequential read performance is

particularly bad. The problem is that these sequential reads require a random read of the index.

One possible solution is a form of read-ahead.When reading a block from the data log, it is feasible to

also read several following blocks. These extra blocks

can be added to the caches without referencing the

index.

Performance

 

Performance

 

Performance

 

Reliability & Recovery

Reliability & Recovery

 

Integrity checking and error recovery is of fundamental importance. There are several tools implemented along with Venti to achieve this: verifying the structure of the arena, checking a one on one relation between data blocks and entries in the data log and copying an arena to removable media. 

Reliability & Recovery

 

There is also a type identifier associated with each block, this integer is included in every write or read operation and has the effect of partitioning the server into multiple independent domains.  

Related & future work

Related & Future work

 

There are several systems similar to Venti. For example the Stanford Archival Vault, that unlike Venti, it has no way to share data between objects that are partially the same. Another system is the Read-Only Secure File System though, the focus of this system is security and not archival storage. Finally the Elephant file system, could incorporate Venti as the storage device for the permanent versions of  files.

Related & Future work

 

  • Venti could be distributed across multiple machines

  • Venti provides little security

  • Similarities on files (what if a data block shifts)

Conclusion & Critic

Conclution & Critic

 

 

  •  The use of disk technologies overly complicates random reads and writes, maybe solid state technologies should be explored.
  •  The performance penalty of archiving new data blocks is too large for a system whose logic is perpetual archival of information.
  •  The authors say that performance is not as good as they expected, but that the results look promising, whereas,  they do not abound in the reasons why they say so.
  • It is not easy to implement on any given application, the application must be suitable for Venti's properties and behavior.
  • Full index checkup implementation is frankly naïve.

Thank you.

deck

By Luis Roman

deck

  • 1,351