groff 1.23.0 added .MR to its -man macro package. The NEWS file states
that the inclusion of the macro "was prompted by its introduction to
Plan 9 from User Space's troff in August 2020." From d32deab it seems
that the name for Plan 9 from User Space's implementation was suggested
by groff maintainer G. Brandon Robinson.
Not sure if the intention was to make these definitions compatible, but
it would be nice if they were.
Currently, Plan 9 from User Space's .MR expects its second argument to
be parenthesized. groff's .MR does not. This results in extra
parentheses appearing in manual references when viewing Plan 9 from User
Space's manual pages on a system using groff.
468 lines
11 KiB
Groff
468 lines
11 KiB
Groff
.TH VENTI 7
|
|
.SH NAME
|
|
venti \- archival storage server
|
|
.SH DESCRIPTION
|
|
Venti is a block storage server intended for archival data.
|
|
In a Venti server, the SHA1 hash of a block's contents acts
|
|
as the block identifier for read and write operations.
|
|
This approach enforces a write-once policy, preventing
|
|
accidental or malicious destruction of data. In addition,
|
|
duplicate copies of a block are coalesced, reducing the
|
|
consumption of storage and simplifying the implementation
|
|
of clients.
|
|
.PP
|
|
This manual page documents the basic concepts of
|
|
block storage using Venti as well as the Venti network protocol.
|
|
.PP
|
|
.MR Venti 1
|
|
documents some simple clients.
|
|
.MR Vac 1 ,
|
|
.MR vacfs 4 ,
|
|
and
|
|
.MR vbackup 8
|
|
are more complex clients.
|
|
.PP
|
|
.MR Venti 3
|
|
describes a C library interface for accessing
|
|
Venti servers and manipulating Venti data structures.
|
|
.PP
|
|
.MR Venti 8
|
|
describes the programs used to run a Venti server.
|
|
.PP
|
|
.SS "Scores
|
|
The SHA1 hash that identifies a block is called its
|
|
.IR score .
|
|
The score of the zero-length block is called the
|
|
.IR "zero score" .
|
|
.PP
|
|
Scores may have an optional
|
|
.IB label :
|
|
prefix, typically used to
|
|
describe the format of the data.
|
|
For example,
|
|
.MR vac 1
|
|
uses a
|
|
.B vac:
|
|
prefix, while
|
|
.MR vbackup 8
|
|
uses prefixes corresponding to the file system
|
|
types:
|
|
.BR ext2: ,
|
|
.BR ffs: ,
|
|
and so on.
|
|
.SS "Files and Directories
|
|
Venti accepts blocks up to 56 kilobytes in size.
|
|
By convention, Venti clients use hash trees of blocks to
|
|
represent arbitrary-size data
|
|
.IR files .
|
|
The data to be stored is split into fixed-size
|
|
blocks and written to the server, producing a list
|
|
of scores.
|
|
The resulting list of scores is split into fixed-size pointer
|
|
blocks (using only an integral number of scores per block)
|
|
and written to the server, producing a smaller list
|
|
of scores.
|
|
The process continues, eventually ending with the
|
|
score for the hash tree's top-most block.
|
|
Each file stored this way is summarized by
|
|
a
|
|
.B VtEntry
|
|
structure recording the top-most score, the depth
|
|
of the tree, the data block size, and the pointer block size.
|
|
One or more
|
|
.B VtEntry
|
|
structures can be concatenated
|
|
and stored as a special file called a
|
|
.IR directory .
|
|
In this
|
|
manner, arbitrary trees of files can be constructed
|
|
and stored.
|
|
.PP
|
|
Scores passed between programs conventionally refer
|
|
to
|
|
.B VtRoot
|
|
blocks, which contain descriptive information
|
|
as well as the score of a directory block containing a small number
|
|
of directory entries.
|
|
.PP
|
|
Conventionally, programs do not mix data and directory entries
|
|
in the same file. Instead, they keep two separate files, one with
|
|
directory entries and one with metadata referencing those
|
|
entries by position.
|
|
Keeping this parallel representation is a minor annoyance
|
|
but makes it possible for general programs like
|
|
.I venti/copy
|
|
(see
|
|
.MR venti 1 )
|
|
to traverse the block tree without knowing the specific details
|
|
of any particular program's data.
|
|
.SS "Block Types
|
|
To allow programs to traverse these structures without
|
|
needing to understand their higher-level meanings,
|
|
Venti tags each block with a type. The types are:
|
|
.PP
|
|
.nf
|
|
.ft L
|
|
VtDataType 000 \f1data\fL
|
|
VtDataType+1 001 \fRscores of \fPVtDataType\fR blocks\fL
|
|
VtDataType+2 002 \fRscores of \fPVtDataType+1\fR blocks\fL
|
|
\fR\&...\fL
|
|
VtDirType 010 VtEntry\fR structures\fL
|
|
VtDirType+1 011 \fRscores of \fLVtDirType\fR blocks\fL
|
|
VtDirType+2 012 \fRscores of \fLVtDirType+1\fR blocks\fL
|
|
\fR\&...\fL
|
|
VtRootType 020 VtRoot\fR structure\fL
|
|
.fi
|
|
.PP
|
|
The octal numbers listed are the type numbers used
|
|
by the commands below.
|
|
(For historical reasons, the type numbers used on
|
|
disk and on the wire are different from the above.
|
|
They do not distinguish
|
|
.BI VtDataType+ n
|
|
blocks from
|
|
.BI VtDirType+ n
|
|
blocks.)
|
|
.SS "Zero Truncation
|
|
To avoid storing the same short data blocks padded with
|
|
differing numbers of zeros, Venti clients working with fixed-size
|
|
blocks conventionally
|
|
`zero truncate' the blocks before writing them to the server.
|
|
For example, if a 1024-byte data block contains the
|
|
11-byte string
|
|
.RB ` hello " " world '
|
|
followed by 1013 zero bytes,
|
|
a client would store only the 11-byte block.
|
|
When the client later read the block from the server,
|
|
it would append zero bytes to the end as necessary to
|
|
reach the expected size.
|
|
.PP
|
|
When truncating pointer blocks
|
|
.RB ( VtDataType+ \fIn
|
|
and
|
|
.BI VtDirType+ n
|
|
blocks),
|
|
trailing zero scores are removed
|
|
instead of trailing zero bytes.
|
|
.PP
|
|
Because of the truncation convention,
|
|
any file consisting entirely of zero bytes,
|
|
no matter what its length, will be represented by the zero score:
|
|
the data blocks contain all zeros and are thus truncated
|
|
to the empty block, and the pointer blocks contain all zero scores
|
|
and are thus also truncated to the empty block,
|
|
and so on up the hash tree.
|
|
.SS Network Protocol
|
|
A Venti session begins when a
|
|
.I client
|
|
connects to the network address served by a Venti
|
|
.IR server ;
|
|
the conventional address is
|
|
.BI tcp! server !venti
|
|
(the
|
|
.B venti
|
|
port is 17034).
|
|
Both client and server begin by sending a version
|
|
string of the form
|
|
.BI venti- versions - comment \en \fR.
|
|
The
|
|
.I versions
|
|
field is a list of acceptable versions separated by
|
|
colons.
|
|
The protocol described here is version
|
|
.BR 02 .
|
|
The client is responsible for choosing a common
|
|
version and sending it in the
|
|
.B VtThello
|
|
message, described below.
|
|
.PP
|
|
After the initial version exchange, the client transmits
|
|
.I requests
|
|
.RI ( T-messages )
|
|
to the server, which subsequently returns
|
|
.I replies
|
|
.RI ( R-messages )
|
|
to the client.
|
|
The combined act of transmitting (receiving) a request
|
|
of a particular type, and receiving (transmitting) its reply
|
|
is called a
|
|
.I transaction
|
|
of that type.
|
|
.PP
|
|
Each message consists of a sequence of bytes.
|
|
Two-byte fields hold unsigned integers represented
|
|
in big-endian order (most significant byte first).
|
|
Data items of variable lengths are represented by
|
|
a one-byte field specifying a count,
|
|
.IR n ,
|
|
followed by
|
|
.I n
|
|
bytes of data.
|
|
Text strings are represented similarly,
|
|
using a two-byte count with
|
|
the text itself stored as a UTF-encoded sequence
|
|
of Unicode characters (see
|
|
.MR utf 7 ).
|
|
Text strings are not
|
|
.SM NUL\c
|
|
-terminated:
|
|
.I n
|
|
counts the bytes of UTF data, which include no final
|
|
zero byte.
|
|
The
|
|
.SM NUL
|
|
character is illegal in text strings in the Venti protocol.
|
|
The maximum string length in Venti is 1024 bytes.
|
|
.PP
|
|
Each Venti message begins with a two-byte size field
|
|
specifying the length in bytes of the message,
|
|
not including the length field itself.
|
|
The next byte is the message type, one of the constants
|
|
in the enumeration in the include file
|
|
.BR <venti.h> .
|
|
The next byte is an identifying
|
|
.IR tag ,
|
|
used to match responses to requests.
|
|
The remaining bytes are parameters of different sizes.
|
|
In the message descriptions, the number of bytes in a field
|
|
is given in brackets after the field name.
|
|
The notation
|
|
.IR parameter [ n ]
|
|
where
|
|
.I n
|
|
is not a constant represents a variable-length parameter:
|
|
.IR n [1]
|
|
followed by
|
|
.I n
|
|
bytes of data forming the
|
|
.IR parameter .
|
|
The notation
|
|
.IR string [ s ]
|
|
(using a literal
|
|
.I s
|
|
character)
|
|
is shorthand for
|
|
.IR s [2]
|
|
followed by
|
|
.I s
|
|
bytes of UTF-8 text.
|
|
The notation
|
|
.IR parameter []
|
|
where
|
|
.I parameter
|
|
is the last field in the message represents a
|
|
variable-length field that comprises all remaining
|
|
bytes in the message.
|
|
.PP
|
|
All Venti RPC messages are prefixed with a field
|
|
.IR size [2]
|
|
giving the length of the message that follows
|
|
(not including the
|
|
.I size
|
|
field itself).
|
|
The message bodies are:
|
|
.ta \w'\fLVtTgoodbye 'u
|
|
.IP
|
|
.ne 2v
|
|
.B VtThello
|
|
.IR tag [1]
|
|
.IR version [ s ]
|
|
.IR uid [ s ]
|
|
.IR strength [1]
|
|
.IR crypto [ n ]
|
|
.IR codec [ n ]
|
|
.br
|
|
.B VtRhello
|
|
.IR tag [1]
|
|
.IR sid [ s ]
|
|
.IR rcrypto [1]
|
|
.IR rcodec [1]
|
|
.IP
|
|
.ne 2v
|
|
.B VtTping
|
|
.IR tag [1]
|
|
.br
|
|
.B VtRping
|
|
.IR tag [1]
|
|
.IP
|
|
.ne 2v
|
|
.B VtTread
|
|
.IR tag [1]
|
|
.IR score [20]
|
|
.IR type [1]
|
|
.IR pad [1]
|
|
.IR count [2]
|
|
.br
|
|
.B VtRread
|
|
.IR tag [1]
|
|
.IR data []
|
|
.IP
|
|
.ne 2v
|
|
.B VtTwrite
|
|
.IR tag [1]
|
|
.IR type [1]
|
|
.IR pad [3]
|
|
.IR data []
|
|
.br
|
|
.B VtRwrite
|
|
.IR tag [1]
|
|
.IR score [20]
|
|
.IP
|
|
.ne 2v
|
|
.B VtTsync
|
|
.IR tag [1]
|
|
.br
|
|
.B VtRsync
|
|
.IR tag [1]
|
|
.IP
|
|
.ne 2v
|
|
.B VtRerror
|
|
.IR tag [1]
|
|
.IR error [ s ]
|
|
.IP
|
|
.ne 2v
|
|
.B VtTgoodbye
|
|
.IR tag [1]
|
|
.PP
|
|
Each T-message has a one-byte
|
|
.I tag
|
|
field, chosen and used by the client to identify the message.
|
|
The server will echo the request's
|
|
.I tag
|
|
field in the reply.
|
|
Clients should arrange that no two outstanding
|
|
messages have the same tag field so that responses
|
|
can be distinguished.
|
|
.PP
|
|
The type of an R-message will either be one greater than
|
|
the type of the corresponding T-message or
|
|
.BR Rerror ,
|
|
indicating that the request failed.
|
|
In the latter case, the
|
|
.I error
|
|
field contains a string describing the reason for failure.
|
|
.PP
|
|
Venti connections must begin with a
|
|
.B hello
|
|
transaction.
|
|
The
|
|
.B VtThello
|
|
message contains the protocol
|
|
.I version
|
|
that the client has chosen to use.
|
|
The fields
|
|
.IR strength ,
|
|
.IR crypto ,
|
|
and
|
|
.IR codec
|
|
could be used to add authentication, encryption,
|
|
and compression to the Venti session
|
|
but are currently ignored.
|
|
The
|
|
.IR rcrypto ,
|
|
and
|
|
.I rcodec
|
|
fields in the
|
|
.B VtRhello
|
|
response are similarly ignored.
|
|
The
|
|
.IR uid
|
|
and
|
|
.IR sid
|
|
fields are intended to be the identity
|
|
of the client and server but, given the lack of
|
|
authentication, should be treated only as advisory.
|
|
The initial
|
|
.B hello
|
|
should be the only
|
|
.B hello
|
|
transaction during the session.
|
|
.PP
|
|
The
|
|
.B ping
|
|
message has no effect and
|
|
is used mainly for debugging.
|
|
Servers should respond immediately to pings.
|
|
.PP
|
|
The
|
|
.B read
|
|
message requests a block with the given
|
|
.I score
|
|
and
|
|
.IR type .
|
|
Use
|
|
.I vttodisktype
|
|
and
|
|
.I vtfromdisktype
|
|
(see
|
|
.MR venti 3 )
|
|
to convert a block type enumeration value
|
|
.RB ( VtDataType ,
|
|
etc.)
|
|
to the
|
|
.I type
|
|
used on disk and in the protocol.
|
|
The
|
|
.I count
|
|
field specifies the maximum expected size
|
|
of the block.
|
|
The
|
|
.I data
|
|
in the reply is the block's contents.
|
|
.PP
|
|
The
|
|
.B write
|
|
message writes a new block of the given
|
|
.I type
|
|
with contents
|
|
.I data
|
|
to the server.
|
|
The response includes the
|
|
.I score
|
|
to use to read the block,
|
|
which should be the SHA1 hash of
|
|
.IR data .
|
|
.PP
|
|
The Venti server may buffer written blocks in memory,
|
|
waiting until after responding to the
|
|
.B write
|
|
message before writing them to
|
|
permanent storage.
|
|
The server will delay the response to a
|
|
.B sync
|
|
message until after all blocks in earlier
|
|
.B write
|
|
messages have been written to permanent storage.
|
|
.PP
|
|
The
|
|
.B goodbye
|
|
message ends a session. There is no
|
|
.BR VtRgoodbye :
|
|
upon receiving the
|
|
.BR VtTgoodbye
|
|
message, the server terminates up the connection.
|
|
.PP
|
|
Version
|
|
.B 04
|
|
of the Venti protocol is similar to version
|
|
.B 02
|
|
(described above)
|
|
but has two changes to accomodates larger payloads.
|
|
First, it replaces the leading 2-byte packet size with
|
|
a 4-byte size.
|
|
Second, the
|
|
.I count
|
|
in the
|
|
.B VtTread
|
|
packet may be either 2 or 4 bytes;
|
|
the total packet length distinguishes the two cases.
|
|
.SH SEE ALSO
|
|
.MR venti 1 ,
|
|
.MR venti 3 ,
|
|
.MR venti 8
|
|
.br
|
|
Sean Quinlan and Sean Dorward,
|
|
``Venti: a new approach to archival storage'',
|
|
.I "Usenix Conference on File and Storage Technologies" ,
|
|
2002.
|