Versioned Nodes Byte Trees
Byte trees are a general purpose format for storing byte-oriented data in versioned nodes.
Format
The general idea is a tree of nodes. Each node may hold some data in its body, which is logically followed by the data held by the nodes linked to by its parts.
Body
The body directly holds zero or more bytes of the stored data.
Parts
The data section of each part is an 8-bit variable length quantity, indicating the total length to use of the linked node’s data. As the size must be greater than zero, the stored value is one less than the actual value.
Note: That value may not match the actual size of the data under the linked node. Extra data is ignored and insufficient data is padded with zeros, much like holes in a sparse file.
VLQ8
VLQ8 is a non-self-terminating encoding for natural numbers. Conceptually it maps natural numbers to known-length byte strings sorted from shortest to longest, and then lexiographically. That is, 0 is the empty byte string, 1 is the byte string containing a single zero byte, 257 is the byte string containing two empty bytes, and so forth.
toVLQ8 = rec [] where
rec xs x = if x == 0 then xs else
rec (((x-1) `mod` 256) : xs) ((x-1) `div` 256)
Links
Links should be empty.
Discussion
There are several possible strategies for writing data into a byte tree. Branch nodes may or may not contain data and the branching style is arbitrary.
Storing data in the branch nodes decreases both the total number of nodes and the time to access data near the beginning (and scattered throughout). However, that also increases write costs, as that data must be rewritten when its children change.
All the usual tradeoffs apply to the branching styles - balanced trees have better worst-case charactaristics, but specifically unbalanced trees (like leaf nodes as the first and last children of each branch) can handle some common access patterns faster (on average).