Versioned Nodes Technical Introduction
Versioned nodes is a computer storage paradigm enabling distributed collaboration, either in real time or with disconnected operation.
It is designed to replace file systems, object stores, and some databases. Collaboration by exchanging files is asynchronous, but without version control or extreme care results in a mess of different conflicting versions. Centralized databases provide for real-time collaboration, but typically require constant connectivity to edit (and often view) their data. Versioned nodes allow for both real-time and asynchronous collaboration in a single paradigm, without requiring centralization or constant connectivity.
Versioned nodes are more akin to files in a vast file system than any contemporary database system. However, the paradigm departs from files in a number of ways, the three most important of which are:
- Where files require external version control systems, all modifiable nodes are inherently version controlled by containing references to the parent node (or nodes) that was modified to create that version.
- Access to files is controlled soley by the software of the machines that contain them, and encryption is an after-thought. Read access to a node requires not only possession of the node, but a shared key to decrypt it. Likewise, write access to a modifiable node requires a secret key to sign the newly created node.
- Files contain only data and thus can reference other files only in ad-hoc and indirect ways, and can not convey access to that other file. In contrast, nodes contain data, unforgeable references to other nodes, the shared keys required to read those other nodes, and, optionally, the secret keys required to update them.
These differences make versioned nodes as suited to a loosely connected network of machines with a variety of operators as files are suited to a single operator on a single machine.
Overview
This is a technical descriptions of the versioned nodes paradigm. As such it is focused on the paradigm, not the implementation or use of that paradigm.
Capabilities
Versioned nodes uses capabilities to provide access control. To fetch a
specific node or verify its authenticity, you must have the string of
bytes that forms a fetch
capability. For immutable nodes this is a
cryptographic hash, for mutable nodes it is either a signature
(identifying a specific version) or public key (identifying the braid of
versions).
Likewise, read the data on that node, you need its read
capability (a
shared encryption key). Creating a new version of a (mutable) node
requires a write
capability. You also need the write
capability to
read any write
capabilities embedded in the node.
If you have those capabilities you than can perform those actions, if you don’t, you can’t.
Generations
Versioned nodes makes an explicit affordance for a variety of underlying exchange formats, encryption algorithms, compression, and so forth. We call each combination of these a generation. A node and the capabilities to operate on that node have the same generation.
Nodes
Nodes are a fusion of some opaque data, an ordered list of parts (edges to nodes that are contained by this node), an ordered list of links (edges to nodes that are referenced by this node). Mutable nodes add a list of parents, the immediate predecessors of this node.
Nodes should be limited in size. Both in the number of links and the amount of data they hold. Most exchange formats will require the entire node loaded into memory to verify the authenticity of any piece of it.
Large amounts of data may be stored under a single node, by splitting it into smaller chunks, and constructing a tree of nodes. The leaves of the tree would contain only chunks of data, while the branches would list their child nodes as parts. See the byte sequence format for a one such node data format.
Blobs
The simplest nodes are known as blobs. They are immutable, in the sense
that once a fetch
capability has been created for a blob, it will
always point at the same data, parts, and links.
struct Blob {
caps: Caps<BlobId>,
data: Bytes,
parts: Vec<Edge>,
links: Vec<Edge>,
}
Blobs may independent (with no write
capability) or associated with a
braid (with the braid’s write
capability). They are typically
content-addressed; the same write
capability, data, parts, and links
will result in the same fetch
and read
capabilities.
The edges of independent blobs cannot contain write
capabilities.
Versions
Versions form the basis of modifiable nodes. While each version node is
itself immutable, as the version fetch
capability will always return
that version, a braid of versions share a single braid fetch
capability that addresses all of them.
struct Version {
caps: Caps<VersionId>,
braid: BraidId,
data: Bytes,
parts: Vec<Edge>,
links: Vec<Edge>,
parents: Vec<Parent>,
}
The parents of a version allow the system to build a directed acyclic graph of versions, and track a set of versions that don’t have other versions marking them as parents. If there are multiple latest versions in that set, the application can then merge them, possibly with the assistance of the operator.
It is usually desirable for exchange formats to deterministically derive the version identifiers. This allows multiple programs to automatically reconcile diverged versions. So long as the data, parts, links, and parents are the same, they would produce the same identifier.
Edges
The parts and links on a node are an ordered list of edges to other nodes. Each edge pairs some data with the capabilities that convey access that another node.
struct Edge {
data: Bytes,
caps: Caps<Id>,
}
struct Caps<T> {
fetch: T,
read: ReadCap,
write: Option<WriteCap>,
}
struct Parent {
fetch: VersionId,
read: Option<ReadCap>,
}
Exchange formats partially encrypt part edges, leaving only the fetch
capabilities exposed, and entirely encrypt link edges. This reflects a
trade-off between static confidentiality and exchange performance
(especially over high-latency networks). Links will generally only be
fetched from the network upon demand, while parts can be preemptively
fetched.
The Programming Interface
Most programs should not need to interact with the various exchange formats of nodes that go on the wire or are stored to disk. Instead, they interact with the decrypted and normalized nodes presented in the previous section.
Again, this lays out the basic paradigm, and makes some recommendations, and is not intended to be a formal specification.
Opening Nodes
The central mechanism is to fetch and read the content of a node.
For this the application requires the fetch
capability to identify the
node, the read
capability to decrypt it, and, optionally, the write
capability to decrypt other write
capabilities.
When the fetch
capability refers to a blob, there’s no ambiguity what
is returned.
fn open_blob (caps: Caps<BlobId>) ->
Result<Blob, Error>;
When the fetch
capability refers to a specific version, we don’t know
(necessarily) know what braid that version is part of. So the system
should return that too.
fn open_version (caps: Caps<VersionId>) ->
Result<(BraidId, Version), Error>;
When the fetch
capability refers to a braid of versions, things or
even more uncertain. The system (and format) might implement
migrations, and the version returned could belong to a completely
separate braid in addition to not knowing which version will be
returned.
fn open_braid (caps: Caps<BraidId>) ->
Result<(Caps<BraidVersionId>, Version), Error>;
struct BraidVersionId {
braid: BraidId,
version: VersionId,
}
Fetch and Read Errors
There are two error conditions for fetching nodes.
The more common would be a timeout, that the node service could not find the requested node in the time available. This could because the requested node is not present on the local machine or any of its available peers, or the requested node could simply not exist.
Another option is that there is a mismatch between the capabilities
presented. If the API doesn’t use the abbreviated Caps
, this could be
capabilities with different generations. If the capabilities have the
same generation, the read
or write
capability may fail authenticated
decryption, indicating that either the capability or the fetched node is
invalid.
Creating Nodes
A system to open nodes avails us little without some way to create them.
Creating a blob requires the contents of the blob and an optional
write
capability. The system can create a new one if needed, or not
have one at all if there are no write
capabilities being saved. The
read
and fetch
capabilities are derived from the contents of the
blob, and need to be returned to the application.
fn create_blob(braid: Option<WriteCap>
, data: Bytes, parts: Vec<Edge>, links: Vec<Edge>)
-> Result<Caps<BlobCap>, Error>;
Creating a new version of a braid requires the contents of the version,
what braid that version is for, and the read
and write
capabilities
for that version.
fn create_version(read: ReadCap, braid: WriteCap
, data: Bytes, parts: Vec<Edge>, links: Vec<Edge>
, parents: Vec<Parent>)
-> Result<Caps<VersionCap>, Error>;
But if there’s no existing braid, we need to be able to create one. We may want to specify the generation of braid to create, or just use the default.
fn create_braid(generation: Optional<Generation>)
-> Result<Caps<BraidId>, Error>;
Querying for Generations
Different generations may have different properties. They’ll use different cryptography, maybe with post-quantum signatures. Some might not encrypt the data on the node (e.g. so they can be sent via amateur radio where encryption is disallowed), some may compress the nodes. They may have different maximum amounts of data or numbers of edges. Determinism may be important, or not.
Many applications won’t especially care about all this, but some might, and systems should support queries for specific properties. Or even support a sophisticated user to specify what generation they want when the application queries.
Most applications though, should continue using the generation of the nodes the shell gives them to work with. Whether that’s an empty new braid or an existing one.
Finding Versions of a Braid
In order to merge concurrent updates, programs need to be able to fetch
a list of current tips. For some formats, this only requires the fetch
capability for the braid, but for others might encrypt the version
history and require the read
capability to decrypt.
fn fetch_braid_versions(braid: Caps<BraidId>)
-> Result<Vec<VersionCap>, Error>
Subscribing to a Braid
Some programs may want notifications of when new versions of a braid are available.
fn subscribe(braid: Caps<BraidId>)
-> Stream<Caps<BraidVersionId>>
Sharing or Saving
How can a program that holds no capabilities to existing nodes save its output such that the operator can find it again? Similarly, how do you share capabilities across other, non-node, channels?
fn share_node(braid: Caps<Id>) -> Result<(),Error>
Shells
There is a bunch of access that applications don’t need directly, but supports the applications and management of a system working with versioned nodes. Things like organizing nodes, managing connections to peers (both machines and people). Shells manage this sort of thing, and systems should have a separate programming interface for shells to interact with that is not available to general applications.
Migration
Braids of versions may be finalized, indicating that no version that is
not an ancestor of the final version should be accepted and indicating a
successor braid to continue modifications on. This allows transition
between generations, and recovery from compromised write
capabilities.
Finalizing a braid requires the migrate
capability. This is typically
unavailable to programs directly and split into a node-specific part
that is stored on the originating machine (and possibly shared with the
operator’s other machines) and an operator-specific part that is either
separately encrypted on the machine(s) or potentially stored in an
external secure enclave.
Not all generations will implement migration, see converge for one that does.