Versioned Nodes Technical Introduction
Versioned nodes is a computer storage paradigm enabling distributed collaboration, either in real time or with disconnected operation.
It is designed to replace file systems, object stores, and some databases. Collaboration by exchanging files is an intrinsically asynchronous process, and requires version control which, outside of programming, is often a tedious and manual process. Centralized databases provide for real-time collaboration, but typically require constant connectivity to edit (and often view) their data. Versioned nodes allow for both real-time and asynchronous collaboration in a single paradigm, without requiring centralization or constant connectivity.
Versioned nodes are more akin to files in a vast file system than any contemporary database system. However, the paradigm departs from files in a number of ways, the three most important of which are:
- Version control of files is mostly provided my manual intervention, but all modifiable nodes are inherently version controlled by containing references to the parent node (or nodes) that was modified to create that version.
- Access control of files is enforced by the software of the machines that contain them. Read access to a node requires not only possession of the node, but a shared key to decrypt it. Likewise, write access to a modifiable node requires a secret key to sign the newly created node.
- Files contain only data and thus can reference other files only in ad-hoc and indirect ways, and can not convey access to that other file. In contrast, nodes contain data, unforgeable references to other nodes, the shared keys required to read those other nodes, and, optionally, the secret keys required to update them.
These differences make versioned nodes as natural on a loosely connected network of machines controlled by a variety of operators as files are on a single machine.
Overview
This is a technical manual for versioned nodes. It first introduces the general paradigm, then discusses how applications and programs might interact with that paradigm.
Understand that this document does not dictate exchange formats, protocols, or even application programming interfaces for working with versioned nodes. It is a technical introduction, not a specification. In particular, there are a multitude of possible exchange formats, cryptography, and protocols that may fit into this paradigm that are not discussed here.
Nodes
The core of versioned nodes are of course the nodes. Nodes are the
fusion of some opaque data and a list of links to other nodes. These
links convey the capability to fetch
and read
those other nodes, and
may convey the capability to write
new versions of them.
Every node has a couple basic capabilities:
- a
fetch
capability, which uniquely identifies the node and allows it to be verified for authenticity. - a
read
capability, with which the node to be may be read once fetched.
Modifiable nodes have additional capabilities:
- an
write
capability, which allows new versions to be stored. - a
migrate
capability, which finalizes the node and indicates a new node, with different capabilities, to use for further modification.
Nodes should be limited in size. Both in the number of links and the
amount of data. Typically, the entire node must be read into memory to
verify its integrity, both to verify the fetch
capability and to check
any authenticated encryption.
Blobs
The simplest nodes are known as blobs. They are immutable, in the sense
that once a fetch
capability has been created for a blob, it will
always point at the same data.
struct Blob {
data: Bytes,
links: Vec<Link>,
}
Storing blobs is deterministic, a blob with the same data, links,
storage method and context will produce identical fetch
and read
capabilities. The data and read
capabilities are all encrypted as to
require the read
capability to expose (note! there are unencrypted
storage methods that have empty read
capabilities!). The context is
used to prevent guessing the other content of an encrypted blob, and,
even for unencrypted blobs, is also used to encrypt any write
capabilities so that they may not be used without it. This context is
typically the write
capability of a containing version. The fetch
capabilities are left as plaintext.
Versions
Versions form the bases of modifiable nodes. While each version is
immutable, its fetch
capability will always return that version, a
braid of versions share another fetch
capability that will return
the most recent versions available.
struct Version {
data: Bytes,
links: Vec<Link>,
parents: Vec<VersionCap>,
}
Storing versions is also deterministic. Again, the data and read
capabilities are encrypted as to require the read
capability to expose
(and again that may be with a null cipher for unencrypted storage
methods). Again, even in unencrypted versions, any write
capabilities
must be encrypted, in this case as to require the write
capability for
that version to decrypt. All fetch
capabilities, including for parent
versions, are left as plaintext.
Link Entries
The links on a node are an ordered list of capabilities, giving access to another node. Conceptually:
type Link = (FetchCap,ReadCap,Option<WriteCap>);
All the capabilities for a particular node will have the same generation
however, so actual data types will likely strip the generation from the
read
and write
capabilities.
In the exchange format, these links will be partially or completely
encrypted. The write
capabilities must be encrypted to require both
read
and write
capability of the node to extract. The read
capabilities are encrypted to require the read
capability, but the
read
capability may be null for unencrypted generations. However,
fetch
capabilities are often not encrypted, depending on the exchange
format. Encrypting fetch
capabilities provides better privacy, but
leaving them decrypted allows nodes to be synchronized and garbage
collected without hosts having access to their content.
Capabilities
Capabilities are unforgeable tokens that convey the authority to do something and may be sent between actors in a system. In formatted nodes capabilities are typically cryptographic keys or digests, but these may or may not be exposed to applications! This implies that applications should attempt to communicate them other than through nodes.
There are three types of capabilities that appear in nodes, fetch
,
read
, and write
. There are also three types of fetch
capabilities,
which indicate what sort of node should be fetched, a blob, a version,
or a braid.
enum FetchCap {
Blob(BlobCap),
Version(VersionCap),
Braid(BraidCap),
}
That gives five basic types of capabilities. Those basic capabilities should share a general structure of a generation identifier, which indicates what cryptography and format of node is in use, followed by an opaque byte string.
struct Cap<A> {
content: Bytes,
generation: usize,
}
type BlobCap = Cap<Blob>;
type BraidCap = Cap<Braid>;
type VersionCap = Cap<Version>;
type ReadCap = Cap<Read>;
type WriteCap = Cap<Write>;
In the service implementation of course, the size of the content may depend on the generation and type, but programs should not care.
The Programming Interface
Most programs should not interact with the various formats of nodes that go on the wire or are stored to disk. Instead, they interact with the decrypted and normalized nodes presented in the previous section.
Again, this lays out the basic paradigm, and makes some recommendations, and is not intended to be a formal specification.
Opening Nodes
The most basic requirement is to fetch the content of a node.
For this the application requires the fetch
capability to identify the
node, the read capability to decrypt it, and, optionally, the write
capability to decrypt other write
capabilities.
When the fetch
capability refers to a blob or a specific version,
there’s no ambiguity what is returned. However, if the fetch
capability refers to a braid, any version of that braid may be returned,
so reading that braid should also return the fetch
capability for the
version read. This is mostly important if the node may be updated - so
that the proper version can be referenced as a parent.
In a non-embedded context, there’s not much reason to fetch part of a node. As noted above nodes should be limited in size. An embedded API might add that functionality, or may simply have lower limits on data size.
A node opening interface might look something like this:
class OpenNode m where
openBlob :: Link BlobCap -> m Blob
openVersion :: Link VersionCap -> m (BraidCap, Version)
openBraid :: Link BraidCap -> m (Link KnownVersionCap, Version)
While all relevant information about a blob is present in the link, opening a particular version also will identify the braid that version belongs to (which may be needed to produce a replacement version).
Opening a braid is somewhat more complex. The returned version is similarly identified, but the braid might have been migrated. In which case, the returned version thus might be from a completely different braid! So the link must include both the version and braid capabilities.
struct KnownVersionCap {
braid: BraidCap,
version: VersionCap,
}
Links
Every function in the above definitions takes a Link
of some sort.
These links may be derived from individual entries of the links
present on opened blobs and versions.
struct Link<T> {
fetch: T,
read: ReadCap,
write: Option<WriteCap>,
}
Nodes may be opened for reading or writing - depending on whether the
write
capability is present in the provided link.
Fetch and Read Errors
There are two error conditions for fetching nodes.
The more common would be a timeout, that the node service could not find the requested node in the time available. This could because the requested node is not present on the local machine or any of its available peers, or the requested node could simply not exist.
Another option is that there is a mismatch between the capabilities
presented. If the API doesn’t use the abbreviated Link
, this could be
capabilities with different generations. If the capabilities have the
same generation, the read
or write
capability may fail authenticated
decryption, indicating that either the capability or the fetched node is
invalid.
Saving Nodes
A system to open nodes avails us little without some way to save them.
Saving a blob requires the blob and optionally a context. The context
can specify a generation, write
capability. Without the context, the
node service can use the default generation. With just the generation,
the node service cannot protect any write
capabilities attached to the
links - and should return an error if any are present. The read
and
fetch
capabilities for the blob are generated from the content of the
blob and returned.
Storing a version is slightly simpler. It requires a version, and the
fetch
, read
and write
capability for the braid. The fetch
capability for the created version is returned. Of course, the fetch
,
read
and write
capabilities for a braid must be created somehow, so
there must be an additional call to create them.
class SaveNode m where
saveBlob :: Maybe Context -> Blob -> m (Link BlobCap)
saveVersion :: WriteLink BraidCap -> Version -> m VersionCap
createBraid :: Maybe Generation -> m (WriteLink BraidCap)
WriteLink
here refers to a link that is guaranteed to have a write
capability, as that is obviously required in order to save a version.
struct WriteLink<T> {
fetch: T,
read: ReadCap,
write: WriteCap,
}
Finding Versions of a Braid
In order to merge concurrent updates, programs need to be able to fetch
a list of current tips. For some formats, this only requires the fetch
capability, but for others the version history may be encrypted and
require the read
capability to decrypt.
class LatestBraid m where
latestBraidVersions :: Link BraidCap -> m [VersionCap]
Subscribing to a Braid
Some programs may want notifications of when new versions of a braid are available.
class SubscribeBraid m where
subscribe :: Link BraidCap -> (Link VersionCap -> m ()) -> m ()
cancelSubscription :: BraidCap -> m ()
Sharing or Saving
How can a program that holds no capabilities to existing nodes save its output such that the operator can find it again? Similarly, how do you share capabilities across other, non-node, channels?
class ShareNode m where
shareNode :: Link FetchCap -> m ()
The Management Interface
Migration
Version Finalization
Braids of versions may be finalized, indicating that no version that is
not an ancestor of the final version should be accepted and indicating a
successor braid to continue modifications on. This is used to transition
between storage methods and crypto-systems, and to recover from
compromised write
capabilities.
Finalizing a braid requires the migrate
capability. This is typically
unavailable to programs directly and split into a node-specific part
that is stored on the originating machine (and possibly shared with the
operator’s other machines) and an operator-specific part that is either
separately encrypted on the machine(s) or potentially stored in an
external secure enclave.
Finals do not contain operator data, but they may contain read
capabilities.
struct Final {
last_version: VersionCap,
next_fetch: BraidCap,
next_read: Option<ReadCap>,
}
The read
capability for the next braid, if present, are encrypted such
that they require the read
capability for the previous braid to open.
This can allow the node service to return versions from the new braid in
response to queries for the old braid - without write
capability.