try·st·imu·li

Versioned Nodes Technical Introduction

Versioned nodes is a computer storage paradigm enabling distributed collaboration, either in real time or with disconnected operation.

It is designed to replace file systems, object stores, and some databases. Collaboration by exchanging files is an intrinsically asynchronous process, and requires version control which, outside of programming, is often a tedious and manual process. Centralized databases provide for real-time collaboration, but typically require constant connectivity to edit (and often view) their data. Versioned nodes allow for both real-time and asynchronous collaboration in a single paradigm, without requiring centralization or constant connectivity.

Versioned nodes are more akin to files in a vast file system than any contemporary database system. However, the paradigm departs from files in a number of ways, the three most important of which are:

  1. Version control of files is mostly provided my manual intervention, but all modifiable nodes are inherently version controlled by containing references to the parent node (or nodes) that was modified to create that version.
  2. Access control of files is enforced by the software of the machines that contain them. Read access to a node requires not only possession of the node, but a shared key to decrypt it. Likewise, write access to a modifiable node requires a secret key to sign the newly created node.
  3. Files contain only data and thus can reference other files only in ad-hoc and indirect ways, and can not convey access to that other file. In contrast, nodes contain data, unforgeable references to other nodes, the shared keys required to read those other nodes, and, optionally, the secret keys required to update them.

These differences make versioned nodes as natural on a loosely connected network of machines controlled by a variety of operators as files are on a single machine.

Overview

This is a technical manual for versioned nodes. We will detail the both the general paradigm and different components of systems working within that paradigm. First though, a general introduction.

A node is the fusion of some data, opaque to the system storing it, and some structured metadata concerning the relationship that node has with other nodes, which is available to that system. Each extant node has a unique identifier, and different versions that represent a single modifiable node share another identifier. These identifiers are perhaps analogous to inodes in a file system - they are not intended for human consumption.

Applications that work with versioned nodes interact with a local service that manages the local storage and caching of nodes, encryption, decryption, hashing, signing, and verifying of nodes, exchange of nodes with other machines, and so forth. The API presented by the service may vary, but must contain some basic functionality, similarly to how the file system API on different operating systems varies, but provides generally similar functionality.

There may also be various formats of nodes and protocols for exchanging them between machines. There are basic requirements for those formats and protocols, but different machines may support different ones (especially when it comes to protocols) and still interoperate to some extent.

Nodes

Abstractly, a node is a structure that contains some opaque data and a set of links to other nodes with some capabilities. From a graph theory perspective, a node represents a vertex in a directed graph and a set of outgoing edges. Additional information about each link may be in the opaque data or the node metadata.

Every node has a couple basic capabilities:

  • a fetch capability, which uniquely identifies the node and allows it to be verified for authenticity.
  • a read capability, with which the node to be may be read once fetched.

Modifiable nodes have additional capabilities:

  • an write capability, which allows new versions to be stored.
  • a migrate capability, which finalizes the node and indicates a new node, with different capabilities, to use for further modification.

Nodes should be limited in size. Both in the number of links and the amount of data. The entire node will need to be read into memory to verify its integrity, both to verify the fetch capability and to check any authenticated encryption.

Each link contains at least a fetch and read capability.

struct Link {
    fetch: FetchCap,
    read: Cap<Read>,
    write: Optional<Cap<Store>>,
}

migrate capabilities are not stored in links. See the migration section for details.

A link set is an array of links that has been sorted and deduplicated by the fetch capability.

struct LinkSet (Vec<Link>);

Blobs

The simplest nodes are known as blobs. They are immutable, in the sense that once a fetch capability has been created for a blob, it will always point at the same data.

struct Blob {
    data: Bytes,
    links: LinkSet,
}

Storing blobs is deterministic, a blob with the same data, links, storage method and context will produce identical fetch and read capabilities. The data and read capabilities are all encrypted as to require the read capability to expose (note! there are unencrypted storage methods that have empty read capabilities!). The context is used to prevent guessing the other content of an encrypted blob, and, even for unencrypted blobs, is also used to encrypt any write capabilities so that they may not be used without it. This context is typically the write capability of a containing version. The fetch capabilities are left as plaintext.

Versions

Versions form the bases of modifiable nodes. While each version is immutable, its fetch capability will always return that version, a braid of versions share another fetch capability that will return the most recent versions available.

struct Version {
    data: Bytes,
    links: LinkSet,
    parents: Vec<VersionFetchCap>,
}

Storing versions is also deterministic. Again, the data and read capabilities are encrypted as to require the read capability to expose (and again that may be with a null cipher for unencrypted storage methods). Again, even in unencrypted versions, any write capabilities must be encrypted, in this case as to require the write capability for that version to decrypt. All fetch capabilities, including for parent versions, are left as plaintext.

Version Finalization

Braids of versions may be finalized, indicating that no version that is not an ancestor of the final version should be accepted and indicating a successor braid to continue modifications on. This is used to transition between storage methods and crypto-systems, and to recover from compromised write capabilities.

Finalizing a braid requires the migrate capability. This is typically unavailable to applications and split into a node-specific part that is stored on the originating machine (and possibly shared with the operator’s other machines) and an operator-specific part that is either separately encrypted on the machine(s) or potentially stored in an external secure enclave.

Finals do not contain operator or application data, but they may contain read and write capabilities.

struct Final {
    last_version: VersionFetchCap,
    next_fetch: BraidFetchCap,
    next_read: Optional<ReadCap>,
    next_store: Optional<StoreCap>,
}

The read and write capabilities for the next braid, if present, are encrypted such that they require the read and write capabilities for the previous braid to open.

Capabilities

Capabilities are unforgeable tokens that convey the authority to do something and may be sent between actors in a system. In formatted nodes capabilities are typically cryptographic keys or digests, but these may or may not be exposed to applications! This implies that applications should attempt to communicate them other than through nodes.

There are three types of capabilities that appear in nodes, fetch, read, and write. There are also three types of fetch capabilities, which indicate what sort of node should be fetched, a blob, a version, or a braid.

enum FetchCap {
    Blob(BlobFetchCap),
    Version(VersionFetchCap),
    Braid(BraidFetchCap),
}

That gives five basic types of capabilities. Those basic capabilities should share a general structure of a generation identifier, which indicates what cryptography and format of node is in use, followed by an opaque byte string.

struct Cap<A> {
    generation: u32,
    content: Bytes,
    _phantom: Phantom<A>,
}

All capabilities for a particular node would use the same generation, but nodes of some formats may contain capabilities for nodes of other generations. Different systems may ascribe different meaning to the generations, which again implies that applications should not attempt to communicate them except attached to other nodes. This allows the collection of capabilities forming the Link for a node to be abbreviated somewhat:

struct Link<A> {
    fetch: A,
    read_raw: Bytes,
    write_raw: Option<Bytes>,
}

to ensure all the capabilities are at least of the same generation.

The Application Programming Interface

Applications do not, generally, interact with the various formats of nodes that go on the wire or are stored to disk. Instead, they interact with the decrypted and normalized nodes presented in the previous section.

Fetching and Reading

The most basic requirement is to fetch the content of a node.

For this the application requires the fetch capability to identify the node, the read capability to decrypt it, and, optionally, the write capability to decrypt other write capabilities.

When the fetch capability refers to a blob or a specific version, there’s no ambiguity what is returned. However, if the fetch capability refers to a braid, any version of that braid may be returned, so reading that braid should also return the fetch capability for the version read. This is mostly important if the node may be updated - so that the proper version can be referenced as a parent.

In a non-embedded context, there’s not much reason to fetch part of a node. As noted above nodes should be limited in size. An embedded API might add that functionality, or may simply have lower limits on data size.

A fetch API might look something like this:

class FetchNode m where
  fetchBlob :: Link BlobFetchCap -> m Blob
  fetchVersion :: Link VersionFetchCap -> m Version
  fetchBraid :: Link BraidFetchCap
             -> m (NonEmpty (VersionFetchCap, Version))

Fetch and Read Errors

There are two error conditions for fetching nodes.

The more common would be a timeout, that the node service could not find the requested node in the time available. This could because the requested node is not present on the local machine or any of its available peers, or the requested node could simply not exist.

Another option is that there is a mismatch between the capabilities presented. If the API doesn’t use the abbreviated Link, this could be capabilities with different generations. If the capabilities have the same generation, the read or write capability may fail authenticated decryption, indicating that either the capability or the fetched node is invalid.

Storing or Updating

A system to fetch nodes avails us little without some way to store them. Storing is a bit more complicated than fetching though.

Storing a blob requires the blob and optionally a generation or write capability. With neither, the node service can use an empty context with the default generation. With just the generation, the node service cannot store write capabilities. And with a write capability, the service can then store them. The read and fetch capabilities for the blob are generated from the content of the blob and returned.

Storing a version is somewhat simpler. It requires a version, and the fetch, read and write capability for the braid. The fetch capability for the created version is returned. Of course, the fetch, read and write capabilities for a braid must be created somehow, so there must be an additional call to create them.

class StoreNode m where
  storeBlob :: Maybe (Either Generation WriteCap) -> Blob -> m BlobCaps
  storeVersion :: BraidWriteCaps -> Version -> m VersionFetchCap
  createBraid :: Maybe Generation -> m BraidWriteCaps

Subscribing

Some applications may want notifications of when new versions of a braid are available.

class SubscribeNode m where
  subscribe :: BraidFetchCap -> (VersionFetchCap -> m ()) -> m ()
  cancelSubscription :: BraidFetchCap -> m ()

Sharing or Saving

How can a program that holds no capabilities to existing nodes save its output such that the operator can find it again? Similarly, how do you share capabilities across other, non-node, channels?

class SaveNode m where
  saveNode :: NodeCaps -> m ()
published updated