try·st·imu·li

Versioned Nodes Technical Introduction

Versioned nodes is a computer storage paradigm enabling distributed collaboration, either in real time or with disconnected operation.

It is designed to replace file systems, object stores, and some databases. Collaboration by exchanging files is an intrinsically asynchronous process, and requires version control which, outside of programming, is often a tedious and manual process. Centralized databases provide for real-time collaboration, but typically require constant connectivity to edit (and often view) their data. Versioned nodes allow for both real-time and asynchronous collaboration in a single paradigm, without requiring centralization or constant connectivity.

Versioned nodes are more akin to files in a vast file system than any contemporary database system. However, the paradigm departs from files in a number of ways, the three most important of which are:

  1. Version control of files is mostly provided my manual intervention, but all modifiable nodes are inherently version controlled by containing references to the parent node (or nodes) that was modified to create that version.
  2. Access control of files is enforced by the software of the machines that contain them. Read access to a node requires not only possession of the node, but a shared key to decrypt it. Likewise, write access to a modifiable node requires a secret key to sign the newly created node.
  3. Files contain only data and thus can reference other files only in ad-hoc and indirect ways, and can not convey access to that other file. In contrast, nodes contain data, unforgeable references to other nodes, the shared keys required to read those other nodes, and, optionally, the secret keys required to update them.

These differences make versioned nodes suitable for a loosely connected network of machines controlled by a variety of operators as files are on a single machine.

Overview

This is a technical manual for versioned nodes. It first introduces the general paradigm, then discusses how applications and programs might interact with that paradigm.

Understand that this document does not dictate exchange formats, protocols, or even application programming interfaces for working with versioned nodes. We are concerned with the data storage paradigm, not a specification of an implementation of that paradigm.

Capabilities

Versioned nodes uses capability security to provide access control. To fetch a node or verify its authenticity, you use a fetch capability. To read the data on that node, you need its read capability. Creating a new version of a (mutable) node requires a write capability. If you have those capabilities you can perform those actions, if you don’t, you can’t.

These capabilities can be embedded in nodes. So a particular sort of access to one node can provide that same sort of access to others.

Generations

This paradigm makes an explicit affordance for a variety of underlying formats and encryption algorithms. We call each one combination of these a generation. Each node and the capabilities that operate on that node belong to the same generation.

Nodes

The core of versioned nodes are of course the nodes. Nodes are the fusion of some opaque data, an ordered list of links (bundles of capabilities) to other nodes, and a list of previous versions for mutable nodes.

Nodes should be limited in size. Both in the number of links and the amount of data. Typically, the entire node must be read into memory to verify its integrity, both to verify the fetch capability and to check any authenticated encryption.

Blobs

The simplest nodes are known as blobs. They are immutable, in the sense that once a fetch capability has been created for a blob, it will always point at the same data and links.

struct Blob {
    data: Bytes,
    links: Vec<Link<FetchCap>>,
}

Versions

Versions form the bases of modifiable nodes. While each version node itself is immutable, as the version fetch capability will always return that version, a braid of versions share a single braid fetch capability that addresses all of them.

struct Version {
    data: Bytes,
    links: Vec<Link<FetchCap>>,
    parents: Vec<Parent>,
}

The parents of a version allow the system to build a directed acyclic graph of versions, and track a set of versions that don’t have other versions marking them as parents. If there are multiple latest versions in that set, the application can then merge them, possibly with the assistance of the operator.

The links and parents on a node are an ordered list of related capabilities, each link conveying access to another node.

struct Link<T> {
    fetch: T,
    read: ReadCap,
    write: Option<WriteCap>,
}

struct Parent {
  fetch: VersionFetchCap,
  read: Option<ReadCap>,
}

In exchange formats, these links will be partially or completely encrypted. The write capabilities must be encrypted to require both read and write capability of the node to extract. The read capabilities are encrypted to require the read capability. However, the fetch capabilities might be left unencrypted, be encrypted to require the read capability, or be encrypted just enough to be concealed during cold storage. Encrypting fetch capabilities provides better privacy, but leaving them decrypted allows nodes to be synchronized and garbage collected without hosts having access to the node’s content.

The Programming Interface

Most programs should not interact with the various formats of nodes that go on the wire or are stored to disk. Instead, they interact with the decrypted and normalized nodes presented in the previous section.

Again, this lays out the basic paradigm, and makes some recommendations, and is not intended to be a formal specification.

Opening Nodes

The central mechanism is to fetch and read the content of a node.

For this the application requires the fetch capability to identify the node, the read capability to decrypt it, and, optionally, the write capability to decrypt other write capabilities.

When the fetch capability refers to a blob, there’s no ambiguity what is returned.

fn open_blob (link: Link<BlobFetchCap>) ->
    Result<Blob, Error>;

When the fetch capability refers to a specific version, we don’t know (necessarily) know what braid that version is part of. So the system should return that too.

fn open_version (link: Link<VersionFetchCap>) ->
    Result<(BraidFetchCap, Version), Error>;

When the fetch capability refers to a braid of versions, things or even more uncertain. The system (and format) might implement migrations, and the version returned could belong to a completely separate braid in addition to not knowing which version will be returned.

fn open_braid (link: Link<BraidFetchCap>) ->
    Result<(Link<BraidVersionFetchCap>, Version), Error>;

struct BraidVersionFetchCap {
  braid: BraidFetchCap,
  version: VersionFetchCap,
}

Fetch and Read Errors

There are two error conditions for fetching nodes.

The more common would be a timeout, that the node service could not find the requested node in the time available. This could because the requested node is not present on the local machine or any of its available peers, or the requested node could simply not exist.

Another option is that there is a mismatch between the capabilities presented. If the API doesn’t use the abbreviated Link, this could be capabilities with different generations. If the capabilities have the same generation, the read or write capability may fail authenticated decryption, indicating that either the capability or the fetched node is invalid.

Saving Nodes

A system to open nodes avails us little without some way to create them.

Creating a blob requires the contents of the blob and an optional write capability. The system can create a new one if needed, or not have one at all if there are no write capabilities being saved. The read and fetch capabilities are derived from the contents of the blob, and need to be returned to the application.

fn create_blob(blob: Blob, write: Option<WriteCap>)
  -> Result<Link<BlobCap>, Error>;

Creating a new version of a braid requires the contents of the version, what braid that version is for, and the read and write capabilities for that version.

fn create_blob(version: Version, caps: WriteLink)
  -> Result<Link<VersionCap>, Error>;

struct WriteLink {
  fetch: BraidFetch,
  read: ReadCap,
  write: WriteCap,
}

But if there’s no existing braid, we need to be able to create one. We may want to specify the generation of braid to create, or just use the default.

fn create_braid(generation: Optional<Generation>)
  -> Result<WriteLink, Error>;

Querying for Generations

So different generations may have different properties. They’ll use different cryptography, maybe with post-quantum signatures. Some might not encrypt the data on the node (e.g. so they can be sent via amateur radio where encryption is disallowed), some may encrypt fetch capabilities along with the data and the read capabilities.

Many applications won’t especially care about all this, but some might, and systems should support queries for specific properties. Or even support a sophisticated user to specify what generation they want when the application queries.

Finding Versions of a Braid

In order to merge concurrent updates, programs need to be able to fetch a list of current tips. For some formats, this only requires the fetch capability for the braid, but for others might encrypt the version history and require the read capability to decrypt.

fn fetch_braid_versions(braid: Link<BraidFetchCap>)
  -> Result<Vec<VersionCap>, Error>

Subscribing to a Braid

Some programs may want notifications of when new versions of a braid are available.

fn subscribe(braid: Link<BraidFetchCap>)
  -> Stream<Link BraidVersionFetchCap>

Sharing or Saving

How can a program that holds no capabilities to existing nodes save its output such that the operator can find it again? Similarly, how do you share capabilities across other, non-node, channels?

fn share_node(braid: Link<FetchCap>) -> Result<(),Error>

Shells

There is a bunch of access that applications don’t need directly, but supports the applications and management of a system working with versioned nodes. Things like organizing nodes, managing connections to peers (both machines and people). Shells manage this sort of thing, and systems should have a separate programming interface for shells to interact with that is not available to general applications.

Migration

Braids of versions may be finalized, indicating that no version that is not an ancestor of the final version should be accepted and indicating a successor braid to continue modifications on. This is used to transition between storage methods and crypto-systems, and to recover from compromised write capabilities.

Finalizing a braid requires the migrate capability. This is typically unavailable to programs directly and split into a node-specific part that is stored on the originating machine (and possibly shared with the operator’s other machines) and an operator-specific part that is either separately encrypted on the machine(s) or potentially stored in an external secure enclave.

published updated