try·st·imu·li

13.77771

i realized that the extra byte of subtag in atlv unions was unnecessary and indeed harmful. in particular, it has been pushing me to overload tags when using a union would be natural, because the subtag wouldn’t indicate anything.

some context:

atlv is a generic data encoding scheme i devised because all the others suck. it had an extremely tiny specification:

value:	quant | binary | union | array
quant:	tag…00(vlq) (vlq)
binary:	tag…01(vlq) len(vlq) byte[len]
union:	tag…10(vlq) (vlq) value
array:	tag…11(vlq) len(vlq) value[len]
vlq:	0xxxxxxx | 1xxxxxxx vlq
byte:	xxxxxxxx
tag:	xxxxx | xxxxxxx tag

which has now gotten slightly smaller with this change:

-union:	tag…10(vlq) (vlq) value
+union:	tag…10(vlq) value

union is targeted at implementing sum types. but a compact encoding already makes the tag context-dependent. so you can just give each constructor of a sum type its own tag. and you know from your context that you’re encoding that type.

but in converge i have a sum type of sum types (for all the capabilities), which becomes the obvious place to use union. except that the union has this extra subtag, that would then always be zero. so i got clever, and decided to encode the type of the inner sum type into the tag, as well as which constructors.

unfortunately that makes keeping track of which subtags go to which types difficult, and how do we manage if we need to add more types to the outer sum type… and sorting now becomes complicated, because logical sorting no longer matches lexicographic sorting.

but with the above change it now costs only a byte per capability to explicitly tag the type with a union. and it preserves sort order and makes the first character or two in base16/41/64 (whatever i end up using) very clearly designate the type of capability represented.

published