13.79508

every so often i need to render some data in human-typeable form. sometimes that leads me to my one-word-per-byte encoding. other times it leads me to the various n characters per m bytes encodings.

assuming we are encoding evenly distributed data, and that we want to be able to seek within that data without reading from the front, we generally break the text to encode or decode into fixed-sized blocks. each block gets transformed into another fixed size. the ratio of those fixed sizes is approximately ratio of the logarithm of the alphabet sizes.

we can rearrange that to find alphabet sizes for different block ratios, 256 (the alphabet size of a byte) raised to various powers, rounded up.

	1	2	3	4	5	6	7	8
1	256
2	16	256
3		41	256
4		16	64	256
5			28	85	256
6			16	41	102	256
7				24	53	115	256
8				16	32	64	128	256
9					22	41	75	138
10					16	28	49	85
11						21	35	57
12						16	26	41
13							20	31
14							16	24
15								20
16								16

the ones i see around the most are base 16, 64, 32, and 85. there are at least two proposals for base 41 around.

base 28? the only code i see out there turns seven nibbles into six digits, which could also be done with base 26. but there are some nice properties available - case-insensitivity is with an alphabet of “@ABCDEFGHIJKLMNOPQRSTUVWXYZ[” or “`abcdefghijklmnopqrstuvwxyz{”.

i want:

fast and simple constant time implementation
embeddable in url fragments without further encoding
preserves sorting
relatively easy to narrate
compact as possible given the other constraints

base 16 (a-p or e-t) fulfills everything except 5
base 24 (a-x) is a bit more complicated but a bit more compact
base 28 (@-[?) is equally complicated, and a bit more compact
base 41 could be done with (A-Oa-z) or similar, still more complicated, and a bit harder to narrate

published 2024-01-15

← 13.78558
one very popular fully encrypted transport protocol over udp would make traffic analysis of fully encrypted protocols much harder.
13.79965 →
so i’m playing with a virtual machine idea based on closures. basic idea is that each procedure takes a fixed number of inputs, and a call either adds insufficient inputs and so returns a new closure or adds enough inputs and so invokes the procedure (which returns whatever it returns).