13.79508
every so often i need to render some data in human-typeable form. sometimes that leads me to my one-word-per-byte encoding. other times it leads me to the various n characters per m bytes encodings.
assuming we are encoding evenly distributed data, and that we want to be able to seek within that data without reading from the front, we generally break the text to encode or decode into fixed-sized blocks. each block gets transformed into another fixed size. the ratio of those fixed sizes is approximately ratio of the logarithm of the alphabet sizes.
we can rearrange that to find alphabet sizes for different block ratios, 256 (the alphabet size of a byte) raised to various powers, rounded up.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|
1 | 256 | |||||||
2 | 16 | 256 | ||||||
3 | 41 | 256 | ||||||
4 | 16 | 64 | 256 | |||||
5 | 28 | 85 | 256 | |||||
6 | 16 | 41 | 102 | 256 | ||||
7 | 24 | 53 | 115 | 256 | ||||
8 | 16 | 32 | 64 | 128 | 256 | |||
9 | 22 | 41 | 75 | 138 | ||||
10 | 16 | 28 | 49 | 85 | ||||
11 | 21 | 35 | 57 | |||||
12 | 16 | 26 | 41 | |||||
13 | 20 | 31 | ||||||
14 | 16 | 24 | ||||||
15 | 20 | |||||||
16 | 16 |
the ones i see around the most are base 16, 64, 32, and 85. there are at least two proposals for base 41 around.
base 28? the only code i see out there turns seven nibbles into six digits, which could also be done with base 26. but there are some nice properties available - case-insensitivity is with an alphabet of “@ABCDEFGHIJKLMNOPQRSTUVWXYZ[” or “`abcdefghijklmnopqrstuvwxyz{”.
i want:
- fast and simple constant time implementation
- embeddable in url fragments without further encoding
- preserves sorting
- relatively easy to narrate
- compact as possible given the other constraints
- base 16 (a-p or e-t) fulfills everything except 5
- base 24 (a-x) is a bit more complicated but a bit more compact
- base 28 (@-[?) is equally complicated, and a bit more compact
- base 41 could be done with (A-Oa-z) or similar, still more complicated, and a bit harder to narrate