try·st·imu·li

13.77776

there are various exponential growth patterns in technology these days: compute, bandwidth, storage, display resolution… but not audio sampling rate.

we’re talking ten teraflops in game consoles, 10 gigabit wireless access points, 4 terabytes on a 22x80mm pcb, 7680x4320x30bpp screens… why don’t we have our podcasts streamed to us in 384kHz 64-bit surround sound?

because our ears can’t hear that much better.

the absolute dynamic range of human hearing is something like 140dB, say 47 bits. so in theory a 48-bit audio stream could represent the quietest sound you could hear and one at the threshold of pain at the same time. of course, you can’t actually distinguish the first while the second is going on, which is to say that the momentary dynamic range is much lower than the absolute dynamic range.

i haven’t (yet!) managed to find a good source for what our momentary dynamic range is (and maybe it’s called something else in the literature). but whatever it is, our recording and playback capabilities have had it covered for decades.

in the frequency domain, it’s well established that human hearing tops out somewhere around 22kHz, significantly lower for adults. you need at least twice the sample rate of a frequency to represent it, so anything over 44kHz essentially covers the whole human hearing frequency range. we’ve been doing that commercially since the early 70s.


we’re almost there with video too.

there’s little point in a 16k screen - the screen would need to extend outside your binocular vision for each pixel to contribute. we can make out about 60 pixels per degree. that’s 21.6k in a full circle, or ~150 million for a full sphere. 16k is 133 million pixels - that curved screen goes right around you.

each eye can see almost the full vertical range, and about 160° horizontally, as an upper bound say 4/9th of the sphere. 70 million pixels definitely covers the field of view of a single eye. so 140 million pixels at 120Hz for your vr goggles.

much like hearing, human vision has a wide absolute dynamic range, about 90dB. and also much like hearing, the momentary dynamic range is significantly smaller. here at least we do use logarithmic scales - known as gamma correction.

the more difficult problem is that there’s whole swaths of colours that humans can see that we can’t reproduce in our typical color spaces. we need more primaries in our displays to capture the greens, and further out primaries for the deep violets and reds. and, even setting aside the problem of tetrachromats, we need to be using color spaces that talk about all those colors.

so maybe we get 40 bit colour, with 16 bits of luminosity, 12x12 colour.

so that’s 140 million pixels at 120Hz and 40 bits per pixel, 670 Gbps. not quite 10 times the current display bandwidth.


the videos can then get bigger than the displays, the full 150 million pixels per eye, two eyes. 40 bits per pixel colour. a raw bandwidth of 1.4Tbps (at 120fps). but 8k (at 120 fps) has a raw bandwidth of 120 Gbps and compresses to 100 Mbps. so maybe we’ll see 1 Gbps (or less, as the views to each eye overlap a lot) video streams. 10 Gbps FTTH is already a thing - granted not around me (at least easily). i’m genuinely curious what could inspire a desire for home internet connections with more bandwidth than that.

this makes for a tentative upper bound on last mile home bandwidth, with a current hard upper bound cost of a strand of fiber, a $80 pair of bidi optics, and a port on a 10Gbit switch. that’s cheap enough, and the bandwidth is high enough, that i don’t think we’ll have copper or wireless serving it. of course if that’s not actually enough bandwidth, it’s relatively easy to upgrade that link to 40Gbps. (if slightly less easy in the PON scenarios that are likely to be used to deploy to homes)

published