one very popular fully encrypted transport protocol over udp would make traffic analysis of fully encrypted protocols much harder.
QUIC is a UDP-based replacement for TLS, with some nice additional features like roaming. once QUIC gets itself established, it’s got a very minimal plaintext payload.
- 1 byte flags (0x4c)
- ? byte destination connection id
the length of the destination connection id is set earlier in the protocol. unfortunately, almost all QUIC connections have one end on port 443, and you’re generally expected to have a https server also running on tcp 443 next to it.
dTLS is a TLS-secured UDP transport, most commonly used by WebRTC. there’s no expectation of running anything else next to it, and you usually see random ports on both sides. unfortunately, once it gets itself established, every packet contains a bunch of identifiable data:
- 1 byte packet type (0x17) marking application data
- 2 byte dtls version (0xfefd for 1.2)
- 2 byte epoch
- 6 byte counter
- 2 byte length
additionally, the epoch and counter are very often repeated as an eight byte nonce for the ciphertext.
unlike QUIC, there’s no roaming support, so firewalls can reasonably expect to see the entire lifecycle of the protocol. so masquarading as dTLS more or less requires running a dTLS connection.
WireGuard is a peer to peer VPN protocol - it supports roaming, and while it has a default port it’s not uncommon to use other ports. it enables a bunch of different topologies, and both point to point and mesh are common, so neither would be suspicious. and the traffic patterns it encapsulates mimic the protocol its transporting.
- 1 byte packet type (0x04 for transport data)
- 3 byte reserved field of zeros
- 4 byte receiver index
- 8 byte counter / nonce
unfortunately it also does a new handshake every so often, so anything doing traffic analysis can watch for that.
full encryption vs roaming
alice is exchange messages with 20 other people, including bob, bob’s ip address changes, and alice recieves a udp packet from bob’s new address. how does alice figure out which person sent the packet (and what key to use to decrypt)?
the easy way to do this is to have a field that identifies each connection, when ip addresses change, that identifier remains the same, and the receiver knows what key to use. this is what QUIC and WireGuard do.
unfortunately, this allows a passive observer watching one party to track the ip addresses of the parties it communicates with. we want only the receiver to be able to do that.
one alternative is to include an encrypted identifier. this is relatively expensive - it must use randomized encryption, and every peer is encrypting to the same key for every packet it sends. so either it’s a public key encryption method (very expensive) or we accept that the people talking to someone can track who they talk to.
another alternative is to generate the nonces in a predictable way, so that the receiver can pregenerate a few nonces for each contact and match on incoming nonces. this has no protocol overhead, and relatively little computational overhead for the receiver. it does introduce some fragility as the sender needs to be careful not to get into the same state and repeat nonces, it also raises the question of how far ahead should the receiver generate.
or the receiver can simply check the mac on the packet against known contacts. there’s a bunch of hueristics around which contacts to check, ones that were just recently sending you messages might be more likely to be sending you a message, but ones you haven’t heard from lately might have changed locations. this potentially opens the receiver up to denial of service attacks.
[ebash] puts hashing a 64 byte value with blake3 at about 200-250 cycles on amd64 architectures and 400-500 on aarch64. so as long as the mac is calculated without rehashing the whole message (for instance by hashing the message, and then doing keyed hash on the output) the linear search is not that expensive. a quick check on my computer shows ten million 32 byte hash and compare in one thread in 1.11 seconds. interestingly its significantly higher per hash with only 1000, jumping between ~0.5ms and ~1ms, while 10k is consistantly 1ms.
a saturated 10Gb link could deliver 12M of the smallest possible packets (28 bytes ipv4+udp, 24 byte nonce, zero data, 16 bytes mac tag, 38 bytes of ethernet and framing) per second. a single thread wouldn’t quite keep up with a single blake3 verification. siphash128-2-4 is about 5x as fast, which would leave plenty of headroom for known source verification, but that’s hardly enough to mitigate a linear search through every peer.
one could generate a random elliptic curve scalar for each packet, do ecdh with a key of the receiver’s, xor the result with the identifier, and then send the elliptic curve point and xored data. for a nice 40+ byte payload penalty and 24 microseconds per decryption (using ristretto) or 27 microseconds per decryption (using curve25519).
i guess if you’ve got more than 1000 peers it’s faster. still about a factor of 250 away from the goal of 10M packets per second.
i suppose there is always… mac the whole packet with a key the recipient shares with every peer. a random ristretto point does double duty as a nonce for the cipher. only actually use that ristretto point if the packet comes from an unknown source.
that doesn’t stop a peer for doing anonymous denial of service though.