As a person, Luther Tychonievich is one of the most interesting professors I ever had.
This was one of my first impressions: he explained to us on the first day of class that (despite (because of?) his having a PhD in computer science) he did not have an internet connection at home and would not be able to respond to our emails outside of working hours.
He has also blogged publicly and extensively[1]. He has written about math and computer science--his explanation of Cantor diagonalization[2] is the first one that clicked with me. On the other hand, he has blogged about his inner life[3]. Though I am personally irreligious, it makes me happy to see him blogging about something so personal and, frankly, unfashionable. I'd hazard a guess that being publicly Christian would probably not be in your interest if you're trying to make a career for yourself in academia, but he has clearly not tried to hide it. I admire the commitment he has to the dictates of his soul.
EDIT: Actually, in retrospect, this is not quite right. It maps every tree to a natural number, but not every natural number to a tree. Regardless, it can be handy in certain circumstances.
Another simpler way of doing this is to encode a tree with n levels as 2^n - 1 bits. The first bit is 1 if and only if this tree has a root. The next 2^(n-1) - 1 bits represent the left tree. The following 2^(n-1) - 1 bits represent the right tree. Recurse as needed until you get to a tree with only one level (which is either a single node when the bit is 1, or is nothing when the bit is 0).
I'm fairly sure this is not optimal, but it is much simpler.
Enumeration techniques can be really useful in surprising situations.
I am working on a planetary terrain rendering system for a game about spaceflight. The technique is based on conforming quadtrees but at no point there is an actual tree structure, it is encoded in a single integer. Starting from an origin point, the tree is traversed by moving to the adjacent subtree or parent node using simple bit twiddling.
The goal is to write a vertex shader than can efficiently generate triangle number n in quadtree m. This will be then fed to tesselation shaders for LOD subdivision.
For unlables rooted trees, there are also the matula numbers which naturally map between these trees and the natural numbers.
Also, if you want binary trees, another way to do that could be to take any bijection between positive natural numbers and pairs of natural numbers (including 0), and,
well, it is fairly straightforward.
Let 0 correspond to the tree of one vertex (no children), and then for any binary tree, associate it with the natural number corresponding to the pair of natural numbers which are associated with the two child subtrees.
The specific bijection used to pair them is up to you. This page seems to describe one example of this.
Knuth also has some interesting material on BDDs (binary decision diagrams), which can be used to encode, count, enumerate and uniformly sample a variety of combinatorial structures.
The concept of entropy pushes one to think of binary trees. Huffman coding used in real compressors use Huffman trees [1] that express the prefix code used to define entropy as in Colah's blog http://colah.github.io/posts/2015-09-Visual-Information/
Entropy is the expected lenght, so it's the weighted sum of the lenght of the paths from root to leaves, weighted by their frequency.
There's a simple recipe for arithmetically encoding recursive algebraic data types (in the functional programming sense) which is related to this.
What you might have seen is Goedel numbering where a finite sequence of natural numbers a_0, a_1, ..., a_n (where n isn't fixed but can vary per sequence) is mapped bijectively onto p_0^a_0 a_1^p_1 ... a_n^p_n where p_0, p_1, ... is an enumeration of the primes.
However, if you want to represent trees instead of sequences, you have a better, simpler option. The key is the existence of a bijective pairing function between N^2 and N, which you can write as <m, n> for m, n in N.
You have a lot of choices for how to construct the pairing function. But a curious fact is that there is essentially one polynomial pairing function and it's the one you saw in class when you learned that the rationals are countable: https://en.wikipedia.org/wiki/Fueter%E2%80%93P%C3%B3lya_theo....
From this perspective it should be clear that bit interleaving provides one such pairing function.
Here's a larger example of how you can use this for encoding algebraic data types. The data type in this article is an unlabeled binary tree:
data Tree = Leaf | Node Tree Tree
I will use square brackets to denote the encoding. Then
[Leaf] = 0
[Node a b] = 1 + 2*<[a], [b]>
So the tag for the sum type is the residue modulo 2, and if the tag is 1 you can divide by 2 (shift right by one bit) to extract the encoded pair of subtrees. If the sum type had three terms you could use 3 instead of 2:
data Tree = WhiteLeaf | BlackLeaf | Node Tree Tree
[WhiteLeaf] = 0
[BlackLeaf] = 1
[Node a b] = 2 + 3*<[a], [b]>
This also works with labeled trees:
data Tree = Leaf N | Node Tree Tree
[Leaf n] = 0 + 2*n
[Node a b] = 1 + 2*<[a], [b]>
If the label type wasn't a natural number, you'd just recursively encode it.
In the above I treated the <m, n> pairing as a black box that can be implemented in different ways (e.g. Cantor pairing, bit interleaving). You can abstract out the encoding of sums as well. You're looking for a bijection between N and Inl N | Inr N and I made the particular choice
[Inl n] = 0 + 2*n
[Inr n] = 1 + 2*n
But you have other options as well, though this is probably the simplest. And once you have an encoding for sums and products you can apply them recursively to encode any recursive algebraic data type.
Note that the sum and pair encoding functions are bijective on the natural numbers, but the corresponding encodings for algebraic data types aren't necessarily bijective, only injective. For the unlabeled binary tree example, the number 2 has a tag of 0 (leaf) but isn't the encoding of any tree. However, the encoding for labeled binary trees is bijective if the label type is bijectively encoded. The problem with this kind of recursively constructed encoding that terminates in finite types is that you obviously cannot have a bijection between a finite set and an infinite set like N.
To tie it back into the article, he uses this encoding:
data Tree = Leaf | Node Tree Tree
[Leaf] = 0
[Node a b] = 1 + <[a], [b]>
You can extend this to my example with two kinds of leaves:
data Tree = WhiteLeaf | BlackLeaf | Node Tree Tree
[WhiteLeaf] = 0
[BlackLeaf] = 1
[Node a b] = 2 + <[a], [b]>
Note that the tag ordering is all-important here. This only works when the tag ordering is "tail recursive". If you used 0 as the tag for Node, then there'd be an ambiguity between leaves and proper subtrees; you wouldn't know if 1 corresponded to a leaf or 0 + encoded_subtree_pair.
So, this kind of prefix-sum encoding is bijective but can only accommodate one unbounded term, which must be assigned the final tag. No tag ordering can work for this:
data Tree = Leaf | WhiteNode Tree Tree | BlackNode Tree Tree
[Leaf] = 0
[WhiteNode a b] = 1 + <[a], [b]>
[BlackNode a b] = 2 + <[a], [b]> // Wrong!
I haven't thought about ordinals for a long time, but this feels related to the fact that 1 + ω = ω is not equal to ω + 1.
Yeah, if you use the Moser-de Bruijn sequence for constructing a pairing function you just get bit interleaving. You define the map expand(x_0, x_1, ..., x_n) = (x_0, 0, x_1, 0, ..., 0, x_n), apply it to x and y, multiply expand(y) by 2 to shift it over, and add them to merge the now disjoint even/odd positions. Incidentally, this decomposition of the problem is exactly how you efficiently implement bit interleaving in software except you don't even need addition:
z = interleave(x, y) = expand(x) | (expand(y) << 1)
For the inverse you define the map compact(x_0, x_1, ..., x_2n) = (x_0, x_2, ..., x_2n) and then
The expand map can be implemented efficiently as a logarithmic number of shift-and-merge stages, and compact can be constructed by inverting each stage separately and composing them in reverse order. A stage has this form:
x' = x | (x << n)
You sometimes also see it written with xor. That's not necessary. What you care about is that the operation is bitwise (e.g. no carries from addition) and it must have 0 as an identity.
If x has 2n bits then this is reversible: the lower n bits in x and x' are the same, and the upper n bits in x and (x' >> n) are the same, so you can recover all 2n bits of x from x' with low(x') | high(x' >> n). You can cascade stages like this if you alternate them with appropriate masking to allow data-parallel operation. The net effect of the first stage, after masking, is to shift the upper n bits of x up by n bits while leaving the lower n bits in place. The next stage then does the same thing on the two halves in parallel with a shift of n/2, and so on until you're done. The masking prevents cross-talk between subvectors so you can operate on a single vector as if it were a set of independent subvectors. Or put another way, it enforces the precondition for the reversibility of each stage.
Anyway, just some random connections to low-level hacking.
I wonder if this can be used to improve an algorithm [1] I worked on a year ago for building nearly-comlete binary trees out of pre-sorted array of items.
This was one of my first impressions: he explained to us on the first day of class that (despite (because of?) his having a PhD in computer science) he did not have an internet connection at home and would not be able to respond to our emails outside of working hours.
He has also blogged publicly and extensively[1]. He has written about math and computer science--his explanation of Cantor diagonalization[2] is the first one that clicked with me. On the other hand, he has blogged about his inner life[3]. Though I am personally irreligious, it makes me happy to see him blogging about something so personal and, frankly, unfashionable. I'd hazard a guess that being publicly Christian would probably not be in your interest if you're trying to make a career for yourself in academia, but he has clearly not tried to hide it. I admire the commitment he has to the dictates of his soul.
[1]: https://www.cs.virginia.edu/~lat7h/blog/index.html
[2]: https://www.cs.virginia.edu/~lat7h/blog/posts/124.html
[3]: https://www.cs.virginia.edu/~lat7h/blog/posts/563.html