Representing Graphs by Knuth Trees (1974) [pdf]

tunesmith · on June 20, 2020

I have a hobby project that deals a lot with graphical data structures. Think like choose-your-own-adventure-stories, mostly hierarchical, but with some DAG and even an occasional loop (but those are annoying). Each node just has a list of what it points to, and all my programming just does a lot of recursion. It's doable; while I have a number of graphs, none of them will have more than 1000 nodes, so I'm not actually hitting performance problems. But I wonder if I'm missing something; each time I think of a new graph-based feature (show only the part of the graph that leads to this category of nodes; display the graph while hiding everything downstream of node X; possibilities are endless) I'm spending hours doing recursive programming making sure I get it just right. I know there are plenty of graph algorithms out there but they're all quite academic and it's hard to grasp the utility of each one - even here, I grasp what the Knuth Transform is but it's harder to grasp what it gains me. Is there something I'm missing for a good next step, like graph theory for dummies?

Also, I like storing my data in mysql for other reasons so I don't want to switch everything wholesale to a graph database, but I wonder if I should be loading the graphical part of the data into something in memory, like a graph cache, that I can easily query rather than having to do all the recursive programming myself. Does something like that exist, a simple free library that acts as an in-memory graph db?

BorisTheBrave · on June 20, 2020

It shouldn't take you hours to code these new features. And 1000 nodes isn't enough that you need fancy performance from a library.

I suspect you want to extract a few basic functions from what you've already written, and then combine them for new features. The posibilities are not endless, there's only so many things you are likely to want to do on a graph, and the vast majority (like TFA) are not useful to you

Here are some common operations: * list everythign reachable from a given node, depth first order * list everythign reachable from a given node, breadth first order * make the reverse graph (edges point the other way) * find the strongly connected component a node is in (a more general way of identifying loops)

Then you can build more complex queries out of those, rather than coding them every time. e.g.

> show only the part of the graph that leads to this category of nodes

get the reverse graph, for each node in the category, find everything reachable in the reverse graph, and union them together in one set

> display the graph while hiding everything downstream of node X

get the set of everything downstream, then list everything depth first, skipping stuff in the set.

It's not super efficient, but I don't think you care.

LouisSayers · on June 20, 2020

For a simple graph app I don’t see why you’d need anything more than what you’re doing plus maybe some Maps of category names to nodes and a hash table of node ids to nodes for quick lookups.

It doesn’t sound like too much work to create your own in-memory cache - if you even need it...

Graph dbs are more for searching for patterns in your data. In your case it sounds more like you just want to traverse the graph and display a subset of it.

refset · on June 20, 2020

Sounds like DataScript [0] could be a good fit for you. The Datalog query language is very powerful for writing recursive rules.

[0] https://github.com/tonsky/datascript

reese_john · on June 20, 2020

I have used Python's NetworkX before, and it was a great experience. It has a neat API, good performance and plenty of graph algorithms implemented.

867-5309 · on June 20, 2020

mysql databases can be stored in memory for non-critical data

george_ciobanu · on June 21, 2020

check out Datomic and Datalog.

svat · on June 21, 2020

The “Knuth Trees” or the “Knuth transform” that the article refers to in the first sentence is the subject of The Art of Computer Programming section 2.3.2 “Binary Tree Representation of Trees”. (This is under 2.3 “Trees” in Chapter 2 “Information Structures” of Volume 1 “Fundamental Algorithms.”) In TAOCP it is just called “the natural correspondence between forests and binary trees”.

The idea, roughly, is that corresponding to each node of the original tree (or forest), in the (resulting) binary tree, the left child denotes “first child [in original tree]” and the right child denotes “next sibling [in original tree]”. In TAOCP there are 15 pages about this idea (including exercises) (plus 4 pages of solutions). In this paper here, the author calls it the Knuth transform, and extends that idea to a larger class of graphs. (This is all in the first paragraph of the paper, but just posting this here in case the context is helpful to someone, as I didn't know all this before just now.)

mathgenius · on June 20, 2020

If you like that then you are going to love this:

"Seven Trees in One" by Andreas Blass. https://arxiv.org/abs/math/9405205

einpoklum · on June 20, 2020

Why is it useful, today, to represent graphs using Knuth trees? Especially when not all graphs are thus representable?

taeric · on June 20, 2020

I think this is what he called threaded trees. Just space efficient.

HTML dom has the necessary links. Let's you do dfs without a stack.

enriquto · on June 21, 2020

I guess you can always assign an orientation to an arbitrary graph so that it becomes representable? Some implementations of sparse matrices look essentially the same as the knuth transform.

ktm5j · on June 21, 2020

Hey, I used to run this webserver! John Pfaltz is a nice guy, glad to see his work up here.