A Pi-Powered Plan 9 Cluster

rcarmo · on March 14, 2019

I’d love to have read more about the cluster configuration, because it looks like the only machine that actually ran Plan9 was the one with the display...

I’ve been lurking in the 9fans mailing-list for years and kept a Pi running Plan9 as a sort of 24/7 “home console” (to log in to other machines) for a long time, but I recently upgraded that to a 3B+ and Raspbian instead due to the lack of a good enough web browser, and having a cluster of my own (now running k3s), I would have liked to see a reference configuration of some sort.

The Plan9/Inferno ecosystem has always been fascinating to me, but it’s so closed upon itself that it’s hard to, say, go out into GitHub (or equivalent) and find actively maintained tools (or even blog posts with updated info), and source archives seem to be maintained by fewer and fewer people...

MisterTea · on March 14, 2019

> , and source archives seem to be maintained by fewer and fewer people...

Untrue, the community is quite active and alive. I've lurked for years, but this past spring I stumbled on 9gridchan. I was hooked and drawn in by it's simple the interfaces, clean api's (just look at the man page for dial(2)), fresh take on a C library, and great documentation.

Second, pure labs Plan 9 is pretty much unmaintained. 9front is the most up-to-date distribution complete with a very active community and a small yet extremely talented group of core developers. You also have 9gridchan which is maintained by another talented individual. 9gridchan is two things: first a 9front fork called ANTS and a publically accessible grid system where users can connect to share files, chat in an irc like chat program which is just a shared text file and the ability to plumb messages, code and images. In addition, there's spawngrid which provides on demand CPU environments similar to Linux containers.

rcarmo · on March 15, 2019

Well, only the other day people were discussing how to keep the archives alive on the mailing-list. And the community hasn’t been very active in the sense of new projects, software ports, etc.

Plus the number of e-mails I get has been steadily decreasing over time...

MisterTea · on March 15, 2019

Depends on which community camp you're in. I'm more in the 9front camp where yes, you have to deal with a lot of younger people, some even young teens. They lurk 4chan and meme like children but it's active and there are a lot of passionate people on there. There is even a 9front discord clent called disco that's written in Go. There is also an IRC bridge (##9fans on freenode).

The 9front camp might be off putting to some but I don't mind. It's good to see young people be passionate about something and actively participate.

calvinmorrison · on March 15, 2019

that and hubchat is where a lot of conversation goes on. #cat-v is a perennial winner of course.

There's plenty of ongoing work. the ANTS stuff in particular coming from mycrotiv is pretty interesting, though as usual somewhat whacky.

MisterTea · on March 16, 2019

He's a really nice guy though. 9front didn't do itself any favors with the arrogance and RTFM mentality so I didnt feel comfortable asking questions. But mycroftiv really makes hubchat pleasant and welcoming along with the others. I felt comfortable asking questions that I felt were dumb, e.g. why does 9fs 9fat fail in a drawterm session? "Oh you just have to `bind -a '#S' /dev` to get the serial ata controller in your shells namespace" Cool people.

avhon1 · on March 14, 2019

Why do you think that only one of the four raspberry pis is running Plan 9? My understanding is that, with Plan 9, you can run it on multiple networked computers, and mount CPU servers, RAM servers, and disk servers from other machines onto your terminal. They seem all set to do just that on this setup.

rcarmo · on March 14, 2019

If you read through the text, it only mentions booting the one machine with Plan9. There is zero detail on the actual clustering part.

Just like you outline, I would expect some mention of which machines in the cluster run cpu, auth, venti, etc., since splitting those was actually one of the primary design criteria for real clusters (there used to be a nice doc on the Plan9 wiki for that).

Also, Richard’s image (and 9front’s) runs all services on the same node. To break those apart and run a cluster, you really need to reconfigure it, and there is no mention of that either.

_lwad · on March 14, 2019

The text is focused on the maker aspect of the project, and not on the software configuration for the cluster which can be found elsewhere. For that second part that you correctly found missing, you'd need to read a little bit more on documentation and other sources. It would be nice though, if he had some links for those interested in further research on the software part.

newaccoutnas · on March 14, 2019

There's thousands of RPI cluster designs, not many Plan 9 detailed guides for said cluster though.

_lwad · on March 15, 2019

Fair enough, but none of them have a glowing Glenda!

rcarmo · on March 15, 2019

Where is the cluster configuration then?

kristianp · on March 14, 2019

Reading this led me to wonder if Go is available on Arm Plan 9. There's a way to cross-compile it [1], it requires a shell script to simulate git [2] (which isn't available on Plan 9) using http calls to github!

[1] https://github.com/golang/go/wiki/Plan9 [2] https://blog.gopheracademy.com/advent-2014/wrapping-git/

ainar-g · on March 14, 2019

It is available. At least https://build.golang.org/ lists plan9/arm as one of builders. Interestingly enough, the source code build help page[1] doesn't list plan9/arm as a valid build target.

EDIT: There is an open issue[2] about updating that page.

[1] https://golang.org/doc/install/source

[2] https://github.com/golang/go/issues/28142

MisterTea · on March 14, 2019

There is a Go port but upstream changes keep breaking builds on 9 so it's been difficult. Same situation lead to Python being neglected though it's still included in 9front but stuck at version 2.5.1. Makes for a nice calculator repl though.

Most of that stems from the use of the Plan 9 C library which is not ANSI but has many of the same functions. So to bridge the portability gap ANSI C/Unix/POSIX is via the APE (ANSI/POSIX Environment) library and the cpp preprocessor.

bibyte · on March 14, 2019

I am slightly disappointed that they finished it just when the system booted up. Why not show off the power of the cluster ?

newnewpdro · on March 14, 2019

Presumably because they didn't actually do much more than get plan9 booting on a raspberry pi sharing an enclosure with other raspberry pis.

rbanffy · on March 14, 2019

Someone should eventually build a browser that runs tabs on multiple nodes of the cluster.

Maybe Slack could release a client that does that, running each connection on a separate node.

FridgeSeal · on March 14, 2019

> Maybe Slack could release a client that does that, running each connection on a separate node.

Why, so it can consume ram on even more machines at once? :P

phoobahr · on March 14, 2019

that's the joke.

sambull · on March 14, 2019

eh, slack should just die in a fire. If I could name the most hated tool in my organization that would be it. Just set that to crap DND check at 5:30pm.

rob74 · on March 14, 2019

"The project provided an excuse to make use of a Vortex Core 47-key keyboard, which together with a mini white HDMI monitor, provided a particularly compact and suitably futuristic feeling setup."

I think the word he's looking for is "retrofuturistic" ;)

rbanffy · on March 14, 2019

It's kind of alt-future.

We invented something better than Unix. We just didn't want to use it that badly.

mruts · on March 14, 2019

Are we talking about Plan 9 or Lisp Machines?

rbanffy · on March 14, 2019

Both fit the bill at different time references. Plan 9 is closer to what we would expect a modern computer would be, with network transparency built-in down to its very core. It's natural to spread your use across multiple machines from a single seat. Authentication, authorization and securing the channel against eavesdropping was not as big a concern then as it is now when every network should be treated as it's public and hostile.

I'm not sure how a Lisp machine of Symbolics or LMI heritage or a Smalltalk system would deal with distributed functionality across untrusted networks. From the language PoV, it seems natural to Smalltalk.

pjmlp · on March 14, 2019

I surely wouldn't like my GPGPU at the end of a network socket, instead of shared memory IPC.

rbanffy · on March 14, 2019

If the data resides on the same node as the GPGPU and you only do queries on it (as you would with a remote Jupyter or something like it) with little data being moved across the network, there is no reason its memory should be directly visible to the local CPU and little benefit if it is.

The only thing is that your OS will have to do some cluster management to make sure data will be close to the programs using it.

mruts · on March 14, 2019

If data is being transferred frequently between RAM, the CPU, or the GPU, it will kill performance anyways. So maybe a GPU is actually one of the better things to have over a socket..

rbanffy · on March 14, 2019

If your OS is able to deal transparently with heterogeneous hosts across different network transports, then the GPU and its memory will end up being treated as just another compute node hooked up to one of the other compute nodes by a ridiculously fast network.

pjmlp · on March 14, 2019

WebGL 2.0 is the best we got so far and it isn't impressive versus what modern GPUs are capable of.

aseipp · on March 14, 2019

Why? Remote GPU compute is a totally viable solution for many classes of problems, and systems like QNX or Plan 9 would actually do it properly and allow you to have powerful setups such as thin clients with little configuration, without constantly reinventing so many wheels.

Case in point, the other commentator mentioning Jupyter is an excellent example. Jupyter is kind of a classic problem where something like QNX would shine: it's a multi-process system that we expose over HTTP to get remote transports to other clients. In QNX, IPC is the remote transport, as well as the local transport, so the distinction between running the Jupyter notebook/kernel locally, split distributed, or all of it entirely on another machine is relatively transparent. This goes all the way up and down the stack -- from the core process layer to the GUI itself (so even GUI programs could be remote, and the desktop protocol proxies the command buffers to you to render locally.) Jupyter, as a system, always has an underlying transport layer for talking between processes, computing and transferring results. So your "GPGPU" being at the other end of a network socket is already a very common case, in fact -- one that it is designed for explicitly (for basically anyone who does DL, for instance.)

In something like QNX I'd be able to simply type the command `jupyter notebook` and the kernel would start on the machine in my other room (Threadripper with a nice GPU) and the notebook UX itself would start locally, they would talk immediately (due to policy/authorization being baked into the IPC/user/process mechanisms -- no HTTP Auth, etc) and there would be no need at the API layer to distinguish between local shared memory or remote network transports. It would always just work. I could boot up a GCloud machine with 8x TPUs and a $10,000 CPU and just "add" it to my network, run Jupyter again, and it would all be the same (except some latency, of course). I could just use a Raspberry Pi as my thin client for most purposes, honestly. Compute resources would be completely disaggregated, more or less.

Jupyter already does things like "compute the ggplot2 of some data on a remote machine, convert to png, tunnel it over HTTP into browser for display" -- what's the difference between using a socket and HTTP? Not much. You could even use HTTP as the layer-7 protocol over QNX IPC, if you wanted...

It's probably not a coincidence that the rise of HTTP as an L7 application layer protocol has happened and exploded in popularity, in retrospect. Remote compute is a vital component of many systems today, and HTTP is one of the easiest ways to accomplish it thanks to the ubiquity of browser protocols (think of how much stuff tunnels over HTTPS now!) All mainstream operating systems make very hard distinctions between remote and local IPC mechanisms -- so you might as well use HTTP, and bind to /run/app/local.sock or 0.0.0.0:443 and just issue GET requests. Boom, you have a local and remote system. It's the easiest way to get "local" and "remote" application transport all in the same purchase, even if it's error prone and crappy as hell.

And, of course, if you are playing a game -- there's nothing stopping you from running everything locally at native speed!

Instead of systems like QNX which elegantly handle distributed computing at the core of the IPC/process/network mechanism in a single place, though -- we basically look doomed to reinvent all of it over bespoke transport/application/distribution protocols throughout the stack. It's a huge shame, IMO.

pjmlp · on March 14, 2019

I said a GPGPU, something capable of DirectX12, Metal, Vulkan, LibGNM fillrates of GBs per second, not a 2D HTML 5 canvas or WebGL 2.0 dropping frames on hardware capable of running GL ES perfectly fine in native code.

rbanffy · on March 15, 2019

If anyone gets interested in using QNX, there is a tutorial:

https://membarrier.wordpress.com/2017/04/12/qnx-7-desktop/

Koshkin · on March 14, 2019

It’s an old(ish) post (mostly showing how to build the enclosure), and the download link pointing at Bell Labs is dead. Plan9 has a new home at https://9p.io/plan9/.

driusan · on March 14, 2019

If you stumble across a labs URL (like in the article), you can usually just replace "plan9.bell-labs.com" with "9p.io". For instance, the link to Miller's RPi build that was linked to should be updated to http://9p.io/sources/contrib/miller/9pi.img.gz

rbanffy · on March 14, 2019

I wonder if one could run it on a ClusterHAT.

https://shop.pimoroni.com/products/cluster-hat

hestefisk · on March 14, 2019

This is very cool. I love the simple windowing environment. It would be great to understand how he takes advantage of the clustered setup with only one workspace. How does he do networking etc.

abhinai · on March 14, 2019

I wonder if this post gave some seminal ideas to a new generation of OS or language builders! Really cool stuff.

pjmlp · on March 14, 2019

Plenty of ideas available in mainframes and safe OS systems that are yet to become mainstream.

FridgeSeal · on March 14, 2019

For those of us unaware of some of the intricacies and unusual/interesting features available in mainframe and other OS’s, could you provide some examples?

pjmlp · on March 14, 2019

Being written in memory safe system programming languages to start with.

Burroughs B5500, nowadays sold by Unysis as ClearPath.

OS/400, nowadays known as IBM i.

Using a database as file system, like OS/400 catalogs.

Containers were invented on OS/360, improved on other IBM models, and still offer resource management features not yet mainstream.

mruts · on March 14, 2019

I’m not entirely convinced that were will be a new generation of OS researchers. This, of course, is unfortunate.

kevintb · on March 14, 2019

Very cool! Like the setup and I always like hearing more about Plan 9 projects.

Jansi · on March 14, 2019

[flagged]

snazz · on March 14, 2019

Please stop posting these test comments. You’re being downvoted because they add nothing to the discussion.