Really love when program authors actually explain what is happening behind the scenes like this and even go into details about how they made the small details work.
I may or may not use this program but I did definitely learned another cool thing about Linux just now and that would most certainly be useful to me sometime in the future in an all together totally different context.
This is a clever use of fuse/sshfs and chroots to allow you to run programs on remote computers. However, for many workloads, isn't the network I/O going to outweigh the benefits of remote CPU cycles?
I really think this is a clever hack, but I'm not sure how useful it is. Is it really good for the transcoding example? I could see that being good if the remote machine has a better GPU, but I'm struggling to see how this could work for other use-cases. Maybe compiling (which was one of the main use-cases behind Sun's SGE)?
I do most of my work on HPC clusters, so moving compute out to remote nodes is something I'm very interested in... and I'd love to learn more about what kinds of workflows the authors had in mind.
I can totally imagine wanting to work on a thin client on your home network over a 1gbps network connection. Hell, if you optimized specifically for this and ran a 10gbps line right from your thin client to the server, it would be blazing fast.
Another application I could see is putting a loud server into a closet somewhere and just running network cables to it. Make a 4 GPU render node, then just send jobs to it over your thin MacBook.
Well for starters, no scp command at the beginning and end of each job. I'm sure you could work out an alternative solution, this is just one such solution.
Yeah, this feels like a good solution for when you only occasionally need to get the remote machine to do the work, e.g. the macbook example for simple dev work. In that case it probably is more convenient.
If you were relying heavily on the remote machine, maybe using a pinebook pro or something, ssh or rdp might make more sense.
It's cool to have more options though, I have considered workflows like this before especially in the context of replacing my laptop with something like a pinebook.
Per the readme, architectures aren't cross-compatible, so you can't run commands from an x86 box on an ARM box (and presumably vice-versa). That would rule out the pinebook and most SBCs as potential use-cases.
The thin machine could be your laptop where you'd occasionally want to do a heavy operation on a workstation or server on the data you happen to work on on your laptop. Sounds like a decent usecase.
Of course running it there will work too but will need more setup. So the occasional thing is what really makes this worthwhile.
The said method doesn't support cross-platform executions (at least not without some qemu hackery).
I can totally see how I would use this to transcode videos from my NAS using my main workstation CPU since both are connected through a 10gbs switch though (need to figure out how to integrate this with Plex transcoding flow).
A 4k 60hz 30bpc display requires 14Gbps to feed. The thin client thing is such a meme, all the computing requirements continuously scale - there is nothing "seamless" about them.
A ton of tasks don't require any graphical data streaming, in fact aside from gaming most resource intensive applications can be run in the terminal and thus easily offloaded to another machine. Text editors are already getting distributed code checkers via LSP, it wouldn't surprise me if in a couple years we could run the whole backend of IDEs on another computer and the local part can be more lightweight than most websites.
Then you only need to drive a 4k screen for displaying a desktop and maybe video playback, which can already be achieved with a passively cooled Raspberry Pi.
> it wouldn't surprise me if in a couple years we could run the whole backend of IDEs on another computer and the local part can be more lightweight than most websites.
Checkout the official SSH/Docker/WSL Remote Development extensions to VSCode
Also could use some more polish around handling SSH connections; I often manage to lock up TRAMP when I step away from the computer for 5 minutes. It's something that can probably be fixed with the right ssh_config/sshd_config, but I can't for the life of me figure out the right incantations.
For now, I just habitually do M-x tramp-cleanup-all-connections whenever I stop actively working with remote files, and let Emacs reconnect automatically when I save a remote file or refresh a remote folder listing.
(Incidentally, half of my use of TRAMP went away once I managed to configure a 256-color terminal on my sidearm; with that, Emacs in terminal mode looks almost indistinguishable from a GUI session.)
> It's something that can probably be fixed with the right ssh_config/sshd_config, but I can't for the life of me figure out the right incantations.
Have you tried something like this? It does a common TCP/IP socket for each remote host and does application-level pings every 4 minutes to keep the socket active. Solved most of my "stay connected" issues.
Host *
ControlPath ~/.ssh/cm-%r@%h:%p
ControlMaster auto
ControlPersist 10s
ServerAliveInterval 240
Meanwhile I have a 4G modem at home because it came with higher stability and more than double the bandwidth compared to a similarly priced cable connection. Sometimes the more traditional telecoms are working really hard at pulling an Intel.
I wrote the "more lightweight than most websites" part for a reason. VS Code is already horribly sluggish on my Thinkpad with i7 CPU, I would go totally insane trying to run it on anything referred to as thin client. Might as well go for a full-blown IDE at that point, some of them are actually trying to optimise for low-latency text input.
I think if you were to send the raw frames sure, but wouldn't they just send the delta of the frames (only areas which changed) -- with some sort of compression algorithm and parameters like VNC does?
SPICE solved this (years ago in fact) by doing various sorts of compression, including detection of video, and streaming that over the network. (https://www.spice-space.org/features.html)
I agree that computing requirements scale as we go, but surely you don't need 4k at 60Hz to browse the web now? I regularly play games using steam link now and while it's not the full quality and framerate, it's just convenient (and cheaper than having a full desktop pc in every room). Thin computing seems much more seamless and workable now than it did in my college years with Sun thin clients.
For a typical HPC cluster, there is a shared filesystem with exactly the same file structures across the cluster over a high bandwidth network, which basically replaces "a clever use of fuse/sshfs and chroots" with something far better. OP's project is clearly useless in HPC setting.
On the other hand, OP's project is exactly an linux/posix version of the same as what Plan 9 was designed to do and has been doing.
I miss openMosix[1]... Its been so long, but I remember running kernel compiles on two workstations a long time ago... Double the core count! It actually gave a significant speedup on 100mbps switched lan (I think it was just a crossed cable, no actual switch).
Which is why I'm curious about the applications they had in mind. I know that my workflow isn't typical for most people, so hopefully others have better uses in mind.
For me, I just submit adhoc jobs to SLURM and call it a day. Not everyone has access to an HPC cluster or is comfortable setting such a system up for a small, home cluster.
Anything (like this project) that makes HPC-like processing more accessible, I'm interested in.
When I developed this I was very much thinking of using this to run one-off compute heavy tasks, e.g. compressing a video, running scientific computations or rendering something in Blender. You would just rent a VM for a few hours with the hardware you need, set it up in a few minutes, run your job there and delete it again.
It reminds me of GNU Parallel[0], so I have to imagine it fits for some use cases. I stumbled upon Parallel when I was trying to do user-level "distributed" computing at the day job for running hundreds of simulations in a shorter amount of time. Never really did get it working (I had a hard enough time just ginning up a Python program to orchestrate test cases across local cores; still took 3.5 hours for a full run of tests), and they're currently trying to get a SLURM cluster setup while I'm debugging NVidia binary drivers and overloading our Gitlab instance.
> isn't the network I/O going to outweigh the benefits of remote CPU cycles?
Not if:
a) your network B/W is higher than your local disks (not as unusual as you might think, spinning medium is awful slow)
b) the app you're using chunks up data in a reasonably efficient manner so as to not get bitten by latency issues (ffmpeg encoding likely meets that bar).
> for many workloads, isn't the network I/O going to outweigh the benefits of remote CPU cycles?
Yes, the README addresses this:
> File system performance remains a bottleneck, so the most suitable workloads are computationally constrained tasks like ray tracing and video encoding. Using outrun for something like git status works, but is not recommended.
> Maybe compiling (which was one of the main use-cases behind Sun's SGE)?
The problem is that your task will still be limited by power of a single machine.
You need means to run a task across more than one server, and preferably great many.
It's possible, but completely impractical to run a shared memory system across network.
If somebody figures out how to do that usable in practice, and without complete rewrite of the program that will be really impressive, and an ACM medal material.
It's not trivial, but it's certainly possible. <Un-named IC customer> has been doing level 1 image formation processing for decades using shared memory across a network. They're even doing it in Kubernetes now utilizing libraries from Sandia National Laboratories that use a volume mount of an in-memory filesystem to running containers so you can scale your compute cluster up and down as your memory load varies. This basic setup is powering the entire US geointelligence enterprise. They call it the Tumbo Framework. There's a press release alluding to it here: https://www.sandia.gov/news/publications/labnews/articles/20...
I took the parent comment as meaning on a regular Ethernet+IP network. SGI UV and HP Superdome are semi-exotic hardware architectures. ScaleMP requires Infiniband, which is at least 10 times, if not 100, the price of Ethernet; it also requires fiber optic cable for runs > 5 meters.
I should add to my above comment about the IC doing this kind of thing in Kubernetes on EC2 instances that they utilize the Amazon Elastic Fabric Adapter, which bypasses the ethernet NIC to avoid all of the queueing delay you can't tolerate in applications that need to be basically unaware that some threads are local and some are remote. And obviously they make sure your servers are in the same or physically nearby racks.
UV and Superdome are custom hardware for huge NUMA boxes,so not a great comparison. ScaleMP is definitely valid though - a real shame it is stuck behind a license fee, would be interesting to experiment with.
Somehow I think this may make it possible to "horizontally Scale" the desktop experience. What if I have my main computer/cpu at home and have additional headless computers configured and I set my Linux system so that everything besides the Kernel/boot stuff is spread through the other computers?
I tried an openMOSIX cluster back in the day when it was fashionable. Cluster-Knoppix could transform a bunch of old pc's in an instant SSI cluster with almost zero configuration needed. Though I know it is not the most efficient way, I miss the simplicity of it.
A shell script that simply launches a new process on the most idle node of a network would be enough to get back some of that experience. Live migration would be even better, almost perfect for that. Hope some day I can easily build a bewolf cluster out of my ARM SBC's with an experience similar to the old days of Cluster-Knoppix.
I remember back in the early 2000s thinking how cool it was that I had distributed C compilers helping my Gentoo installs along.
I really wanted to set up a university computer lab such that when computers are idle, their CPU cycles are donated to some massively parallel simulations. Worked great on small scale, but I moved on before we implemented it on a larger scale.
Well, that's just F@H/R@H/BOINC. But I think there's some ethical considerations before you sign up all your computers for that, or at least you should check the electricity cost.
Linux is getting features for process checkpointing and migration in the mainline kernel, see the CRIU (checkpoint and restore in userspace) patchset. It needs proper namespacing of all system resources, but we're not far from that with all the container-enabling features that Linux has.
Hmm interesting idea. I was working on something a couple weeks ago after all the "αcτµαlly pδrταblε εxεcµταblε" stuff. My idea was to be able to run local programs like your favorite fancy shell, but on a remote machine or container that does not have it installed (think lightweight containers you need to work in). The idea was to have a parent program that runs your shell or other client program using ptrace to intercept and proxy all syscalls to a small client on the remote machine/container. So the code would be running locally, but all of the syscalls would be running remotely. I actually got it somewhat working but gave up when I realized that the difficulty in memory and file access. Files in particular were hard since I couldn't disambiguate if a file access was for a "local" or "remote" file. Also in the past I did something silmilar for python programs https://github.com/seiferteric/telepythy
This reminds me of playing with openMosix and ClusterKnoppix back in the day. The kernel would take care of migrating the processes and balancing the load automagically...
Ah man. OpenMosix. What a great piece of software that was. Many people might not know but the developer beyond OpenMosix founded a company around it called Qlusters and then that was the predecessor to Xen/Xensource which he also founded and then he went on to found KVM/Qumranet. To say he left a mark is an understatement.
> Llama is a tool for running UNIX commands inside of Amazon Lambda. Its goal is to make it easy to outsource compute-heavy tasks to Lambda, with its enormous available parallelism, from your shell.
This is very clever. I know that if you are using this solution a lot for the same purpose (let's say rendering) then it would be beneficial to come up with a more efficient setup for your particular application.
But this could be very useful for ad hoc, or just spur of the moment things. Let's say I have a few servers that I have access to. Whenever I need to do something computation heavy and it could be something different everytime. Sometimes transcoding a video, sometimes compiling a library in rust. I just outsource them to the servers and keep using my laptop doing other stuff.
> a more efficient setup for your particular application
Yeah, once you get past ad hoc applications, one common way to set this up is to setup a cluster/batch scheduler. This lets your jobs be run on an available node. You don't get the same chroot/fuse filesystem voodoo, but once you have a cluster, then an NFS mount for applications is a common setup.
Plan 9 is still plenty alive and kicking. My profile has all the links you need to get started. I recommend 9front for pc hardware and either 9front or Millers pi image (based on vanilla plan 9) which supports the pi wifi.
cpu/rcpu does the remote cpu session which can act like an ssh in a way. Plan 9 doesnt do "remote" control but instead imports/exports resources directly. Everything is done through 9p. Very zen OS.
Yeah, surprised more people haven't mentioned this, this is very plan9-esque. Although as a side note for the viewers at home, because plan9 exposed all interfaces as files, the remote cpu server could use all resources on the client machine. Also, because of plan9's exclusive use of static linking and supporting file servers with many arch binaries "superimposed", the "server needs to be same architecture" req could be relaxed.
> Also, because of plan9's exclusive use of static linking and supporting file servers with many arch binaries "superimposed", the "server needs to be same architecture" req could be relaxed.
Plan 9 was designed fro the get-go to easily build and run on different architectures. static linking helps and you do all your communication over 9p so you don't care if the machine is arm, power, x86, mips, etc. The protocol is platform independent and you just rpc everything. So instead of poorly bolting a security library to your program, you let someone experienced with security write a server and let it do the hard stuff. Then you statically lnk a little c lib into your code that just talks to the server.
And file servers are just little micro services running on your computer. plan 9 is more cloud ready and distributed than any poor excuse of an os nowadays.
I'm not sure I understand why static vs dynamic linking isn't completely orthogonal to this discussion; if you had your paths for dynamic linking in as orderly a state as we're expecting our xxx/bin paths to be, then dynamic linking would work just fine. There's no real reason that you couldn't have xxx/lib paths set up paralleling the xxx/bin directories and unioned together in the same way.
The static library fetish was one of the things that helped keep Plan 9 marginal (along with the generalized NIH or "Invented Here But By Bjarne So Who Cares" attitude).
Sun started the dynamic library fetish. Unix never used dynamic libs prior to that.
If you actually took the time to study plan 9 you would understand the architecture and why dynamic libs are not necessary. The idea is to move that logic into another program where it can run in its own namespace, in isolation. You then use pipes or 9p to communicate between them.
I wonder if this works for the Linux version of Dwarf Fortress? It would be an ideal workload - requires a lot of CPU but doesn't do a whole lot of disk or screen I/O. It would be an ideal way to run DF on a laptop.
I love this idea! I've never played Dwarf Fortress, but I just tested it on a fresh vps by setting [SOUND:NO] and [PRINT_MODE:TEXT] in df/data/init/init.txt and it seems to work fine!
I've been frequently thinking over this past decade that an average consumer level distributed computing is the biggest causality of closed, proprietary software/hardware.
Consider compressing several terabytes of disjoint data to a zip file using compute resources of pc, smartphone, tablet etc. at the same time.
I've seen some subtle attempts to make use of distributed computing in consumer space like Apple's Compressor for video encoding.
Of course distributed computing has been a staple in scientific research for a long time. I personally use dask, modin for some data exploration activities when I feel the urge for some distributed computing. Wanted to checkout Julia's distributed computing capabilities but it required similar setup for all nodes but I'm interested in cross-platform(architecture) only.
This is sort of a plot point in Silicon Valley where the Pied Piper team is trying figure out where to physically store files they keep as a remote backup/sync service for a mobile app and end up going with a fully-distributed peer-to-peer solution that uses no servers at all and just stores file shards directly on the phones of all of their users, like the CERN shared CPU time project but for disk space.
Thanks, I haven't watched that show. I'll try to watch that episode or season; don't know how that show is structured.
Common network storage access removed most of the headache with distributed computing in my explorations on that subject(OP is using common network share to execute programs as well).
It goes like this, one of the programmers of the app, hack a smart fridge to irritate their roommate. Then one fine day, just during their biggest demo to an insurance data company, their garage looses power where their servers are located, they hastily build a mobile data centre on a truck. They try to transport it over to their friends lab at a college. Upon reaching there, they notice they forgot to lock the door, and find that their entire equipment got scattered all over the road. Cue some emotional scenes they get a call from insurance guy saying them the app perfomed perfectly well.
Shockingly to these people they realise when one of the programmer hacked the smart fridge he accidentally linked their software to the smart fridge and the smart fridge via its network uploaded the "malware", there by the entire content into its network of all smart fridges there is on earth. Thats when they realise the power of distribution and they go P2P.
Silicon valley season 2 last episode and season 3 first episode, should do, to get this.
Distributed computing is not super useful for video encoding (although it's certainly kind of useful) because there's overhead to ship all the raw video around and put it back together again. If you have multiple files to encode, you might as well just run one on each machine.
I could see this being really useful for the team I'm on, specifically for doing docker builds of rust binaries. The build always has to start from scratch, and being able to easily used a remote high power machine could really speed this up. Thanks for sharing!
Builds don't "have" to start from scratch. If you're building in a docker context you can mount a volume containing build artifacts (target directory). Or `sccache` if you want those artifacts stored in a remote store such as S3. I'm sure there's other solutions as well, but not clearing build artifacts in between builds would be a simpler win over maintaining separate, shared infrastructure.
Ah. The follow-up question would be "how do you do that without root access?" But I now see that the docs say root access is required. I was hoping you'd have some magic way that didn't require it. Thanks for replying
If it's just the chroot part that needs root, that might be offloaded to some stand alone suid components i guess? And/or maybe some of the lxc tools could help.
Actually looking at the (very good!) readme, I'm thinking that the most interesting part is the custom file system.
I've been looking for the "perfect" networked filesystem, and this is the first new effort I've seen that seems to check most boxes (not disconnected operation though, but that is tall order..).
Nice! This is a little like a one-off High Throughput Computing. Could be useful for both media processing, and also I imagine some stuff with large data sets as well.
A long time ago I contributed to HTCondor, which permits long-running processes to be automatically shuffled between N computers: https://research.cs.wisc.edu/htcondor/
Can this be used for building a convergent, dockable phone? What I mean is - if I have, say, a Pine Phone, can I build a dock with its own CPU and RAM, so that when I dock the phone, I get the extra horsepower from the dock, but without the dock, everything still works, albeit a little slower?
It's very impressive that the developer made this work, but it seems it would always be outperformed by simply having a set of uniform Linux servers and common NFS mounts serving as a compute farm, which is how people currently solve the problem.
Not everyone has a farm, some people just have a beefy machine somewhere in the other room that they'd love to have the power of while they work from the laptop on the couch
So you can just put the laptop in your bag and go somewhere else and still have the same environment, just slower. No need to sync filesystems and packages etc.
> No need to first install the command on the other machine.
and
> It must be installed on your own machine and any machines that you'll be using to run commands. On those other machines you must make sure to install it globally, such that it can be started with a command like ssh user@host outrun.
Not sure if the readme is just out of date, or if I'm misunderstanding the initial statement about the _other machine_.
Very cool with the application of `chroot` et al here. Would something like this be possible with remote direct memory access? That is, and for lack of a better word, more "cleanly"? I assume not since you'll need to end up transferring the binary etc which is already explained in the README on their repo as something they _don't_ do.
For people who work mostly on a laptop but have access to various fast, stationary machines, Overrun lets them use their CPUs and GPUs without tirades of "scp -r" or "rsync".
(Of course you should still rsync to your backup machines, but that's not necessarily the same box as your compute server.)
Very interesting concept and approach, mostly thanks to the very generic approach it takes. I wonder if one could implement something similar using some lightweight virtualization instead?
Clustering file requests into batches over time based debounced windows also seems like a good option. I wrote this kind of batching reverse proxy before for HTTP requests.
nice. I like this project. but there's a problem because you assume here my binary would run on the target machine. which is why I just have a shared filesystem and that's it. I see there's potential in the AI industry if you would like to send microtasks to cloud, that would be amazingly simple.
It's interesting how stuff that is totally common and considered so normal it isn't even mentioned is special elsewhere and can be seen as a novelty. The future not being distributed equally also holds for the IT world.
So arbitrage it for money. This is the future that threatens need for centralized power. Get a smart washing maching to run Linux. Buy cycles (pun) when it's idle.
Sadly, everything that wastefully consumes too much of my CPU cycles is usually some bloated graphical app.
At my last job simply watching the logs in TravisCI via Google Chrome would slow my MacBook to a crawl.
At my new job my MacBook is fine until I join a Zoom meeting, especially if I try and screen share.
Anything I actually call from the terminal is optimized enough to never be a bother. If I worked in a space where it wasn't (big data, video transcoding, etc), I'd probably just run a powerful cloud instance and sync my code there.
I may or may not use this program but I did definitely learned another cool thing about Linux just now and that would most certainly be useful to me sometime in the future in an all together totally different context.
Thanks for the software and the explanation!