I'm deploying Nix to real ROS robots right now at my day job. We're switching to it because it solves real problems associated with scaling the codebase and development team, particularly for domains adjacent to scientific computing.
Waaaay back in the day, I packaged Gazebo for Nix when I had an internship at a company that made robots. That was when I first dove in with NixOS, and decided that whatever packaging issues I ran into were just something I had to get good at dealing with and solve along the way. Nixpkgs was just a tiny reaction of it's current size back then, so I ended up packaging several things! I ended up needing quite some help in IRC at the time for Gazebo, which I remember as having a very quirky and bespoke build system. Folks there were extremely kind and helpful.
I ended up abandoning the packages when the gig ended. Sorry about that! Pretty cool that a whole ROS environment is well-supported via Nix nowadays. :D
Awesome! Unfortunately the public story is not really that great, actually. There's a single maintainer doing most of it (@lopsided98) and we rolled our own based in part on his work, which we in turn open sourced in October for a ROSCon talk, but are not able to maintain in public long term unfortunately.
So it's doable for sure, but for most mere mortals, Ubuntu LTSes are definitely still the lowest friction path to working with and deploying ROS.
I would strongly suggest you not use Nix for mission critical computing. It is suitable only for hobby use cases unless you are prepared to review 100% of all code yourself, because no one else is.
For robots you should not need more than a Linux kernel and a minimal init shim to your own custom runtime binary anyway.
Bit of an odd take, to hold Nix to that standard when other systems happily pull the entire world down from pypi, npm, cargo, and other sites of unknown review-status. I actually find it easier to audit my dependencies and their patches, build logs, etc under Nix than I ever did under Ubuntu.
> For robots you should not need more than a Linux kernel and a minimal init shim to your own custom runtime binary anyway.
This might have been true in 2005, but it is IMO not aligned to modern realities, where:
- You have a huge list of dependencies, including painful-to-package stuff like OpenCV, PCL, CUDA, Tensorflow.
- You deal with proprietary things like TensorRT, GPU drivers, and vendor tools for flashing firmware onto sensors, PLCs, and the like.
- You rely on the fault isolation and self-monitoring/healing of a multi-process architecture.
- You need to cgroup portions of the system that are critical vs being more spectulative.
- You have a bunch of asynchronous comms stuff going on, like streaming telemetry, logs, crash reports, and other assets. All of this has to be queued up and prioritized.
- You have to supply a user-ready workflow for updating the entire system down to the kernel and bootloader, with downtime measured in single-digit minutes.
None of these requirements will be met by a single binary and init shim solution.
I would never suggest trusting pypi, npm, cargo, etc. Those are all effectively remote code execution as a service. Those tools save you some time -writing- code but you still are on the hook to review it all just as you would review code from a peer. Why would strangers be trusted more than peers?
Operating systems should have a higher standard than random dev libraries. You should be able to trust they already have had a strict cryptographically enforced review process. Distros like Debian and Arch actually have a maintainer application and review process that includes verifying the maintainers cryptographic signing keys. We can cryptographically prove who authored any given package, who approved it, and who approved the approvers.
When your threat model includes supply chain attacks, the only answer is to get really really specific about what you -need- to run your target jobs and ensure it comes from well signed and reviewed sources... then review the edge cases yourself.
As for your other points...
> - You have a huge list of dependencies, including painful-to-package stuff like OpenCV, PCL, CUDA, Tensorflow.
Those could be statically and deterministically compiled into your target application binary, or at a minimum the final build artifacts included in the cpio initramfs which in turn can be statically linked into the kernel. You do not need a full package manager, init system, or even a shell.
> - You deal with proprietary things like TensorRT, GPU drivers, and vendor tools for flashing firmware onto sensors, PLCs, and the like.
Sure. An init shim can do insmod to load custom kernel modules as needed in your initramfs.
> - You rely on the fault isolation and self-monitoring/healing of a multi-process architecture.
Nobody said you have to have a single process. Your pid1 binary can spin off any other processes or threads you need and run reapers for them. A few lines of code in most languages.
> - You need to cgroup portions of the system that are critical vs being more spectulative.
cgroup system calls are very simple to perform in most programming languages
> - You have a bunch of asynchronous comms stuff going on, like streaming telemetry, logs, crash reports, and other assets. All of this has to be queued up and prioritized.
You can include any syslog binary you want for this shipped in your initramfs, or have everything bundle into the kernel stdout over a network where something external does the parsing. I do not know your requirements but there are many many ways to do that. I do not see what NixOS gives you that buildroot, busybox, or a single explicit choice of log collecting daemon cant.
> - You have to supply a user-ready workflow for updating the entire system down to the kernel and bootloader, with downtime measured in single-digit minutes.
If the entire OS is just a lean bzImage with everything you need statically linked into it, then a new one is downloaded to /boot, and then you reboot or kexec pivot. If boot fails roll back. No need for a read/write filesytem other than some fixed directories you can mount in for cache/logs.
I realize a lot of this feels like handwaiving, but I have been doing embedded linux systems for over a decade and have found there is always a path to a super lean, immutable, and deterministic/reproducible unikernels with nothing more than a few easily understood makefiles and dockerfiles.
If you ever want to chat about this stuff feel free to drop in #!:matrix.org
Lot of talk about embedded linux approaches for satellites and hsms in recent weeks.
You don't have to take on all of Nixpkgs to use Nix in that kind of context. Nix hackers have in fact spun off their own, more focused repos for such applications (e.g., NixNG, Not-OS).
It's pretty easy to audit your whole dependency tree with Nix if you want to.
> Lot of talk about embedded linux approaches for satellites and hsms in recent weeks.
There was a talk at this year's NixCon about migrating to NixOS for a weather satellite system: https://youtu.be/RL2xuhU9Nhk
Not taking away from the rest of your post, but the reason you'd trust strangers more than peers for random packages is that the strangers are often some of the best in the world at what they do and your peers are most likely not.
Most programming language package maintainers are hobbists new in their career with no idea what they are doing when it comes to security, or worse, actively malicious.
See the dozens of serious supply chain attacks or massive security oversights in recent years. The overwhelming majority of code in open source is not reviewed by anyone.
I'm talking about the most commonly used packages, not the long tail here.
Tokio for example is clearly maintained by some of the best people in the world at writing async runtimes. It is extremely unlikely that your peers would be able to do a better job at it than the Tokio team.
Someone being good at async runtimes does not mean they are versed in security. Also you have no easy proof the code that the Tokio team wrote is what actually made it into the binaries hosted by the Nix project. That is the nature of increasingly common supply chain attacks. The Nix tooling and package definitions themselves have very minimal supply chain integrity evidence. No author or reviewer signing, etc.
As for my peers, I work with some of the best security researchers in the world, and I myself have found and filed critical CVEs in widely depended on and trusted software like gnupg and terraform. I am not an expert by any means, but just a technical person willing to actually read some of the code we all rely on.
No one bothered to carefully review openssl before heartbleed.
Everyone assumes someone else is reviewing critical code with a security lens. It is always a bad assumption and it gives dangerous people that actually -do- review code a massive advantage.
If you ship you copied off the internet for a critical use case without ensuring it receives qualified review, then you are as responsible for any bad outcomes as a chef who failed to identify toxic ingredients.
The current industry standard on software supply chain integrity is about as negligent as the medical industry before the normalization of basic sanitation practices. Yeah, it takes a lot of extra work, but that is the job.
Most supply chain attacks are pretty orthogonal to whether there's a chain of trust on the git repo containing the package definitions, as far as stuff like poisoning cache.nixos.org with a backdoored binary that doesn't actually match the build definition given.
Anyway, as far as robotics in particular, no one worth their salt is treating the computer or ROS as "trusted" for the purposes of last-mile safety— we're using safety-rated lasers, PLCs, and motor controllers for the physical safety part of the equation. The computer is critical in the sense that it's critical to keep the robot driving and therefore critical for business operations, but it's deliberately not in the loop that keeps humans or property from being physically harmed.