I am (obviously?) biased, but this is a great read by Rain, as it takes the reader through not just some of the illumos tooling, but also how compilers need to bootstrap themselves -- and why heterogeneous platforms are important. (As Rain elaborates in the piece, this issue was seen on illumos, but is in fact lurking on other platforms.)
These are easily one of my favourite types of posts (and this one was particularly gratifying). I wish I could go down these rabbit holes every day!
> [...] and why heterogeneous platforms are important
This prompts me to wonder vaguely whether there's any untapped juice in fuzzing approaches that might be relevant here. As in, how much of the platform (including configuration and heuristics, and so on) could be fuzzed as program input?
Thank you, glad you appreciated the post! I love writing up debugging/incident reports and this one was just really fun.
Regarding fuzzing... maybe? I've wondered that a couple of times myself but in reality there are really just a finite number of platforms, and so much of this is determined at compile time by library call availability. But I'm probably not thinking as deeply about this as someone could be, and I'd be interested to hear other folks' thoughts.
It makes me wonder though if illumos is worth it for a relatively small company to maintain. This bug came out of the larger ecosystem not knowing what to do for a niche OS.
What we're doing inherently requires deep integration up and down the stack. We'd still have to be doing OS-level work even if we used another operating system. But then we'd be at the mercy of upstream of accepting patches, or keeping our own fork, and at that point, you're basically at the same spot we are now, but with less overall control.
> ...But then we'd be at the mercy of upstream of accepting patches,
This point bothers me, but I can't say with confidence that it's completely wrong. I know there are occasional rifts within the open source world, but I wish I knew two things:
1) How much overhead (in totality) is there when contributing to a project you don't control?
2) How different is the end result of collaboration between distinct groups or individuals versus doing things separately?
It depends on the project, and we do contribute upstream to other things all the time.
My comment wasn't so much about the overhead of collaboration, but of the chance of there being significant differences in opinion, leading to a place where we'd basically have to fork anyway. Remember, in this specific context, we're talking about an operating system and hypervisor that are core to our product, and we're building our own hardware.
You can't get one single answer for these questions for the entirety of the open source community. Even cross-language norms can be different. These things are inherently tradeoffs.
I wasn't part of making the decision to use illumos, but having an extensive history of open source contributions I'm confident it is the right one (at least on this axis).
On top of what Steve said, illumos does support all of the required APIs here, but the Rust libc crate was just missing definitions for them. It's not a tremendously exotic platform the way something like Haiku is.
Edit: also worth pointing out (again) that the bug actually exists everywhere -- it was just being masked on the other platforms.
Last year I got sucked into poking at some of the cross building rust issues (specifically issues targeting Solaris and BSD and issues cross hosting on macOS). Illumos isn't terribly exotic but the rust bootstrapping process has a few rough edges that will cut you. Illumos suffers mainly because it's not popular enough to get a ton of attention by the rustc folks and because it's not quite Solaris.
That said, for CI, cross building is much easier to scale than tracking down every permutation. For something like Illumos, that's not too bad. But for Solaris/SPARC? Heh.
Since cranelift-codegen was an optional component that can be disabled,
the x.py tooling could notice if a build failed in such a component
I think there's quite a bit of utility in ensuring that as much of the rust core builds. As cranelift-codegen is optional, the scripts should be able to bundle up a distribution with everything that succeeded.
Edit: Just took a quick look, and it sure looks like cranelift isn't built by default (at least that's what config.example.toml says and the none of the defaults available via 'x setup' seem to override that).
Weird. I'm looking at master right now and `config.example.toml` has this comment:
# This is an array of the codegen backends that will be compiled for the rustc
# that's being compiled. The default is to only build the LLVM codegen backend,
# and currently the only standard options supported are `"llvm"`, `"cranelift"`
# and `"gcc"`. The first backend in this list will be used as default by rustc
# when no explicit backend is specified.
#codegen-backends = ["llvm"]
Now I've not messed with the build profiles at all, and I don't have the repo checked out so digging through it is tedious, but my assumption is the library profiles work by copying everything from src/bootstrap/defaults/config.library.toml into a config.toml at the current directory. There's nothing overriding the default codegen-backends value that I can see.
The defaults for the Config struct are set in src/bootstrap/src/core/config/config.rs and codegen-backends is indeed just "llvm" (line 1167).
Nothing in src/bootstrap/src/core/build_steps/compile.rs appears to override that list.
So that's all very curious (to me). Did the gcc backend get built as well?
Tangentially: a year on and the Github interface is still nasty to use – and one of the big motivating factor for me backing off of hacking on the cross build stuff. Every day seems to bring a new WTF moment. If I could get one thing for my birthday it would be for Rust to wean itself off of Github.
These are pesky. The brute force search is a good idea, in that it breaks that cycle of almost needing to know the answer in order to discover it. (Unless you can surmise that the CWD is the crate dir, but let's assume that we don't want to depend on having such a moment of sheet "eureka!".)
> But there are also other bits of evidence that this theory doesn’t explain, or even cuts against. (This is what makes post-mortem debugging exciting! There are often contradictory-seeming pieces of information that need to be explained.)
I wish more people appreciated this; too many people are apt to sweet such discrepancies under the rug. This post does a good job on not just following through on them, but also showing how figuring some of them out ("why is our stack weird?") leads to the key insights: "oh we're using stacker and … $the_bug".
I do wonder how the author managed to notice that line in a 1.5k line stack trace, though. The "abrupt" address change would have probably gone unnoticed by me. (The only saving grace being a.) it's close to the bottom b.) most of the rest is repetitive, an artifact of a recursive descent parser recursing, and if we just consider that repetition "one chunk", it gets a lot smaller. I still dunno if I'd've seen it, though.)
It was actually my coworker Joshua who first noticed that, and then that dredged up a long-forgotten memory in me of having seen stacker on crates.io. Many eyes make bugs shallow!
Agree about the power of brute force solutions! When you're struggling to get a foothold, those kinds of approaches are extremely helpful. For the cwd thing specifically, it was clear in hindsight, and I should have known about it (having written nextest which has to handle crate cwds carefully). But I don't beat myself up too much over it, and now I know where to look next time this happens.
In BSD land you can specify the path and filename pattern for the core dump via a sysctl knob (kern.corefile).
quick check of the docs
On Solaris or Illumos I'd eye coreadm(8) and maybe chuck something like this in my shell login script:
coreadm -p $HOME/coredumps/core.%f.%p
That'll put the executable name and PID in the dump filename and leave them all in ~/coredumps/… for all processes that are children of that login shell.
To rustc not calling stacker enough/at the right times, the behaviour on MSVC/Windows is for the compiler to rely on hitting the OS's guard page to extend the stack (rather than growing it yourself), but also for the compiler to emit a special routine in any function that uses more than a page of stack frame (to make sure the first thing the function does is poke every page in order, so the OS can grow the stack the right amount).
I believe that's how rustc works on all platforms. rustc's soundness story depends on each call stack having one guard page, and ensuring that each time a frame is created every page gets poked at least once.
Oh, is stacker requesting stack growth that otherwise wouldn't happen? I always just assume address space reservations are cheap/free, and you can just mark a heap (hah) of space for your stacks. I guess it also has the behaviour that you'll find out which programs are stack hogs earlier.
Yes, stacker has to be called explicitly, and in the case of an arbitrarily recursive program at each N recursion levels -- it's not something like a signal handler that sits in the background and gets activated when the thread runs out of stack space.
It's funny the parallel of "detect when the mapped stack region is exhausted, and map the next page" and "detect when the stack region is/will exhaust, and reserve/allocate another region"
You can only grow the stack automatically if you "prereserve" the virtual address space for it. Then it becomes just a question of "committing" the memory at time of need. Default options to mmap() should actually give you this (the OS finds/fills the pages for you on first pagefault), but Solaris/Linux differ on overcommit behaviour.
To "grow" a stack at runtime, if the virtual address space to just add mappings in the "right" place is unavailable (already used for something else, say), it means a stack switch. That isn't entirely impossible. But if done anywhere else but in the "base thread initialisation" (as seen in that example) it becomes complicated and hazardous. It's more like "coroutines" then, your current function "spawns" a func on a new callstack and on return, undoes that and cleans up. One can imagine things like thread-local signal handlers to "simulate" this transparently but that would become more of a piece of programming performance art. If there ever were an "unbelievable unix underbelly programming contest" (UUUPC) stuff like that might well make it.
I took a brief look into Windows public API, even with a tight coupling between compiler/toolchain & OS they don't offer a way to increase the region for a stack after a thread has been created.
But if you're on 64-bit; you can just create the threads with a huge stack size limit (e.g. 1GB) and let the OS handle automatically growing the actual stack size. No need to reinvent the wheel.
Definitely a fun read. Debugging crashes has, in the last decade or so, become something a bit like a "lost art". Noone looks at coredumps in the cloud ...
I don't want to outdo you on Solaris debugging (plenty of old-time Solaris folks at Oxide who are totally capable to show how to get things like open files and their contents from a coredump, or how to configure the system to include those should it not be there ... etc ... etc ... Solaris has the best coredumps for all that's worth ...).
A note on the fix side of things though, while adding pthread_get_attr_np() for stack location/size gives Solaris the Linux interface, it already has its own for those - pthread_attr_getstack{size,addt}(), see https://docs.oracle.com/cd/E19455-01/806-5257/6je9h032l/inde... - I happen to remember this because I used this decades ago somewhere in the Solaris name lookup code to choose at runtime between using alloca() and malloc() ... don't ask. Those were different times.
> Solaris has the best coredumps for all that's worth
I remember debugging a gnarly c++ crash in some vendor code we had extended on solaris circa 1998 and I ended up with a coredump[1] which, when I loaded it into the sun workshop debugger, caused the debugger to dump core. That's one of those moments you go and get a coffee while figuring out what to do next.
One minor meta point if the author is (still) around: there is something strange with the styling of the hexadecimal literals in the code. Instead of having the prefix "0x", they look like "0×" even though they seem to be normal x:es in the source.
> Generally, on Unix systems the default is to generate a file named core in the current directory of the crashing process.
Sounds like a horrible default. That's a security risk (working directory might be readable by untrusted users), and pollutes a random directory with a file that could cause problems for other applications processing files in that directory.
A fixed location inside the user's home directory feels like a much better choice to me.
The default on macOS is to chuck them in /cores (which seems quite reasonable to me).
Security-wise I wouldn't worry too much about the Solaris/Illumos defaults. There, dumps can be created in up to three contexts: system-wide "global", zone-wide "global", and local. All are created with mode 600 and global dumps are created with owner of uid 0. Local core dumps are owned by user that owns the process unless its uid/gid has changed (e.g. setuid/setgid), then the owner is the superuser like the global dumps.
Otherwise yeah I'm not a huge fan of leaving core dumps in the current directory. What if you're doing something on a read-only filesystem?
Very nice writeup and I appreciate the effort put into showing the process. I got nerd sniped yesterday playing around with how to find the isle_opt.rs filepath from the core file and didn't succeed but left some notes on scripting with lldb here https://gist.github.com/aconz2/aef366a7b198b8ac151df147fec32...
DTrace allows a bunch of nicer filtering (for example by process name, or by function in call stack), but in this case truss was just a no-brainer. I'm not as comfortable with DTrace yet as I'd like to be!
yes, truss has been based on / implemented in terms of dtrace since Solaris 10.
(dtrace userland is a library; you can, if you so choose, write your own tooling there. The command is "just a wrapper", a lot of similarities to bpftrace on Linux)
truss(1) is not DTrace-based. It continues to be implemented in terms of the process control and instrumentation facilities provided by the proc(5) file system, as it has been for a long time.
Extremely tangential, but what does something like Illumos/Solaris buy you in 2024 over something like FreeBSD or Linux?
This isn't some passive aggressive gotcha, I'm actually curious what people prefer about the Solaris distros nowadays. I know Zones and ZFS are cool, but FreeBSD supports Jails and ZFS out of the box, but maybe there are cool features I'm not aware of.