Hacker News new | past | comments | ask | show | jobs | submit login
Dtrace for Linux (wildebeest.org)
354 points by espadrine on Feb 14, 2018 | hide | past | favorite | 104 comments



Unfortunately for DTrace, this is too late. Oracle should have done this years ago. Now Linux has a more powerful tracer builtin, eBPF, and it would be a backwards step to switch the kernel code to DTrace (assuming the DTrace port is completed, which it is not). I'm sure this will not be lost on the maintainers, who have the ultimate say as to what is included in Linux mainline.

The only hope for DTrace is to have the frontend emit BPF bytecode. The bulk of this GPL DTrace code is no longer needed, only the user-level front end.


Tangential, but I feel the need to quote Dan Colascione:

"Ah Linux, in which a thread is a process, a semantic patch is a lint rule, and a packet filter is a system profiler."

;)

(but seriously, your work in exploring and expanding it has been wonderful. Thank you!)


What does this mean? I don't really understand the quote.


BPF started as the Berkeley Packet Filter, a language for declaring network packet filtering rules and eventually, a pretty good JITted virtual machine runtime for applying these rules quickly. However, it's since evolved to a generic filtering VM and been applied to system trace and other kernel-level filtering usecases.

Historically, threads on Linux were implemented as process-alike tasks and even had unique PIDs, which caused all sorts of hell for vanilla-POSIX threaded applications.

And, in Linux, code style differences in device drivers led to (rather than a codestyle enforced by an automatic formatter) "semantic patches" - patches that can apply to code regardless of its formatting.


Threads still have PIDs! I have had an issue on some heavily-loaded systems where a process dies but leaves a pidfile around, then its PID is reused for a thread in another process. When i run a restart script for the process, it reads the pidfile, confirms that the PID is alive, and then kills it - so shanking some random completely unrelated process.

I have modified the script to make some more careful checks before killing a PID it finds in a pidfile. Really, we should just use a proper process manager, but that probably won't happen soon.


For ages now, I've wanted NT-style process handles. open(2)ing a process (maybe via its proc directory) should keep the corresponding process alive, even if as a zombie. This way, you'd be able to write perfectly robust versions of pkill without annoying TOCTOU PID reuse races.

Linus has rejected this mechanism due to the ability of an outstanding process handle to prevent process reclaim, permitting users to fill the process table --- but I think this argument is bogus: users not constrained by RLIMIT_NPROC can do that already, and we could count process handles against RLIMIT_NPROC.

Bonus points: allow operations like kill, ptrace, status-reading, etc., via the process handle FD instead of PID-accepting syscalls. This way, you'd be able to pass the FD around as a credential.

Even more bonus points: select(2) support for the FD for easy process death monitoring.


The argument is not bogus, but it may not be realistic. But this is: https://randomascii.wordpress.com/2018/02/11/zombie-processe...


So? How is that different from leaking any other kind of resource? There's nothing special about processes.


We've talked before about FreeBSD's process descriptors and EVFILT_PROC kevents, ne?


Process descriptors are unfortunately a little incomplete -- pdwait() was never implemented. They don't seem to be getting much use.


You don't need pdwait(). You can watch the process descriptor with kqueue to get the process exit status look for EVFILT_PROCDESC in https://www.freebsd.org/cgi/man.cgi?kevent.


Threads are supposed to have PIDs, the PIDs of the processes that they variously belong to. What they are not supposed to have is different PIDs within a single process.

* http://jdebp.eu./FGA/linux-thread-problems.html

And you are relating this to a mechanism that has been known to be fundamentally flawed, in precisely this way, since the 1980s.


You mean, like systemd?


systemd still has a race condition when handling forking servers. There's no way to atomically send a signal to a cgroup, so what systemd does is read the PIDs in the cgroup and then iteratively send SIGKILL to each one. However, between reading the PID and sending the signal is the classic PID file race.

There is a way to atomically send a signal to a traditional process group, however. What I do for my daemons--at least, those which create subprocesses--is have the master become a new session and process group leader using setsid, open a master/slave PTY pair, and assign the new PTY as the controlling terminal. Child processes inherit the controlling terminal from the master, and if the master ever dies then _all_ children with the controlling terminal and process group will atomically get SIGHUP. As long as your subprocesses aren't actively trying to subvert you, it's bullet-proof behavior and more robust than any hack using cgroups.

There's still the issue of figuring out how to kill the master process. Ideally the master process never forks away from, e.g., systemd. (Not sure if systemd will try to kill it directly, first, before relying on its cgroups hack. Also, sometimes becoming a session leader requires forking if, e.g., the invoker already made you a process group leader.) But if the master must be independent, the best way is for the master to be super simple and just have it open a unix domain socket to take start/stop commands.

But let's presume we want a failsafe method in case the master has some sort of bug and we need to send it SIGKILL. (This is what I always assume, actually.) No matter what you do there'll always be a race. However, the least racy way to get the PID using a PID file is using POSIX fcntl locks. fcntl locks provide a way to query the PID of the process holding a lock (no reading or writing of files involved; just a kernel syscall). Importantly, if the process dies it no longer holds the lock and so querying the owner cannot return a stale PID. So when I use a traditional "PID file", I don't write the PID to the file, I just have the master lock it. There's still the race between querying the pid and sending a signal, but at least you're not leaving a loaded gun around (i.e. a PID file with a stale PID written to it).

This method is no worse than systemd and arguably better in some respects.

Oddly, I don't think systemd even bothers with process groups. That's a shame because it's really the only race-free to kill a bunch of related processes. systemd could provide the option to spawn a service within a process group and to send SIGKILL to the process group first before resorting to the cgroups hack to pick up any stragglers (i.e. those that intentionally left the process group). It could even provide the controlling terminal trick as an option. But it doesn't. AFAIK it just use the imperfect cgroups hack.


Right. I implemented the same controlling pty approach in Buck. It's amazing how quickly forget robust older solutions and jump immediately to the new hotness.


I assume this was in response to "Really, we should just use a proper process manager, but that probably won't happen soon".

Yes, like systemd. Our servers even run systemd already!

Sadly, we don't have root on our servers, and user-level service support is broken on CentOS 7:

https://bugs.centos.org/view.php?id=8767

Maybe i could install another process manager of my own, and start it from cron. That really doesn't sound like fun, though. Is there a simple process manager which i can install and use without root privileges, and which won't make me restructure my whole deployment process?


So then it is not correct to say a semantic patch is a lint rule. Rather semantic patches are both tools that attack the same problem from different angles. Arguably semantic patches are far superior if they can really be made to work.


Linux uses Coccinelle (the semantic patch program) as a lint engine --- a job for which it's surprisingly well-suited. Try this:

  make coccicheck


The Linux kernel treats threads no different than processes that share an address space which is unusual in UNIX land.

Semantic patch probably refers to coccinelle - look it up, it's pretty cool, makes it easy to deal with API changes, renames etc.

eBPF comes from extended Berkeley Packet Filter - where packet filter code is JITed to be fast. It's also used for dtrace like tracing along with kprobes to run profiling code in kernel safely.


The Linux kernel treats threads no different than processes that share an address space which is unusual in UNIX land.

They also share a signal handler table, open file table and current working directory. If a process-directed signal is sent to the process any of the threads with the signal unblocked can handle it. If a default signal disposition causes one thread to exit (eg SIGKILL or SIGSEGV) then all threads in the process exit. The exit_group(2) syscall causes all threads to exit (which is used to implement the POSIX exit() libc function).

Really, "threads mostly are just processes" hasn't been true for a very long time. Mostly the only ways in which that is still true is in scheduling, where Linux schedules each thread independently (what POSIX calls PTHREAD_SCOPE_SYSTEM) and in credentials, where threads each have their own set of current, real and effective uids / gids (and glibc has to use a signal-based hack to synchronise them).


The point about the quote was when Solaris and BSDs were all doing special treatment of threads (LWPs/LKWT) for M:N threading, Linux since day one started with treating threads just like processes. The changes to special treat threads to solve problems (clean up manager threads hackery, signal handling improvments, clone syscall improvements, PID semantics etc.) came in later with NPTL as you said but by that time Solaris 9 had already adopted a 1:1 threading model.


"Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads -- everything is simply a runnable task."

https://stackoverflow.com/a/809049

"semantic patches" probably refer to Coccinelle:

http://coccinelle.lip6.fr/

eBPF is extended Berkeley Packet Filter":

https://en.wikipedia.org/wiki/eBPF


As a non-expert, I still find this exciting. While eBPF may be more powerful, I find it far less approachable than DTrace. Perhaps this is just a failure to discover the right documentation. I use DTrace now and then on macOS and have resented that I can't really use it on Linux as well.


eBPF lacks a high level front end, in part because that's optional. Engineering effort has gone into the essentials first: kernel development of eBPF, basic user-level frameworks to use it (bcc), and lots and lots of testing and bug fixes to make that rock solid. bcc is not as easy to code in, but I've been able to port over many of my DTraceToolkit scripts already.

We're only more recently looking at higher level languages, as a nice to have. Here's one project:

https://github.com/ajor/bpftrace

We can also look at taking the /usr/sbin/dtrace code (licenses permitting), and calling into libbcc and emitting BPF bytecode. That'd give us a D language front-end to BPF, so I can run my old DTrace scripts directly (plus sell more copies of my DTrace book -- I joke ;-)

Help on either of those projects would be appreciated (bpftrace or DTrace-BPF).


Am I reading too much into the last sentence, or are they actually trying to use existing probes with dtrace frontend:

> So with all those techniques now available in the linux kernel it will be exciting to see if dtrace for linux can unite them all.


With regards to tracing eBPF has actually moved at a fairly brisk pace the last year or two and it will probably take a while for the documentation and knowledge to permeate.

The usability side you mention is what Brendan's suggesting is worth porting from DTrace, just the front-end that the users sees. That would just back onto eBPF and the user would know little different.


IIRC one of the big things about dtrace vs systemtap is that dtrace could trace a process from userspace to kernel space and then back again. Can eBPF do this? How else do they compare?


That would not be recalling correctly: SystemTap does that, and so does eBPF (via uprobes and kprobes).

A large problem with SystemTap is that it was developed in the open, and in the early days people were scared off with its kernel panics. SystemTap built a negative brand: first impressions matter. DTrace, however, was largely built before its public release, and with the help from many other engineers and test teams. People's first impressions with DTrace was something that was already awesome. Of course, there are other problems with SystemTap: its process of compiling code and loading kernel modules was always risker than an interpreter.

As for how eBPF compares: I summarized it last year http://www.brendangregg.com/blog/2016-10-27/dtrace-for-linux...


> and in the early days people were scared off with its kernel panics.

It is ironic that to this day those kernel panics are associated with systemtap (the visible user interface) rather than the kernel (where the vast majority of the responsible bugs actually lay). But perceptions indeed matter, even erroneous ones.


Note for freshness that systemtap can also emit eBPF directly now. So kernel modules are the fastest & most powerful way, but not the only one.


Thanks. How does the stap language being compiled into kernel modules differ from an interpreter? Both seem to have a limited surface (unless you're doing C directly in systemtap)


An option for correlated kernel and userspace tracing on Linux is LTTng. [0] For a comparison of LTTng and other tracers, check out the LTTng docs. [1]

[0] http://lttng.org/

[1] http://lttng.org/docs/v2.10/#doc-lttng-alternatives


Well, none of my dtrace knowledge works with eBPF, and eBPF is only useful on linux. This is still a very welcome change :)


Yeah, the compatibility element is really important. I've been doing tons of research around illumos, SmartOS, etc., lately and I'm hoping to deploy it in some projects soon. DTrace knowledge becoming properly portable across systems is a great thing.

I've been very impressed with illumos from a technical standpoint, as a Linux admin for over 10 years whose never really had cause to seriously wander into the now-esoteric "proper Unices". Combination of the massive relief that is ZOL and the disastrous aftermath of the [ongoing] Linuxland "container revolution" got me looking for more.

But before I commit to illumos in a substantial way, I'm trying to understand the social dynamics around it more, and whether the community is essentially just crossing fingers and hoping that Samsung doesn't shutter Joyent, as they've done to other open-source acquisitions, especially as alternate distributors like OmniTI have bowed out. So I've been doing lots of sleuthing and trying to piece together some of the drama behind this whole ecosystem.

A lot of the momentum on SmartOS seems like it gave out during the first half of 2014 when Brendan and many other important Sun alumni left Joyent. Bryan Cantrill addressed this in an AMA by saying that he thinks some of them just tired of fighting an uphill battle (which, IMO, is totally understandable and actually doesn't seem like it's too far off the mark), but the close timing with which basically everyone except Bryan Cantrill made this decision makes me wonder if there was some internal event at Joyent that signaled a need to head for the hills.

This theory is inflamed by Brendan's apparently complete abandonment of illumos and the tone he takes when he addresses illumos/Solaris, frequently telling people to avoid it and that Linux is much better nowadays. He didn't leave Joyent and continue to extoll the virtues of illumos or SmartOS, even though his farewell post says that his high opinion on the value of illumos tech hadn't changed. He left Joyent and seemed like he was relieved that he could finally let it all out and talk about the futility of illumos/Solaris.

These days, he frequently leaves comments saying things like (paraphrasing here) "You could use [illumos distro], but I don't know why you would." and "We'd be in trouble if we picked the wrong OS at big companies like Netflix, and none of us are picking Solaris."

It just feels like there are pieces of the puzzle missing here. I get that there is a long and storied history to illumos (as Cantrill points out in one talk, the codebase is older than he is!) and that everyone brings a lot of personal perspective and emotion to it, so I don't expect there to be a simple answer per se, but I just want to feel like I've got my arms more around it.

SmartOS really does seem like a saner system for people who want things that work without constantly fussing, but I don't want to become dependent on a system whose future appears to hinge completely on Cantrill's continued employment with Joyent, especially post-exit, as he is likely bound by golden handcuffs.

Node.js is another Joyent puzzler with its own long and tortured history (though obviously, much shorter than Solaris/illumos). It's strange, imo, that they were ever really involved in that in the first place. I can only guess that they were quick to snatch up Dahl and ride the wave before Google et al could get their wits about them and establish control over that segment, but maybe it has more to do with Joyent's relationship with web publishing via TextDrive, I don't really know. But if SmartOS weren't otherwise so charming, its integration of Node.js for system control scripts would seriously put me off.

Disclaimer: I have no idea what I'm talking about, I wasn't there, I don't know any of the participants involved, and I'm not trying to assign any particular feelings or values to anyone. This is just my read of the ecosystem and dynamic from reading stuff on the internet, and it is very likely wrong. Corrections are welcome.


I'm sorry, but Bryan did not "address" the real reasons I left Joyent. What he said was inaccurate.

I no longer believe in illumos. It made a lot of sense in 2010 when competing with Linux, a lot less by 2014, and very little today.

When we created illumos in August 2010, the latest Linux version was 2.6.35. The cutting-edge latest. Most people were still running older versions, 2.6.32, 2.6.27, etc. Linux had many performance issues at the time: it still had the Big Kernel Lock (BKL), performance issues with mutexes, IPC, ext4, XFS, dcache, VFS, slub, and other subsystems, no transparent huge pages, no NUMA balancing, no inbound or outbound network fanout, and before many TCP improvements: increasing the initial window size, early retransmit, fast open, tail loss probe, autocorking, anti-bufferbloat, and so on. Linux was slow. I worked on Solaris performance, and beating Linux was something I'd expect to do nine times out of ten.

illumos, based on OpenSolaris, seemed like a great idea. You get a faster, more scalable kernel, and you get three great features: ZFS, DTrace, and Zones. At one point we had "ZFS, DTrace, Zones" printed in big letters on T-shirts. I evangelized illumos. I believed in it.

However, all of those Linux performance issues were fixed in the years that followed. With the changed kernel version: BKL (2.6.37), mutexes (2.6.36, 3.10), IPC (2.6.33, 2.6.35), ext4 (2.6.37+), XFS (2.6.37+), dcache (2.6.38), VFS (3.1), slub (3.1), transparent huge pages (2.6.38), NUMA balancing (3.8), inbound fanout (RPS, 2.6.35), outbound fanout (XFS, 2.6.38), TCP window size (2.6.39), early retransmit (3.5), fast open (3.6, 3.7), tail loss probe (3.10), autocorking (3.14), anti-bufferbloat (3.14), etc.

It came down to ZFS, DTrace, and Zones as the differentiating features. If you look at Linux today, and what we run at Netflix, we already have ZFS, eBPF, and containers.

Anyway, I've discussed this topic before, here on hackernews:

https://news.ycombinator.com/item?id=13081465

You might still have some corner case reason to still run illumos. But bear in mind that Linux today, in 2018, is completely, completely, different to 2010 when we created illumos. 8 years is a long time in Linux.

edit: you have got me thinking, however. As someone who promoted illumos years ago (or any technology), and then stopped, I may have a responsibility to properly explain why I stopped (beyond answering questions when directly asked). I was hoping my Solaris to Linux post[1] was the last time I pour effort down the Solaris drain, but maybe I need one more.

[1] http://www.brendangregg.com/blog/2017-09-05/solaris-to-linux...


Just a short counter point.

While i do agree with the progress Linus has made, Linux in most organisations is still problematic in these areas. But i am not sure illumos is better. I think it would but...

Why? Usability and operational cost. You can get containers and ZFS and eBPF... but you need really recent version of the kernel, which are not supported in most distros (there is still a huge amount of machines out there stuck on 2.X line). And they have a really big cost of entry.

Containers are still really hard to use reliably, epoll is still a mess, eBPF still has no high level frontend or books on using it, and ZFS is far from "turn key" ready on Linux.

Can you do it? Yes. But it means your infra team will pay a high cost and there are no real vendor to help. Is it getting better? Alledgedly for some things like eBPF yes.

But for ZFS, no hope until license change. And for containers, i stopped believing we can have zn easy to use, secure by default container system on Linux. Hell Docker kernel panic the host far too much.

So i do think there is a place for something like illumos... but lot of work still to be done.

PS : also virtual networking. We need a real thing here. This is already making some company come back from containers.


Yeah, I don't wanna drag this out in this thread since it's been hashed out many times, but these are the types of concerns that make illumos appealing to me.

You get a system that has been developed and thoroughly tested as a single cohesive whole. It has advanced features that you know work well out of the box, which require a lot of fiddling and custom hackery to replicate reliably on Linux. Anyone who has tried to get LXC working well on Linux will instantly appreciate the ease of firing up a zone in SmartOS; all that crap you fought for hours to make work on LXC works automatically, it works well, and it doesn't hide its guts from you.

I appreciate how the system coheres and operates together in a sensible, straightforward manner, that I trust to work without crashing or getting in my face or breaking between incremental updates.

You can't say that about Linux containerization. You install Docker and then overlayfs is a broken piece of crap. You update your overlay driver, and it's still a broken piece of crap. You try to work and hack around it, and you eventually find something that mostly-works, but it's disgusting and terrible, like having to do eight different commands as a single RUN line with continuations (just had to do this today; saved 200MB on my Docker image).

illumos looks like a potential modern refuge from this, where I can still run Linux containers thanks to LX zones, and where I can still keep the monkeys happy by giving them a Kubernetes and they're none-the-wiser that it's running on something that isn't going to be crapping itself every time I blink.

Crossbow is impressive, as you say. There are a lot of gems and features that are only half-implemented on Linux and a total nightmare to get working, that work out of the box on illumos. At least on paper.

I have a SmartOS VM that I'm playing with. Tried to grab an image from Docker Hub. imgadm doesn't support Docker v2 images. There's a pull request that's been open since September with the changes and no information about it in Gerrit. Tried to poke the Joyent employees who participated in this PR for an update and up to this point, radio silence. Not a very inspiring start.

It would've been a godsend to confidently and quickly be able to pull images from Docker repos and deploy them on SmartOS. Seems, however, that they want this to be hard if you don't use Triton, because they want you to buy Triton (yes, I know this is open-source; the setup demands at least a couple of machines, the "cloud on a laptop" is a VMWare image that spawns multiple components (a la minikube) and I don't want to try to run one of those right now, and I want it to work on plain SmartOS!).

Lots of promise, but sadly, not certain that it's firm footing right now.


Brendan, thanks very much for replying to this. It's great to get clarity around it. Your blog and HN posts have been great resources as I've gone down this rabbit hole, and since I've read so many of them, I know that's not the first time you've heard it. Your work has been hugely valuable to this community.

Since the options for illumos employment are and were scant, the story that you were looking for a change of scenery and it just-so-happened to entail Linux is somewhat believable. I appreciate you clearing up the record on that; it seems like now it's clear that you left over doubts and uncertainties about the value of illumos itself.

The real key question is why, as Linux caught up to these features that you've listed, illumos wasn't pushing the envelope even farther, adding more features that you could continue to get excited about and continue to evangelize.

I think that's the key missing piece for me. It seems like that loss of faith had to be pretty momentous if you were willing to shift attention to Linux and spend years helping to develop eBPF, replicating the core featureset that already existed via DTrace.

So I'm trying to understand how to interpret that, and whether your departure + the departure of other key ex-Sun personnel around the same time should be taken as a clear vote of no confidence in illumos and/or other Solaris-derived products (potentially precipitated by a company-internal event that demonstrated this obviously, but which the rest of us don't know about), or whether that's too much to read into it based on associated personal factors.

This is important to me as a potential platform adherent because as awesome as Bryan seems, illumos is a harder sell when he's standing alone as one of its sole prominent proponents, whereas everyone else who was involved with OpenSolaris back in the day has accepted the reality of Linux dominance and moved on.

-----

EDIT: Hope we don't end up talking past each other in edits, but yeah, the thing that's been hard for me piecing this together is that it doesn't seem like there's any closure. Seemed like in a short period of time, you went from touting illumos frequently to never mentioning it unless it was to tell people that you don't think they should use it anymore. I'm trying to bridge that gap from the breadcrumbs left online, but thus far, coming up without satisfactory answers. There is still a tension here that I haven't been able to resolve.

With the post from last year, you talk about things like it's a forgone conclusion that Solaris and its derivatives are over and done. This makes your position sharper but it doesn't really help with a context or pathway for why that conclusion is so crisp when illumos and Joyent are still apparently viable.

I understand that the tone is conciliatory because you're trying to cheer up the Sun folks who got caught up in the layoff, and show them that there is light on the Linux side of things. But no mention of illumos at all? No "Hey, Bryan has some openings over there if you're still into the Solaris thing, I used to work for him and it was alright, he'll give me 500 smackers if he hires someone and you say you read about it on this blog"? A little weird.

You even do this in the parent comment here, where you describe time spent discussing illumos/Solaris as "down the Solaris drain", i.e., anything Solaris is non-productive. Why is it now "the Solaris drain"? Bryan and Joyent are trying to convince me that it's not a drain.

I know that Linux "caught up", but that doesn't feel like the only explanation here. There's something else. After all, you still speak highly of FreeBSD, which also lags Linux in several significant ways.

If it's just personal, I'm not asking you to get into the details and spread gossip, etc., but would be great to hear from you that "I just personally lost interest in that", "it was just about better employability on the Linux side, Solaris is still cool if that floats your boat, just doesn't float mine anymore", etc. instead of just hearing generic hand-wavy stuff from Bryan, who can't give the real deal as long as he wants to keep his job.

It just feels like there something else here that's not out. I want to know what it is before I start running production applications on these platforms!


I'm definitely not trying to be "hand-wavy"; apologies if it's taken that way.

So let me make this slightly blunter: our goal in SmartOS and illumos is to solve systems problems -- and yes, we do it with the courage to go our own way where we see fit. If that worldview is a match for you, that's awesome. If that's not a match for you, that's fine too -- world domination is not, in fact, our goal. Yes, we are a small community, but that's fine with us: this is all open source; it's not going away.

As an aside, I personally don't understand why anyone would choose to spend their time denigrating small communities; if you want to write your software in (say) OCaml and deploy it on OpenBSD via a Nix derivative, why should anyone spend their time telling you that you are wasting yours? (These are three examples of communities that -- like SmartOS -- are small but technically interesting.)

And more importantly: why should you (or anyone, really) listen to a person who has defined themselves so negatively and so rigidly? Your technology choices -- like the books you like or the movies you enjoy or the ideas you have -- are your business; don't let anyone browbeat your choices out of you simply because they don't match their own.


I see the small community in illumos as a benefit, more or less. I'm not trying to denigrate it for its size. The issue is the attrition, the apparently-negative momentum, and the loss of prominent proponents and distributors, including Brendan, OmniTI, etc. That's bad when something is already small. It makes people wonder why others are fleeing.

My concern, as it relates to this thread, is around the spectacular loss of faith that Brendan has exhibited. You don't go from being a world-recognized authority on DTrace like Brendan to a Linux refugee overnight, especially not 3 years ago when you have to lay out all the plumbing for eBPF yourself.

So something happened. That's clear. I want to know what it is, at least proximally / vaguely, because if it can drive Brendan away from illumos, I assume it can also drive away those of us who are less involved.

If it was simply a personal dispute and had nothing to do with technical direction, that, of course, is one thing that doesn't reflect badly on illumos at all. People burn out sometimes, they want a change of scenery, whatever. But Brendan appears to stringently believe that illumos is now a bad choice in virtually all circumstances. It feels like there's something substantial at play. I don't think he's petty enough to bash the illumos kernel just because he had a spat or wanted to mix things up, especially considering it contains a not-insignificant portion of his own work.

I don't necessarily expect either of you to reveal details, and obviously, I respect that you have both moral and legal obligations as it relates to confidentiality surrounding personnel matters. I'm just talking about my read on the situation, and why I'm skittish, despite what I feel are some strong positives in SmartOS's favor. A world where Bryan Cantrill is the lone SmartOS holdout, after his compatriots have declared it a lost cause, is just nerve-wracking. I'm sure you can understand an outsider's perspective on that.

It'd be cool if you could discuss some of the performance ground that illumos has covered since 2014, as Brendan gave a pretty thorough list of ground that Linux has covered. His tone always seems to be "Solaris is a sad situation, but look guys, there's light at the end of the tunnel." He gives absolutely no intimation that illumos is a viable alternative, though he will entertain FreeBSD. Always phrases it like everyone knows Solaris and its derivatives are fully dead, and he alternates between appearing relieved about it and appearing disappointed about it (I'm sure it's both).

I just want to know why he rules it out so strictly. It's weird, it doesn't add up. Missing variables. I don't think this attitude is sufficiently justified by "Linux caught up, that's all". When stuff is weird, I approach it cautiously, especially since my goal with SmartOS is something simple, stable, and powerful, no fuss. Weird things are fuss.


> A world where Bryan Cantrill is the lone SmartOS holdout, after his compatriots have declared it a lost cause, is just nerve-wracking.

I don't see where you are getting this idea. There are many contributors to Illumos from all over the globe. https://github.com/illumos/illumos-gate/graphs/contributors . Bryan stated in a previous post that Joyent now employs more former Sun engineers than it ever has.

> If it was simply a personal dispute and had nothing to do with technical direction

From the outside looking in when Brendan left, reading both the mailing lists and Twitter, this is exactly what it seemed like.


“A world where Bryan Cantrill is the lone SmartOS holdout, after his compatriots have declared it a lost cause, is just nerve-wracking.“

Bryan Catrill will never be alone for as long as I draw breath: I will single handedly continue working on illumos and SmartOS even if everybody else quits doing that. I have already made up my mind after a lengthy private exchange with Brendan Gregg a while ago.


OmniTI put OmniOS out to its active community: https://omniosce.org/

FYI.


Wow -- there's a lot here! Just to unpack this a bit...

First, there are a bunch of people that you might be unaware of doing important work in illumos and SmartOS (both at Joyent and in the broader community); I recommend taking a look at the actual repo[1], and some of the recent work we've done like LX-branded zones[2] and (more currently) bhyve.[3]

Second, in terms of the engineering team at Joyent: yes, people have come and gone over the years -- but we've grown a bunch since the acquisition by Samsung, adding many engineers from a wide variety of backgrounds. Yes, this has included some ex-Sun folks (we have more ex-Sun now than at any time in our history, if that kind of thing matters to you), but (importantly to me, anyway) it's also included a bunch of people new to the system. These new engineers bring new perspectives and fresh thinking -- whether they've had two decades of experience or are fresh out of school.

Finally, in terms of Joyent and node.js, it might be helpful to see my talk on platform as a reflection of values[4][5]; it offers my perspective on how and why we diverged from the node.js community -- but also why we're okay with that.

Hope all of this helps! Let me know if you have any questions (my Twitter DMs are always open) -- or introduce yourself to the community in #illumos or #smartos on Freenode!

[1] https://github.com/joyent/illumos-joyent

[2] https://www.youtube.com/watch?v=lnesNFulpPE

[3] https://github.com/joyent/illumos-joyent/tree/bhyve

[4] https://www.slideshare.net/bcantrill/platform-as-reflection-...

[5] https://vimeo.com/230142234


For what it's worth, I've been at Joyent for six years now. It's still a great place to work, and we're still fully invested in SmartOS. Samsung has some pretty seriously large computing workloads and they bought us to use our technology.

I would note also that while we have had staff come and go over the years, I don't believe there was a mass exodus of Sun alumni -- indeed, since the recent ahem reprioritisation of Solaris at Oracle we've picked up a bunch of new Sun alumni!

In short, we're keeping the dream alive and doing the work we think is important. The world has space for more than one approach to systems software, and we're keen to keep working on ours!


Disclaimer : I don't work on Solaris nor Illumos at the moment. But I wish I were.

I've been working on Solaris for 4 years, implementing zones and then being "the Solaris guy" in a big software company. We had a bit of RHEL too, as well as VMware, and we had some AIX running, but Solaris (both in VMware and on physical x86 and SPARC) was 90% of systems. Back then I learned about Illumos and learned a lot from the community, from playing around with dtrace, and from browsing the code and watching talks.

It's been 1 year now that I worked exclusively with Linux in my new company. The other day at home I spinned up a SmartOS instance in my virtualbox at home, and in 5 minutes I was able to run different distributions in lx zones, and run dtrace inside them. I knew about those features for a long time but never tried them. What a breathe it was, I was baffled.

I follow the mailing list for smartos and omnios. You'd be surprised how little drama there is in those ; in fact, I don't recall having seen even one, in the past few years since I've subscribed. When OmniTI shut down OmniOS, in a few emails exchanged some people offered to create a community edition and that's it - the system lives. Some contributors (shootouts to rm ! but there are others obviously...), I've seen that can answer about a deep kernel issue in the matter of minutes, and have a fix under testing before the end of the day ; and I'm not talking about paid support here.

Anyway, my point is, if I could find a systems engineer job to do infrastructure work (or any work really) around illumos, I would snap call and go there. No more systemd, nwm, lvm, and poor implementation of docker to suffer. I don't care if the entire system goes down in 5 years (which it won't, that's the beauty of open source), at least I would have spent 5 years working on a reliable, consistent, stable system inside which, when you expect something to work a certain way, it actually works that way.

The post from Brendan, I can understand it. If your company has a strong IT team, regularly patches their systems, implement the new features of Linux, probably Linux is the safe bet. But where I am (leaving soon though), we still have some redhat 5.X (heck we even have some hpux...), and I would much rather have Solaris 10 even. Probably Linux is making lots of progress but the Illumos ecosystem is so advanced and "out of the box" that if I were building a company or advising one on their private infra, I would chose it above anything else, because I know I could scale it to 100s of physical hosts with very little operational overhead. Heck they are even implementing bhyve now, although they already have KVM and invested a lot of energy in that, just because bhyve is better.


Indeed. There's probably some sort of Greek tragedy to be written about the story of tracing and Linux. Or maybe we can just go for a beer at the next LPC and laugh this one out ;)


Better? OK, could you please show me eBPF oneliner code which is analogue of the DTrace command like:

# dtrace -qn 'syscall::write:entry /execname == "mysqld"/ {self->stime = timestamp;} syscall::write:return /self->stime != 0/ {@LWrite = quantize(timestamp - self->stime);} tick-10s {printa(@LWrite);}'

??

Above is now working not only under Solaris but under Oracle UEK Linux kernel as well.


# funclatency -i 10 -p `pgrep -n mysqld` sys_write

Whoops, that was easy. I think argdist can do this as well.

I'd have picked a different one-liner. It's not hard to come up with DTrace one-liners that are currently horribly hard with eBPF and bcc.

Which is why I said we could have DTrace's D as a front end to bcc/eBPF. Or the bpftrace project will do that, and already has similar one-liners. Check it out[1]. Either way, we win.

It's the kernel eBPF code that is more featured. It's great that Oracle is doing more with open source, and we may use some user level components, but Linux already has eBPF in the kernel now.

[1] https://github.com/ajor/bpftrace


Just FTR. Your example is wrong.

[root@domek ~]# /usr/share/bcc/tools/funclatency -i 10 -p `pgrep -n mysqld` sys_write 0 functions matched by "sys_write". Exiting.

The correct one is:

funclatency -i 10 -p `pgrep -n mysqld` vfs_write

Nevertheless, I'm not happy that there this no in this case analogue of the language which I can use on correlating data from different probes. Tools like funclatency cover exact class of the cases and on implementing anything more sophisticated still, I wont to have something like DTrace as frontend.


Oh wow, I had no idea that funclatency and biolatency existed. There aren't enough hours in the day to learn about everything that could be saving me time. Huge thanks!


I'm trying to "upgrade" from valgrind and understand the more advanced tools available but I'm having trouble understanding how perf relates to eBPF.

I can't quite makes sense of your description "BPF makes perf tracing programmatic, and takes perf from being a counting & sampling-with-post-processing tracer, to a fully in-kernel programmable tracer"

Are you saying eBPF can make your programs trigger perf tracing on-demand?


How would USDT probes work with a BPF backend?


Like this: http://www.brendangregg.com/blog/2016-10-12/linux-bcc-nodejs...

Although that's a bit old, and Sasha and others have done improvements, so we probably need some updated USDT tracing docs.


I had somehow missed this; this is excellent news! The Oracle port of DTrace to Linux is (far and away) the most complete port; Kris van Hees and team did a very thorough job[1], and I'm elated (and honestly, surprised) that they somehow prevailed on Oracle to do the right thing here!

[1] https://www.youtube.com/watch?v=NElog3MvUC8


If ZFS were also relicensed, would Illumos be able to relicense OpenZFS as well, even with the divergence?


If they can get in touch with the developers or their employers, then yes. OpenZFS could be fully relicensed as the main reason for not doing it is because of the substantial Oracle ZFS base code not being licensed appropriately.


The majority of OpenZFS is under CDDL 1.0 or later, so Oracle can relicense it by releasing a new version of CDDL that permits relicensing any ZFS-related code to GPLv2. Wikipedia and the FSF did something similar to relicense Wikipedia from GFDL to CC.


OpenZFS can't be relicensed as GPL since FreeBSD won't be able to use it.

Perhaps it could be dual-licensed, not sure, but that would be bad for FreeBSD as well. If it would be integrated into the mainline Linux kernel, I don't believe FreeBSD could merge back subsequent ZFS changes done on the Linux side.

If OpenZFS were relincensed and developed outside the mainline Linux tree (as it is now), FreeBSD would be able to get all changes, but it would still be annoying to actually use ZFS in Linux for the same reason it is annoying to use it now, even though the legal uncertainties would disappear.


There is plenty of dual GPL-BSD (or even MIT) licensed code in the Linux kernel that can be merged back and forth with FreeBSD. Examples are most Intel-contributed drivers (BSD+GPL) and graphics code (MIT).


Instead of relicensing to gpl it could be relicensed to bsd or MIT both of which can be included in mainline Linux kernel and in bsd land too, without compromising future development

edit: stupid phone autocorrect on the train this morning. s/not/MIT/


> ... but it would still be annoying to actually use ZFS in Linux for the same reason it is annoying to use it now.

For those of us who are ignorant of what is annoying to use now (I raise my hand) can you elaborate? I'm fooling around with it on Debian Stretch and Ubuntu 16.04 LTS and haven't yet found it annoying (except for the challenge of finding where the various command line switches are documented...)

I was particularly surprised to find it was an 'apt get' away on Debian, but I do have non-free repos enabled.


Running ZFS (boot&root) on both Ubuntu 17.10 on my home server and Debian Jessie on my laptop.

The ZFS experience between the two is heaven and hell. Ubuntu just ships the zfs.ko and basically everything works 100% of the time, zero effort.

Debian OTOH has to compile the module via DKMS for every minor kernel release... You better track those apt upgrades carefully, and run `dkms status` after each one, or you might end up having to drop into a rescue shell on next reboot, or try booting older kernels until you find the one that works. Also hopefully you didn't remove the last working kernel and/or zfs.ko. ZFS module upgrades are a roulette too (although none have failed... so far). Full disk encryption (including /boot) only adds to the complexity.

I set this up in January 2017, at the time it seemed like the coolest thing on Earth. In retrospect, it was foolish. I wish I went with ext4 for /boot & /, unencrypted /boot, and left ZFS for /home & maybe /var/lib.


Thanks for the info. I don't recall if Stretch requires module rebuilds on kernel updates but that hasn't bothered me in the past. And I always reboot before removing old kernels just to make sure that I don't really need them.

I'm using ZFS on extra storage - not on /, /boot or /home. On one box I do have a ZFS volume mounted to my ~/Documents directory. I did try to install ZFS on the boot device and after thrashing a bit I decided to wait until the installer supports it. I have a file server and remote file server for backups and I'm preparing upgrades for both that put the served filesystem on ZFS and sends incremental snapshots to the remote to keep it synced. The existing system uses `rsync` to synchronize.


I have a ZFS on root Arch setup, unfortunately Gnome 3 is such garbage nowadays that I hardly ever use it. It still doesn’t do display scaling correctly for 4K and it just crashes all the time.


There is something seriously wrong with your setup.


AFAICT, my issue isn't mine alone, Gnome 3 likes to crash when the display is disconnected. Some quick googling shows many other users with the same issue. I have my monitors hooked up to a wifi plug to turn them both off.

I'm running linux-lts and the proprietary nvidia LTS driver (on a GTX 1070), along with all the LTS versions of SPL, zfs, etc. Never had a problem with system stability, just Gnome.

And the display scaling issue, I have 2x28" 4k monitors which means I need fractional scaling. 1x is too small, 2x is too big. Windows can do it, macOS does it, but Gnome does not. I thought this was supposed to be fixed by Canonical for 18.04 but that seems less and less likely. There's supposed to be a way to fiddle with xrandr but with two monitors I've never been able to get it to work.


Fedora here, never had any crash. However, my laptops are Intel GPU only which hotplug just fine, and the desktop, when it had an Nvidia card had no need to be hot plugged. Are you sure it is Gnome crashing, and not X server disappearing from under it?

I also have 27" 4k. Under X11, it is not Gnome's job to do framebuffer scaling, it is display server's. Under Wayland it is, because then the Mutter compositor is the display server. So if you want macos-style framebuffer scaling under X11, use xrandr (i.e.: xrandr --display DP-2 --scale 1.15x1.15).

For the above reason, the scaling coming with Gnome 3.28 is for Wayland only - and that means no Nvidia, until they sort out their API issues.


The biggest issue is actually the fact that both require a gui installer that doesn't support zfs.

Regarding rebuilding zfs isn't that done automatically when you upgrade your kernel?

On funtoo I simply don't upgrade my kernel automatically EVER and when I do its part of a script that rebuilds everything required. Little chance of misfire.

ZFS as root is actually valuable it means you can roll the entire system back at need I have no idea why you would want to give that up.


My debian install automatically builds the zfs module during each kernel upgrade.


FreeBSD is OK with CDDL but not GPL?


Yeah, the CDDL is weak copyleft, meaning derived works don't need to be licensed CDDL. The GPL is strong copyleft, and could force all of FreeBSD to adopt the GPL.


To clarify that a bit: The CDDL is a per-file copyleft; you must provide the sources of the CDDL-licensed source files. The GPL is a per-program copyleft; you must provide the sources of the entire program if it contains GPL-licensed code; the source of the entire program effectively becomes GPL-licensed if you incorporate GPL-licensed code.

If I made a closed derivative of the FreeBSD kernel, I'd have to provide the (CDDL) ZFS sources, but nothing else. If ZFS were GPL, I'd have to provide the sources to the entire kernel.


>If OpenZFS were relincensed and developed outside the mainline Linux tree (as it is now), FreeBSD would be able to get all changes, but it would still be annoying to actually use ZFS in Linux for the same reason it is annoying to use it now, even though the legal uncertainties would disappear

Couldn't it be developed outside the mainline Linux tree but still have improvements ported to the kernel? If I understand things correctly, thats basically how xorg handles drivers.


If Oracle's sources would be GPL'ed all OpenZFS changes would also have to be put under the GPL (or dual licensed) if you want to use those. OpenZFS doesn't have a CLA I think so that might be somewhat of a problem. Then again most ZFS changes were made by people for their employers (Joyent, Delphix, etc.) so it might be that only a few companies (...and Allan Jude ;-) need to give their blessing.


I don't think the license can be done retroactively. I mean Oracle owns copyright to major part of ZFS and they can change license to that. But the contributions done to OpenZFS each developer owns copyright to their own work.

I believe the original code could be treated as either CDDL or GPL, but the code that was contributed before license change would still be under CDDL.

I think OpenZFS project owners would need to track down every contributor and get their permission to change the license. Perhaps it wouldn't be terribly hard with git/svn blame command. The only problem would be if the original authors were no longer available, but hopefully that code could be just rewritten by someone else.

Things would be much easier if every author would surrender their copyright to OpenZFS project, but some people might have problem with it since then you surrender all claims to the code.


> Things would be much easier if every author would surrender their copyright to OpenZFS project, but some people might have problem with it since then you surrender all claims to the code.

This is super-dangerous. There's a reason that copyright assignment agreements are rare. In a lot of jurisdictions, it's not even possible.

Most copyright license agreements instead force unlimited sublicensing grants, which provides de facto ownership of the copyrighted code, rather than de jure ownership.


> There's a reason that copyright assignment agreements are rare. In a lot of jurisdictions, it's not even possible.

How is a software development business possible under those circumstances? In the US one of the core pieces of every employment contract (for software developers) is IP assignment.


I can only speak for Germany. Here, we have a big exception to the normal rule that you own the rights and cannot reassign them: everything you create during work hours under a normal employment contract is automatically the property of your employer. Even if you create stuff outisde work that may be useful for the employer, the company has access and may get the rights (for a fair compensation according to the law).

Also, even though you cannot reassign ownership, you can grant exclusive rights. This is the closest analog to assignment of copyright.


Matches my knowledge, except for the following part:

> Even if you create stuff outisde work that may be useful for the employer, the company has access and may get the rights (for a fair compensation according to the law).

Where does this come from?


I cannot tell you where exactly this stems from in law, but it is roughly the same mechanism as for inventions. If you create or invent something outside of your regular work that may be relevant to your employer, you have an obligation to inform them and offer them rights to it. There are several sources out there (in German) that describe this.


IP assignment itself is a bit of hand-wavy term, that hides a lot.

Those jurisdictions, where you cannot assign authorship, make distinction between being an author and being able to distribute a copyrighted work. It you wrote something, you will be forever the author and you cannot transfer the authorship. What you can transfer is the rights to distribution.


Solaris developers have an unhappy history with CLAs, you are unlikely to see them in that community ever again.


Really cool! However isn't it a bit too late for DTrace to gain market share on Linux now SystemTap and all have gotten such a head start?

For me personally I like the idea of being able to use my DTrace skills (which I gained on FreeBSD and SmartOS) on Linux!


Well, it's never too late!

I mean, when reading articles like this one:

http://www.brendangregg.com/blog/2015-07-08/choosing-a-linux...

Linux tracing looks like a mess, where there're dozens of incompatible alternatives, each one with its gotchas. But instead, dtrace is a mature standard framework which also works under BSD or macOS. You got one programming language, D, and tons of resources out there.

So... my best wishes for this.


It was a mess back then - in 2015. Instead, it's probably more useful to look at at where things are today rather than assuming things are the same as they were a few years ago. Things move fast in this space - for example, just a year later Brendan wrote this on eBPF: http://brendangregg.com/blog/2016-10-27/dtrace-for-linux-201...


It would be very nice to be able to use the same Dtrace tracing skills and interface across a variety of modern Unix systems. I hope it isn't too late.


it looks like the dtrace port can be used by non-privileged users (i'm just guessing because there is some uid checks in there). just be aware there has been a bunch of fixes applied to the illumos version of dtrace and i'm pretty sure at least 1 (maybe more) have not been applied to this version.


Time for ZFS now!

Even better yet, a new OpenSolaris under GPL instead of CDDL.


The new OpenSolaris is called illumos.

And the CDDL is actually fine. Give it a try :)


I use OpenIndiana since its beginning, it's fine but IMHO not there for serious work. FreeBSD with ZFS is miles ahead.

And the CDDL is not fine, as it is not compatible with GPL, which makes it impossible to have it official in the Linux kernel.


That day better never come; If the last thing I dearly love, illumos (there is no OpenSolaris any more) ever gets relicensed under that fascist GPL, I’ll never write another line of code again. That would be the last straw.


I can understand why someone does not like GPL, but calling a license which says you have to pass on the same rights you were given a 'fascist' license makes absolutely zero sense.


The issue with the GPL is, that it does not only requires to pass on the source any (modified) GPL component, but also any other program part which happens to be linked together with the GPL code, but is in no way derived from it. This is pretty violent towards code not strictly related to the GPL code - whether the strong term used is appropriate, is another question.


What does this mean for DTrace on FreeBSD?

Given that DTrace is now GPL'ed, does that kill DTrace on FreeBSD?

https://wiki.freebsd.org/DTrace


You can't retroactively revoke CDDL :)

And new DTrace userspace is actually under the Universal Permissive License now (which is, well, permissive). The Linux kernel module is GPL. I don't think FreeBSD needs anything from that module at all.


The UPL license also has some somewhat "unique" patent grant wording with regard to a "Larger Works" file, which can even reference external projects by name, and its supposed use as a CLA document.

Not being a lawyer (and similarly not a licensing or IP subject matter expert), and know the license was drafted by Oracle, I am somewhat suspicious of it.


Have you tried Sysdig? https://github.com/draios/sysdig


This is good news, I think.

I have heard so many people praise DTrace, but it not being available on Linux has so far kept me from looking at it. I have one FreeBSD box at home, one (ancient) laptop running OpenBSD, but all my other machines run various Linux distros. So there was little point for me to spend any time learning it.

Now there is. :D I think I know what I will be doing with my next vacation.


Thanks God, eBPF is a security nightmare as we saw with Spectre, they even got arrays now. and there is no infrastructure as with dtrace. Will compile it into my kernel for sure for proper dev work.


Doesn't this criticism apply to any software that opens a communications channel between the kernel and user space? Also, how does eBPF relate to Spectre?


Yes, but syscalls are properly designed. drivers are a huge problem, always have been. eBPF on the other hand lacks proper security design. They added it afterwards, to some extent.

> how does eBPF relate to Spectre?

Please read any spectre paper. Besides the known javascript attacks, eBPF is the easiest way to bypass kernel ASLR. google is your friend.


Why is the dtrace language not vulnerable in the same way?


Because dtrace does not have the security holes eBPF has right now. dtrace only enables a bit (e.g. only a hash, no arrays), eBPF on the other hand tries to disable the dangerous stuff, which is always a lost cause.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: