I wrote Spall, one of the lightweight profilers mentioned in the post. I loved the author's blogpost on implicit in-order forests, it was neat to see someone else's take on trees for big traces, pushed me to go way bigger than I was originally planning!
Thankfully, eytzinger-ordered 4-ary trees work totally fine at 165+ fps, even at 3+ billion functions, but I like to read back through that post once in a while just in case I hit that perf wall someday.
Working on timestamp delta-compression at the moment to pack events into much smaller spaces, and hopefully get to 10 billion in 128 GB RAM sometime soon (at least for native builds of Spall).
In my opinion, the best way to interact with ETW is through DTrace. Microsoft’s GUIs like WPA-Xperf are so buggy and unreliable that using them feels utterly futile. DTrace on Windows on the other hand is very usable.
A pretty good overview of open source solutions in the space.
Missing out on one of the most useful areas for tracing which is time travel debugging. There are a number of interesting solutions there taking advantage of hardware trace, instrumentation, and deterministic replay. Even better when you get full visualization integration so you can do something like zoom in from a multiple minute trace onto a suspicious 200 ns function and then double click on it which will then backstep to that exact point in your program with the full reconstruction of memory at that time so you can debug from that point.
Do you know of anyone who's built that kind of time travel debugging with a trace visualization in the open outside of Javascript? I know about rr and Pernosco but don't know of trace visualization integration for either of them, that would indeed be very cool. I definitely dream of having systems like this.
At undo.io we're interested in using our time travel capability beyond conventional time travel debugging - a recording file contains everything the program did, without any advance knowledge of what you need to sample, so there's a lot of potential to get other data out of it.
I just read your post and don't think it would take much to integrate with some of the visualisations you posted about, as a first step.
But that's not quite the kind of tracing you're talking about. We also built a printf-style interface to our recording files, which seems closer:
https://docs.undo.io/PostFailureLogging.html
Something like that but outputting trace events that can be consumed by Perfetto (say) would not be so hard to add. If we considered modifying the core record/replay engine then even more powerful things become possible.
I've seen undo.io several times at cppcon. I've been throughly impressed with the demonstrations at the conference and came to this thread specifically to recommend undo.io. I was particularly impressed this year by a demonstration of debugging stack smashing -- that's something I recently worked around stack smashing in protobuf which happens before `main()` even starts. It seems perfect for undo.io to help debug :)
I'm still waiting on the keyserver to be able to run in Kubernetes though
No particularly good publicly visible documentation of the functionality, but it does that and is a publicly purchasable product.
They also had TimeMachine + PathAnalyzer from the early 2000s which was a time travel debug with visualization solution, but they were only about as integrated as most of the solutions you see floating around today.
Not that I am aware of. They phase in and out of existence every so often because developing the technology is expensive and requires constant maintenance, but nobody wants to pay for tools so they never catch on with enough resources to stay maintained.
As byefruit says above - we (undo.io) sell a Java Time Travel Debugger.
If anybody wants to try it, they should get in touch with us.
Our Java tech is based on an underlying record/replay engine that works at the level of machine instructions / syscalls to record the entire process. On top of that we've added the necessary cleverness to show what that means at Java level (so normal source-level debugging works).
That's different to e.g. Chronon, which I think was a pure Java solution:
https://blog.jetbrains.com/idea/2014/03/try-chronon-debugger...
It had some flexibility (e.g. only record certain classes) but at the cost of quite considerable slowdown and very large storage requirements.
Ah, did not realize you all at Undo did a Java implementation as well. I knew about Chronon which was probably the most well-known Java solution (as much as that means) during that spate of new time travel debuggers at the time, but when I looked it up again for my comment it appeared to be defunct after being largely unmaintained for years.
The short answer is yes - but not as tightly as you'd think. We don't need a deep awareness of what the JVM is doing, e.g. its internal data structures are largely opaque to us.
When we need to reconstruct state we always have the option of time travelling the process and re-executing to drill down on the details, though that's only required when you're replaying a recording.
Hmm, so what do you do to answer questions like "what code corresponds to this address" or "what object is this allocation"? Run the recording, ask the JVM itself using its introspection interfaces in your replay by forking it?
At lower optimisation levels there's a register allocated by the JVM to refer back to the bytecode, which makes things easy. In principle they could change that with a JVM revision - but in practice they don't, so it's an easy cheat.
We have some ability to walk data structures and the re-compute the program's behaviour by other means, which I probably shouldn't get into here. I think we could fall back on that more-or-less completely if we couldn't retrieve the bytecode pointer directly.
The fact the JVM introduces Safe Points to help it transition between optimisation levels is quite helpful!
Our original intention was to always fork a copy of the JVM back in time to handle Java debug protocol requests but that turned out to be painful and, thankfully, also unnecessary.
Conceptually similar in that you can decide after-the-fact what state you want to see.
But Time Travel Debugging applies that to everything in the program, not just log statements - all function calls, variables, memory locations, etc can be reconstructed after the fact without having to log them explicitly.
The author mentions dtrace in passing. If you're into "load bearing rants", check out bcantrill's recent rant on bpftrace silently losing events and why dtrace won't do that.
I haven't actually used bpftrace myself, only BCC. I can totally imagine it being more janky than DTrace, BCC is pretty janky even if I also think it's cool. In my eBPF tracing framework I had to add special handling counters to alert you if it ever lost any events, plausible bpftrace didn't do that.
I think if you're working mostly with tracing/sampling specific applications you'll be more of a BCC person, while if you're hired to diagnose problems in a wide variety of applications then you might learn to like bpftrace more.
That's an absurd comment: eBPF and DTrace exist on orthogonal systems, and most using eBPF have never even used DTrace, let alone "moved on" from it. The systems are really quite different, and have different design centers; for the use case of instrumenting the system for purposes of understanding it, there are many regards in which eBPF remains behind DTrace -- one of which I elaborated on in the rant to which the parent is referring.[0]
That was true 15 years ago. eBPF and DTrace exist on some of the same systems now, Linux and Windows.
>and most using eBPF have never even used DTrace, let alone "moved on" from it
The performance and tracing groups at Microsoft certainty have. Same with Oracle, Netflix, among others.
>The systems are really quite different, and have different design centers; for the use case of instrumenting the system for purposes of understanding it
True, but unfortunately for DTrace, it is too late. Oracle should have done this years ago. Now Linux has a more powerful tracer builtin, eBPF, and it would be a backwards step to switch the kernel code to DTrace. [0]
> I wanted to correlate packets with userspace events from a Python program, so I used a fun trick: Find a syscall which has an early-exit error path and bindings in most languages, and then trace calls to that which have specific arguments which produce an error.
Wow. This is some great engineering. Obviously that's what you'd do, but I'd never think of it in a thousand years!
Me and my team have been working on building an IDE plugin to add the powers of a traditional debugger to your apps running in production - without the overheads and redeployments associated with a traditional debugger.
People use it to analyze arbitrary variables during runtime to understand what is happening in their code. We charge $0 for it.
What a great way to recruit! The ending pitch to join Tristan at Anthropic, if I were competent enough in this area, is very alluring! Tristan does a great job covering the content about the types of things one would be working on.
p.s. I think the blog post could use more screengrabs of the traces. Great first pass at it though, and screengrabs can be added over time!
I wish the industry had a better answer for deterministically profiling the execution cost of JavaScript. Attempts were made in Chromium by hooking into Linux perf, but that change has since been removed.
If anyone has any tips on how to trace JavaScript (not just profile by time, but deterministically measure the cost of it in CI), I'd love to hear tips!
Thankfully, eytzinger-ordered 4-ary trees work totally fine at 165+ fps, even at 3+ billion functions, but I like to read back through that post once in a while just in case I hit that perf wall someday.
Working on timestamp delta-compression at the moment to pack events into much smaller spaces, and hopefully get to 10 billion in 128 GB RAM sometime soon (at least for native builds of Spall).
Thanks for the kick to keep on pushing!