Hacker News new | past | comments | ask | show | jobs | submit login
Linux Uprobe: User-Level Dynamic Tracing (brendangregg.com)
93 points by adamnemecek on July 4, 2015 | hide | past | favorite | 15 comments



Can someone explain to me how this works and what the advantages are over a regular user space debugger in a bit more detail?

So I read the article but I don't understand the advantages are? So I googled and found this on uprobes[0] which mentions

> Uprobes thus implements a mechanism by which a kernel function can be invoked whenever a process executes a specific instruction location.

This I don't understand, I was under the assumption that there are two ways to break at a specific location, software breakpoints, which replace the instruction with 0xCC and hardware breakpoints, which I've seen in Olly but have no actual idea how they work.

I just don't understand what role the kernel is playing here exactly. Obviously my knowledge in that area is also fairly limited.

[0] https://lwn.net/Articles/499190/


Depends what user space debugger you mean...

Some specific advantages of uprobes:

- Kernel tracing with user-level context. uprobes works with the other kernel tracing frameworks, so front-ends like ftrace, perf_events, and SystemTap, can trace both user and kernel, and combine the results (especially programmatic tracers like SystemTap). Eg, let's say we wanted disk I/O latency by database query, or scheduler run-queue latency by application request.

- Full user-level visibility. You may have a language runtime tracer that shows what the language is doing (and do so better than uprobes can), but not system libraries. Eg, Java burning time in a compression library, or even its own libjvm for GC, which may not be seen (in the same manner) as method tracing.

- It's there by default. At my company people can run anything. So if there's a performance issue on an application I've never seen before, I have one way to dig in, even if there are no other options.

- Some debuggers are not made for real-time production use, as they halt the target. With uprobes, we can pose a question of the running software (latency of X, arguments of Y), and answer it quickly, with relatively less overhead.

- It can trace system wide. (I'm not sure many other debuggers can.) Eg, you could trace libnsl calls, across all processes.

Some advantages of other user space debuggers:

- User to user tracing is more efficient than calling the kernel. LTTng has a user space implementation which beats the performance of user->kernel tracers by some factor. Some runtimes, like Java, have plenty of user-level tracing add-ons that are also much more efficient. (It's possible, like with LTTng, that uprobes could be implemented to do user->user tracing, and combine results afterwards. I don't know the status of this.)

- Custom user-level tracers (eg, with Java) can be better developed to handle the target language and context. Tracing Java methods with uprobes is extremely difficult (I have an idea of how to do it), but trivial with Java tracers designed to do that. (I should add: tracing Java native calls, like the workings of GC in libjvm, is well suited for uprobes.)


funny i was about to comment on this and then i realized the guru has come to comment on his post :P

i think the parent might find this article informative

http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-...

brendan, what happened to ktap, have you stopped using it? i know it almost made it into mainline, but apparently as long it's using a different architecture than perf it's a no go. but it also seemed to be a little performant because of that


ktap was really promising, but when eBPF began kernel integration, ktap development was postponed until it could be rewritten to use eBPF. eBPF itself is still being integrated, part by part. I hope there's enough eBPF to restart work on ktap (or something like it) this year. ktap might have just missed out by unlucky timing. But hopefully we'll end up with a better tracer (ktap+eBPF) after the delay. I still want to see ktap finished.

I've spent so many hours with Linux tracing (and talking to its developers), I think my next post will be "Choosing a Linux tracer (Jul 2015)", where I briefly summarize the current state of tracers, and make recommendations. I think it might work as a blog post, with a clear timestamp, since it's a topic that changes from month to month.


I just wish they integrated dtrace into linux back in the day. It would have received a lot of support and improvement.

Now every platform has its own complicated set of tools. Even dtrace on osx is not the same as dtrace on freebsd :(


Linux integration would have been tricky, since Sun chose a license that they knew was incompatible with the GPL from the get go (http://www.slideshare.net/brendangregg/from-dtrace-to-linux/...). But yes, would have been nice! :)

Having such fragmentation in the Linux tracing space makes my job tricker, but for a lot of end-users it won't ultimately matter, given front-end analysis tools. In my current job, the team I'm on is building such a front-end analysis tool, and for a lot of end-users, they won't care much what the underlying tracer is, provided it meets their requirements.


agreed. Abstracting the tracers to provide similar information is a challenge.

I had a plan once to create a SaaS around dtrace like tools but it's just too fragmented and couldn't find a common ground. Not to mention that most users don't have the proper debugging symbols around, nor the required kernels. So I bailed. :)


Brendan answered with advantages, but for how: it's still just a software breakpoint, int3/0xCC on x86 as you say. But the round trip of dealing with that breakpoint is much tighter, because the handler function is called directly in the kernel trap, without even a context switch. Uprobes has about the minimal overhead that a software breakpoint can possibly have, whereas involving a userspace debugger requires a bunch of syscalls and context switches every time.


I'm not sure how this works but from no mention of it being limited to 4, it's probably some sort of software mechanism; OllyDbg hardware breakpoints use debug facilities of the hardware that have been there since the 386:

https://en.wikipedia.org/wiki/X86_debug_register

The limitation of hardware breakpoints is that there are only 4 available, but it causes less disturbance than replacing instructions with int3s.



huh, i'm surprised the repost checker didn't catch this


I read it as up-robe at first and thought "that's a clever name - it's like peeking up a robe" - and only realised that it was meant to be "u-probe" when it mentioned kprobe.


Brendan, are you working out of the Los Gatos office?


Yes


Nice. Big fan of your work; I got interested in performance analysis at EarthLink, a Solaris shop at the time.

The culture at Netflix is appealing; if I wasn't building my own company, I'd apply.

Thanks for everything you do and keep up the great work!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: