I’ve never used eBPF, does anyone have some good resources for learning it?

tanelpoder · 2024-07-03T21:47:57

Brendan Gregg's site (and book) is probably the best starting point (he was involved in DTrace work & rollout 20 years ago when at Sun) and was/is instrumental in pushing eBPF in Linux even further than DTrace ever went:

https://brendangregg.com/ebpf.html

bcantrill · 2024-07-04T00:20:35

Just a quick clarification: while Brendan was certainly an active DTrace user and evangelist, he wasn't involved in the development of DTrace itself -- or its rollout. (Brendan came to Sun in 2006; DTrace was released in 2003.) As for eBPF with respect to DTrace, I would say that they are different systems with different goals and approaches rather than one eclipsing the other. (There are certainly many things that DTrace can do that eBPF/BCC cannot, some of the details of which we elaborated on in our 20th anniversary of DTrace's initial integration.[0])

Edit: We actually went into much more specific detail on eBPF/BCC in contrast to DTrace a few weeks after the 20th anniversary podcast.[1]

[0] https://www.youtube.com/watch?v=IeUFzBBRilM

[1] https://www.youtube.com/watch?v=mqvVmYhclAg#t=12m7s

tanelpoder · 2024-07-04T00:35:25

Thanks, yes I was more or less aware of that (I'd been using DTrace since Solaris 10 beta in 2004 or 2003?)... By rollout I really meant "getting the word out there"... that's half the battle in my experience (that's why this post here! :-)

What I loved about DTrace was that once it was out, even in beta, it was pretty complete and worked - all the DTrace ports that I've tried, including on Windows (!) a few years ago were very limited or had some showstopper issues. I guess eBPF was like that too some years ago, but by now it's pretty sweet even for more regular consumer who don't keep track of its development.

Edit: Oh, wasn't aware of the timeline, I may have some dates (years) wrong in my memory

abrookewood · 2024-07-04T00:37:00

Yes, not involved in DTrace itself, but he did write a bunch of DTrace Tools which led to an interesting meeting with a Sun exec: https://www.brendangregg.com/blog/2021-06-04/an-unbelievable...

anonfordays · 2024-07-04T05:31:06

>As for eBPF with respect to DTrace, I would say that they are different systems with different goals and approaches

For sure. Different systems, different times.

>rather than one eclipsing the other.

It does seem that DTrace has been eclipsed though, at least in Linux (which runs the vast majority of the world's compute). Is there a reason to use DTrace over eBPF for tracing and observability in Linux?

>There are certainly many things that DTrace can do that eBPF/BCC cannot

This may be true, but that gap is closing. There are certainly many things that eBPF can do that DTrace cannot, like Cilium.

tanelpoder · 2024-07-04T05:37:52

Perhaps familiarity with the syntax of DTrace if coming from Solaris-heavy enterprise background. But then again, too many years have passed since Solaris was a major mainstream platform. Oracle ships and supports DTrace on (Oracle) Linux by the way, but DTrace 2.0 on Linux is a scripting frontend that gets compiled to eBPF under the hood.

Back when I tried to build xcapture with DTrace, I could launch the script and use something like /pid$oracle::func:entry/ but IIRC the probe was attached only to the processes that already existed and not any new ones that were started after loading the DTrace probes. Maybe I should have used some lower level APIs or something - but eBPF on Linux automatically handles both existing and new processes.

bch · 2024-07-04T16:16:57

> eBPF on Linux automatically handles both existing and new processes

Without knowing your particular case, DTrace does too - it’d certainly be tricky to use if you’re trying to debug software that “instantly crashes on startup” if it couldn’t do that. “execname” (not “pid”) is where I’d look, or perhaps that part of the predicate is skipable; regardless, should be possible.

tanelpoder · 2024-07-04T19:57:21

For example I used something like "pid:module:funcname:entry" probe for userspace things (not pid$123 or pid$target, just pid to catch all PIDs using the module/funcname of interest). And back when I tested, it didn't automatically catch any new PIDs so these probes were not fired for them unless I restarted my DTrace script (but it was probably year <2010 when I last tested it).

Execname is a variable in DTrace and not a probe (?), so how would it help with automatically attaching to new PIDs? Now that I recall more details, there was no issue with statically defined kernel "fbt" probes nor "profile", but the userspace pid one was where I hit this limitation.

bch · 2024-07-04T21:03:31

> Execname is a variable in DTrace and not a probe (?), so how would it help with automatically attaching to new PIDs?

You're correct, and I may have provided "a solution" to a misunderstanding of your problem - I don't think the "not matching new procs/pids" is inherent in DTrace, so indeed you might have run into an implementation issue (as it was 15 years ago). I misunderstood you as perhaps using a predicate matching a specific pid; my fault.

mgaunard · 2024-07-03T21:54:04

It lets you hook into various points in the kernel; ultimately you need to learn how the Linux kernel is structured to make the most of it.

Unlike a module, it can only really read data, not modify data structures, so it's nice for things like tracing kernel events.

The XDP subsystem is particularly designed for you to apply filters to network data before it makes it to the network stack, but it still doesn't give you the same level of control or performance as DPDK, since you still need the data to go to the kernel.

tanelpoder · 2024-07-03T22:00:41

Yep (the 0x.tools author here). If you look into my code, you'll see that I'm not a good developer :-) But I have a decent understanding of Linux kernel flow and kernel/app interaction dynamics, thanks to many years of troubleshooting large (Oracle) database workloads. So I knew exactly what I wanted to measure and how, just had to learn the eBPF parts. That's why I picked BCC instead of libbpf as I was somewhat familiar with it already, but fully dynamic and "self-updating" libbpf loading approach is the goal for v3 (help appreciated!)

tptacek · 2024-07-04T00:11:05

I was going to ask "why BCC" (BCC is super clunky) but you're way ahead of us. This is great work, thanks for posting it.

tanelpoder · 2024-07-04T00:28:14

Yeah, I already see limitations, the last one was yesterday when I installed earlier Ubuntu versions to see how far back this can go - and even Ubuntu 22.04 didn't work out of the box, ended up with some BCC/kernel header mismatch issue [1] although the kernel itself supported it. A workaround was to download & compile the latest BCC yourself, but I don't want to go there as the customers/systems I work on wouldn't go there anyway.

But libbpf with CO-RE will solve these issues as I understand, so as long as the kernel supports what you need, the CO-RE binary will work.

This raises another issue for me though, it's not easy, but easier, for enterprises to download and run a single python + single C source file (with <500 code lines to review) than a compiled CO-RE binary, but my long term plan/hope is that I (we) get the RedHats and AWSes of this world to just provide the eventual mature release as a standard package.

[1] https://github.com/iovisor/bcc/issues/3993

mgaunard · 2024-07-03T22:13:43

Myself I've only built simple things, like tracing sched switch events for certain threads, and killing the process if they happen (specifically designed as a safety for pinned threads).

tanelpoder · 2024-07-03T22:22:42

Same here, until now. I built the earlier xcapture v1 (also in the repo) about 5 years ago and it just samples various /proc/PID/task/TID pseudofiles regularly, it also allows you get pretty far with the thread-level activity measurement approach, especially when combined with always-on low frequency on-CPU sampling with perf.

tptacek · 2024-07-03T23:45:06

XDP, in its intended configuration, passes pointers to packets still on the driver DMA rings (or whatever) directly to BPF code, which can modify packets and forward them to other devices, bypassing the kernel stack completely. You can XDP_PASS a packet if you'd like it to hit the kernel, creating an skbuff, and bouncing it through all the kernel's network stack code, but the idea is that you don't want to do that; if you do, just use TC BPF, which is equivalently powerful and more flexible.

mgaunard · 2024-07-04T11:34:12

Yes for XDP there is a dedicated API, but for any of the other hooks like tracepoints, it's all designed to give you read-only access.

The whole CO-RE thing is about having a kernel-version-agnostic way of reading fields from kernel data structures.

tptacek · 2024-07-04T16:59:51

Right, I'm just pushing back on the DPDK thing.

mgaunard · 2024-07-05T11:17:54

DPDK polls the hardware directly from userland.

XDP reads the data in the normal NAPI kernel way, integrating with the IRQ system etc., which might or might not be desirable depending on your use case.

Then if you want to forward it to userland, you still need to write the data to a ring buffer, with your userland process polling it, at which point it's more akin to using io_uring.

It's mostly useful if you can write your entire logic in your eBPF program without going through userland, so it's nice for various tracing applications, filters or security checks, but that's about it as far as I can tell.

lathiat · 2024-07-04T01:14:58

I'll toot my own horn here. But there are plenty of presentations about it, Brendan Gregg's are usually pretty great.

"bpftrace recipes: 5 real problems solved" - Trent Lloyd (Everything Open 2023) https://www.youtube.com/watch?v=ZDTfcrp9pJI

jiripospisil · 2024-07-03T21:58:00

There's a bunch of examples over at https://github.com/iovisor/bcc

rascul · 2024-07-03T22:23:32

You might find some interesting stuff here

https://ebpf.io/