No mention of overhead, and finishes with an strace -c and "This option is very ...

wyldfire · on Dec 1, 2016

Brendan, I appreciate all your contributions to the software tool landscape.

> It's another to write a post that's dangerous.

Huh? It's called "introduction to strace". I agree with everything you said, it's easy to bork things up with strace. But I'd hardly call this article "dangerous."

90% of the time I'm running some script or opaque binary which gives some bogus unclear error or just stops executing (or appears to halt). strace is the perfect tool for finding out which syscall is failing and why.

But, yes, your warnings are good reminders that attaching to a service that's already running can have bad consequences.

brendangregg · on Dec 1, 2016

It's a tool with an unexpected side effect that it can hurt or kill production. And this post (and others like it) encourages people to use it without a single mention of "overhead" or dangers.

One may not need to belabor the dangers of tools like shutdown, ifconfig, fdisk, etc, since as administration tools one should _expect_ dangers. The worst tools are where the dangers are unexpected. People are going to learn it the hard way if you don't provide a warning.

whatupmd · on Dec 1, 2016

It looks like this was written Jan. 2014. Your post is a few months later.

The content on your site is great and personally I really appreciate it. Thanks for adding some context here.

g0xA52A2A · on Dec 1, 2016

> Because of the overheads, I wouldn't trust the high resolution timestamps also included in the blog post.

Naive question why would the timestamps be inaccurate, surely each step of the program's execution should just be slower? Am I missing something about how the timestamp of a syscall would be off?

Hello71 · on Dec 1, 2016

they're absolutely inaccurate for the reason you mentioned, but they're also relatively inaccurate because not all of the program's execution is syscalls. regular program code runs at the same speed (more or less), but syscalls are made much slower so appear to take more time than they really do.

bechampion · on Dec 1, 2016

I think an explanation would be PTRACING a process and tapping on its syscalls will always add some latency , for example time(process(open())) < time(ptrace(process(open))) if that makes sense ? , similar to selinux ... there's a cost to tapping syscalls.

helper · on Dec 1, 2016

Its too bad no one has written a tool that keeps the familiar strace cli flags interface and output format but under the hood uses perf/ebpf. If such a tool existed I would be much more likely to reach for that than to fall back to strace.