Hacker News new | past | comments | ask | show | jobs | submit login

No mention of overhead, and finishes with an strace -c and "This option is very useful when trying to find out why a program is running slow." Well, it will run slow if you run "strace -c", because you're running "strace -c".

Because of the overheads, I wouldn't trust the high resolution timestamps also included in the blog post.

Also not mentioned: "strace -e" to select a single syscall doesn't reduce overhead -- you pay the tax for tracing all syscalls anyway.

Because of strace's behavior, there's also been situations in past kernels where it can hang processes and need a kill -9. Your application is now paused while you're madly typing at the command line -- if I did that on one of my instances at work (although I believe that bug has been fixed a long time ago), request timeouts may have the instance fail over before I could finish hitting enter. That's not so bad, but in your environment it could be much worse. You may get such timeouts just from the overheads of strace.

It's one thing to write a post that has errors. It's another to write a post that's dangerous.

This is why I wrote http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-...




Brendan, I appreciate all your contributions to the software tool landscape.

> It's another to write a post that's dangerous.

Huh? It's called "introduction to strace". I agree with everything you said, it's easy to bork things up with strace. But I'd hardly call this article "dangerous."

90% of the time I'm running some script or opaque binary which gives some bogus unclear error or just stops executing (or appears to halt). strace is the perfect tool for finding out which syscall is failing and why.

But, yes, your warnings are good reminders that attaching to a service that's already running can have bad consequences.


It's a tool with an unexpected side effect that it can hurt or kill production. And this post (and others like it) encourages people to use it without a single mention of "overhead" or dangers.

One may not need to belabor the dangers of tools like shutdown, ifconfig, fdisk, etc, since as administration tools one should _expect_ dangers. The worst tools are where the dangers are unexpected. People are going to learn it the hard way if you don't provide a warning.


It looks like this was written Jan. 2014. Your post is a few months later.

The content on your site is great and personally I really appreciate it. Thanks for adding some context here.


> Because of the overheads, I wouldn't trust the high resolution timestamps also included in the blog post.

Naive question why would the timestamps be inaccurate, surely each step of the program's execution should just be slower? Am I missing something about how the timestamp of a syscall would be off?


they're absolutely inaccurate for the reason you mentioned, but they're also relatively inaccurate because not all of the program's execution is syscalls. regular program code runs at the same speed (more or less), but syscalls are made much slower so appear to take more time than they really do.


I think an explanation would be PTRACING a process and tapping on its syscalls will always add some latency , for example time(process(open())) < time(ptrace(process(open))) if that makes sense ? , similar to selinux ... there's a cost to tapping syscalls.


Its too bad no one has written a tool that keeps the familiar strace cli flags interface and output format but under the hood uses perf/ebpf. If such a tool existed I would be much more likely to reach for that than to fall back to strace.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: