You know, sometimes i get slightly eye-rolly at the tone of Julia's articles, but fuck, this sort of attitude in programming is so much better than the alternative (i.e. lkml). Bringing more positivity and jolliness into programming is important. Fuck the traditional macho programmer attitude. I'm gonna try to be more like Julia.
Also, I have noticed that she takes simple things we don't usually think deep about and drill down to it opening up underlying complexity and pointing a metaphorical magnifying glass at it. (An example I quickly picked from the blog http://jvns.ca/blog/2014/09/28/how-does-sqlite-work-part-1-p...)
I've done this to debug OpenStack in the past and it worked very well. There are many similar projects for Python, I used this one since it's in the RHEL repo.
When I've needed to dump a stack trace, I've just included an interrupt handler which prints the call stack to a debug log.
That has been good enough for the problem of "wtf is this ruby process doing for _minutes_ at a time?" That doesn't get you flame graphs, but you can take a few snapshots and get an idea of what is happening.
For more involved perf debugging I've used ruby-prof.
“I'm constantly surprised by how many people don't know you can do this. It's amazing.”
I'm probably nitpicking, but sad to see this in the article. One of the things I love about Julia's writing otherwise is that it is free of this sort of 'I'm surprised that you don't know this simple thing.' expressions.
If you're planning ahead you can have your application load rbtrace which then allows connecting to a running process to see what it's doing, with options to limit to slow calls, IO, of specific method calls.
I hadn't heard of perf - very handy indeed. It's in the linux-tools package for distros using apt. You will need to install both the generic package and the one for your kernel version.
Of course there are equivalent programs for C. On GNU/Linux it's pstack aka gstack. Also gcore if you want an entire core dump you can analyze later with gdb.
I'm not sure what you want to do. Is your problem that you do want to see the native Java classes, or you want to hide them and focus on the Ruby code? Or is the key thing you want a Java flame graph?
Last time I needed this I knocked something up that could use the internals of Mission Control or Visual VM to turn their profile formats into flame graphs, but I doubt it would keep working across versions, and it really needed more work to be something anybody else could use sensibly.
It was however very easy to do (maybe half an hour's work) and produced very useful results. If you're using an invokeDynamic based language implementation however you will want to filter a lot of internal stack frames out of the graph, or you'll have trouble seeing past the LambdaForms to what's really going on.
"I was going to paste the strace output of what gdb is actually doing, but it is 20 megabytes of system calls."
I think this is why you shouldn't run this on production server itself. Each call is very resource intensive on the production server.
I believe the right way to analyze memory is to use "gcore", dump the memory, download it to the local machine's VM instance that's running the same OS as the production using scp. Also download the same ruby binary that production is running, and use gdb on the VM to analyze memory dump.