I recently had to test compilation of a project for work on FreeBSD and feel like my eyes were opened.
I went into the task thinking "the BSDs are OLD" yet there must be something great about this operating system. It's beloved in certain communities and used at popular successful startups such as Netflix. There must be something I've been missing that those who cherish it have come to understand.
What I found was the perfect blend of modern and old school. It's not old feeling at all!
It had all of the newest GNU software (and otherwise) that I've come to know and love on other operating systems but also a sense of stability about the core that you don't get with Linux. There was a sense of underlying structure that had actually been planned out instead of discovered over time and it made the whole process of learning about how it worked a pleasure.
In addition, the FreeBSD manual was actually helpful and gave me a sense of completeness rather than "the text in this wiki is just scratching the surface of a complex wrapper for x that used to be y".
It's simple yet powerful, up to date but solid and I'd highly recommend FreeBSD as a result of the experience.
You have to install some things, poke around the system, do some general management, etc... I'm not claiming to be an expert but I really liked what I saw especially in the documentation (it looks outdated but the content is absolutely on point)
In addition (or even instead of) to htop I strongly recommend atop [0]. This tool has been of an invaluable help to me during a lot of diagnostic sessions.
It can collect detailed memory usage profile of processes and when combined with some smart scripting it has a nice leak detection functionality [1]. Very useful when you run out of memory and want to find which daemon has used all of it.
An ncurses disk usage analyzer: http://dev.yorhel.nl/ncdu
I find this an extremely handy alternative to du, it's somewhat similar to TreeView on Windows.
While most sysadm books are years out of date, this one covers all the hot recent stuff like Dtrace and its equivalents on Linux, pidstat etc. Solid coverage and the author (from Joyent) knows his stuff. Available on Safari too.
If you have multiple nodes, I recommend new relic. It's a bit pricey, but if an issue arises in your stack, new relic can help you immediately pinpoint where and what the issue is.
ps. I can view new relic on my phone, so if I get a pagerduty, I can still see what's up if I'm at the beach.
Sounds like Munin would suffice for you. Also, it's free.
munin-monitoring.org
> not really that good
New relic has such rich functionality that it is easy to overlook some of its utility. It took us a while to get it tuned to our needs, but now that we have it configured, I couldn't imagine running a high availabilty web service with anything else. Suppose I get a pagerduty for high memory usage on a server. I would then go look at that server in new relic, see what processes are using the memory, see what the memory usage for that process has been like for the last 6 months, perhaps notice a slow steady increase in memory consumption, realize there's a memory leak, etc.
If you have multiple nodes that includes load balancer and other SOA services and want to link them all to view a request as one transaction (e.g.: a request comes in through a load balancer, gets processed by app-server-1, which in turns calls service-2 that queries to DB-1 {or memcache}, Appneta Traceview can connect them all).
Or you can monitor your web-app using Appview Web too for synthetic monitoring. Plus you can monitor your network as well using Pathview.
If you're just starting out, the free tier is pretty good.
What the monitoring tools lack on specificity (and depending on your stack, it may provide varying levels of awesome -- server monitoring is weak but improving), it has massive win on zero-configuration installation.
Just sign up, instrument, and start monitoring.
If you find bits lacking, there are almost always local tools you can use to supplement.
Yes New Relic is pretty awesome. Lot of information you can monitor there, with a very easy installation. You can get a free account in there aswell, just to test it. I just made one, and i really like the simple UI: http://i.imgur.com/oGGfTrp.png
New Relic is really great. Especially if you use Ruby, as it has instrumentation deep into the application run level. Unmatched for finding bottlenecks really.
Yup, all of application level instrumentation and performance metrics/exception reporting is also available for python developers with https://appenlight.com.
Is there a tool that would allow collecting historical data on memory and CPU usage patterns of individual processes? In troubleshooting you are frequently dealing with the situation that some process is "exploding" in memory or/and CPU usage and either you are not there at the moment to run htop or you might not even be able to easily log in on the server to do checks.
"atop" can do it to some degree. When you run atop interactively, it uses the process accounting facility to find out not what is running exactly when it takes a snapshot of the system, but also what processes started and exited since the last refresh interval.
Installing atop will also (depending on your distro etc.) set it up to snapshot the system state every 600 seconds. If you run "atop -r" you can review that legacy old data from today or an older day, and switch between the 10-minute snapshot with t and T.
Personally i like "sar" for quick text only overview (sysstat package). Once enabled you have a 10 minute snapshot of a huge amount of performance metrics (e.g. sar -r for memory, sar -b for disk). Of course, it's even better if you use something to collect them centrally (I signed up for DataDog which takes very little effort to integrate compared to rolling your own stuff).
Scalyr [1] can do this. (Disclaimer: I am the founder of Scalyr, and it's a commercial product.) We aim to be a one-stop-shopping monitoring tool: collect everything you might want to collect, and let you analyze it in any way you want. To your question, we can collect CPU, memory, I/O, and other stats for specified processes [2], and give you graphs, rolled-up dashboards, and alerts on that data.
We're always looking for feedback, and we're happy to give out discounted or free accounts to startups. Drop me a line -- steve@[company domain] -- if you're interested.
An answer somewhat stolen from stackoverflow says "ps -o rss $(pgrep executablename)" but I guess that assumes that you only have one process running, maybe it would be easier to put it in a script and use "ps -o rss $!"
Real simple way: Cron job to dump "top" to a file. It will tell you all processes and their memory/CPU usage every x minutes. Once you need data on a specific process, you can just grep its pid.
It's a fairly old convention for adding links or references in plaintext. Used on mailing lists also. The alternative is to move to hypertext (not supported in some contexts, not preferred in other contexts), or to add the link or other citation information inline in parentheses, which can make text look cluttered.
I am a contributor to glances, very pleased to see it mentioned. Glances can also run as a server, which can then allow glances clients to connect, or even the android app Android Glances.
I don't think PowerTop's been mentioned. Maybe because it's more useful for a laptop than a server. Once calibrated, it can output a HTML report on a machine's consumption, which includes a handy list of tunable power saving options. It was written by Intel so it may, or may not, work that well on other processors.
https://01.org/powertop
The default value for "log_format" is "combined" -- identical to the Apache "combined" log format -- so apachetop can read nginx log files without any needed changes.
Strange definition of a "monitoring" tool. It essentially requires a human being to run the tool, look at the graph, and make a decision on the data. There isn't even a baseline to compare the data to. This isn't really monitoring, it's equivalent to typing df and looking at how many bytes on disk is being used, and doing this every 5 minutes.
I could never get apachetop to work on Debian or Ubuntu. It displays the data all right, but doesn't respond to most of my keypresses, and it segfaults. Maybe related to its not having been updated since 2005 or so? It's too bad, because it promises to do exactly what I want.
I think it is amazing with all the great suggestions everyone here support this post with. Much appreciated - thank you :-) I will add them to the post later, as a list with an URL to their website.
bwm-ng really has a neat and to the point interface, but it doesn't gives break up of individual established network sessions. Some times the individual details are needed too. That's where iftop comes in handy.
Yes, most people know of them :-) I just felt like presenting them to people who might not know of them and their strengths. Why not present tools, although many know of them already? I think it is important to talk freely about any software, as there is always someone who might not know them. I appreciate this myself, and i am actually also new with *nix systems. If everyone had the same approach to information as you, i would not be a part of it.
All the suggestions here is amazing. I will take a look at this later. At least make an edit with a list of the suggested software. Did not see this coming :-) Thank you everyone.
I didn't know about apachetop and glances.
I was actually trying to install an app like zabbix before I stumbled on this thread. If these tools can help me monitor the servers, that's even better!
Thanks @adionditsak for sharing.
* http://www.freshports.org/sysutils/htop/
* http://www.freshports.org/sysutils/py-glances/
* http://www.freshports.org/sysutils/apachetop/
Some others mentioned in the comments:
* http://www.freshports.org/net-mgmt/iftop/
* http://www.freshports.org/net-mgmt/bwm-ng/
* http://www.freshports.org/sysutils/xsysstats/
* http://www.freshports.org/sysutils/atop/
If you're running OS X / FreeBSD / Solaris, there are many useful DTrace scripts for system monitoring and profiling:
http://www.brendangregg.com/dtrace.html