Ask HN: How do I learn how to become a good sysadmin?

protomyth · on April 22, 2015

Start with two mantras:

1. I will know exactly what every command or script I run on a system I control is supposed to do - no exceptions. If I don't and are just following instructions, I really need to learn what it means and why. If you need to setup a test system and snapshot before and after to see how things work.

2. I will document a lot. Imagine some poor person showing up after you have won the lottery (think happy thoughts, but watch out for buses just the same). Don't just blindly put down step, put the why down. If I cannot write why I am doing something then I need to think about it more.

The rest just flow from those. Learn to program, be a tool builder, find the best way to learn and dive in, solve problems, and insist on consistent, repeatable, backedup, secure systems.

Do remember though: all your successes will be hidden in the darkness and all your failures will be shown in the full light of day. Its not a fun gig at times.

TheCowboy · on April 22, 2015

This is good, and I'd add on to the part about the successes. Learn to document your successes and be able to verbally communicate why anything you do is important or useful to less technical users.

If you don't have a good boss who can see that you are good at what you do, you will have to be able to speak up if you want to be paid what you are worth.

You want to be a step up from a computer janitor who needs to be told what to do, to being someone who delivers value to the business and helps people get their jobs done, and can anticipate problems before they occur.

lighthawk · on April 22, 2015

> Learn to document your successes and be able to verbally communicate why anything you do is important or useful to less technical users.

And on the software engineer/developer side of things the same applies. This is why whenever I am given a self-assessment or asked to help with a review of myself, I go back through my git log, email, etc. looking for what I've done instead of just attempting to summarize based on memory. Then I keep a personal copy of my self-assessment. That way, I have a record of what I did, and so does my company. Wikis, file servers, and other document repositories change, and when you switch jobs, you have that available to look at to update your resume. If your company doesn't make you do at least annual and hopefully quarterly self-assessments, you should do it on your own.

hacknat · on April 22, 2015

I would like to add that you should be patient and kind to the developers with whom you are working. When they have product successes, they may call attention to your efforts as being tied to its success. This is a good thing; having your job tied to profit-center activity is very good. Even if attention isn't drawn to your work, smart developers know how incredibly valuable smart, communicative sysadmins are, they will work to keep you working with them.

protomyth · on April 22, 2015

You should be patient and kind with everyone. Although, patient and kind does not included adding untested code to production to "fix" something late on a Sunday night. Insisting on proper deployment almost always makes developers[1] irritable. It also prevents http://dougseven.com/2014/04/17/knightmare-a-devops-cautiona...

1) I have more years as a developer than system admin (11 vs 7 and 5 as something I'm still not sure).

davidgerard · on April 27, 2015

"why" is the most important information ever.

I've been in my present position five years. Knowing where the bodies are buried (because I buried half of them) turns out to be one of my most useful functions, and whenever I am asked such a question I add the answer to the internal wiki ...

ephemer1c · on April 23, 2015

I don't think anyone knows what every single script and command does on a system. Has anyone read and audited all the init scripts for example? Never. And where would one find the time to do this? Certainly not on the job.

crypt1d · on April 22, 2015

Sorry to disagree with a lot of posts here.

>At the most basic level, when firing up a new server I follow the how to harden your server guides and install fail2ban, disable root login, enable ssh only login etc.

This is simply wrong. There are two things wrong with this approach:

1. If I understood correctly, you are essentially following random guides on the internet about setting up your security rig. Not too far from downloading untrusted binaries from the Internet.

2. It shows that you do not understand the core issue and what the main purpose of such tools is, how they function , etc.

This is a bad approach because it does not scale well. Security should be built from ground up, not as icing on top. You have to interact with developers during the build cycle and be the paranoid one. So, for example, when somebody mentions FTP transfers you yell at them, instead of finding a workaround (eg, setting up a box with FTP but filtering IP addresses with iptables or whatever).

Security is an architectural issue, not the issue of which tools to install.

That being said about security, being a good sysadmin for me also means striving for simplicity, documenting and standardizing everything and being meticulous.

How do you learn it? From experience. Listen to your senior colleagues and learn by doing. Also, RTFM (but seriously though, those tech notes are important).

Disclaimer: Used to be a team leader of a large UNIX/Linux team at the big blue. Now I do DevOps.

Mahn · on April 22, 2015

Following random guides on the internet doesn't necesarily have to be harmful if you don't simply copy-paste, but rather make an effort to understand what is the advice being given, why is it being given, and form an opinion about it. For instance, assuming you start from the very beginning, if a guide suggests disabling root login and you do your own research to understand what root accounts are, what can they do, and why would you want to disable it, and after doing your homework you happen to agree it is a good idea, then it's probably okay to follow that specific advice.

That doesn't mean you are going to learn everything there is to server administration by following a random guide on the internet, but for a small startup it's not necesarily a bad way to get started since at the end of the day you are going to learn more from experience than by reading 50 books.

kpcyrd · on April 22, 2015

There are some surprises if you go down that hole.

The best argument I could find for "disable root login" was "the attacker has to guess the username, too", which doesn't align with Kerckhoffs's principle and isn't the way security should be done, imo.

Also, fail2ban is a protection against bruteforce. If bruteforce is an issue for you, you're doing something wrong.

Please correct me if you know more.

rdl · on April 22, 2015

Disable root login is part of the principle of making all access accountable to individuals, not to role accounts. Imagine how much more challenging things are forensically if you see a bunch of actions in the logs taken by "root" vs. by "joeg, the sysadmin who was fired a week later.

fail2ban helps with a lot of things. It keeps spam out of the logs. Some systems have high CPU cost per login (bcrypt), so similar systems can help prevent brute force attempts turning into (or being actually intended as) DoS.

thaumaturgy · on April 22, 2015

Some of the brute force attempts against servers are so relentless now that they can consume a significant amount of server resources just causing the server to say, "no. no. no. no. no. no. no. no. no...." They also fill up your log files, needlessly consuming disk space and making it a pain to crawl through logs later on to troubleshoot legitimate issues. Plus, you can hook Fail2Ban so that other services can use it to buff up their filters. For instance, if someone's spamming your mail server, your mail server can trigger Fail2Ban and then Fail2Ban can tell your web server to also block the IP (or network) to help reduce common sources of WordPress spam.

There are good reasons to use Fail2Ban, and the counterarguments that it doesn't actually improve security miss all the other benefits it brings.

And, I've read all of Theo de Raadt's arguments against these approaches. I understand and mostly agree with them. I get that with ssh key only authentication and sane services configuration and so on that people can hammer away at your server all day and never accomplish anything. But that still doesn't mean I want to provide a test bed for every dumb script kiddie on the internet (and there are many).

danieldk · on April 22, 2015

Use the simplest tool possible. Fail2ban relies on log parsing, which is a possible attack vector.

The thing is that you can reach pretty much the same effect with a smaller attack service and better efficiency using rate limiting in your packet filter.

E.g. in iptables the 'recent' module can do this, see man man iptables-extensions and search for 'recent'. E.g. you can set up a rule that any IP address making more than 5 connection attempts to port 22 in one minute gets put on a list that is DROPped.

Edit: BTW, if you think the fail2ban attack vector is purely theoretical, you might want to check the CVEs:

http://www.cvedetails.com/vulnerability-list/vendor_id-5567/...

thaumaturgy · on April 22, 2015

iptables rate limiting still doesn't solve the problem of identifying attacks against one service so that they can be preemptively blocked by other services on other servers.

nstart · on April 22, 2015

You are right with the following blog posts blindly is a bad idea. And I feel guilty that I didn't invest time to learn why I commented those lines in IP conf (it's bad enough that I don't remember which files I changed). So I've definitely got a place to start working on already.

The mentorship part is a lot more difficult. Good sysadmins are really difficult to find in Sri Lanka. I've worked with a few companies so far. Some examples.

1) One company I worked for didn't even have a policy of hashed passwords and protection against SQL injection. They were developing major enterprise software.

2) I worked as an internal systems developer for a non IT team within another company. This was the one place I could have learnt the most at but the IT team was this very opaque "don't tell people what exactly we do" kind of team.

3) One last example. This other company that I worked at, the sysadmin was pretty good in keeping stuff up and running, but a lot of it was copy paste scripts. I got what I could out of the person but I couldn't pull out much.

Where I'm from, the main cyber security body of the country gave blank looks when asked about heartbleed at a conference held recently after the whole thing exploded.

All that to just sum up why I turned to HN to seek out advice as to what resources I should look at. A lot of threads on the net seem to veer more towards "be a good communicator" and "know your system". While necessary, it's a little too abstract for someone trying to find out what gaps exist and which ones need filling ASAP.

Thanks a lot for the advice. I'll probably start reading up on all those files I had to edit when hardening the server. That should provide a good starting point.

marcosdumay · on April 22, 2015

> The mentorship part is a lot more difficult. Good sysadmins are really difficult to find in Sri Lanka.

Congratulations, it looks like you are one of the top sysadmins at your country.

Maybe you should try to look in other places, or maybe you really should really congratulate yourself, with no sarcasm at all, and start selling yourself like an expert.

bsbechtel · on April 23, 2015

You should reach out to @arunoda (https://twitter.com/arunoda), founder of Meteor Hacks. He's one of the leading voices in deployment architecture for the open source meteor.js project, and based in Sri Lanka.

nstart · on April 23, 2015

I have actually. He's one of my heroes in SL. He works close to where I'll be moving to soon (it's all in one IT park). Guy is very very humble. Very nice to talk to him. Also one of the few all in believers of TDD :D

drzaiusapelord · on April 22, 2015

> It shows that you do not understand the core issue and what the main purpose of such tools is, how they function , etc.

The people who write those guides don't either. I almost never see mention of things like OSSEC or running a WAF or even a basic IDS/IPS. Just lots of naive "hackers get in through passwords, dont use passwords in ssh and enable fail2ban." That's on top of security via obscurity like changing port numbers. Uh, what?

Security needs to have a layered onion approach:

Network> IDS/IPS, Firewall

Server> OSSEC, SELinux, smart permissions, WAF, AV

Storage/Filesystem> Backups, tripwire

DB> Hashed passwords, etc

Software> Patching schedule, writing secure software from day 1, not giving into to team/client demands for insecurity

Monitoring> OSSEC alerts, nagios/zabbix alerts, tripwire alerts, etc

Once you get used to a security stack, this stuff comes easy. The problem is cheap VPS's have turned devs into sysadmins who just plant a naked linux box on the web, turn on fail2ban, and call it a day.

blumkvist · on April 22, 2015

Learning from blogposts is a widespread issue it seems. I see content marketers and growth hackers who have never done a linear regression in their life. How they are "marketing" and what they do day-to-day is beyond me.

If you want to learn something, go read (text)books. Not blog posts. You need core understanding. Nobody can give you that in a blog post. Even if the author actually knows what he is talking about, he is translating his knowledge into practical tips for a particular situation. He is applying his knowledge. You get applied knowledge, not real knowledge. When it comes time for you to do your job, the only things you know how to do are bits and pieces from situations which might or might not be applicable to your situation.

Think of it like raw vs. aggregate data. You can easily memorize by heart what the aggregations of the data are. You can learn mean, quartiles, max, min, sum. But what do you do when it comes time to filter? The job of knowledge workers is to filter data, not to memorize aggregations.

nstart · on April 22, 2015

Would this list of books be a good starting point?

http://everythingsysadmin.com/books.html

lamontcg · on April 22, 2015

That is insufficent, try going through TCP/IP Illustrated Vol 1 at least and really learning TCP/IP and use tcpdump to rip apart packets. You should learn in detail the tcp state diagram and how to read netstat output. Know what the TCP 3-way handshake is and PTMU discovery and 4-way connection teardown and be able to apply that knowledge to debugging.

Learn how to program in C at least well enough to write simple clients and servers, write a trivial threaded application using something like pthreads (just a couple ideas). Then start reading Advanced Programming in the Unix Environment. Really learn how to use Strace -- you should know what the difference is between a system call and a library call and be familiar with many of the common system calls you see in the output of that. You should understand the sockets API and file IO APIs. Should be able to understand socket and file descriptors and understand the output of lsof and be able to use it to debug problems.

Pick up the latest version of the Daemon book and learn how a Unix system is architected and what the different kernel subsystems are. Start poking around in the linux kernel sources. Use your C knowledge to make some toy modifications to the kernel (a /proc entry that echos back 'hello world' when you cat it for example). Read Unix Systems for Modern Architecture by Schimmel as a start if you want to go further here.

Take the same approach to the higher level aspects of systems. Your package manger is important so if you are on a RedHat system you should be able to take a trivial piece of C code that you wrote and properly package it so that you can build it, package it, and install it.

If you've built by hand or seen C packages built by hand you should have encountered 'configure; make; make install'. You should write a toy C program with a portability issue (find a Linux/FreeBSD portability issue and write a toy C program with that problem in it), and write an autoconf script so that you can run your code on either O/S.

Pick up either CFEngine, Puppet or Chef and learn how to use it (you can also add Salt and/or Ansible, but please don't consider yourself an expert if you don't understand the limitations of those two). Study Mark Burgesses Promise Theory.

Learn at least one high level scripting language that is in common use: Ruby, Python or Perl. You should also learn bash (and ideally learn old school /bin/sh and its differences with bash as well), but you cannot stop at just shell scripts.

Ideally you really learn how to program and write 10,000+ line object oriented programs, and write code that is tested. Learn what inheritance and composition are and why some software developers argue you should favor composition over inheritance. Learn the Law of Demeter and other principles.

Install and use something other than Linux to broaden your horizons a bit. FreeBSD at least. Solaris or one of the Solaris derived distros would be even better. Having a Mac as a laptop is also useful but I'm not sure it replaces playing with different server O/Sen.

Then, you definitely do also want Limoncelli's book and Nemeth's Unix and Linux System Administration Handbook and be able to configure a wide range of different systems apps (sshd, ntpd, SMTP mailers, apache, nginx, etc). If all you can do is install and configure apps, though, then you're no better than a scriptkiddie running scripts they download off the internet if you can't write them yourself. You won't understand what you're actually doing, and won't have a prayer of debugging some of the harder problems you can run into in architecting and debugging issue that come up in running those apps in a large and successful environment.

ABNWZ · on April 22, 2015

This is very extensive. Great wealth of content and knowledge in here. Is this all strictly necessary? Or is this how to MASTER your craft?

lamontcg · on April 22, 2015

I certainly threw in things there which I would not consider a requirement for hiring. I DO expect basic knowledge of the TCP state diagram and ability to parse and explain netstat output in actual hiring interviews. I would not expect someone to have read Schimmel's book and know how kernel locking works in great detail.

My advice is to really learn how to write some toy C programs at minimum an learn the sockets API and really have a copy of APUE and use it occasionally. I highly recommend a project where you do some kind of trivial 'hello world' patch to the actual kernel sources and things of that nature.

I have actually, on my job, seen issues that require this kind of knowledge -- like one incident with massive amounts of TCP RSTs and I've cracked open the kernel sources and tracked down where the TCP stack can issue RSTs, and found a problem with TCP Timestamps and the PAWS algorithm and the way that our layer 4 hardware loadbalancers at the time were doing packet rewriting and preserving all the random TCP timestamps from the clients which was hitting very high velocity servers and when TIME_WAIT state sockets were hit with a 'backwards in time' TCP timestamp from a different client it would issue a RST and tear down the connection because of PAWS. You might be able to find that kind of problem by studying the TCP RFCs and books like TCP/IP Illustrated instead of opening the kernel sources, but its the kind of problem that you aren't going to be able to Google. And I don't really have the ability to write a [modern, multithreaded] TCP/IP stack (although I could certainly acquire it), but I have demonstrated the ability to read the sources and get real-life work done by doing so.

For all the candidates that I interview I want to see some indication that they can work without the Google safety net and show some ability to track down problems on their own.

angersock · on April 22, 2015

Much better than the "ignore blogposts" meme.

lamontcg · on April 22, 2015

I read blogposts all the time, I do use google first if I don't know the answer myself offhand. I have the background, though, to be able to determine what smells correct and don't just type in all the commands from the first random web page that google kicks back at me...

sciurus · on April 22, 2015

Yes. I've read all three of those and benefitted a lot. I recommend The Practice of System and Network Administration to anyone wanting to learn how to be a systems administrator. If you want to work in web operations, I also recommend The Practice of Cloud System Administration.

jpgvm · on April 22, 2015

At the end of the day it's important to find a mentor or join a bigger company where you can study under a more senior sysadmin.

These days things are getting more complicated, to be a good sysadmin you also have to be a good developer. Usually as good and sometimes more than the people that write the apps that you will run, maintain and love long after they have decided they have something more shiny they could be working on.

You need to be able to dissassemble and fix programs built by other developers, diagnose issues in many runtimes, understand kernel subsystems and the various issues you can run into in kernel land.

You need indepth knowledge of C, system calls and the behaviour of hypervisors and hardware. Don't believe that running on EC2 and not programming in C gives you the luxury of glossing over these fundamentals.

You will also need networking knowledge, even if you don't intend to run your own networking equipment you will still need to understand things like the TCP handshake, what and skb is and why that's important, understand the differences between select() and epoll().

Learning to be a good sysadmin is relatively easy, learning how to be a great sysadmin takes a good 10,000 hours.

mvanvoorden · on April 22, 2015

I'm a sysadmin for 16 years, I have worked in big and small companies and I don't agree with a great part of this post.

I'm not a developer, I have never needed to disassemble or fix programs built by others, never needed to understand kernel subsystems or anything else kernel related (except may be how to replace a broken driver/module). I know nothing of C, I know just the basics of system calls and I've never heard of skb or select() and epoll().

I don't even like developing software, which is the reason I once joined the sysadmin team at the company I worked for and fell in love with this job.

I see how these skills are a nice extra, but I understand that knowing too much also creates the problem of having to work harder and stay more often after hours. And saying no isn't an option because nobody else knows how to handle the problems that may arise.

To me, a good sysadmin knows when to say no and spends the least effort in fixing errors in code. The developers should provide good programs or it doesn't get installed in production. Keep strict boundaries in what you do and don't to prevent being abused by your employer. If you are often the last one to leave because you're doing work that someone else should be doing, or asked to be stand by after hours while being the only sysadmin, step on the brakes before this will eventually burn you out or makes you hate your job (or even hobby).

Also, knowing all of what parent describes + all the 'normal' sysadmin knowledge is (eventually) impossible to keep up and will also take too much of your free time. You will regret this later in life.

lamontcg · on April 22, 2015

You are literally the problem with system administration.

The attitude that you don't need to know how to program or understand the architecture of the systems that you run on is a highly privileged attitude that you are practitioner who can remain ignorant of their tools.

The good news is that kids these days are being raised on SREs and there's a billion people in India who will take your job and won't have an attitude about learning how to program. Ultimately, you are going to be a curious dinosaur. You are a unique product of the late 90s Internet bubble where the need for system administrators expanded so much faster than the available supply and anyone halfway technically minded got hired for really good salaries and put in charge of servers.

That is going to change and evolve.

And I spend 5-10 years learning all that knowledge back in the 90s and its 15 years later and I can state with confidence that there's no regrets. If anything my only regret is that I didn't learn how to really practice "software development" as opposed to "programming" even earlier. Going forwards, I expect that more an more the lines between systems and software is going to blur and MY advice to kids these days would be that they will only be able to be ignorant of software development practices at their own peril.

And I stated doing PC Tech work in the late 80s, became a Unix admin in the mid-90s, managed the configuration management system and base configuration of a site that grew to 30,000 linux servers in the 200Xs, and then after 15 years switched to being a Software Developer. I'm fully confident based on experience that you are offering absolutely terrible advice to someone who is just learning, and you are out of touch with the direction of your own field.

dang · on April 22, 2015

> You are literally the problem with system administration.

No personal attacks on HN, please.

redwood · on April 23, 2015

Your post seems to boil down to a logical conclusion that everyone better be a software developer, as if there is no value in abstracting to a layer where things aren't pieced together with compiled software. Feels rather myopic to me. There will always be work to do at every layer

rdudek · on April 22, 2015

I agree. When we hire sysadmins, we look for folks with experience using various products and weigh in extra bonus points for folks who are also SecOps. We could care less if a person knows how to code. That's what we have the developers for.

marcosdumay · on April 22, 2015

Lots of people agree.

Those end up creating environments where everything just sorta-works enough to run, for as long as you put a constant stream of sysadmin time into it.

And then as see they doubting people that claim that a person can administer hundreds of servers, or that an admin can go away for a while and nothing will happen.

(Also, exactly what do you look for in SecOps? Most people that use that term are selecting for exactly the wrong things.)

jpgvm · on April 22, 2015

I guess I come from a different background, work for different companies and have dealt with different problems.

I am aware there is plenty of room for the traditional sysadmin in enterprise still but I am not quite sure how long those days will be around.

Google and facebook where first to discover the "Site Reliability Engineer" but they were far from the last.

SRE is becoming the new sysadmin and you can bet your life on it they expect the skills I outlined.

lamontcg · on April 22, 2015

I was doing "SRE" level system engineering at Amazon back in 2001 long before facebook ever existed.

Also when I grew up as a junior system engineer in the mid-90s I looked up to "system administrators" like Wietse Venema who coded in C and wrote tcp_wrappers and postfix, and Larry Wall who wrote his own programming language.

The System Administration / Software Developer is not a new invention of Facebook/Google. What is "new" is the "System Administrator" that started getting hired in the late 90s who couldn't do anything other than install programs and configure them and maybe knew how to 'ps' and 'kill' and that's about it. Even the old school Enterprise-class system admins knew more about debugging crash dumps and using their tools even if they did call up IBM when they needed a patch.

jpgvm · on April 22, 2015

Very true, I should have been more specific when I say "created". Popularised is probably a better term.

I worked with some old and crusty Perl and C hackers that called themselves sysadmins that definitely would have qualified as "SRE"s during the 90s, it's just they were taken for granted.

All the term SRE has done is create a role that is more appreciated and better compensated with a better understood set of responsibilities and required skills.

mauricemir · on April 22, 2015

at British Telecom ALL off the developers in some engineering centers where expected to be able to do basic admin on their sun work stations.

fsniper · on April 22, 2015

I'm sorry for you, my fellow sysadmin who has more years on his/her shoulders than me. I'm sorry because you lack the the ultimate knowledge needed to be a good system administator. You did not ever get hold of it.

The sysadmin practice is not only installing software and OS, doing basic troubleshooting. It's more than that. It's knowing systems inside out, from kernel internals to application internals. It's knowing and handling more area than anyone else in the field.

If you do not know about these, you do not automate, yo do not code, you do not debug, you become an IT helpdesk employee.

ocdtrekkie · on April 22, 2015

Absolutely agree. Sysadmins' interface with code is basically to be a compliance-checker. Software, be it from in-house developers or a third party solution need to meet that compliance bar, or it doesn't go to production. If your in-house developers are unable to meet those compliance requirements, that's a them issue and not a you issue.

It's good for devs to know some ops, and ops to know some dev from a familiarity standpoint, particularly because mutual understanding fosters a good working relationship, but they're two very different roles.

lamontcg · on April 22, 2015

Continuous Delivery is going to eliminate that from the equation. The software developer will write software and tests and if they pass the tests throughout the pipeline then it will ship to production automatically and potentially multiple times per day. Much faster than the heavy ITIL based change control process that requires you to be inserted in the middle of it will manage.

The companies that do this will have a velocity that outstrips the companies that do what you describe and will naturally have an advantage in the marketplace.

There will still be room for security compliance and someone in the company to be responsible for that, but they'll be writing tests that check software for compliance automatically, they won't be some drone in an office looking at a form and stamping their approval on a change request.

Your industry is changing. You may retire before it really hits you in the ass and kicks you out, but your advice for people getting into the industry is terrible.

ocdtrekkie · on April 22, 2015

The amount of examples I have in my daily job demonstrate that no, the concept that developers won't have any need for IT staff is a day that is never going to come. And while I make a point to have a relatively varied skill set, the truth is, trying to be both a developer and an admin is going to just mean you suck at both jobs.

RRRA · on April 23, 2015

Agreed, even finding a sysadmin-only that knows something is hard. When I talk to dev that have no idea about system, security, networking, etc: forget about them trying to wear 2 hats...

exelius · on April 22, 2015

Agreed, but for different reasons.

You still need a sysadmin to set up your CI systems in a continuous delivery world. You still need people to debug performance issues -- most developers don't know enough about system I/O to do that effectively.

You're right in that there's no longer some heavy-handed change control process, but wrong that sysadmins will go away. We just call them DevOps instead, but it's the same skill set, just embedded inside a dev team.

lamontcg · on April 22, 2015

In order to be able to usefully debug perf issues you need to have a fairly deep understanding of systems architecture though or you're no better off than the software devs that you're slamming. There are a lot of "I don't need to learn programming" system admins whose knowledge of how to debug performance issues I could teach to a decent software developer in an afternoon. There are even more SAs who cargo cult completely incorrect assumptions and tend to throw money semi-randomly at problems because the last time they had a wonky server they upgraded the RAM so this time if they spend the money or more RAM that'll fix it too.

And yes, I've had to painfully explain why Linux servers report nearly all their memory as being fully used and what the VM free is was to a Principle Architect at a major internet firm with a string of PhDs and who had built the software architecture of the company from its founding. Controlling a huge chunk of the technical direction of the company and doesn't know the first thing about the VM or how to tell if a server is really out of RAM or not. There's still going to be a skillset which is closer to the hardware, but people need to really have that skillset. Most people who call themselves SAs do not actually have that skillset, they're just semi-technical people who wield root power and throw their weight around the company.

exelius · on April 22, 2015

Yeah, a system administrator who doesn't know how to program is just as useless. Why avoid a supremely useful tool because you have some personal aversion to it? You describe people who really don't have a technical mindset to problem solving and were never taught that you need data to back up your hunches. Unfortunately, a lot of those people tend to get stuck in the "system administrator" role because many sysadmin tasks can be easy and repetitive, and putting them on those tasks keeps them away from anything they could break.

Architecture is a "10,000 foot view" type job. I do a good bit of architecture work, and I don't give a flying fuck about how much RAM is in a server or how many CPUs it has. I care about the function the server performs, whether that performance is adequate relative to current demand, and whether the architecture can scale to handle 10-100x that demand. When I was a sysadmin, I used to care about RAM/CPU tuning, but it's not relevant to my current work so I have forgotten a lot of technical details. I do know that if I think a system isn't adequately tuned to its use case, I can go talk to my performance testing team and they will investigate, generate a hypothesis and test that hypothesis.

exelius · on April 22, 2015

The entire problem here is the division of "us" and "them". From the customer's perspective, you are both "them". If the product doesn't work, you're both to blame no matter whose fault it was.

Work with your developers to make the application more supportable. Show them the problems you are having and ask for their help in fixing them. Too many sysadmins just throw it over the fence and say "not my problem, you fix it" and that's honestly not acceptable.

This is the entire reason "DevOps" is a thing. They are not fundamentally different roles; you're both involved in building a system that does things for customers. You bring different skills to the table, but that's often the case with every developer: you probably have a dev who is a whiz with databases, another who knows some other library really well... a sysadmin skill set is no different.

ocdtrekkie · on April 22, 2015

"Us and Them" makes it sounds adversarial, but it's not meant to be. It's about who is best equipped to fix the problem. More often than not, a problem with an in-house developed piece of code will be best fixed by the person who wrote that code.

falcolas · on April 22, 2015

To use permissions for security, follow the principle of least access. If a user or program doesn't need access to something, don't give it to them. User permissions are the first tier of this, Apparmor and Selinux are the next (and correspondingly complex) next tier.

For example, for a web stack which communicates exclusively over the network stack, run your entire stack as individual non-root user. Nginx can run as the nginx user, django/rails/node as a separate user which has no access to the nginx configs. MySQL/PostgreSQL/Mongo as yet another user, which can't access the previous two.

By default, config files should not be owned by the process reading them, they should be part of the group (if they owned them, they could be re-written by that process).

Logging, you have a few options. Syslog is simple, and re-directable, which can help increase security. Writing to normal log files is perfectly acceptable as well, just be sure to write to a program specific directory and set up a logrotate config.

Also, set up monitoring and notification on everything you possibly can. First, know when something has gone wrong before you're notified by your customers. Then figure out how to know something will go wrong before it actually goes wrong (like running out of disk space - this happens more often than you might imagine).

There are plenty of articles on hardening linux, and sysadmin best practices on the web - I've only outlined a few here. When acting in the sysadmin role, be skeptical, paranoid, and value service uptime above everything. With this attitude, you will be more prone to automate server setup, limit user access, and less inclined to just throwing things out there because they're shiny.

The real fun is when you design and develop a way to safely and securely let your fellow developers just throw code over the fence, because then they can do their job in a way that suits them, without compromising your servers.

dozzie · on April 22, 2015

> Also, set up monitoring and notification on everything you possibly can.

Notifying about everything is a very, very bad idea. You'll drown under notifications that do not matter and you will certainly miss those that do matter. There should be as little notifications as possible.

falcolas · on April 22, 2015

I used to believe this. I was aggravated by the pager, and would happily acknowledge problems in Nagios just to make the damned thing shut up. More and more pages would come in, the volume increasing each day as "non-issues" piled up.

The third time this behavior caused downtime for my clients, I wised up and took my own advice of "uptime is king". I took the "every alert is actionable" to heart, and took a few moments to realize that the action can also be against the monitoring.

Alert for disk at 80% on a 2TB volume? Action: Verify growth in Graphite isn't out of the ordinary, and increase the warning threshold to give you about 3-4 weeks of notice that you might need to get bigger disks.

Alert for an excessive number of 404 responses? Browse the nginx logs and identify someone trying to hack your corporate-mandated WordPress install. Verify they aren't making any traction, and add exceptions to your 404's so you don't alert on known (and non-whitelisted) endpoints they're hitting.

Alert for memory at 80%? What's consuming it; do I have a memory leak and need to restart something? If all is well, and MySQL is just being greedy, up the alert to 90%.

Disk capacity warning at 3am? Put some hours around the warning notification, and add a separate "things are growing out of control" alert which doesn't have hours.

API endpoint is not responding again, but it's not the system at fault? Add the API developer to the notifications for that alert and remove yourself for a week or two.

After an admittedly harrying week of this, you gain two things. One: an operational understanding of what is going on in your system. Two: an alerting system which is tuned to your use case, and which lets you know when you have real problems. Remember - uptime is king.

bbrazil · on April 22, 2015

I'd recommend My Philosophy on Alerting by Rob Ewaschuk - https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa... as a good approach to take.

We've fleshed this out a bit more in the Prometheus best practices at http://prometheus.io/docs/practices/alerting/

Taking this approach at our company greatly reduced the alert count and improved responsiveness with no degradation in service.

smt88 · on April 22, 2015

A good rule of thumb is that you should monitor everything you can (tempered with an understanding of performance requirements) and notify for anything that requires action, even if it's not an emergency. It's easy to categorize your notifications after you've received them.

jaimebuelta · on April 22, 2015

I am not a sysadmin, so probably there are better specific advice.

But, from the point of view of a developer, the thing that I appreciate the most on a fellow sysadmins is to be calm and methodic at all times. Organised. Having a plan and know what to do. Being ready for disasters.

Do we need to update a security patch on every single server? Ok, list of servers, start with server one, finish with the last one, don't let any one fall behind.

A server suddenly catches fire? No problem, remove it from the load balancer, get a fire extinguisher, remove it, order another one, recover from backup.

Is there a problem on production? What could be wrong? Check logs, think a little, then try to fix it. Do postmortem and come with improvements. Try not to be bitten again for the same thing.

In mi opinion, the core of good sysadmin is to minimise risks and errors in the stability of the system. Mistakes can (and will) happen, but the aim is to make them only once. I think it has a big component of learn from battle stories.

Processes and servers can fail, but the whole system should hold up.

nostalgiac · on April 23, 2015

Learning to be calm and methodic as a system administrator is something that comes with experience. It's a great way to tell experienced and non-experienced sysadmins apart.

As you said, if a server catches fire, the newbie sysadmin will freak out and start running around in circles screaming. The experienced sysadmin has seen it 10x before and knows its not a big issue - removes, extinguishes and replaces it.

joatca · on April 22, 2015

I was once asked at an interview to give my three most important rules of sysadmin. This is what I said:

1. Never enter a command if you don't understand what it does.

2. Never make a change on a production system if you don't know how to undo it.

3. Get everything else wrong if you have to, but get the backups right.

I realize this paints with a broad brush but it's a good baseline.

bluedino · on April 22, 2015

Automation, documentation, and uninstallation.

A good sysadmin automates everything he can. It saves time, makes everything uniform, and reduces errors (it can multiply errors but at least they're all the same error).

A good sysadmin documents everything he does. Code, configurations, everything. You want it to be easy for not just yourself to figure out what you did, but your colleagues, customers, or your replacement.

A good sysadmin uninstalls programs that are no longer needed. He doesn't leave 50 old or unused versions of scripts laying around. Not just to save disk space or reduce system resources, but for security and to avoid confusion.

cyberrodent · on April 22, 2015

http://www.opsschool.org/en/latest/

dccoolgai · on April 22, 2015

Wow, I didn't know about this. Great resource, thanks for providing.

weaksauce · on April 22, 2015

Decent outline but mainly empty in the sections I looked at near the end.

aesthetics1 · on April 22, 2015

"MS Windows fundamentals 101" is blank. Is this a sign?!

putna · on April 22, 2015

Good resource!

Thriptic · on April 22, 2015

THANK YOU!

daxfohl · on April 22, 2015

Not on Hacker News! Most of the people here (including myself) are developer hacks that think they can do sysadmin, which is probably worse than not knowing it at all.

nstart · on April 22, 2015

I dunno. The replies I've got have so far pointed me in a fairly consistent direction that can't possibly leave me worse than where I am right now. What I do from there will be up to me (and anyone else who benefits from this thread) I guess.

mvanvoorden · on April 22, 2015

F*cking up a lot is a good start ;)

When you don't understand things like user permissions, best is to make up some scenario's and write them down on paper. It makes these things easier to grasp, when you can strike through impossible options or for instance visualize routes or outcomes.

Visualizing on paper also works very good for debugging networking. For instance when a packet coming from the internet arrives at your gateway, how does it travel from there to it's final destination and how does it travel back? Write down at every hop the port/subnet/netmask/gateway for both the incoming and (if applicable) the outgoing interface. Works great for finding connection issues.

Document your infrastructure. Documenting in detail can show design flaws that weren't visible before. It also helps a lot to gain more insight in how everything is connected and how it works together.

SpaceInvader · on April 22, 2015

One resource I always recommend is the FreeBSD handbook, can be found here: https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/

nstart · on April 22, 2015

Thanks. Saving it to the reading list.

snorkel · on April 22, 2015

Just because Hadoop's install process sucks that doesn't make you a bad sysadmin. Back in the day when we were building our kernels most of us weren't verifying download package sigs either. Containers aren't making us dumber either.

A good sys admin just knows all of the working parts of a server and knows about the latest tools. The only way to get better is hands on practice. Try setting up your own Hadoop cluster from source and run a job through it, then put Hive on it, then attempt to containerize a node ... which not even work but learning why that won't work becomes useful knowledge. Just stumble through it and learn.

mtalantikite · on April 22, 2015

Sysadmin is (unfortunately) a role that is on the decline, so from a pure employability perspective I'd suggest you focus more on the dev side.

As for the skills, I'd suggest running a Linux distro as your personal, everyday machine, not just a server you log into on AWS or DO every so often to configure (which, also -- don't do that. You don't want snowflakes in your environment). It'll force you to learn a lot about how the system actually works.

Try out a distro that doesn't hold your hand so much -- for me it was Gentoo in the early 2000s and Slackware before that. Always read the man pages. Learn all the tools for performance profiling and get used to reading your logs. Spend a lot of time learning how networking works -- maybe start with really understanding iptables which will lead you into lots of other parts of the networking stack. Read "The Design and Implementation of the FreeBSD OS" if you're interested in Unix beyond Linux.

Ultimately it's a skill that you need to learn by doing. Just don't stop the dev side of your life because as things continue to be automated and abstracted away there are going to be less and less positions as a generic sysadmin.

davidgerard · on April 27, 2015

The job title might change (and has changed).

I flatly do not believe the job will disappear until humans stop building technologies, and fucking them up. So, not any time soon.

ocdtrekkie · on April 22, 2015

Why would you say that the need for Sysadmins is on the decline? If anything, I should think it would be greater than ever. I'm very curious here.

mtalantikite · on April 22, 2015

It's not that sysadmins are going to disappear, it's just that with IaaS and the automation tooling that's been developed in the past decade teams don't need to be nearly as large. The role has also changed.

A few people can manage a deployment of a thousand server instances now fairly easily (I've been on teams like that). A decade ago you'd be renting colo space, racking/stacking yourself, managing your networks, swapping dead hardware, and managing all the software that goes on top (I've also been on a team like that). You'd need a large team dedicated to just ops and sysadmining.

Hiring today is different. A sysadmin didn't necessarily need to know how to code beyond some scripting with bash or perl. These days in order to manage the complexity of large cloud systems you probably should be a solid developer in addition to having a deep knowledge of systems. Or if you're a small startup you'll probably have your devs work additionally on your infrastructure or use a PaaS.

davidgerard · on April 27, 2015

> It's not that sysadmins are going to disappear, it's just that with IaaS and the automation tooling that's been developed in the past decade teams don't need to be nearly as large. The role has also changed.

The sysadmin's role is to automate themselves out of a job: you should not need to do anything twice.

For some reason, the job never disappears and new stuff keeps coming along.

(I do know one guy who successfully automated most of this job. He got bored and got a new job.)

neonfreon · on April 22, 2015

According to the US Bureau of Labor Statistics in 2012, the number Sys and Network Admin jobs will increase by 12% between 2012 and 2022. (http://l.md/gr4)

That's a few years out of date now though. I wonder if there are newer statistics somewhere.

emodendroket · on April 22, 2015

I'm guessing because stuff like PaaS and contracted-out IT services make it easier for smaller shops to do without.

ephemer1c · on April 23, 2015

I've been doing Linux for 16 years, no big deal.

Most valuable advice I can offer is: learn an editor and use it for everything.

OpenSSH is boss for all networking issues.

Netfilter (iptables) once understood is used daily.

And something I call "The 2s Complement": know two of everything, two distros (RPM and .deb), two shells (bash and zsh), two MTAs, two... you get the picture.

In every environment you will be able to perform then.

Use zsh with GRML completion, saves so much time.

For hacking, read all Phrack issues.

Know shell scripting and at least one other interpreted language like Python, Perl, PHP, Ruby.

Don't waste time on desktops, changing looks etc. I wasted so much time early on customising everything, great fun but no ROI whatsoever.

But the most important thing is... how to think! That that my friend takes a long time and only you can construct your own effective thought processes and algorithms.

Build terse mnemonics to aid in command options.

Oh yes, keep everything you code, script etc. for future reference and learning and improving.

Learn to code and then learn to think in code when sysadmining systems. Infrastructure as code.

Another thing, build scripts, configurations, solutions etc. as objects and reuse those objects, eg. a rsyslog configuration stanza for a UDP input that is portable and can be reused.

Sorry for waffling on I can think of so much more...

Start a blog and copy/paste your ideas and interesting work configs, code, scripts etc. there.

pjungwir · on April 22, 2015

If you want to learn to "think like a sysadmin", the book Time Management for System Administrators by Limoncelli is short, funny, and excellent---and covers far more than just time management. That will give you some sense of what to care about and why. It's very useful even as a dev or devops person.

If you are really serious about being a professional sysadmin, Limoncelli's other book is considered outstanding---but it's huge and I haven't read it yet. Again, it's a mix of non-technical goals and technical solutions. The Nemeth book is how I started (though I'm not a sysadmin) and it's also quite good, but more purely technical.

I wouldn't say to a beginner, "Know what every command does," but try to learn a new thing every day. Keep notes. I like to write private man pages (https://github.com/pjungwir/manpj), but do what works for you.

I think learning sysadmin skills mostly just takes time, so try to make each task double as a learning opportunity. Sometimes this requires a lot of digging, or reading, or "debugging". Here I probably agree with @protomyth: the holy grail is understanding. If you strive to achieve that (and don't just go with the first thing that appears to work) you will become better and better.

While you're learning the nitty-gritty tech stuff, also try to keep in mind the different priorities of a sysadmin over a developer. Both care about reducing their own effort and annoyances. A sysadmin wants stability (no 3 a.m. downtime), automation (easy to deploy/update/scale), transparency (monitoring, logging), auditability (logging), recoverability (failover, backups), controllability (runbooks), security. Maybe a real sysadmin could chime in with what I'm leaving out. :-)

davidgerard · on April 27, 2015

> Time Management for System Administrators by Limoncelli

YES. Best comment here.

SimpleUser245 · on April 22, 2015

All of the mentioned points are very good (and applicable) to performing the job properly. But for me it is a matter of curiosity. If I need a function to work, I want to know how it works. If I need an application installed, I want to know its underlying pieces, how it communicates, what dependencies it has, etc. At some point you have to draw the line of course, as it is impossible to know everything, but that line is generally easy to find. For example, I could put you to sleep talking on how TLS handshakes work, but I have no interest or desire in learning the math behind any used algorithms (although I do keep abreast of security issues with ciphers, etc.). Learning new options (say the proliferation of systemd in linux land) is always good, even if you ultimately decide not to implement it at this point in time. So just keep learning, always find new ways to do things, audit your own systems, etc.

dozzie · on April 22, 2015

How to become good sysadmin? Understand how your system works and do everything to not interfere with it. This simple principle is the most important for a sysadmin, all the rest is an implication of it.

How to use logging (shipping, processing), how to install software (packaging systems), how to one can separate services in different ways, how to communicate with systems, how to work with resources, how to tell what's going on in OS (what's running, what uses what, what subsystems are under how much load) are all derivatives of "know your system and work with it instead of around/against it".

mauvm · on April 22, 2015

The opposite approach might also teach you a lot: try to hack your own server (even better with the help of someone who knows a decent amount of hacking). This won't ensure you'll fully secure your server, since you can't easily reach the level of "real" hackers, but will give you a decent understanding of a hacker might approach attacking your system.

I myself are also a "pretend to be" sysadmin. Not hardcore in any way. However I use Docker for encapsulating almost all the software components, basic security (no-password ssh logins etc.), and most importantly: logging.

Arubis · on April 22, 2015

There's a lot of good advice already in this thread, and I'd rather not just repeat it all in new phrasing. So, read it and weigh it and take what you like.

And then, when you're done, fire up a console window and unplug your mouse and put it somewhere that's really annoying to get to. Live with this for a week.

This will force you to live your life in a terminal, which means that all those little tasks become commands and scripts and configuration files. You will not have a GUI and you will have to understand how to make stuff work anyway.

Trust me, you will learn fast once you don't have a choice about it :)

rotten · on April 23, 2015

You have to be organized and thorough. You are going to do a lot of stuff no one appreciates or cares about unless you don't do it. (such as backups)

Organized and thorough and don't forget backups. You can never have too many backups.

cd /bin and cd /sbin and cd /usr/sbin. Do you know what every command in those directories do? No? Learn them. On some systems /sbin and /usr/sbin have different files in them. Why? Try to figure that out on your own.

cd $MANPATH, do you see more man pages that you didn't know existed? Read them.

Signs of a poor administrator: System clocks are all different; not sure if the backups work; don't know who every one who has an account on the system is; don't know what the system does; system way out of date on patches; large garbage files floating around; inconsistent and incomplete monitoring; don't know where the server is; etc...

You don't have to be a top notched system ENGINEER to be a good system ADMINISTRATOR. It helps (a lot), but I think attention to detail, thoroughness, and organization are core skills. Also, you are the interface between the machine and the human world. You. Be prepared to deal with people. Developers can get away with living with their heads buried in code. A System Administrator cannot. 90% of the problems a System Administrator faces in their job is not the technology, but the people trying to use it and manage it and mess with it and own it.

davidgerard · on April 27, 2015

A lot of the comments here are about technical detail. But knowing the technical details is just a prerequisite.

Half the job is having a sense for technologies.

* You will have projects dropped on you that you literally never heard of 24 hours before, and you will be expected to be able to support them. ... and, with some experience, you will.

* People will come to you with great new technologies! ... that you'll get a sense of disaster about. You will need to articulate your concerns in a manner that doesn't piss off everyone invested in the bad idea.

* Be humble, don't turn into an expert beginner. (Look that up.)

* Find other sysadmins to drink and bitch about work with. Your geekosphere will keep you balanced, help your career and be a vital source of info.

Politics is the other half of the job, even though you're expected to be a consummate techie. Treat every email as a press communication, it'll be picked over like one.

* This is not a technology festival, it's an organisation that does something. Always look from that angle.

* The BOFH stories are stories. Nobody, anywhere, actually wants to deal with the grumpy BOFH, in any circumstance.

I have been a sysadmin for 15 years. I fully expect to be gainfully employed until age 100 if I want to, because even in the future, nothing works.

kokey · on April 22, 2015

Experienced sysadmins knows that it's very likely for a successful application to be in place for well over 5 years. They will know this from having been the person who had had to take over such systems from others, several times.

They will know that any proprietary code will have to be supportable even after the developers are gone. The systems themselves will have to be supportable by new sysadmins. The system will have to be able to receive security patches and fixes. They will know all this will be needed for multiple systems in the same company. They will know that many promising technologies will come and go. They will know how much it sucks taking over systems where these things weren't thought through. They will remember systems they have left for others which would have not have these things considered. They will remember abandoning systems themselves that became unmanageable because these things weren't considered.

Inexperienced developers and system administrators treat servers like new desktops, with desktop focussed operating systems, new software being added to it all the time, changes being made in an increasingly unrepeatable manner, and the whole thing being replaced every 2 years.

Naery · on April 24, 2015

I'm a Windows Server Administrator, have been for a very long time. And, if you'll permit it, I'll say I'm a damn good one. I think there are two things that have gotten me to where I am today: 1) Incessantly asking why, and 2) voracious reading and research in my lab.

I acquired some servers a while back, then got some switches, then a router, all enterprise-grade gear, and I set up some virtual servers, vswitches, etc... Basically, I made a ridiculously complicated home lab. To do so, I followed tutorials online, but at every step of the way, I asked "why is this necessary, what does this do, why do I do it this way". That gave me a great foundational understanding.

Then, I realized that certifications aren't restricted to a certain group of people, anyone can take them, so I started studying for certifications. The idea was, these large governing bodies think these are the kinds of things that experts should know, so if I learn these things, I should be an expert. Right? Not quite, but reading up on each of the features, etc, that are covered in the certification exams really expanded my horizons. Then of course, I started asking Why would I use that and things just snowballed. Hear about something, ask why it is what it is and why people need/want it, then read up on it.

My home lab has been one of my greatest learning tools. Everything I've read about I have set up, installed, configured in my home lab. Then, I usually ask my brother (who also does IT) to come break something in the lab. He of course doesn't tell me what it is, and I have to go figure out what my "junior admin" did wrong and repair it.

So, I guess the TL;DR version is this: You need three things: 1) A home lab. 2) The question Why. 3) Tons and tons of reading.

rlonstein · on April 22, 2015

There's a lot of good information in this thread, but I can add this: join a SA organization and/or special interest group:

   Usenix, https://www.usenix.org/

   LOPSA, https://www.lopsa.org/

Then read the journals, even the articles that don't interest you now, and follow the email lists.

lwhalen · on April 23, 2015

I'd like to add that LOPSA also has a mentorship program, free for all members.

6t6t6 · on April 23, 2015

Maybe my advises are more abstract than what you are hoping for, but this is what I learnt after 15 years being a sysadmin.

- The good sysadmin is not the one who makes difficult things, is the one who makes the things easy. If one you find yourself wanting to recompile a Linux Kernel, probably you are taking a really bad approach for the problem.

- Document, document, document. Think about the bus factor. If a junior sysadmin is able to rebuild and manage your infrastructure using your documentation, you are doing the things in the right way.

- Never make an step forward without a backup plan. If you are going to make changes in the production servers, always have a plan B in case something goes wrong. Doing an `apt-get update` and hope that everything will be ok, is not a good policy.

- Always remember that your job is to serve the other departments, so they can do awesome things. If too many people in the company knows your name, you are doing something wrong.

mobiplayer · on April 22, 2015

Aside from the technical side, a good sysadmin knows that the business has priority over how cool this or that technologies are.

So every time you fix something, you're not fixing a computer, you are fixing a piece or tool of a business process. Changing your mindset will help you prioritizing the really important things.

mordae · on April 22, 2015

> ... being a good sysadmin ... means striving for simplicity, documenting and standardizing everything and being meticulous.

This. And you learn by not doing this and still having to maintain an infrastructure. After a while you will either start, hang yourself or get fired.

WestCoastJustin · on April 22, 2015

Here's my brain dump on what goes into being a well rounded Sysadmin:

https://sysadmincasts.com/episodes/25-bits-sysadmins-should-...

hobarrera · on April 23, 2015

A lot of comments here already address how to get into the learning side, and the theoretical side.

If you also want to get your hand dirty and learn some more that way (eg: experience), get some VPS with BSD (my preference is OpenBSD) or some really bare linux distro and set up your own email server (eg: OpenSMTPd), IM server, etc there. You'll learn plenty on the way: about servers themselves, how emailing works, management, etc, etc.

I learnt huge amounts doing this years ago, and have an extremely detailed understanding of how email, anti-spam, validation, etc work.

Don't do this "instead of" grabbing books though, do it "on top of", especially if you want to make a career out of it in future.

smutticus · on April 22, 2015

Let's all take a moment to lament the sorry state of man pages in Linux.

antod · on April 23, 2015

Lots of good advice about concrete skills mentioned here already - eg scripting, monitoring, documenting, low level OS knowledge etc.

I reckon a good sysadmin also has a hint of nagging paranoia or slight sense of impending doom and is never complacent about how well things are currently running or how secure they seem.

On top of this they need a good instinct for anticipating problems and evaluating risks so they can proactively fix problems before they arise.

Oh yeah - learn how DNS really works.

elwin · on April 22, 2015

There's a lot of good advice already, but here's one tip:

Learn how to read documentation. Consult man pages and official documentation before resorting to random people on websites. This is a skill that requires practice, because a lot of the material is mediocre. Some writers give overviews, some give examples, some list every feature. You may be more comfortable with one kind, but learn how to digest each one and extract the knowledge you need.

Ologn · on April 22, 2015

The article mentions certain things, but every piece of outside software run is a potential security problem. Every piece of inside software is a potential security problem for that matter. The authors don't even have to be malicious, just careless. How many remote holes did Drupal, Joomla and Wordpress have? (A lot)

Yes, the article is right that you should not just grab some random compiled binary and throw it on your production server. It mentions Debian. I suspect Red Hat and Suse have better solutions, as their customers demand it. Of course they may not have an officially blessed package of something that was released last month.

How to be a good sysadmin? For big installations there are production servers, staging servers, development servers, and then often some unofficial development servers. You control access to the production server, the procedure to do releases is formalized. You update server firmware, OS updates and package security updates. Do it regularly on staging, QA it, then do it on production.

Most security breakins I have seen are because a non sysadmin, non security person is doing something they're not supposed to. They're running an unauthorized server on their desktop not set up by the sysadmins, with a glaring security hole. Or an outside consultant is careless about how they connect to your systems, and someone breaks in through their account.

Maybe you're a sysadmin at a web site and you notice scripts trying to hack web usernames and passwords. Your workload is high, and you bring this to the attention of the head developers and management. No one cares, the business logic management wants implemented in the short term is very high, there is no time or budget for security. So you can either end your normal work at 6:15 PM and stay another hour at work each day fixing the problem, or ignore it and go home like everyone else.

I knew some people who were on the early tiger teams for the big accounting firms. They told me their success rate was 100% - they managed to get in to the company systems every time. They also mentioned they were at a disadvantage, as they had to remain within the law (beyond the blessing of management to probe security), while others doing so would not.

Insofar as logging - syslog calls from programs go to syslogd. This can be sent to various places, including to /var/log. You can tune facilities and logging levels in the syslog configuration file. Under systemd it might be different. Do you understand what I said in this paragraph? Good, you now know more than 95% of the Unix sysadmins I've interviewed over the past 20 years. I wish I was kidding.

rogeryu · on April 22, 2015

I think you are doing a decent job. However when you want to do the same for a big(ger) company, or a bank or multinational like Shell or IBM, the expectations go up. Then your setup will not be enough. I work like you, try to learn, every day. I'm not a professional sysadmin and I don't think I would get a job when I applied for one, but by lack of the real thing, I'm doing the sysadmin job.

gauravgupta · on April 23, 2015

I would recommend following Hackr's System Administration section. It's been my starting point for most sysadmin learning I have had so far (even though I am a software developer) - http://hackr.io/tutorials/system-administration

ehershey · on April 22, 2015

The content at opsschool.org has the potential to be a great resource - http://www.opsschool.org/en/latest/security_101.html - but there's not a lot of "there" there. Maybe you'll like it more than I do.

joshbaptiste · on April 22, 2015

The best sysadmins I have seen understand the Operating system from top to bottom. The easiest way to understand an OS without being a OS developer is using tracing tools to solve problems such as Dtrace under FreeBSD or Solaris (Linux also has many tracing tools). I would also recommend watching CS 162 series from Berkeley on Youtube.

emodendroket · on April 22, 2015

It's a different discipline and you don't really need to be an expert in that and development; you should pick one. That's not to say you should be totally ignorant of administration details, but, really, you could devote all of your time to it if you wanted.

digitalsushi · on April 22, 2015

My greybeard ISP unix boss in the late 90's taught me that if it's worth getting sued over, it's worth sending your /var/log/auth.log to a printer that can print line-by-line (a line printer).

daurnimator · on April 22, 2015

FWIW, I'm part of an 'online hackerspace' 'hashbang.sh' where we try and teach people to be better sysadmins.

https://hashbang.sh/#!

skywhopper · on April 22, 2015

As a long-time sysadmin, my best advice for learning is to give yourself the opportunity to break things. Constantly ask yourself questions about how things work and then see if you can answer them. I wouldn't even begin to start worrying about containers until you're more comfortable with the basics.

Some questions and projects to get you started:

Do you know how to start and stop services, how to tell what's running, how to see what network activity is happening?

Try installing Apache or Nginx on an EC2 instance, then see if you can change how it works. Change where it logs to, or change what port it listens on, or change what directory it pulls files from. Can you change where it reads its configuration from? What happens if you give it a bad configuration? Where does it log errors? Can you break it and then fix it?

Spin up a new EC2 instance and try to see if you can lock yourself out of it or otherwise disable it. What happens if you kill the SSH daemon? Or change the port? Or delete your user's entry in /etc/shadow? What happens if you run "rm -rf /"? What do all the fields in /etc/passwd do? What if you delete your home directory? Or "export PATH="? What happens when you delete files that a running program is using? What happens when you fill up the disk? When you run too many processes? Can you make the thing crash? When something gets screwed up, do you understand why?

Run "ls /bin" and see how many of those commands you know what they do. Pick one you don't know how to use but which you've heard of and try to figure out how to use it. Look at the man page, run the command with "-?" or "--help". Play around with it till you feel comfortable. Then pick another one tomorrow and do the same.

Run "ls /etc" and pick a file and try to find out what it's for. See if you can do something interesting by changing the contents of the file. You might need to reboot or restart a service. Tomorrow pick something else from /etc.

Do the same with /usr/bin, /usr/lib, /var/lib. Figure out where the files in /var/log come from and how you can write to them and how you can change their names and how many there are.

Set up two instances and see if you can get them to talk to each other. On EC2 each host has two network interfaces. Do you know how to find both IPs? Can you set up MySQL on one, and connect to it from the other? Once you get that to work, can you block it? Spin up a third instance and see if you can figure out how to make MySQL accessible to one and not the other. Once you figure out one way to do it, figure out another way. Can you get a service to listen on one IP and not the other? Both?

What's the difference between UDP and TCP? Set up NFS. Set up a RAID. Break a RAID, rebuild it. Do you understand crontabs? syslog? Can you send mail to yourself?

Once you're feeling more comfortable with the environment, and you start actually fulfilling a sysadmin role, your basic philosophy should be:

* Expect anything and everything to fail.

* Trust nothing, even your own software and machines.

* Grant the least possible access to make things work. Developers and vendors will always ask for more, and if you are not pushing back against their requests, you are probably giving them too much.

* Always have a plan for rolling back a change.

* When something breaks and you fix it, don't stop working until you understand what went wrong and you take some steps to avoid it in the future.

* When you have a task that takes more than one step to complete that you do more than once, write a script to do it for you.

That should get you started...

nstart · on April 22, 2015

That... Was brilliant. My weekend plans seem to be pretty sorted. Thanks a lot for this. I wish I could reply with something more than just thanks but honestly, there's so much here. Just.. Thank you :)

skywhopper · on April 22, 2015

Glad you got some good ideas. When you're just learning, if you get stuck on something and get frustrated, just drop it and move on to something else. Sometimes you just need to come back later. Sometimes you'll figure out the solution while trying to solve or learn something else. Don't get hung up on any particular problem at first.

Ultimately, good luck and have fun!

captn3m0 · on April 22, 2015

https://serversforhackers.com/ launched finally a while back, and I thought I'd share it here.

denysonique · on April 22, 2015

The first step is: Install Gentoo

denysonique · on April 22, 2015

Some of you may want to down vote this comment, however only those who ever installed Gentoo know the deep meaning behind that sentence.

Fourkeys · on April 23, 2015

Then perhaps you could elaborate on the sentence, seeing as the thread is about helping the learning process of a sysadmin. Posting something that only someone experienced in it would "understand" is entirely unhelpful and the likely reason for any down votes.

mightymaike · on April 22, 2015

making a lot of hours. Especially when it comes down to debugging problems created by endusers. Making edits on servers what are in production. Every sysadmin was fooling around in the beginning.

FabianBeiner · on April 22, 2015

I'd suggest to crash servers while trying. :)

collyw · on April 22, 2015

Don't take advice from Ubuntu forums.

hackuser · on April 22, 2015

You'll learn the technical concepts and rubrics eventually on your own; here is what I'd be thinking about to start:

1) It's great that you ask and seek to learn to do it right; that's a first step many don't get past.

2) There are big differences between hacking on something at home and professional system administration. Most important is cost: Your time is expensive but system downtime can be incredibly costly. [1] You need to anticipate and prevent the problem in the first place (you are the expert!), have resources prepared in case of failure (including expert knowledge of the system), and resolve it quickly. Also, you are paid as an expert to get results that boost the bottom line. You can't spend a day fiddling around with something and gratifying your curiosity.

3) There is a very wide range of knowledge and skill among sysadmins. You don't need a license to do it; anyone can print "System Administrator" on their business card. There are many, many ignorant hacks; lots of decent ones who don't think beyond what they are told; and few true experts. Who you surround yourself with will determine, to a great extent, where you fall in that range. You probably will adopt their standards and you will learn their way of doing things; you can spend your time learning either the knowledge and techniques of the hacks or those of the experts.

4) Invest the time and effort to learn core technologies [2] exhaustively and to learn best practices. Never shy away from difficult technical material; push yourself to find the best sources and develop skill to understand complex material. Most of what's on the Internet is bullsh-t from and for amatuer hackers, good enough for your home server. You can spend your career without understanding much of what you are doing -- there's enough work out there for the hacks too. Find the very best sources and people (see #3), and take the time to learn from them. Learn something once and it pays off forever.

5) Learn to solve problems. Worry less about learning techniques (e.g., arcane details of command line switches); anyone can look those up. The real challenge is staring at a screen, seeing something that you have no idea about (and which is not in books or on the Internet), and finding a way to solve it (with all the time and other pressures mentioned above). To do that, you need a model in your mind, a deep understanding, of the technologies and systems involved (see #4).

6) Put yourself in situations where you will be in command of the situation and with time in reserve for making major enhancements, learning more, and for unexpected crises; not where you are struggling to keep up with a flood problems, frantically treading water (or slowly drowning). Also choose situations where you are prepared to handle the worst-case scenarios: When the sh-t hits the fan -- when all your plans and normal operations go to hell -- people will look to you to save them. Be their hero. (See #5.)

It can be a very intellectually stimulating and gratifying job. In every field, good people and true experts are hard to find. Make yourself into one and you will be in good shape.

---------

[1] Consider the cost of 1,000 people not working (avg hourly rate * 1,000 * hours), hours of orders missed, facilities shut down because your system is the bottleneck, deadlines missed, angry customers, embarassment to the business and its executives, etc.

[2] What the core technologies are for you will vary. You can't know everything. Beware of investing time learning about technologies that change quickly. TCP/IP probably will stick around for awhile; the app-of-the-week, maybe not.

cymetica · on April 22, 2015

Learn to automate. Script.

dschiptsov · on April 22, 2015

Learn how to make basic autotools project (configure.ac, makefile.am, etc)

Learn how FreeBSD's ports system (building packages from sources) work and why it is so.

Then learn what Redhat Package Manager (RPM) is and how to make a source rpm.

Get enlightened.)