This article needs a giant warning on it. Automating is brilliant, but an automated process is a liability. If no one is regularly checking the output of something that's been automated then you can't know if it's broken, and that could be catastrophic. Every story of "the backups had failed", "payments had been missed for months", or "inventory wasn't where it was supposed to be" is the story of an automated process no one was checking.
If you automate anything you need robust error reporting (which is not an email someone will ignore).
Every developer should take this advice to heart. If a customer asks for 'a daily email report', you should strongly advise against it.
- Most reports the customer asks for will be meaningless vanity metrics anyway
- If you only do error reporting (not sending an email if everything goes smoothly), you won't notice it if email is not being delivered.
- If you report everything (not just the errors), people will stop reading them
- A mailbox history is not a log
- you'll get a request just about every week to change the recipient list of the report.
- Who reads the reports on weekends?
I could go on with arguments for a while, but the short story is this: email is not suitable for logging and reporting. In fact: email is not suitable for a lot of things. But customers will ask you anyway, because that's the tool they know (if you have a hammer, all problems will look like nails).
My advice:
- Log everything to disk when possible (syslog, pipe)
- use a logging aggregator to filter and archive (fluentd, ELK, graylog)
- use an exception tracker for exceptions (sentry, raygun)
- make incidents actionable by creating tickets automatically (jira, zendesk, slack)
- if needed, use an incident response service (pagerduty, opsgenie)
>make incidents actionable by creating tickets automatically
not really a serious concern, but what happens when ticket automation goes down? If all the problems are auto-reported, in a form of tickets or tasks, then if that somehow breaks down it might take a moment before anyone notices, especially if reported exceptions are rather rare. If it is normal to have tens of tickets every day, then problems with ticket creation will be noticeable almost instantly, if they are very rare - then not so much
The second issue with automated exception tracking is that you loose the "huh, this is weird" mechanism that works when actual humans go through logs or reports. While any tool will of course be orders of magnitude faster and also probably more accurate, by relying solely on such automation an opportunity to notice some "weird"/not-typical entries or rare/unexpected sequences of those might be missed. Then again in most cases - I guess - simple statistical analysis might be a good substitute. And that can be automated.
(edit: formatting)
Don't let the perfect be the enemy of the good. 99.9XXXX reliability is good enough. Eventually have a enough nines and your risks are things like "nuclear war", "dinasour-killer sized asteroid hitting the earth", et cetera.
Agreed, there is absolutely no point going further after reaching a certain reliability level. However, one thing is eliminating risks, the other is limiting the consequences of said risks. I strongly prefer 99.9% reliability where that 0.1% means some insignificant problem over 99.99% reliability where the remaining 0.01% means total disaster.
My point is that doing "too much" automation gives diminishing returns (which is not bad in itself), but might also disproportionately increase the consequences of that 0.xxxx1%
<cynical-response>
So you are telling me by adding a simple email tool, I can make the customer happy because I'm giving them what they want and not have to set up 5 tools and continue to pay them monthly?
</cynical-response>
It's your job to advice them to use the proper solution.
But the customer ultimately pays you, so if they really want an email monitoring solution, you should built it for them.
Also, the cost savings by automating the task should outweigh the monthly cost of the tools that are required to run it. If not, the task is probably not worth automating.
we may have plateaued by now, but so far it's still a net gain - a new person comes in, learns things, often by reverse engineering, often improves the automated process, sometimes leaves etc.
But what we do need is a better way to measure automation gains and losses - basically any process that is automated shows up as a net loss in human productivity, which is simply not true.
And vice versa any automated processes probably don't account correctly for the total energy and resource use and the cost of maintenance (both software and hardware).
Automating doesn't mean "do half the job". Just like running the backup script/routine manually means checking the output and making sure the backup is useable, automating that task means performing the same checks. Sending an e-mail automatically on failure shouldn't be the issue...
We'd like to think that a process being manual means someone is paying attention. That's not always the case. You still need to be sure that checking is actually happening, whether the process is automated or manual.
Proper governance mechanisms is the way to go. It's one of the core aspects of the Monetary Authority of Singapore's FEAT regulation and I fully believe it makes sense.
I've been thinking about writing an essay about the flip side for a while, but maybe it only needs to be a toss-off HN comment, so...
In political/economic debates, people like to say: "if you tax something, you get less of it." And in turn "if you subsidize something, you get more of it."
You could think of automation as a labor subsidy. Or, on a rough level, as a tax on attention. That's not strictly correct, but it gets you in the right frame of mind: automation is, among other things, designed the achieve labor with diminished (or even absent) attention.
The problem is that diminished attention can have overlooked tradeoffs. If you know those tradeoffs, you may be able to automate around them. If you're not paying attention, you probably can't. If you've automated away attention, you might not notice until you've hit a problem at the scale of your automation.
Automation is valuable, absolutely. You probably want more of it. I do. And yet...
I particularly love the suggestion to just document what you're doing as the first step to proper automation. It's not that much work or cognitive effort. Sometimes a decent description of your procedure is all you will need and it tends to prove very valuable later. It can be also gradually improved. I was doing it often already, and this will remind me to persist.
This is very true. I stand up a server every now and then and just define the setup as a single-file Ansible playbook— pretty much no matter what, it'll be some combination of packages to install, users to create, config files to modify, etc. Why write it down when I can just define it as code and check it in somewhere?
But then the IT department comes along and hates it; they make a wiki page for each server with a list of instructions for what was done. Their approach is more flexible, as not every tool/step can be easily or robustly automated, and if you can just screenshot the config webpage and paste it into your wiki, that might be just as good as (and a lot faster than) spending an hour trying to bypass that step.
Plus, their approach assumes aggressive use of VM-level tools. Where I'm like "oh, I need N build slaves? Let me add them to my inventory file and re-run", whereas they're like "N build slaves? Sure, I'll set up the first one manually and the clone the machine for the others."
I know there are real actual advantages to version controlled configuration, but there is something pragmatic about just doing the thing and keeping human-readable notes as you do it.
The second time I do something o make a checklist, at a certain point I decide I need to automate it and I have all the steps mapped out to start from.
It works well on practice because a checklist is quite mutable, if something goes awry it gets added as a step the next time until I've covered the edge cases.
To make them I use vscode, markdown/journal plugin and mdpdf to print them to pdf, so they can be source controlled easily as well.
Reading this makes me realise on my team I'm the automation guru. That's my culture through and through. The rest of my team's culture is the opposite. I always want us to play a little more defense, so that we can over time get a lot more done. They always push back that "we're too busy to do it". Yet I continue to automate and things continue to improve. Sucks because we could be significantly more effective if it were the team culture.
I sort of feel it's the inverse. The last place I worked was heavily into automation and they had the one manager who always cautioned about relying on too much automation. I think that was a healthier dynamic tbh.
Too much vs. not enough are just different sets of problems.
Too much = you risk no one on the team being able to troubleshoot when things go wrong because no one has ever had to run through the process themselves before so no one understands it.
Too little = a non-trivial percentage of your bandwidth is eaten up by trivial things you keep doing over and over again that are in the vain of servicing requests rather than solving problems.
In practice I find I'd rather have the first problem and I can think of solutions to it's problems that I'm happy enough with.
Sure, it's ok, but she's compromising on the best resource for advancing her programming ability: her team.
I learned a lot from my coworkers that I couldn't have learned at school or internet. The author of the comment looked like she is passionate for programming, so I think she could be much happier with coworkers who she can look up to and learn from.
I've recently been on a learn-about-economics spree, so I have a handy lens nearby with which to view this advice: you must not allow your gains to grow merely linearly (by constantly learning but not automating). The economic imperative is to allow gains to compound (by automating). Manual workflows cannot compound; automated workflows can.
(I may have been doing a lot of automation recently because, I think, bosses have paid a lot of money for an automation tool and feel the need to use it, and therefore feeling a little cynical about the whole thing)
I've struggled with this. I'm part of a smaller local faculty IT and our systems are created by a central IT.
Some problems I struggle with:
1. How do I automate in an environment of so many disparate systems. Some things are done through web interfaces, some RDP'ing to an AD server, email, spreadsheets sitting on shared drives, etc.
2. How do I handle credentials. Central IT won't create generic accounts for us, even for testing, so all automation would have to be done through my credentials. How do I set up an automated system that uses my credentials safely?
3. Where do I run this system from? From my own laptop, a box under my desk, somewhere in the cloud where I'd have to beg security to allow it to do anything non-trivial? And how do I pass this on to the next person?
I've looked in to RPA type solutions (expensive), headless browsers (so unreliable), Microsoft Flow (limited and confusing).
> How do I automate in an environment of so many disparate systems.
1. Using disparate tools if need be. Fortunately we have PowerShell these days which covers a whole lot of stuff including being ok at various web-based interactions. Anymore, if it can't be done with PowerShell (in a Windows environment that is) then I think it's probably too fragile to leave to automation anyway... ok, that's not entirely true. There is at least one important production system in our environment I have to automate using AutoHotKey and that works pretty well.
2. I have no idea why anyone would think not allowing generic accounts for this sort of thing would be a good idea, but under your own account shouldn't be a big deal as long as you leave appropriate documentation about what needs to happen in case you're hit by a bus. Many sins are forgivable if there is adequate documentation.
3. I usually just run automation scripts on the system I'm scripting. If I'm dropped into an environment and asked to figure out how some system is automated, where's the first place I'm going to look?
To help make the case to change the system, though, you could build a proof-of-concept that uses your credentials and present that - and the ACM Queue article! - to the folks you need to convince. Try showing it to someone who pays the bills too - they might be a useful ally to help change how things work if they recognise the productivity gains.
Automation gives you consistency. If you automate the incorrect process, you’ll get it done consistently wrong.
It still is a win because you’ll have a better chance on diagnosing and improving something that fails the same way as opposed to some manual process that has been done in a dozen different incorrect ways and you learn nothing from mistakes because there are no automated controls to feed back into.
Most people and organizations have a strong negative gut reaction when automation does something wrong but are okay when a manual process yields mistakes because we largely accept humans as fallible (at most, someone gets fired, and more bureaucracy is introduced), but the long-term economics of automation look ultimately better.
I’ve seen time and time again apparently innocent manual processes that, when not recognized as a liability, rapidly turn into entire departments (and entire management hierarchies), at which point it’s impossible to improve, because changing culture is harder than changing code. Having enough instances of that is what creates corporate whales without competitiveness.
I never heard any excitement from people when we told them we would test some automation with their request, but otherwise I love being reminded of this. We tend to slack with that in our team, all the reasons mentioned in this post.
What we have found helped us quite a bit is, after writing the initial runbook for a task, to write a second script (usually pretty short, maybe two dozen lines) that asks you questions and then generates customised steps. We then later use these functions in the actual automation.
Always improve, but manual work is the craft that gets you to that improvement. Once the wheel has been created, no need to recreate it, but brakes might be nice, too.
I would argue not all manual work is a bug nor does all manual work need to be automated. For example, if you have a manual task that needs to be done once per quarter, is it really worth investing time and money to automate it? In contrast, if you have a manual task that you do multiple times per day, then perhaps that's worth investing resources to automate that task.
This article is short-sighted at best. I am a system administrator which sadly cannot do much automation.
The reason is simple: I manage systems not installed by me, where problems are often a one-off issue and more important, every system is installed a bit differently (differently enough that there is no portability for automation between systems).
Automation is awesome. But sometimes it just cannot be done, because of constraints put in place by others.
If you automate anything you need robust error reporting (which is not an email someone will ignore).