Hacker News new | past | comments | ask | show | jobs | submit login

This just means that the error message needs to be more clear. For example, after the error itself, it could give direct advice: “PERFORM THESE STEPS: You must define ENVVAR. Go to <wiki link>. Set ENVVAR to a proper value and restart the service.”

Notice the direct language. It reads like an order. The less direct the message, the higher the chances that the user will not act upon it.




>it could give direct advice: “PERFORM THESE STEPS: You must define ENVVAR. Go to <wiki link>. Set ENVVAR to a proper value and restart the service.”

Really, should logs also be documentation now ? Just mindlessly logging the same "advice" over and over again each time the error happen ?


Logs can definitely be a form of documentation.

I write software that is generally run low in the stack, quietly doing some mundane tasks that are business-critical but rarely thought about. If one of our clients has to mess with our software beyond the occasional update, that was a failing. Not all software is like this, but lots of it is -- its value is that no human needs to be involved.

I need to write log messages with the expectation of an audience who doesn't know much about the software -- it's been running uninterrupted for months or years and suddenly something has gone wrong. If the log line doesn't tell the user how to solve their problem, I will end up getting a call.


If it is that simple, the why doesn't the code fix it itself? But no, usually there is 1/2/3 likely things, but it also could be anything else.. and that kind if unexpected errors even often have no default-fix.

No, the most best thing is to point to the documentation which has that, and not printig out manpages of docs in error messages now.

> I write software that is generally run low in the stack

What stack, how low? Me too.. that low that I usually cannot return or even log a " see error code doc at http.." string for various reasons (bandwidth, mem, performance) but only have error codes ;)


In the case at hand, where an environment variable isn't set, how exactly should the code fix itself? Human interaction is necessary, which is the reason the log message should spell out what the human needs to do.

If I'm starting a service and see a pointer in the logs to documentation, that seems like an incredibly broken approach to me. Why would I look at missing or out-of-date documentation that may or may not be at hand when the code that knows the problem is right there and can just tell me? A log message like you're describing might as well say, "Something went wrong, but I don't want to tell you what. Instead check page 43 of the document in the third file cabinet from the left in that room over there on your right. No, your other right."


Similar issues arises with such documentation in error messages. There now has to be a process to make sure that all such information is always accounted for and updated correspondingly when the system changes.

> Something went wrong, but I don't want to tell you what.

is somewhat disingenius of an example. Error logs should tell in exhausting detail what went wrong. Ops needs that to analyse the situation, and the vendor will have much less trouble reproducing the error. However, suggesting specific fixes could be disastrous. Furthermore, documentation should already be in a form that operations can be expected to work with also in crisis situations.


I don't want to have to hunt for documentation if it breaks. It may have been 30 years and everything but the binary has been lost, and the vendor is out of business. If in that situation all I get is an error code and a link to documentation that doesn't exist, I'd have to start reverse-engineering. And while doing so I'd definitely be cursing the coder who decided that saving a couple hundred bytes of space in a log file in the event of an "abort the program"-severity event was worth dumping this in my lap.


Running such software is asking for a disaster already. At least documentation should still exist, and operational frameworks like ITIL insist on that. It can happen, but is usually telling of an operational culture that disregards maintenance, counting on being able to kick the can down the road as long as possible.


It will be so much fun when the implementation is refactored and half of these comments are forgotten about and no longer meaningful.


Exactly. At one of my previous workplaces there was a cumulative effect of misattributed error messages so the actions to perform were often of no help.

Not even to mention the fact that new or changed error messages caused a landslide in costs in translations to various languages. I guess this product has no localization? At that time, when I was working at such a product that had it, we had to go through a deliberate process to describe why we want to change it, what the impact is, etc. Tell me you want 100 new messages and you will be stuck in meetings for the next month.

In their case, though, it seems they at least have the support in management for it. I hope it turns out better for them than it did for me.


I had an error message a few months ago that instructed me to reinstall the AWS CLI, I filed a ticket when that didn't work, and the team was annoyed with me because obviously the real problem was a Python configuration warning with no suggested action 10 lines up.


It depends who, what, and when the error is about. Failures are generally a bathtub curve. You have a high rate at start (usually configuration issues), some fairly fixed rate during operation, and then more at end of lifecycle (exhaustion, service hiccups on scale-in).

If it's in the early lifecycle, absolutely, because it's most actionable. X is set wrong, Y can't be reached, etc, guide whoever is operating the system how to fix it.

If it's mid cycle, it's often post-hoc, but context is worth its weight in gold. Less about telling the operator how to fix and more about why it broke, to avoid in the future.

End of cycle, whatever.


Yes!

There are people who don't read formal documentation but do read logs, after all.

If the advice is the same over and over again, then yes, give the advice over and over again. I wouldn't want to assume that someone has read every line of the logs, or has started to read top-to-bottom, so the advice should always be among the most recent lines in the log, and the only way to ensure that is to give the advice again each time the error happens.


Yes! We have tools to filter what gets saved and compression that handles repeated text very well.

So why not provide docs on how to solve the error along with the error.


Logs actually are a form of documentation. Documentation can provide instructions on how to diagnose and fix problems, and that's what logs do: tell a human being what a problem is and how to fix it.

Remember that often the person reading the logs is not the person who wrote the software. Maybe it's an Ops person at 2AM trying to fix a broken deploy. Maybe it's a developer who joined the company 3 years after the software was written. Maybe the log is passing through an error message from 3 layers deep in the stack. The more literate your logs are, the better.


Errors on initialization, fatal errors, and non-recurrent errors that require human/support intervention should be documentation.


If the error results in the program shutting down, it’s once per fatal interaction.

In other words, yes.


Should logs more clearly let the user know how to fix problems? Yes.


This is fairly common in good error logs.


I think you're correct. To add to this (and I think it's the point that the article was trying to make), errors written in fragmented language or "developer speak" I feel are likely to get glossed over. The “Write it like you’re talking to a friend.” advice the article gives I think is spot on. Making the message more conversational is to invite better understanding and comprehension.

I feel there's a trend when it comes to disseminating messaging like this that we adopt an attitude of our audience "is smart, and should figure the rest out". They may be. But they already have lots to do any plenty to figure out. Any opportunity we, the requestor, can lighten their mental load, is going to increase the odds that they'll be inclined to take action right away.


I’m not seeing how what the message already is any less direct or clear than what you’re saying it should be? It straight up tells you it can’t find the var and what to do about it.

Can you help me understand what isn’t clear about the message as is, or maybe point out the ambiguity to someone who just isn’t seeing it? I want to write better error messages but I share the frustration of the above poster. The message tells you specifically what to do, but you’re coming back saying it’s not clear.


I think the original error is quite clear, under normal circumstances.

Not OP but I've noticed that people often get brain fog when something goes wrong and are often need BIG, SHORT, WORDS to shake out of it. Or really anything that can shake them out of the 'idunno' state of mind.

But maybe if something like that became standard ut would no longet be a context switcher..


I think you're spot on, and I made a similar comment above.

It's easy to say "they can figure it out". Sure, in a restful state. But the people we're asking to take action already have a lot on their plate. Using plain, conversational language whenever possible with exceedingly clear steps means less mental exertion on the receiver. And since we need their help, anything we can do to make it easier on their end helps us.


These are fascinating responses to me, as with the example given my mind first went to someone for whom English is a second language. that group having trouble with this message I would understand, or at least have an easier time understanding having trouble, if even a very little amount.

For someone who was born speaking English and spoke it their entire lives, the example provided couldn’t possibly be more to the point in my opinion.

Though I agree overall with the general idea and that yes there are some pretty baffling and downright awfully written error messages and log entries that take a minute to grok (I just don’t think the example replied to is one of them).


Conversational errors can also be fatiguing. Often what you want is something short and dry that can be pattern matched. Compilers are pretty good at this because all their errors start the same way.

    Error in file foo/bar.c, line 32, missing semicolon. 
No conversation needed. These can then be complemented with more conversational language on the next line to explain why semicolon is needed. Rust is quite good at this.


Then there's the delightful (no, I actually mean the opposite) errors that g++ emitted (back when I last wrote C++ and compiled using g++), where I basically could go "OK, there is an error that was detected at line L, in file F; and I think it may be a type error", so a recompile with clang, so I can actually understand what the error was, so I could fix it.


Some people don't read anything that isn't an all-caps command. They have learned helplessness from seeing too much useless error text in the past.


There's a type of error for which the user can be given detailed step-by-step instructions (permission issues, etc). But to some extent, errors should handle situations the programmer didn't expect. If it is possible to provide detailed step-by-step fixes, then the program should do those steps itself.

Adding a URL might not be a great plan, never know how long an old copy of a program will stick around, might not control that website forever.


I can't tell if this is sarcasm or not, this is obviously highlighting a deeper issue in developer culture.

The example given was clear compared to 90% of other error messages, and saying that it needs to be "more clear" is almost dismissive


Don't blame developer culture, if that error cannot be acted on, attribute to incompetence and not culture.


Some of the errors that Gentoo portage can encounter do exactly this - and they do it with beautiful terminal colors that make it easy to figure out what you need to run, or where to go to figure out which of the three options you need.

The problem can come when there's a wall of "useless" logging/error messages, and the last one or near the last one is the actual important one to look at. You have to explicitly call it out on a clear screen and make it obvious - and even then, people won't always read it.


It more likely means that the developer views the service as OP's responsibility. They'll view an order as something OP needs to do.

The clarity of the error message doesnt really matter if the recipient believes it is intended for somebody else.


The problem is people are not rational… and we try to solve that with software.

Many people just lock up when software doesn’t do what they expect.


Not rational people must be fired from IT.


Generally a pipe dream in my experience.


Lots of people find ways to irrationalize being rational.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: