Write better error messages

munk-a · on Oct 19, 2022

I don't disagree with any points but they missed a big one. If at all possible, include some application (or attempt at a globally) unique error code on each of your errors - i.e. YCOM-HN-9021. When you provide a clearly googleable string you can help your users independently resolve the issue and you can also set up google alerts on the string - if you roll out a new feature that took 3 months to develop and a week later google tells you that YCOM-HN-9021 is up 9000% you probably broke something. If at all possible make yourself open to client communication but most users won't reach out about an error - users have very low trust in customer care in the modern world (and it is, honestly, often more trouble than it's worth) and are more likely to turn to reddit/technical forums for a solution. It is extremely advantageous to try and track these users.

franga2000 · on Oct 20, 2022

If you do this, please make it alphanumeric, like the above comment's example and make sure your prefix is unique. Modern search engines, especially Google, are very bad at finding literal strings if searched without quotes (many users don't know to do that) and have all sorts of gotchas that make finding errors near impossible.

Some bad examples (both from Microsoft): - "Code -4" (putting a minus in front of a word makes Google exclude that word from results), - "0x00071153" (search engines love to omit the 0x and give you a bunch of phone numbers instead)

briHass · on Oct 20, 2022

This one helped me recently to access my UPS MyChoice account. Got an error message on login attempt, but really didn't give enough of a darn to waste my time calling their support (as was recommended in the message). Reddit was full of reports with a tip that it was related to forced password reset where the old password doesn't meet new complexity requirements, and I was able to download the mobile app and reset my password.

UPS owes that reddit user a beer for helping out at least 30 people (by up votes) at zero cost to themselves.

wsh · on Oct 19, 2022

IBM has done this for decades. For many products, especially on the mainframe, every message has a unique identifier, typically with a prefix for the module or subsystem that generated it, a message number, and an action or severity code, such as “I” for information or “E” for error.

A series of Messages and Codes books is supposed to list every message, each with an explanation and suggested user, administrator, or programmer response.

This was more important when messages were practically limited to a single line on a CRT or teletypewriter, and customers had only the printed or microfiche documentation, not online help or websites, but it’s still valuable today.

randomswede · on Oct 20, 2022

Likewise Digital, all VMS error codes are "SUB-S-NAME" (so a permissions error when dealing with a file woule be something like "%RMS-E-NOPERM" (I think that's "rotating mass storage" rather than "Richard M. Stallman"). I think that even harks back to various OSes on the PDP-11, but cannot say for sure.

kjellsbells · on Oct 21, 2022

Im always amused when I see the HN community get excited about something that the greybeards figured out 40 or 50 years ago. Turns out grandma knew what she was doing after all. If only there was a better way for the wisdom of sages to be transmitted to this generation without eyerolling over IBM and DEC.

Now get off my lawn!

dejj · on Oct 20, 2022

Error -41: sit by a lake

https://xkcd.com/1024/

PaulHoule · on Oct 20, 2022

Another one is to generate a globally unique id for that failure. In a web application the user can share the id with you and you can look in the log and see the associated error.

cratermoon · on Oct 20, 2022

That seems like a good idea, but the problem is that any ID long enough to be sensibly globally unique is likely to be a long string of random-ish characters, and unless the end user has at least a screen cap – highly unlikely – there's about a 0.0001% chance you'll get the code correctly.

On the flip side, if you're willing to give up globally unique for "unique enough within and reasonable time frame" then you can go with just a few characters or even short words.

matsemann · on Oct 20, 2022

We did this. Basically showing the semi-unique request-id / correlation-id to the user, or include it in our response headers in our APIs. So when people contacted us with a screenshot or a dump of a request that failed, it was easy to find the exact one in our logs.

PaulHoule · on Oct 20, 2022

Yeah, just like there is a high fraction of people who can't see 3D movies there are a lot of people who can't cut and paste.

stormking · on Oct 20, 2022

Just use both. A globally unique but static "error code" to google plus a serial number of that distinct instance of the failure.

nicbou · on Oct 20, 2022

This is really important if you have multilingual software. It lets users find help in all the languages they understand, not just their OS language.

dale_glass · on Oct 19, 2022

I'll add a few for developer-oriented messages.

* Say what the program was trying to do.

* Make the message unique and searchable.

* Make it detailed.

* FFS, include the filename or whatever else the program is having trouble with.

* If possible, include the source code location.

* If possible, include useful contextual information.

* Quote strings. Once in a while, some unexpected whitespace sneaks in somewhere and this can be hard to figure out.

Eg, don't just abort with "Open failed: NOT_FOUND". Abort with "job.c:2105 Failed to open job description file '/var/spool/jobs/125.json' when processing job #5 for user 'alice': NOT_FOUND".

This way I don't have to strace the damn thing to try and figure out what's it looking for, and know which user it was for, so I don't have to dig around and try and figure out which entry in the database might contain the wrong information.

Also, context-free, generic error messages are awful. A large enough codebase may be impossible to search for some very common keywords.

If possible, googleable error codes are great to have, but they shouldn't replace the error message. It's ideal if you can search the source code and instantly find where the error message originates.

at_a_remove · on Oct 19, 2022

Yup, all of these. Sometimes I look "around" the problem, like, "I found THIS directory but the file 'z.txt' was not in it!" or "Not only could I not find 'z.txt' I could not find THIS directory it was supposed to be in." Check to see that it is really a file, not a directory. "I found 'z.txt' in THIS directory but it was zero bytes in length!"

In terms of "fail early," my larger programs have a section called Pre-Flight Checklist, which looks for files (and that they are files), databases, that the databases have the expected tables and the correct columns, and so on. Are the files sufficiently recent? More or less the expected length? Because this is ETL stuff, it's usually okay to push this stuff up as early as I can.

redact207 · on Oct 19, 2022

For Saas products, this plus use structured logging so you don't have to grep-parse log messages when searching your log collectors.

Ie all the meta/log context in a hashmap alongside the error message.

chiefalchemist · on Oct 19, 2022

A couple+ years ago my then employer required I take (what amounted to) Security Training 101 for Software Developers. I believe one of the client orgs expected everyone to go through the program.

That said, Ppetty much everything you're suggesting was considered a bad idea (for security). Mainly because the more details you give away, the more a hacker can understand about the underlying system. The more they probe and possibly break things, the more you're showing your cards.

It was then the bland cryptic error msg made perfect sense to me.

dale_glass · on Oct 19, 2022

Well, everything is depends on context of course.

I'm talking here mostly of user-facing local applications -- like what would be in your mail client's logs, or the logs of a corporate service, where the logs are there for the admin's/dev's use.

Of course if you're sending feedback to a potential attacker things change considerably.

chiefalchemist · on Oct 20, 2022

I understand. But I'm going to assume the rule would be. Do X. No exceptions. As you know, doing sec means living with a healthy amount of paranoia. Imagine giving an exception and being wrong.

Sec = better safe than sorry.

m-p-3 · on Oct 19, 2022

I'll also add to make them easy to copy to clipboard in the case of a GUI-based program.

It's easier to search and store in an incident management system.

worik · on Oct 20, 2022

For developers, maybe. But they need logs, not error messages.

For users they need brief, is it fatal (restart) temporary (try again) or just this part - do something else.

Adding words is unhelpful. More information means less communication.

drpixie · on Oct 20, 2022

Yes to all - and also don't include boilerplate text, or at least limit it to the minimum polite for your audience.

So if you must show a stack dump:

- don't put in lots of whitespace - it might look pretty but it makes it harder to read/parse.

- if you're giving the error file:line, don't bother showing the source code. If the source is meaningful to the reader, they've probably got access to the code, or are using an IDE.

trinovantes · on Oct 20, 2022

One thing I've recently started doing when a file related error happens is to retry the command with strace and see which file the program is trying to access

golergka · on Oct 19, 2022

Also, make sure that sensitive information like user's passwords, emails, credit card numbers etc, is filtered out of the logs and not sent to your servers.

Minor49er · on Oct 19, 2022

At a previous job, writing unambiguous error messages was discouraged. Everything just had to be "Oops! Something went wrong"

The reasoning was that "users can't do anything with information we tell them anyways", despite the overwhelming number of help desk tickets we'd get from "Oops!" appearing in a million different scenarios with no clear way for us to tell what error actually caused the message to appear.

Users naturally report the messages that they see because they're helping us to see the problem. I didn't get why that was such a hard concept to understand

wongarsu · on Oct 19, 2022

That seems like peak uselessness. Even "Error code 0x00ad4829" is a more useful message, because even if it's useless to the user it is useful to somebody.

cogman10 · on Oct 19, 2022

There is some logic, the "you don't want to expose your internals". Really useful messages might contain a lot of details about the tech stack you use (giving a nice hint into which CVEs to try).

That said, this is an easily solved problem. The best solution is to aggressively log errors AND prioritize having dev teams push that error count to 0. If an error happens, it's a bug.

The next way to solve it is simply a report button. Let the users click a "I'm mad at you for not working" button and embed something like a session ID that allows internal queries into what went wrong.

Error codes are a terrible solution, but perhaps an OK option if this is not hosted software. That said, a more user friendly approach would be a QR code with all the relevant details embedded.

marcosdumay · on Oct 19, 2022

> Really useful messages might contain a lot of details about the tech stack you use (giving a nice hint into which CVEs to try).

Nope. Useful messages contain details about what your software does. Anything about your tech stack is redundant and can be removed.

> The best solution is to aggressively log errors AND prioritize having dev teams push that error count to 0.

Many errors can only be replicated talking to users. And on the cases your dev team is not all capable enough to remove all errors, you will still want to provide customer support and work-arounds.

> The next way to solve it is simply a report button.

A report button is good. But neither session ID nor any data that you can reasonably add to your logs will be enough to let dev know what went wrong. Besides, your report button will have errors too.

And anyway, anything that you said applies exclusively to people that create web applications. Many other types of application exist, and everybody writing them are better off not following any of your recommendations.

berkes · on Oct 19, 2022

Why are error codes a terrible solution? I rather have an error "bad request f12793b2" than a "bad request". Obviously I prefer a "bad request, 'expiresAt cannot be after 2022-12-19'. Code f12793b2".

Having a unique ID to be able to search in documentation or even source code is -IMO- preferable. It's still rather technical and helps only those who can search such docs, but at least it gives something unique to google/search for."

bigiain · on Oct 20, 2022

> The best solution is to aggressively log errors

Until one day you find some random dev is logging failed authentication attempts and including the email and password in the logs…

(and the most amusing part of that incident was tracing down the offender by finding the earliest of those particular log lines, and getting his real email address password out of them… “Hey Phil, what’s ‘Dragons87!’?” “Ummm, what? That’s, errr, my gsuite password. How did you know?”)

slavik81 · on Oct 19, 2022

This seems to be the approach that Android takes. If you try to connect to a WiFi network and it fails, it just gives up. It won't tell you why it failed. This makes it very frustrating to figure out what's wrong. Maybe I wouldn't understand the error message, but at least it would provide a starting place for me to look up more information or ask for help from someone knowledgable.

jrochkind1 · on Oct 19, 2022

> The reasoning was that "users can't do anything with information we tell them anyways",

I mean, I feel like the focus of the OP was on giving them something they could do something with. Like the information that their information was not lost; and the recommendation to change X or try again in Y way; and the fallthrough to contact customer support with a quick link.

The OP was definitely not recommending giving more specific technical info without thinking about what the user could do with it, but instead specifically thinking about what hte user could do or would want to know (about their data/account, not about your under the hood services), and giving info to that end.

MattGaiser · on Oct 19, 2022

I have only worked at one place that wanted informative error messages.

All the others wanted to hide the reason because "if we know the reason and tell the user, we seem incompetent" or "then hackers will know which API call isn't working right" (apparently the network console in Chrome is beyond hackers) to wanting customers to be dependent as they paid for support.

rockemsockem · on Oct 19, 2022

People who don't know anything about computer security use it as a bludgeon to not do the thing that they didn't want to do anyway.

m-p-3 · on Oct 19, 2022

The epitome of uselessness: making an error message so "user-friendly" that it doesn't help anyone.

At least a "Details" button to unmask the technical details would be useful in some way, while hiding the "ugliness" to the end-user.

vasergen · on Oct 29, 2022

The error user see and the one you log shouldn't be the same, you still can log complete information about an error, while the user will see only "Oops, something went wrong"

PetahNZ · on Oct 19, 2022

As long as you are logging the error with the context somewhere that's fine. You could always include a timestamp or request ID with the user message to not give away information, but be able to easily search your logs for the occurrence.

Stratoscope · on Oct 19, 2022

20 years ago I was working on Acrobat at Adobe. I was mostly the "Windows guy" but also worked and tested on the Mac.

When I tried to install Acrobat on my Mac, I got this message:

"Your hard disk is too small"

My what is too small?!

Later, on Windows I got this unexpected popup:

"You are not here"

WTF?

I searched the code for that string and found it in a function named "CantHappen()". This function was called in numerous places where the programmer thought there was no possible way for the code to get to that place. But of course CantHappen() did happen.

As I looked through the code I found many other messages that were bizarre and incomprehensible and sometimes downright offensive.

So I started a project to go through all our messages and make them more clear and informative - and even better, when possible to not have the message at all but just take care of the situation.

The underlying cause of these bad messages was twofold:

1. Programmers never got raises for writing great error messages or finding ways to avoid them in the first place. We were just rated on how much work we got done.

2. We did have a product designer who was supposed to specify all user-facing messages. But the designer mainly considered the "happy path" and didn't think about edge cases. It was left to developers working under time pressure to handle those.

TacticalCoder · on Oct 19, 2022

> Later, on Windows I got this unexpected popup: > > "You are not here"

The absolute best I had in a Microsoft product was this (paraphrasing): "An error happened because your computer may be turned off". I still have a screenshot of that somewhere. What it meant was that an hypothetical computer I may be trying to connect to (which I wasn't, it was all local) was off, but that wasn't the case. This was seriously WTF.

The second most beautiful one from another Microsoft product was whatever software generating a password and asking me, in a pop-up window, to write it down. The problem was the password was something like:

    9mZOvy9E(4)?6b(w(<$KcTU%>9T6cz0Z4YxgQ-<tw035X6S.dLE0[2n0"42`/S=S1{q5{)61s190':&6UHT.4hZXjO6b%l#X7v]~4tIT2Y0._ebFH,>2:G>%*P]7n4"

I probably also still have a screenshot of that somewhere.

Haven't used Microsoft stuff in two decades so it was a long time ago. But it's still seriously WTF.

layer8 · on Oct 19, 2022

The best error message on Windows is “The operation completed successfully”: https://www.google.com/search?q=the+operation+completed+succ...

…which is the text for result code zero, which is used to mean “no error”.

A_Venom_Roll · on Oct 20, 2022

Long time ago I ran into a "catastrophic failure" in Word, which sounds quite serious.

bigiain · on Oct 20, 2022

“Guru Meditation Error”

redbell · on Oct 20, 2022

There's also that hilarious Windows Phone error that prompts users to insert their Windows installation disc and restart their "computer".

im3w1l · on Oct 19, 2022

The paradox of CantHappen is that if the programmer truly thought it can't happen then there would be no need for it in the first place. The only reason to include it is because of a fear that it may in fact happen.

Rust funny enough has unreachable()! for that case, but it also has unreachable_unchecked() for actually unreachable code. The latter has undefined behavior and exists to help the optimizer.

rablackburn · on Oct 19, 2022

I’m guilting of writing a “can’t happen” branch (as I’m sure many of us are).

I stripped it out before production after verifying it couldn’t actually happen but it was something like an artefact of my thinking process while writing the code.

It feels like a kind of assertion of underlying assumptions, and I know enough to know I’m fallible.

I’m always careful to make the error message something reasonable if it ever did actually come up in a tech demo or something though. Anything else is tempting the Fates.

Aeolun · on Oct 20, 2022

I don’t understand why the message would be ‘Can’t happen’ in the first place.

I’d always make it something like “If you see this, something entirely unexpected went wrong.”

Showing and triggering the error is helpful in itself since it’ll generate a trace and all the attendant stuff.

cratermoon · on Oct 20, 2022

"lp0 on fire" and "PC LOAD LETTER" are two good ones, too.

giancarlostoro · on Oct 19, 2022

What does unreachable()! do actually? I had no idea that was a thing.

maleldil · on Oct 19, 2022

It terminates the program with panic!

https://doc.rust-lang.org/std/macro.unreachable.html

giancarlostoro · on Oct 20, 2022

So is the usual flow to throw these in places where code shouldn't execute but then create tests to try and trip it up to see if that is truly the case? I would hate to be running a release build with this, or does the compiler do something different depending on build type?

maleldil · on Oct 21, 2022

> I would hate to be running a release build with this

The usual argument is that the program would be in an invalid state if the condition was reached, so the only option is to crash. If it turns out it's a valid state, then the programmer can treat in a branch. I don't think tests would capture this because they would operating under the same assumption that such states cannot exist. Maybe fuzzy testing could surface an issue like this.

remram · on Oct 19, 2022

Rust has a few of those, they all panic but with different default messages: panic!(), todo!(), unimplemented!(), and unreachable!()

andirk · on Oct 20, 2022

ESLint complains if no `default` option on `switch` statements. Sometimes it's not possible. I have been trained to add it regardless. While developing, I add some message like "not possible" and sure enough it hits once in a while in dev due to something I didn't consider.

grandinj · on Oct 19, 2022

Probably just me, but I am less concerned with how good my error messages are, and more concerned with trying very very hard to make the errors happen closer to the cause of the problem, rather than further away.

"Fail early, fail hard"

i.e. if I can make the error message happen near the beginning of a process, I can get away with making it a hard error.

Hard errors in the middle of a multi-hour operation tend to annoy people.

Merad · on Oct 19, 2022

This is an attitude I really try to build up in junior devs. Soooo many people seem to default to writing code like, "if input is null return null" (when input should never be null) or "if valueThatShodBePositive < 0 silently skip the code that was going to use the value". If the app detects that something is in an invalid state _I want it to break_. The worst problems to debug are the ones where you have to work backwards through miles of strange behavior and corrupted data to find the root cause, because the program tried valiantly to soldier on long after it had been shot through heart with bad data.

I guess this is because no one really teaches error handling. I assume a lot of students end up with a mindset of just make the errors go away instead of, deal with the errors effectively.

S201 · on Oct 19, 2022

Agreed; I've often wondered if this is a result of early CS classes usually expecting students to handle weird/bad inputs. It's only natural for a programmer to want to write a program that gracefully handles all reasonably bad inputs, like nulls. So we're taught early on to write defensive code that handles those. And that's fine when you're writing short, academic programs. But when the complexity goes up by a few orders of magnitude trying to gracefully handle that null value 10 levels deep in some parsing logic maybe isn't the best thing to do. Old habits die hard, however.

nightpool · on Oct 19, 2022

Yeah, this is a great point. Both overly defensive programming and (my personal least favorite) overly-commented code are instilled in students at a very early point in their careers by irresponsible teachers trying to find something to grade students on (Didn't handle negative values? 5 points off! Didn't leave a comment on every line? 1 point off per line!)

mdtusz · on Oct 20, 2022

I think this is a symptom of using weakly typed languages as well. If your argument types are declared to be options/eithers, then you need to handle the empty case, but usually it's easier and better to just move that optional handling further up the callstack or type system.

A lot of `if (input == null)` checks are because you're just not sure whether the argument being passed in will have a value, and it's too much work for your small feature PR to refactor the whole codebase to resolve it.

Use typescript/python-with-mypy/haskell/rust/whatever and this problem mostly disappears.

Merad · on Oct 20, 2022

> A lot of `if (input == null)` checks are because you're just not sure whether the argument being passed in will have a value, and it's too much work for your small feature PR to refactor the whole codebase to resolve it.

Null checks are totally fine, but it should be clear whether or not null is a valid input to the method. If the answer is 'no' then you should throw ArgumentNullException (or whatever's appropriate for the language), not silently ignore the bad input.

elboru · on Oct 19, 2022

When I was a jr dev, getting exceptions was a synonymous of ”me messing something up”. Null exceptions were specially annoying, so the naive approach is to check for nulls and avoid the code that will cause the exception. And it “works”! You don’t get exceptions and your code keeps running. It’s just when you need to fix difficult bugs while you go through logs when you understand the value of having the right exception with the right message. And you learn to love them and start caring about them.

vbezhenar · on Oct 19, 2022

Exactly. Software must crash as soon as possible and include some context information which is necessary to further debug the problem.

lupire · on Oct 19, 2022

After it fails fast (thank you!), we also want to fix fast. So we need info.

wavesquid · on Oct 20, 2022

With the important corollary that you need to check for the errornous condition both early and late.

Otherwise people start e.g. checking in the frontend and don't enforce it in the backend in the worse case, or TOCTOU bugs in the best case.

tiborsaas · on Oct 19, 2022

That's not really a respectful practice. Error messages should be clear and actionable.

Users don't care if you consider an error soft or hard.

madeofpalk · on Oct 19, 2022

I think the point is that the higher up you fail, the harder it is to identify why you errored in order to give the user clear and actionable feedback.

tiborsaas · on Oct 19, 2022

That's possibly indicating a bad UI / information architecture if you are unable to tell that.

llanowarelves · on Oct 19, 2022

When you have nested exceptions being caught by other exceptions, how do you determine what level is correct to show the user? Especially when it's a service class or something that is used by a lot of calling code.

It's implied that it would be the upper top-most exception handlers in that code path but those are gonna be more generic in their messages, and anything more detailed has to be manually wrapped to add useful description (that's not some internal developer exception).

Error codes may be the least bad solution, to fallback on.

tiborsaas · on Oct 19, 2022

It's hard to give a generic answer to this. I just see way too many bad error messages that could be solved with a little more thought and copywriting skills.

Error messages are part of the user experience and they should not be an afterthought.

If errors are nested, list them all. Give a generic feedback then, and also provide a technical explanation that would help debugging. Most importantly, we should make the user feel safe and in control as much as possible.

llanowarelves · on Oct 19, 2022

I actually do like "collecting" the errors when possible, and having them return in the API response (for example). Instead of the common pattern where there's just singular "error:" in the top-level json.

Works great for things like validation.

P5fRxh5kUvp2th · on Oct 21, 2022

Fail Fast means your logging infrastructure is going to report to you more quickly to get the problem fixed.

As opposed to 6 months down the road when someone finally notices an uptick in complaints by customers and now the potential problem sites is literally the entire software stack.

fail fast is how stable software is made, the question is whether or not you think customers appreciate stable software.

hprotagonist · on Oct 19, 2022

I would, if i had any evidence at all that they would be read and acted on. I’m convinced even seemingly competent people are just rendered contextually blind by the appearance of any error at all.

In the past month, i’ve had about a dozen interactions like this:

  developer: your service crashed, here’s a screenshot of the last 5 lines of the crash

  me: do you see where the final text you just pasted is “RuntimeError: Did not find ENVVAR, ensure this is set to the proper value (see <internal wiki link>) and then restart this service”

  developer: yeah?

  me: well, did you do that thing?

  developer: what thing?

  me: <headdesk>

and this at work, where the developer in question is intimately acquainted with the context and purpose of the project.

nkrisc · on Oct 19, 2022

The goal of writing better error messages isn't to help the people who never read error messages, it's to help the people who do and who you never have to hear from.

marklubi · on Oct 19, 2022

The trick that I've found is that each error message needs to be unique... not just the stack trace, but the actual wording of the message leading up to that.

Get a screenshot or the exact verbatim of it, and you can identify exactly where in the code it originated.

User reports are unreliable, but when I can pinpoint where the message originated from, it massively cuts down on the troubleshooting time.

Too · on Oct 19, 2022

About that, the number of developers that can’t read, or even understand the value of, a stack trace is also astonishing.

If only I had a penny every time someone sent me a “log of the error”, that only contains the final line with the unhelpful message saying nothing but KeyError.

lamontcg · on Oct 19, 2022

At prior work we removed stack traces from the default error output because it was thought to "scare" too many users.

Then for years almost without fail when an error was pasted into a GH issue it would include the big "If submitting a bug report, please include the full stack trace at /var/log/stacktrace.out" message--without the stacktrace. I added some whitespace around it and all caps to it and still nobody read it.

vladvasiliu · on Oct 19, 2022

Forget stack traces.

I've met multiple "web developers" (actually working on the backend or "full-stack", building API servers and whatnot) who came complaining about this or that server being "unreachable" and could I check it's up / whether the firewall allows them through. Only to find they were getting HTTP 404 errors or the like. Which were explicit in the errors they'd show me.

alisonatwork · on Oct 19, 2022

A useful thing here is not just to include a unique error code for the type of error (usually numeric), but also to generate some kind of short Base32 or similar hash and print that right next to the error message while logging it to your normal back end. Then whether people send you a screen shot, copy/paste, whatever, you can easily search the logs to find the exact event that occurred.

mceachen · on Oct 19, 2022

Better still: add a unique prefix to the error code, so it's googlable.

The Typescript team does this with compilation errors, like `TS12345: frobulating types cannot be transmuted`.

rmetzler · on Oct 19, 2022

Yes, that type of thing is pretty useful for linters. These error codes act as identifiers if you need to google them and whenever you need to configure the linter the way you like it or for one-off exceptions.

lucb1e · on Oct 19, 2022

> each error message needs to be unique

Include random numbers. "Error 7743929" is super easy to track down (grep -r 7743929 takes 2 seconds to type), you don't need a NATO alphabet to understand what they're saying on the phone in order to be able to search it correctly, its general purpose is understood internationally, and it won't change between versions (like when you'd encode a file name and line number, for example). When I first figured this out at, idk, 17 years old and mentioned the idea in a game making forum, people called me crazy, but I still use it and don't know of any better system.

Of course, this is alongside an actual error message to help the user help themselves. This is just to trace the line where it originated, which already helps a lot for small software projects like I make.

legulere · on Oct 19, 2022

In RFC 7807 all errors get an unique URI. Message texts might change or be translated into a language you don’t understand.

shadowgovt · on Oct 19, 2022

It turns out translating error messages is controversial.

Users, upon hitting an error, often go check Stack Overflow. If you localize your error messages, you Balkanize the collective wisdom on how to address the error (which will always be larger than your team's ability to troubleshoot errors and offer correctives in your documentation and FAQs).

BerislavLopac · on Oct 19, 2022

To be precise, each error type gets a unique URI.

A good way to take advantage of that is to have a central database of all error types, but not many companies bother to do that.

mi_lk · on Oct 19, 2022

> have a central database of all error types

do you have any example?

zem · on Oct 19, 2022

here's ours for pytype (a python type checker): https://google.github.io/pytype/errors.html

dylan604 · on Oct 19, 2022

I used to lean on line numbers, but those quickly fall out of sync with deployed code and what's currently checked out and available for immediate debugging. I've also switched to using unique text you mention as it will always find the place in the code regardless if it has been moved.

I wish I had learned that earlier than I had.

EvanAnderson · on Oct 19, 2022

I am reminded of the classic non-intuitive survivorship bias example from WWII re: armoring bombers: https://en.wikipedia.org/wiki/Survivorship_bias#In_the_milit...

vkou · on Oct 19, 2022

Or, in the anecdote above, to help yourself, when you are inevitably contacted by the person who never reads error messages.

ajnin · on Oct 19, 2022

How many interactions didn't you have, because the developer read the error message, read the Wiki, and ultimately solved the issue themselves ?

outworlder · on Oct 19, 2022

I have managed to get a lot of notoriety in my company by just:

1. Paying attention to error messages

2. Reading documentation

3. Looking up stuff I don't fully understand(including googling error messages)

That's it.

Some people don't even read error messages at all. I understand non technical people doing that, but I've seen far too many engineers doing it. If anything doesn't go exactly as expected, they freeze. I have no idea how a person gets so far in their careers without reading error messages. Actually, I do, those people ask others to figure out stuff for them. That's way prevalent in enterprise settings. Sure, collaboration is good, but I've seen a lot of instances where there's a massive imbalance – you'll have 10 people pinging a single person to 'unblock' them. They could have spent a couple of minutes trying to figure out yourself.

I'll move mountains to help someone that comes to me after having done some basic homework to try to fix (or at least triage) an issue. It very rare though.

It's also amazing how many people will just go ahead without having read a single line of documentation of the thing they are working on. I've even had a developer dive in a Golang codebase without having _ever_ worked on the language. That would have been fine – that's how I learn new languages, just get accustomed, before doing some more formal training and exercises – except that he continued to not read the language documentation before asking a bunch of questions. Needless to say, the questions weren't good.

And number 3... just rubber ducky everything. If you can't explain it, you don't get it. Go read up on the topic. Sometimes I'll find out that I don't fully understand something as I'm writing an email to others.

tetha · on Oct 19, 2022

> I'll move mountains to help someone that comes to me after having done some basic homework to try to fix (or at least triage) an issue. It very rare though.

These are rare, but they also tend to be the really effective ones. We have a couple of teams who understand the stack, read documentation and read error messages. We generally don't hear of them for months and months, because they are too busy being productive.

But when we hear of them, it's usually time to push boundaries of the infrastructure and the processes. They tried everything and nothing worked and now it's time to make it work.

vladvasiliu · on Oct 19, 2022

> I'll move mountains to help someone that comes to me after having done some basic homework to try to fix (or at least triage) an issue. It very rare though.

This. I actually am OK with people not figuring out even basic stuff. But please, at least try to give the impression that you've put some effort in, instead of just trying to have me do your homework while you browse facebook or whatever.

dan_mctree · on Oct 19, 2022

> except that he continued to not read the language documentation before asking a bunch of questions

Can't really blame people for that too much, most language documentation is utterly unreadable unless you already know exactly what you're doing. And even if you do get it, it's in one eye and out the other. Most people just don't learn very well from reading technical information you don't need to use right away. You might be a happy exception and got to build up your notoriety that way

samus · on Oct 21, 2022

This doesn't hold for any popular language. For those, a bazillion tutorials in various formats, books and example projects have been written.

Language documentation is for looking up nitty-gritty details. You go there if you already know what you're looking for. It works for some languages and for some people, but reading it from top to bottom is usually a horrible way to learn a programming language.

bonoboTP · on Oct 19, 2022

It's error message blindness, similar to ad blindness. Even if you make a great banner ad with some very useful information, or the perfect and affordable product for my life I won't see it because I mentally filter out ads because they are junk most of the time.

Some people develop the same with relation to error messages because most of them are not actionable, other than "stuff broke somehow, [gibberish] blabla". Even if your error message is impeccable, it's in the class of things that are noise.

If you come up to me at some busy tourist location, where I'm used to lots of scammers, I won't listen to you even if you are actually a nice person and just want to have a nice chat and we would be compatible friends.

Often it is a good strategy to just ask people. Documentation and comments get out of date very fast. If you are the kind of person who reads everything meticulously and googles around, reads manuals etc. you may be wasting a lot of time. Of course there is a right balance to find. Some people err too much on the side of not thinking themselves and immediately asking for handholding, but overall it's often the right thing to do.

In many cases I found that trying to reason out what was going on was hopeless, because when I eventually gave up and asked someone, it turned out that the solution was unguessable, something like "ah of course, that things is out of date, do this magic incantation, then this and that, yeah we should update the docs sometime!".

A lot of knowledge is locked up inside people's brains and just spreads around as "rumors" on the grapevine. Is that state of affairs ideal? No. But it's realistic and people are going to adapt by asking first, thinking second.

BlargMcLarg · on Oct 19, 2022

Asking people is mostly bad habits from a culture too ingrained into the whole 'ask first' thing, and often times it is the people trying to help that are to blame.

I had this recently. Many individuals like to play hero and make sure I don't get stuck because their business is an undocumented mess. Before I even read the thing and tried, they are already trying to give me the answer. When I ask 'is this documented and if so, how would it be discovered easily' their first reaction is 'no' followed by a lengthy explanation which should be in the wiki and easy for newcomers to find.

And it shows when I forget a few days later because my brain never put in the effort to get to the answer and my memory is that of a fruit fly's.

Jiro · on Oct 19, 2022

There's also the situation where the program creator likes changing functionality on a whim, and every time you google up your problem, you find a solution for a version of the software that doesn't have the particular menu or whatever that you had the problem with.

(This is a big problem if you've ever had a problem with Android.)

zagrebian · on Oct 19, 2022

This just means that the error message needs to be more clear. For example, after the error itself, it could give direct advice: “PERFORM THESE STEPS: You must define ENVVAR. Go to <wiki link>. Set ENVVAR to a proper value and restart the service.”

Notice the direct language. It reads like an order. The less direct the message, the higher the chances that the user will not act upon it.

mariusmg · on Oct 19, 2022

>it could give direct advice: “PERFORM THESE STEPS: You must define ENVVAR. Go to <wiki link>. Set ENVVAR to a proper value and restart the service.”

Really, should logs also be documentation now ? Just mindlessly logging the same "advice" over and over again each time the error happen ?

ddulaney · on Oct 19, 2022

Logs can definitely be a form of documentation.

I write software that is generally run low in the stack, quietly doing some mundane tasks that are business-critical but rarely thought about. If one of our clients has to mess with our software beyond the occasional update, that was a failing. Not all software is like this, but lots of it is -- its value is that no human needs to be involved.

I need to write log messages with the expectation of an audience who doesn't know much about the software -- it's been running uninterrupted for months or years and suddenly something has gone wrong. If the log line doesn't tell the user how to solve their problem, I will end up getting a call.

throw827474737 · on Oct 19, 2022

If it is that simple, the why doesn't the code fix it itself? But no, usually there is 1/2/3 likely things, but it also could be anything else.. and that kind if unexpected errors even often have no default-fix.

No, the most best thing is to point to the documentation which has that, and not printig out manpages of docs in error messages now.

> I write software that is generally run low in the stack

What stack, how low? Me too.. that low that I usually cannot return or even log a " see error code doc at http.." string for various reasons (bandwidth, mem, performance) but only have error codes ;)

pwinnski · on Oct 19, 2022

In the case at hand, where an environment variable isn't set, how exactly should the code fix itself? Human interaction is necessary, which is the reason the log message should spell out what the human needs to do.

If I'm starting a service and see a pointer in the logs to documentation, that seems like an incredibly broken approach to me. Why would I look at missing or out-of-date documentation that may or may not be at hand when the code that knows the problem is right there and can just tell me? A log message like you're describing might as well say, "Something went wrong, but I don't want to tell you what. Instead check page 43 of the document in the third file cabinet from the left in that room over there on your right. No, your other right."

samus · on Oct 21, 2022

Similar issues arises with such documentation in error messages. There now has to be a process to make sure that all such information is always accounted for and updated correspondingly when the system changes.

> Something went wrong, but I don't want to tell you what.

is somewhat disingenius of an example. Error logs should tell in exhausting detail what went wrong. Ops needs that to analyse the situation, and the vendor will have much less trouble reproducing the error. However, suggesting specific fixes could be disastrous. Furthermore, documentation should already be in a form that operations can be expected to work with also in crisis situations.

an_ko · on Oct 19, 2022

I don't want to have to hunt for documentation if it breaks. It may have been 30 years and everything but the binary has been lost, and the vendor is out of business. If in that situation all I get is an error code and a link to documentation that doesn't exist, I'd have to start reverse-engineering. And while doing so I'd definitely be cursing the coder who decided that saving a couple hundred bytes of space in a log file in the event of an "abort the program"-severity event was worth dumping this in my lap.

samus · on Oct 21, 2022

Running such software is asking for a disaster already. At least documentation should still exist, and operational frameworks like ITIL insist on that. It can happen, but is usually telling of an operational culture that disregards maintenance, counting on being able to kick the can down the road as long as possible.

dementiapatent · on Oct 19, 2022

It will be so much fun when the implementation is refactored and half of these comments are forgotten about and no longer meaningful.

prerok · on Oct 19, 2022

Exactly. At one of my previous workplaces there was a cumulative effect of misattributed error messages so the actions to perform were often of no help.

Not even to mention the fact that new or changed error messages caused a landslide in costs in translations to various languages. I guess this product has no localization? At that time, when I was working at such a product that had it, we had to go through a deliberate process to describe why we want to change it, what the impact is, etc. Tell me you want 100 new messages and you will be stuck in meetings for the next month.

In their case, though, it seems they at least have the support in management for it. I hope it turns out better for them than it did for me.

SpicyLemonZest · on Oct 19, 2022

I had an error message a few months ago that instructed me to reinstall the AWS CLI, I filed a ticket when that didn't work, and the team was annoyed with me because obviously the real problem was a Python configuration warning with no suggested action 10 lines up.

kortex · on Oct 19, 2022

It depends who, what, and when the error is about. Failures are generally a bathtub curve. You have a high rate at start (usually configuration issues), some fairly fixed rate during operation, and then more at end of lifecycle (exhaustion, service hiccups on scale-in).

If it's in the early lifecycle, absolutely, because it's most actionable. X is set wrong, Y can't be reached, etc, guide whoever is operating the system how to fix it.

If it's mid cycle, it's often post-hoc, but context is worth its weight in gold. Less about telling the operator how to fix and more about why it broke, to avoid in the future.

End of cycle, whatever.

pwinnski · on Oct 19, 2022

Yes!

There are people who don't read formal documentation but do read logs, after all.

If the advice is the same over and over again, then yes, give the advice over and over again. I wouldn't want to assume that someone has read every line of the logs, or has started to read top-to-bottom, so the advice should always be among the most recent lines in the log, and the only way to ensure that is to give the advice again each time the error happens.

chillfox · on Oct 19, 2022

Yes! We have tools to filter what gets saved and compression that handles repeated text very well.

So why not provide docs on how to solve the error along with the error.

0xbadcafebee · on Oct 19, 2022

Logs actually are a form of documentation. Documentation can provide instructions on how to diagnose and fix problems, and that's what logs do: tell a human being what a problem is and how to fix it.

Remember that often the person reading the logs is not the person who wrote the software. Maybe it's an Ops person at 2AM trying to fix a broken deploy. Maybe it's a developer who joined the company 3 years after the software was written. Maybe the log is passing through an error message from 3 layers deep in the stack. The more literate your logs are, the better.

Spivak · on Oct 19, 2022

Errors on initialization, fatal errors, and non-recurrent errors that require human/support intervention should be documentation.

hinkley · on Oct 19, 2022

If the error results in the program shutting down, it’s once per fatal interaction.

In other words, yes.

xboxnolifes · on Oct 20, 2022

Should logs more clearly let the user know how to fix problems? Yes.

eyelidlessness · on Oct 19, 2022

This is fairly common in good error logs.

ckozlowski · on Oct 19, 2022

I think you're correct. To add to this (and I think it's the point that the article was trying to make), errors written in fragmented language or "developer speak" I feel are likely to get glossed over. The “Write it like you’re talking to a friend.” advice the article gives I think is spot on. Making the message more conversational is to invite better understanding and comprehension.

I feel there's a trend when it comes to disseminating messaging like this that we adopt an attitude of our audience "is smart, and should figure the rest out". They may be. But they already have lots to do any plenty to figure out. Any opportunity we, the requestor, can lighten their mental load, is going to increase the odds that they'll be inclined to take action right away.

dvtrn · on Oct 19, 2022

I’m not seeing how what the message already is any less direct or clear than what you’re saying it should be? It straight up tells you it can’t find the var and what to do about it.

Can you help me understand what isn’t clear about the message as is, or maybe point out the ambiguity to someone who just isn’t seeing it? I want to write better error messages but I share the frustration of the above poster. The message tells you specifically what to do, but you’re coming back saying it’s not clear.

j-bos · on Oct 19, 2022

I think the original error is quite clear, under normal circumstances.

Not OP but I've noticed that people often get brain fog when something goes wrong and are often need BIG, SHORT, WORDS to shake out of it. Or really anything that can shake them out of the 'idunno' state of mind.

But maybe if something like that became standard ut would no longet be a context switcher..

ckozlowski · on Oct 19, 2022

I think you're spot on, and I made a similar comment above.

It's easy to say "they can figure it out". Sure, in a restful state. But the people we're asking to take action already have a lot on their plate. Using plain, conversational language whenever possible with exceedingly clear steps means less mental exertion on the receiver. And since we need their help, anything we can do to make it easier on their end helps us.

dvtrn · on Oct 19, 2022

These are fascinating responses to me, as with the example given my mind first went to someone for whom English is a second language. that group having trouble with this message I would understand, or at least have an easier time understanding having trouble, if even a very little amount.

For someone who was born speaking English and spoke it their entire lives, the example provided couldn’t possibly be more to the point in my opinion.

Though I agree overall with the general idea and that yes there are some pretty baffling and downright awfully written error messages and log entries that take a minute to grok (I just don’t think the example replied to is one of them).

Too · on Oct 19, 2022

Conversational errors can also be fatiguing. Often what you want is something short and dry that can be pattern matched. Compilers are pretty good at this because all their errors start the same way.

    Error in file foo/bar.c, line 32, missing semicolon.

No conversation needed. These can then be complemented with more conversational language on the next line to explain why semicolon is needed. Rust is quite good at this.

randomswede · on Oct 20, 2022

Then there's the delightful (no, I actually mean the opposite) errors that g++ emitted (back when I last wrote C++ and compiled using g++), where I basically could go "OK, there is an error that was detected at line L, in file F; and I think it may be a type error", so a recompile with clang, so I can actually understand what the error was, so I could fix it.

lupire · on Oct 19, 2022

Some people don't read anything that isn't an all-caps command. They have learned helplessness from seeing too much useless error text in the past.

bee_rider · on Oct 19, 2022

There's a type of error for which the user can be given detailed step-by-step instructions (permission issues, etc). But to some extent, errors should handle situations the programmer didn't expect. If it is possible to provide detailed step-by-step fixes, then the program should do those steps itself.

Adding a URL might not be a great plan, never know how long an old copy of a program will stick around, might not control that website forever.

MiddleMan5 · on Oct 19, 2022

I can't tell if this is sarcasm or not, this is obviously highlighting a deeper issue in developer culture.

The example given was clear compared to 90% of other error messages, and saying that it needs to be "more clear" is almost dismissive

Aperocky · on Oct 19, 2022

Don't blame developer culture, if that error cannot be acted on, attribute to incompetence and not culture.

bombcar · on Oct 19, 2022

Some of the errors that Gentoo portage can encounter do exactly this - and they do it with beautiful terminal colors that make it easy to figure out what you need to run, or where to go to figure out which of the three options you need.

The problem can come when there's a wall of "useless" logging/error messages, and the last one or near the last one is the actual important one to look at. You have to explicitly call it out on a clear screen and make it obvious - and even then, people won't always read it.

pydry · on Oct 19, 2022

It more likely means that the developer views the service as OP's responsibility. They'll view an order as something OP needs to do.

The clarity of the error message doesnt really matter if the recipient believes it is intended for somebody else.

duxup · on Oct 19, 2022

The problem is people are not rational… and we try to solve that with software.

Many people just lock up when software doesn’t do what they expect.

vbezhenar · on Oct 19, 2022

Not rational people must be fired from IT.

duxup · on Oct 19, 2022

Generally a pipe dream in my experience.

hinkley · on Oct 19, 2022

Lots of people find ways to irrationalize being rational.

jimmytidey · on Oct 19, 2022

This is a context where people are used to seeing errors that they don't know what to do with.

If a web app pops a well written error it is much more likely to be acted on than an unmotivated dev seeing a some (probably badly formatted) text.

Every time I see an error in terminal with a link to documentation I'm delighted. And surprised.

Kalium · on Oct 19, 2022

Once upon a time, I worked at a financial startup (the company is irrelevant). I created a little harness around a static analysis tool. It would fail builds when a library had an outstanding vulnerability scored as HIGH or SEVERE with a patch available. The harness put a friendly error message around it. It ran roughly as follows:

> Hi! If you're reading this message, it's likely because this tool failed your build. To understand why and fix it, please click this link <link_to_internal_doc>. Below is a table that lists the packages you need to update and the version you need to update them to.

The doc had at the very top in big flashing red text with siren anigifs a link to the portion that explained that they needed to update their libraries with very clear copy-paste-into-Dockerfile actionable guidance. The page also explained the broader context, such as the point of the tool and why we were doing this despite having a firewall and so on.

This is where you might be delighted and surprised.

What was perhaps less delightful and surprising were the consequences for me. About 4-6 times a week, I would then have a Slack conversation akin to this:

    Dev: Why did you break my build!?!

    Me: Can I see the error message?

    Dev: <pastes message above>

    Me: Thanks! Looking at the message, is there something unclear about the documentation? Does it not work?

    <ten minutes pass>

    Dev: Nope! Docs are great!

At this point the conversation would end.

nerdponx · on Oct 19, 2022

So? That's no excuse for a developer to disregard the content of an error message in their own application.

lijogdfljk · on Oct 19, 2022

It kinda is. Kinda like when documentation is so repeatedly outdated and incorrect, that when you need new information you just skip documentation entirely.

Are you wrong for skipping documentation? Yea, maybe. Is it entirely expected? Yea.

Based on the parent comment, at least.

monknomo · on Oct 19, 2022

And yet developers do disregard the content of error messages. Try to figure out why they disregard it. I doubt the answer is "because they're stupid". The answer probably also isn't "because they just aren't trying".

What could it be? Why do people read things and react in similar ways, even if they have different jobs? If only there was some field of study that could answer these mysteries.

bob1029 · on Oct 19, 2022

This is a lesson I learned while being system owner of the primary user interface that runs on a semiconductor factory floor. No amount of confirmation/warning dialogs will actually stop someone from doing a wrong thing. Doesn't matter how scary the language is. Here's an approximate sample of one:

  "DANGER! Confirming this action may result in 8 figures worth of scrap!!!"

Even if you are super careful and make sure your error messages are terse in all cases, you will still succumb to things like muscle memory among your users. I've caught myself mindlessly dismissing these while testing. How can I expect my users to be better than the person who developed the UI? That is unreasonable.

It got to a point where we started removing these alerts/confirmations because it was training people to do the wrong thing in a few places. If you have part of a UI where all actions are immediate and final, the game theory changes. The moment a user enters into one of these spaces, they are much more cautious.

If the user thinks the UI will save them, they may eventually tire of these protections and forget why they are there in the first place. I feel like this is very similar to the problem of driver assistance and partial self-driving capabilities today.

nicbou · on Oct 20, 2022

I like how GitHub asks you to type back the name of the repository you want to delete.

bob1029 · on Oct 20, 2022

For GitHub scary actions, I will not hesitate to copy and paste the expected repo name on the UI. I can do this so quickly my brain does not process the consequences in time.

grandinj · on Oct 19, 2022

Some developers are just lazy, and will likely need some kind of negative feedback to force them to confront their own laziness.

Which can be tricky, because the degree of negative feedback that is appropriate to the person in question can range from

"Polite one-on-one suggestion that you read the error message more than once before calling me"

to

"Full on yelling at the person in the middle of an open-plan office".

Thankfully, type II is rare, but they do occur.

bartread · on Oct 19, 2022

> Some developers are just lazy

I'm really lazy: if I were on the receiving end of emails with error messages that included instructions about how to fix said error I'd automate Freshdesk (or whatever ticketing system I was using) to respond with instructions specific to that error message, in the first instance, along with a note to get in touch again if that didn't solve the problem. I'd also set the ticket to autoresolve after a set period of time.

lupire · on Oct 19, 2022

Send a link to wiki. Last line of page is "if you have questions, reach out and include the keyword $THIS_PAGE_KEY in your message."

TillE · on Oct 19, 2022

I've seen this constantly over the years, people who absolutely refuse to read the simplest instructions, but instead require step-by-step hand-holding from you personally.

I have no idea how these people get through life at all.

nicbou · on Oct 20, 2022

I have no idea either.

For example when someone asks me "how do I get a German work visa" and I reply with a link to a page titled "how to get a German work visa", which is the among the first results on Google. A literal minute later, they ask me more questions that the page clearly answers.

Some people can't be arsed to read a 5 minute article you hand-delivered to them, and would rather have you type it back to them.

I think that some people just have zero respect for other people's time.

dagw · on Oct 19, 2022

I suspect that, at least subconsciously, they're to some extent doing that to punish you for writing 'bad' software that they have to struggle with. If they're going to suffer, you're going to suffer right along side them.

0x457 · on Oct 19, 2022

Hey, let's jump on a quick call, so we can go through this together and maybe update docs if they're out of date?

rjmill · on Oct 19, 2022

> evidence at all that they would be read

I just had an idea: Put tracking info in the error URL. If your company has an internal URL shortener, that could do the trick.

More practically, I feel like it helps to put an empty line before the call to action. For many people, a traceback is just noise. The empty line helps split the useful info out from the traceback.

Or if it's a script/CLI (and you know the error reason) don't even show a traceback. Just print the error message to stderr, exit non-zero, and be done with it.

coldacid · on Oct 19, 2022

The help desk guys are on the other side of a cubicle wall from my workstation, and almost every call I overhear about someone getting errors just convinces me further and further that people don't only not pay attention to the error message, they don't pay attention to the people they're calling to help them get through the situation either.

residualmind · on Oct 19, 2022

Actually reading (and understanding, acting upon) error messages seems to be part of the learning process of every developer. And while more senior devs usually do read error messages, even they sometimes, rather than reading it will jump to behavior like "trying again a different way", before looking closely what went wrong.

hinkley · on Oct 19, 2022

Developers often seemed shocked that people can’t find the important error in a wall of text. A particular peeve is when the same error is reported three ways and the real error is sandwiched between others or scrolled off the screen due to spammy behavior.

JTbane · on Oct 19, 2022

>>>“RuntimeError: Did not find ENVVAR, ensure this is set to the proper value (see <internal wiki link>) and then restart this service”

I'm laughing as you could not make it clearer if you tried. PEBKAC

0xbadcafebee · on Oct 19, 2022

The problem is here: "RuntimeError:". Once they saw that, they stopped reading. "Did not find ENVVAR" [..] "ensure this is set to the proper value" [..] "and then restart the service" are also obscure and will stop them from reading.

Why is the user like this? Error message PTSD. Years of staring at obscure errors full of technical jargon that are not helpful to the user, has left them scared to even look at the content of the error message. They have tried to Google these things before and failed, and now they just avoid it entirely and run for help.

I'm sure there's enough detail in the link you provided to help the user. But if that's the case, it will be better for the error message to simply say:

  A problem occurred, but don't worry! You can fix it yourself in 5 minutes! For instructions, visit https://internal-wiki-link/spaces/BLAH/AppUserRuntimeError#A013579

Even if you expect the user to be "smart enough" to fix their own problem, they are more likely to try it themselves if you make it seem easier.

Kalium · on Oct 19, 2022

I tried exactly this approach! What I got was a bunch of developers copy-pasting the error message with helpful URL at me and demanding to know what they should do. The number who followed the link and fixed the problem themselves was shockingly small.

Going out on a limb, I think we're all going astray by trying to parse the error messages our fellow developers are reacting to. A great many seem to handle any unfamiliar or unexpected error message by giving up, no matter how friendly or informative or helpful it may be.

bonoboTP · on Oct 19, 2022

They don't parse the error message as a natural language sentence talking to them. They take it as an opaque string, like a big error code. It literally passes through them without getting interpreted.

They learned that the affordances of these error messages are copy pasting into some place: a google search box, or a chat box asking for help. But it has no affordance of "interpret as an English sentence" for them.

0xbadcafebee · on Oct 19, 2022

If that's the case, then these people may just need training. It's likely that nobody has ever sat them down and explained that they have a responsibility to investigate their own issue. Often people feel they have to rush to get something done, and that they can't take time to troubleshoot. But if their bosses explained that, actually, it's fine if your work is a little late due to troubleshooting, they might do it themselves more often. You also may need to provide back-pressure by interacting via email/ticket.

Kalium · on Oct 19, 2022

That's a kind, caring, compassionate, empathetic approach founded on assuming good faith.

Unfortunately, it is perhaps not an ideal fit. I was mostly not dealing with the most junior and new of developers here. I was often dealing with senior developers who fully understood that they were responsible for investigating their own issues in a context where it was understood that troubleshooting takes time.

I often wound up regurgitating the error message back to them, asking them to point to the problems in the documentation getting in the way of them solving their own problems. This generally resulted in a conspicuous silence and the issues shortly thereafter being resolved.

The lesson I drew from this was not that the developers in question needed training. What I learned was that they needed to be convinced to treat these errors as natural-language strings they could interpret themselves.

quintussss · on Oct 19, 2022

Isn't this just survivor bias though? You only hear from those that fail to read and act on the error message.

lupire · on Oct 19, 2022

Use the error messages you wrote! Send them the link they sent you, and move on.

Joker_vD · on Oct 19, 2022

Well, imagine the error was simply "RuntimeError: Environment variable not set" instead, then how much of your time would have been wasted by those dozen interactions?

onion2k · on Oct 19, 2022

Shouldn't the app gracefully exit with a clear message, and not bail out in a way that looks like a crash? I'd guess that the person who wrote it hooked into the error handler because that was the easy thing to do rather than bother to write a nice way to exit properly.

The fact that you've had this a dozen times points to a problem with the app more than the people using it to me.

tlogan · on Oct 19, 2022

This is 100% correct.

In theory, all errors should: explain the input, explain the problem and explain how to solve the problem (actions). And that should help and reduce number of support calls. However, error messages and actions how to solve the error are read by maybe 1% of users.

The only way to improve your UI is to prevent errors and use standards / familiar design.

pizza · on Oct 19, 2022

I mean it kinda makes sense. When you're coding, you're constructing something. When you're debugging, you're deconstructing something. I feel like it's natural for people to take a sec to codeswitch, bc they were likely in a state of flow w/ considerable momentum up until they saw the error

Taylor_OD · on Oct 19, 2022

It's a little annoying but to be fair because most error messaging is garbage, its easy to start to ignore them. How often is the error message shown, and the little fix given, actually going to solve the problem in modern web development? 10% of the time? 25% of the time? I'd be shocked if its that high.

CityCobra · on Oct 19, 2022

Still, if you write proper error messages then at least you can figure out what the issue was without SSHing into the person’s computer and checking their logs.

chillfox · on Oct 19, 2022

Don't send people somewhere else to learn how to fix the error. The more steps and indirection you add the fewer people will bother doing it themselves, especially if they can bump it to the developer. Make it easy for people to fix their own problems by being explicit, direct and complete. List all the steps and use formatting to make it visually easier to consume.

So your error message while a far cry from the worst I have seen is also pretty far from the good ones I have seen.

starkd · on Oct 19, 2022

I think his point was the developer tends not to even investigate the ENVVAR at issue or visit the link. If the developer does investigate the link and still has an issue, than you have a point.

chillfox · on Oct 19, 2022

Pretty sure his problem was he got contacted about an issue he considers uninteresting, and his preferred solution is the user stops behaving like a human.

Reaching for the easiest way to solve a problem first is a very human thing to do, and in this case he was easier to contact than opening up a browser and reading an article that presumably is written in the same kind of language as the error message.

starkd · on Oct 19, 2022

I admit to doing this. Even many of the useful error messages that clearly indicate the fix are drowned out out by the mass of output. I've made this mistake before, and I'll probably do so again.

chillfox · on Oct 19, 2022

I feel like this is a problem of overly chatty application logs + lack of formatting for errors.

If the volume of drivel was lowered and errors were formatted with spacing and color to stand out, then they would be easier to focus on.

So log errors to stderr, send it to a separate log file, and format it well (use multiple lines).

marcosdumay · on Oct 19, 2022

> So log errors to stderr, send it to a separate log file, and format it well (use multiple lines).

Oh, for sure. Do never:

- send errors to the same log you send normal activity.

- default into logging things that aren't errors on the error log (make this possible to override if you want, but never the default).

- log the errors there, but the necessary context on stdout so it appears correct on a terminal. (E.g. build tools that print entering into target in stdout; error in stderr; leaving target in stdout)

- try to recover just to show a different error later.

randomswede · on Oct 20, 2022

Oh, boy, on the "never log expected failures as errors" front, I once worked with a database system that used opportunistic transactions. Basically, each modification to a row carried effectively the original value of that trow with the update and if it failed, the API call triggered an error saying that the transaction failed. So if you did a "SET column=(column+1) WHERE rowid=unique", the client could basically do an automatic retry.

But, it also logged each and every occurrence of this at "Error" severity, instead of at "Info" severity (it is, after all, expected to happen once in a while).

And of course, once our code switched over to using this, the first few times every team member had to deal with a production issue, the immediate reaction was "oh, no, the data store is unhealthy! look at this mass of error logs, I can see one every few minutes!". Thankfully, after the first team member (me, as it happens) spent half an hour reading the relevant parts of the design and implementation docs, we could frequently short-cut a lengthy investigation by "oh, you think $DB is bad because you are seeing transaction failures? no, that's expected, see $URL".

dvtrn · on Oct 19, 2022

I’m left wonder at what point does the “give a man a fish/teach a man how to fish” method of pedagogy apply in terms of ‘acting like a human’ in this context?

Asking as someone who otherwise generally agrees that there are some truly poorly written errors and exceptions out there, but has also been on the admittedly frustrating end of the constant requests for help deciphering error messages that were very plainly stating what the problem is for someone who didn’t even try looking for the fishing rod.

chillfox · on Oct 19, 2022

Sure, clearly there are people who will never try, or learn, but in general as an industry I feel like the wast majority of errors are very very far from good.

Few error messages are written well, has good formatting and are self contained (can be used to fix the issue without having to seek further information elsewhere). Sometimes you see errors that contain one of those elements, but rarely all of them.

There has been an effort the last few years improving compiler errors for some languages, but those same improvements have not reached applications.

llbeansandrice · on Oct 19, 2022

I feel like I can't get folks to open the log file and cmd-F "ERROR" half the time.

londons_explore · on Oct 19, 2022

A big part of this is to direct more of your development time into errors that happen more frequently.

Most systems I was involved in designing have some kind of error tracking system, so we can know exactly how often each error occurs.

An error that never happened needs (usually) no attention.

An error that 28% of installations have seen needs a lot of attention. The error text should be translated into local languages, wiki pages should be written about how to resolve it, efforts should be made to auto-resolve the error. The error message should include helpful info, etc.

Eg. "SSH server can't start. Config file unreadable".

Could be split into:

SSH server can't start. Config file error on line 7. 'AllowPasswordLoogin' is an invalid setting. Did you mean 'AllowPasswordLogin'? If you want to make this change, 'sudo nano /etc/sshserver.conf' will let you change this config.

magicalhippo · on Oct 19, 2022

If you're raising an exception deep in some internal code, provide as much detail as possible.

If the error bubbles up to the user, then either the information is over their head, in which case there's no difference to a non-detailed error message, or the user/support person can actually act on it.

The most infuriating error I see is "file not found"... WHICH FILE?!

Of course if the error is found in the higher level due to some consistency check in the business logic, then yeah try to guide the user. But for internal stuff, try to help the person who needs to fix it or find a workaround. It might be you.

lupire · on Oct 19, 2022

> The most infuriating error I see is "file not found"... WHICH FILE?!

Filenames might contain user data, which must not be logged outside of a database with proper access control, schema annotations, and acccess auditing.

We can only display an opaque object key, so authorized devs can look up the filename using secure tools.

magicalhippo · on Oct 19, 2022

Fair enough. I work mostly with good old desktop applications though, so if there's user data, it's almost always the users data.

For the majority of errors in most applications one can provide some helpful information. But yeah, one need to be a bit careful if one has PII in the mix.

riskable · on Oct 19, 2022

> If you're raising an exception deep in some internal code, provide as much detail as possible.

> If the error bubbles up to the user,

...then you have an information disclosure vulnerability! There's a really good reason why we don't bubble up deep exceptions to end users: Attackers can use that info to gain information about your back end that they can use to find worse vulnerabilities.

Put all the detail you want in your logs. Keep the end users out of it. They shouldn't be able to tell what line broke things.

magicalhippo · on Oct 19, 2022

Yeah things are a bit different with web apps. There users usually can't do anything with the info even if they had details, so internal logs is clearly the place. But my point still stands: you want detailed info in those logs, not just a lone "file not found" without anything else.

redbell · on Oct 20, 2022

This reminds me of the two most annoying error messages of all time [for me].

The first one is from PayPal. Whenever I try to add a US bank account to my PayPal account, it says something like "You cannot add this bank account at this time, period"

After more than a year, it turned out that there was no way to add such an account for a foreigner, despite my friends [from the same country] being able to do it easily a couple of months before.

The second one is, poor me again, trying to edit a Facebook page URL I created for a side project, that should read FB.com/[SIDE_PROJECT], where FB keeps rejecting my request with a generic/ unexplained/unhelpful error message despite the page URL name was available.

About a year later, I got it working by, SIMPLY, having my phone number verified! How bad!!

upofadown · on Oct 19, 2022

There are fundamentally two classes of error message:

1. Information that can help a technically engaged person debug a problem.

2. Information that can help a user of the system understand what they have to do the overcome the problem.

Since most error messages are created by people responsible for debugging the system they tend to be of the 1st class. There has to be a way to provide different information based on who is getting the error.

mttjj · on Oct 19, 2022

> There has to be a way to provide different information based on who is getting the error.

Yes, this concept exists. The error message that is shown to the user (number 2) is what's discussed in the article. The error message that an engineer or someone else debugging the system should get (number 1) is the full stack trace and data dump that should be sent to the application log at the same time that the user is shown the error dialog.

Users can fix the problem by following the instructions in the error dialog and engineers or technical people can come back later and look at the more detailed stack trace to determine the best course of action.

MetaWhirledPeas · on Oct 19, 2022

> There has to be a way to provide different information based on who is getting the error.

This is already solved. Provide one error to the user and another to your logging system. In the user error provide a mechanism to point you to the logged error (even a simple timestamp helps).

tremon · on Oct 19, 2022

There's a fatal flaw in assuming that there's no overlap between groups 1 and 2.

FridayoLeary · on Oct 19, 2022

There's also a third class which is “Oops! Something went wrong…” which basically means "i don't know. Try and reload the page." Why this is better then a simple "error" is beyond me, but its mildly fustrating.

koblas · on Oct 19, 2022

The error message that is presented to the user should always be clear and helpful. When an error is presented to the user, you should have matching logging (e.g. sentry) that provides technical reporting on what happened. By having both solutions in place you have error handling that is complete and services both communities.

lupire · on Oct 19, 2022

It's easy. Just provide both, with mark-up to label them.

residualmind · on Oct 19, 2022

Watched the new Quantum Leap yesterday (it's not great) and there was this really cringeworthy moment when something goes wrong with their awesome supercomputer and the screen flashes a giant "INTERNAL SYNTAX ERROR". Apparently, somebody didn't run their linter before sending people through time. Too bad.

thenerdhead · on Oct 19, 2022

As with everything, context matters. It's a great run-down of how to empower an error message. Many products can add so much value and saved support resources by doing so.

There's one thing I wasn't sure about in this article though. Did they talk to actual users regarding these empowered error messages or even asked them what they want to see out of common error messages they run into? It seems rather difficult to empower error messages without first understanding the scenarios that got them into the error state to begin with. Next would be understanding if these error messages are helpful to the users and asking them how they go about resolving these types of issues. All of that is hinted at in the "what makes a good error message".

ChrisMarshallNY · on Oct 19, 2022

The general approach that I take, is that an error message is one of the most stressful occurrences that a user encounters, so it's incumbent upon me to make it as pain-free as possible.

First of all, unless I'm writing an engineering tool, my users aren't geeks, and don't especially care why the error is happening (geeks always need to know why). They just need to know that what was expected, did not happen. If there is a remedy, and it can be simply stated, then I can add that, but it needs to be short and simple. Longer stuff needs to go into some kind of secondary screen (which probably won't be read).

Also, I take the "shopkeeper" approach. The customer is always right, and it's never the customer's fault. I avoid any hints of blaming the user (even if it is their fault), and try to be polite and helpful[0].

Of course, the best way to deal with errors, is to avoid them. I try to design good affordances.

The rules are different for SDKs, though. In that case, I tend to send a great deal of information back. I take advantage of Swift's enums, and the ability to associate data. It can allow me to nest error reports.

[0] https://littlegreenviper.com/miscellany/the-road-most-travel...

Terr_ · on Oct 19, 2022

Over time I've come to believe in the "grepability" of error messages, and the code-lines that construct them.

Sometimes the data (and error-messages) are flowing up and down through many different modules and APIs and job-queues and whatnot, that when an error pops up it saves a lot of developer-time when you can just text-search on the code repo(s) and see exactly the line that generated it in the first place.

xiphias2 · on Oct 19, 2022

,,Try again'' button is the worst way to solve the problem of having no connection. GMail does it right by trying again automatically periodically while having an error bar on the top of the screen, at the same time not stopping the user from using the application.

If Wix can save the data locally, why not just copy the GMail error interface and let the user decide when to connect to internet?

cosmotic · on Oct 19, 2022

All the 'do this' versions suffer from the same problems as the 'don't do this' versions. Aside from fixing the tone, they are still generic, still inactionable, and still verbose.

_nalply · on Oct 19, 2022

It is my opinion that software problems tend be analyzed corresponding to these four axes:

- Can an end-user solve the problem themselves? If so, tell them how, if not, display a generic error message telling them to ask for support (with an error identifier they can tell the support)

- Developers and end-users need different information: developers need as much information as possible, like file names, contents of important variables and especially where the error happened in the source code with a backtrace, sometimes even two backtraces: the backtrace for the cause of the error, too; and end-users only need to be told what they can do, but this needs to be worded clearly and carefully. This means that error messages need to be written twice.

- Is the problem serious? If so, report, crash and restart, if not, just report and abort the affected operation when neccessary.

- The problem should be logged. Sometimes it can be sent to developers automatically.

simion314 · on Oct 19, 2022

My recent experience with docker, I am a total newb so I was running a tutorial step by step, then I get some error about apt certificates/keys/repo stuff. After lot of googling the issue was there was not enough disk space but the fucking error was pointing in a different direction. Also this is a good example why Stack Overflow is usefull for the dudes that hate on it and RTFM everyone else.

This is why I love exceptions, I had an issue with a C# game, but with a stack trace I could figure out myself that the issue is happening when the app initialize and fails to open a file.

I think twe should always give the users a detailed log and stack traces, also docker should fucking have some way to catch the issue when there is not enough space and report the error properly.

robmoore121 · on Oct 19, 2022

I really like this. There are clear shibboleths which identify the author as a person who deeply respects and cares for the readers of error messages, and their experiences. It makes me hopeful for the future of software when I see that there are others. Thanks for sharing.

e40 · on Oct 19, 2022

Tried to download my data from takeout.google.com and got this error:

"500. It's an error."

Thanks, google. I tried to start a chat (I'm a Workspace customer) and could not continue because all the language choices were disabled (even English).