Hacker News new | past | comments | ask | show | jobs | submit login

Which platform gives you unformatted data if a format can't be determined? Or was the code similar to getDate(format) and format happened to be null? Was the race condition in your code or in the platform? (EDIT: I suppose these would be questions more for the developers of the third-party gateway.)



FTA:

> The bug was in a computer program the [third party] Gateway uses to translate payment messages between two formats. When the program was operating under load, the system tried to clear memory it believed to be unused (a process known as garbage collection). > But because it was using an unsafe method to access memory, the code ended up reading memory that had already been cleared away, causing it not to know how to translate the date field in payment messages.


This is exactly the type of thing I would expect from old-school banking systems. Kudos to Monzo for even trying to bring sanity to this space.


IIUC, the problem seems to have been that the code was looking into freed memory and so the date format was essentially random data. I can imagine a case statement where you have a default case of "don't translate the date" with a comment over it saying, "This should never happen". I'm sure I've naively written similar code when I was sleepy and it tends to pass review because it's innocuous.

It's easy to be hard on the programmer -- probably crashing is better than data in the wrong format, but then you are just pushing out the problem to a different layer. Error handling is completely non-trivial in complex systems. Maybe they should have thrown an exception in that case, but are you sure it's going to be handled? What is the downside in that case? It could easily be worse -- we have no way of knowing. Sometimes it gets down to, "Well, you need to make sure there are no mistakes in the code". If we're going to go down that route, then the incorrect timing of the memory freeing is the real cause (or if I'm being particularly nasty I might say, "You really shouldn't be using threads" ;-) ).

I guess what I'm trying to say is that there is certainly a better way of doing defensive programming in this case, but I wouldn't be able to tell what it was without seeing the code. I also wouldn't expect any large codebase to be completely free of these kinds of problems because it's easy to make a mistake.


I haven't seen the code personally, so I'm not sure. The condition was inside the code of the Gateway provider.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: