Simple Testing Can Prevent Most Critical Failures

yeukhon · on Feb 1, 2015

I really enjoy going to usenix and watching usenix presentations. Almost all of them include the PDF and a pretty good quality of the presentation (and plus audio for those who only wish to hear). Also the presentation mode is awesome - presentation and speaker are on the same page. I just can't stand at camera person or video editor switch between presentation and speaker constantly. I want to read the slides, as much as I enjoy the gesture of the presenter.

I think one take away is not just handling exception, but actually monitor exceptions and make use of the exceptions. Very recently I was debugging an application inside a container. The code caught Exception (base class of all exceptions), well, I had to ask developer to log the stacktrace. Finally the stacktrace revealed the actual problem and I was able to write a PoC to test and narrow down the root cause to the security of the container.

With linter and static analysis we probably can encourage developers "hey look you are catching too much or too little."

Lastly, planning for failure by doing HA design is critical. In AWS we have to prepare for underlying host going bad (which in turns means we have to stop and start to move EC2 instance to a different host). It's easy to say but actually hard to do as often applications are not truly stateless. PoC failure, stress test, performance testing are necessary.