Oh absolutely. There’s many levels of failure here. A few that I see as being likely:
- Lack of testing of a deployment
- Lack of required procedures to validate a deployment
- Engineering management prioritizing release pace over stability/testing
- Management prioritizing tech debt/pentests/etc far too low
- Sales/etc promising fast turnarounds that can’t be feasibly met while following proper standards
- Lack of top-down company culture of security and stability first, which should be a must for any security company
This outage wasn’t caused only by “the intern pushing release.” It was caused by a poor company culture (read: incorrect direction from the top) resulting in a lack of testing of the program code, lack of testing environment for deployments, lack of formal deployment process, and someone messing up a definition file that was caught by 0 other employees or automated systems.
- Lack of testing of a deployment - Lack of required procedures to validate a deployment - Engineering management prioritizing release pace over stability/testing - Management prioritizing tech debt/pentests/etc far too low - Sales/etc promising fast turnarounds that can’t be feasibly met while following proper standards - Lack of top-down company culture of security and stability first, which should be a must for any security company
This outage wasn’t caused only by “the intern pushing release.” It was caused by a poor company culture (read: incorrect direction from the top) resulting in a lack of testing of the program code, lack of testing environment for deployments, lack of formal deployment process, and someone messing up a definition file that was caught by 0 other employees or automated systems.