August 28, 2018
Critical failure and the mechanics of causality
The biggest critical failure that I’ve ever experienced on a project happened on the first week of 2014 and caused a recurring outage that lasted a full four days.
It was my very first large-scale, distributed production system with multi-national integrations and a real-time messaging component. I had been on the project for 3 months and had inherited the tech lead role which I shared with a senior colleague who had joined after me.
Read more