😦 The Real CrowdStrike Flaw Was In Deployment
By now, most of you are familiar with the story of CrowdStrike and the resulting worldwide outages. The initial, and angry, responses were along the lines of "Where are the devs? Where was QA?" As it turned out, the actual patch, or software code, was fine. The problem was in the config file that governs the behavior of the patch. Being flawed, it triggered the Microsoft Windows' blue screen of death. In a variation of "Who Watches The Watchers," the config file was indeed tested, but CrowdStrike's testing system was unknowingly broken. The config file passed when it shouldn't have. Nothing is perfect all the time, but what would have mitigated the fallout would have been a gradual or staggered rollout of the config file instead of the apparent "big-bang" release to all machines at once. The staggered rollout is an old and reliable technique. I can think of 2 reasons why it might have been forgotten or abandoned: inexperience or over-confide