How good is a backup, really?
As 24/7 operations go, it’s hard to think of a more critical sector than air traffic control. You would think, then, that uptime would be enough of a priority to pay special attention to infrastructure redundancy. Apparently, not so much in Belgium. To recap, a power surge is being blamed to taking down all air traffic control over the country for more than 5 hours. That means flights taking off from or landing at the European capital, but also overflight of a country between Germany, France, Britain and the Netherlands. So, not very crowded skies or anything. (In fairness, flights from Antwerp were able to take off, and fly at low altitude under visual rules to over to Rotterdam, where the Dutch ATC could take over.)
Let’s gloss over for a moment the economic impact of affecting at least 20,000 passengers for 5 hours. All flights from, to, or over Belgium lost guidance over a country known for its bad weather. Because of a single power surge at ONE location, and the UPS/generator backup systems just didn’t work.
In my early days in the field, I worked for a large organization who went through considerable (capital) expense to add a diesel generator to a large datacenter, but did not want to spend the (operational) budget to purchase fuel for it. While that is an extreme example, I know that, having worked with hundreds of customers over the years, I can count on two hands the environments that diligently went through emergency drills. If you host your own datacenter, it really behooves you to schedule a power failover test (as well as a test data restore, test cluster failovers…) If you host your data somewhere else, grill the hoster or Cloud provider about their approach to such testing. And use the same skepticism dealing with any portion of your Disaster Recovery or High Availability strategy. If something has not been tested in 6 months, assume that it will fail.