Windows IRL #3 – Scott's Blog

from Craig Hettich/The New York Times video

Today, the largest IT outage in history hit the Windows operating system when a Crowdstrike patch caused reboots to fail. Catastrophic. While this is not directly Microsoft‘s fault, it does show the frailties in our every-increasing digital world.

As a technologists, I’ve always been leery of convenience and like to control if/when I make changes to the system, trying to the best of my ability to understand impact. It’s increasingly difficult to ferret out details as the overall complexity increases, but I do avoid being the first one to upgrade to see if anyone reports problems.

But how did we get here?

In the beginning….

….updating Windows was often a manual task as the operating system had no built-in way to update itself or installed applications. Even though commercial third-party tools attempted to fill the gap, certain use cases still required corporate IT administrators to manual apply updates to computers and servers. Home users had no other choice than to download and install themselves, and hope that everything worked; many, however, never updated and their computer became increasingly out-of-date, sometimes so far so that a reinstall might be the only solution.

Anyone with a lick of technical experienced complained about the lack of a unified package manager, such as apt, yum, Home Brew, etc.

Then Automated Updates arrived…

…and updating your computer or server became much easier. Organizations often ran a local Software Update Service instance to better control what got deployed. A former employer had quarterly meetings to discuss updates to roll-out generally, updates to test, updates to reject. Windows Update agents would reach out to a local or global SUS instance to download whatever, home users had less control but – if I remember correctly – could reject updates, which was useful when you heard about something breaking.

Patch Tuesday Transitioned From Exception to Expected …

Patch Tuesday originally appeared very innocuous, small fixes that many users might not even notice – other than a system reboot – but evolved into something bigger as security concerns moved to the forefront. Users and administrators have become inured to the updates being applied. You often can’t stop them, especially when a zero-day exploit is known or seen in the wild.

Microsoft is using the same mechanism to forcibly update Windows versions when end-of-support approaches – without asking for permission – potentially breaking applications that aren’t available on the updated version.

And now?!?

As anyone who knows me knows, I have been anti-Microsoft, anti-Windows for ~~years~~ decades, so this just further confirms my feelings. I expect organizations to ratchet back their update cycle and attempt to take more control than – obviously – is in place today, though inevitably circumstances might require raw acceptance of an update. I fully expect US and EU investigations, Microsoft and Crowdstrike representatives called in to both explain and to be raked over the coals. I still expect to hear of at least one death caused by the outage. Major major major, and very scary all at the same time. Despite its size, I wonder if Crowdstrike survives.