So Microsoft went down last night. Now what?
As a Managed IT Services Provider, this is a text exchange you never want to have when you're sitting down for dinner. Unfortunately, in this scenario, I'm the green bubble, and this is an actual conversation I had last night with a team member.
Microsoft does a pretty good job of keeping admins up to date on their Microsoft365 Status Page and their Twitter account. Thus you now know what page I was repeatedly hitting the F5 key on all evening. Reading their Twitter feed, it would be safe to assume there was a software update that did not react as planned.
At the time of this writing, Microsoft's official analysis is:
Translation: We rolled out an update that didn't work as expected and we're figuring out what happen. More to come.
Okay, Frank, that nice and all but what does it mean???
It means that Microsoft deployed an update or change, presumably to enhance the security and performance of Microsoft365. It's not possible to determine at this time what the change was and what it was for but my guess is that it may have something to do with Azure Active Directory (AAD), their authentication service. Basically, AAD runs all the magic that happens after you type in your username and password and click on the Sign In button. This service allows you securely access your Microsoft365 account for e-mail, OneDrive files, Sharepoint, Teams chats...you get the picture. There's no reason to believe security was compromised. Largely, because it was impossible to log into your account.
So, Frank, sounds like we should just pull everything out of the cloud and go back to the old fashioned on-premises way of doing things.
Sure but only if you want your outages to be longer, more pronounced, and more expensive.
In the old days, before Microsoft had their Office365 line of services, your e-mail system would look like something like this.
This diagram doesn't even address the hardware, power company, and the backhoe that just accidentally cut the internet cable connected to your e-mail server. All things that are realistically out of your control.
If any one of these components those icons represent would go down your IT admin would have to:
Identify which component was affected.
Determine what the problem was.
Troubleshoot the issue.
Since the problem almost always occurred in the middle of the night, on the weekend, or during a holiday, it would stand to reason the issue also required a support call to the vendor (hoping you still had a support contract in place) where the admin would work their way through a phone tree, support engineer, and spend the next 4 to 6 hours solving the problem.
Be at the office early the next morning bright-eyed and bushy-tailed ready to answer follow up questions and explain to the CEO why the IT admin's crystal ball was broken and the outage wasn't predicted in advance.
Prior to operating in the cloud, that process could take days, depending on the complexity of the network and resources available at the IT admin's disposal.
Yes, last night's interruption sucked. I'm not telling you something you already didn't know. Yes, critical services were unavailable for several hours. However, rest assured these outages will also occur if you are running those services on-prem as well. It's a question of not if but when. Subscribing to services such as Microsoft365 leverages not just the technology features and cost savings from cloud computing but also the army of engineers, programmers, and support personnel in Microsoft's employ to work round the clock to remediate issues. Last night's outage lasted for approximately five hours. If you owned and operated your own e-mail system, how long do you think you would have been down?
If you'd like to discuss this in more detail, as always feel free to contact me directly:
-- Frank Diaz