Data centers have been around for a long time and are very important. Especially for me, as I sit here listening to Apple Music — which is streaming from an Apple data center. The backups, applications and widgets you use every day are all made possible because there is a data center somewhere that supports them. We have learned which processes and procedures work based on what does or doesn’t go well. We’re going to look at some different maintenance activities today, but we’ll also discuss data center outages to see what we can learn.
In the past, you had to make an appointment (usually late at night or on a weekend) to update things like firmware or operating systems. It’s important to note that this was a scheduled process and passed change control and change configuration. I prefer to avoid doing things the old way if possible!
Preventive maintenance is a proactive activity. It could be installing more memory on a server or adding a server to a cluster. These tasks are often completed as a part of a planned data center maintenance period.
A planned outage is very much like planned maintenance, but you are utilizing an outage to perform preventative maintenance. Sometimes this is due to changes related to hardware —like HBA adapter firmware upgrades or maybe a router OS upgrade.
Unplanned outages occur when your applications or services stop working because you failed to perform preventive data center maintenance. It usually has a very negative impact on your company and is difficult to deal with.
What has changed in data centers over the years? We now have tools and processes in place to perform planned maintenance without an outage, which is important. It means you can complete data center maintenance during a weekday to eliminate overtime. You can avoid planned outages and do more preventative maintenance without that outage.
VMware has the VMware Update Manager (VUM) that allows you to trigger maintenance mode in an ESXi host to do VMware patching of hosts without any outage to the VMs and applications. It is a semi-automated activity, which can proactively let you know when there are pending updates.
VMware vSphere Clusters are important constructs because they support interesting patching, like VUM, but they also make it easy to add a host to grow scalability without an outage.
Dell EMC has an amazing utility called OpenManage Integration for vCenter, which means you can do hardware level updates of firmware and drivers using maintenance mode in the servers, so there is no impact to users or VMs. Other vendors have similar tools.
At this point, you can patch your VM hosts without an outage. You can complete hardware firmware updates without an outage. So, this means more preventative data center maintenance is done without planned outages.
And, after what I heard recently, I will also suggest that clean power — filtered and balanced — is very important too!
What about unplanned outages?
This is the most important. When you have something go wrong, you need to manage that too. Even if your whole data center is gone, or just a big part it. You need an orchestration type tool like Veeam Availability Orchestrator that can help you power up replicas on your recovery site and get all of your services working again. The important thing here is to have the shortest possible outage.
If you have any questions or comments, just let me know.