How to Manage Your Data Center During a Heatwave
The recent heatwave that brought record temperatures to the UK caused cooling systems to fail at a London data center resulting in downtime for Google and Oracle.
According to Oracle, “Following unseasonably high temperatures in the UK south (London) region, two cooler units in the data centre experienced a failure when they were required to operate above their design limits. As a result, temperatures in the data centre began to climb, which caused some systems to shut down as a protective measure.”
Similarly, Google reported “there has been a cooling-related failure in one of our buildings,” and in order to prevent damage to equipment that would cause prolonged downtime, they had to shut some equipment off. Still, some customers were affected.
Data centers are often designed to withstand hot temperatures, but unprecedented temperatures may pose a threat of data center disaster. To mitigate this risk, data center managers need to have a plan to maintain uptime while efficiently cooling their data centers.
During the heatwave, we surveyed our customers to find out how they are managing their data center operations with Data Center Infrastructure Management (DCIM) software.. Here is what they are doing:
- Monitoring for hot spots. Hot spots are locations at the intake of IT equipment where insufficient cooling causes the temperature to exceed the recommended range. They post a threat to equipment and increase outages. You should instrument the data center with environmental sensors so that, during a heatwave, you can proactively monitor for hot spots. Then, you can remedy the issue before you have a larger problem. Our customers report leveraging their thermal time-lapse video to watch for the formation of hot spots on their floor map in 3D.
- Monitoring power utilization. Similar to checking for hot spots, monitoring your power utilization will let you know if a device or rack is consuming a lot of power thereby generating a lot of heat. With DCIM software, you can set warning and critical thresholds on power loads so you are automatically notified of threshold violations and can react accordingly.
- Ensuring redundancy. If high temperatures cause an outage or force you to turn equipment off, you can potentially avoid having your customers be impacted by ensuring you have redundancy. DCIM software allows you to run failover simulation reports to identify which cabinets are at risk and what equipment can continue to function of a PDU goes down.
- Monitoring chillers, chiller towers, and generators. Keep an eye on your cooling and backup power equipment to ensure that you are able to cool the data center’s abnormally high temperatures.
- Being more aware. Customers are generally being more aware of the health status of their data centers. Even though everything may be working fine, it’s good to be extra vigilant about what’s happening in the data center to ensure you don’t miss an avoidable outage. Users are paying close attention to which customers are deploying new servers. Then, they are monitoring the impact that has on the data center.
- Issuing a “no work” order. Some customers have limited or stopped any work from happening in the data center to ensure that new deployments or changes don’t disrupt the already fragile environment.
Protect Your Data Center From Future Heatwaves
Our customers are the best data center managers in the world, and they are relying on DCIM software to ensure that heatwaves and other threats do not disrupt their data center operations.
See how second-generation DCIM software can protect you from data center disasters. Get your free test drive now.