Ready to manage your entire data center in one solution?

Start your test drive here

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Free 30 Day Trial - With Your Own Data

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Take DCIM Monitoring for a Test Drive

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Take DCIM for a Spin

Request Your Free Online Demo Today

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Free Full Featured Download

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

See why marquee customers
are moving to the Sunbird
DCIM platform.

Start your test drive here

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

See why marquee customers
are moving to the Sunbird
DCIM platform.

Start your test drive here

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

DCIM Suite Bundle

 

See why marquee customers
are moving to the Sunbird
DCIM platform.

Request your demo here

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Ready to join marquee customers moving to the Sunbird DCIM platform?

Request your quote here

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Request Quote

 

Ready to manage your entire data center in one solution?

Start your test drive here

We’re committed to your privacy. Sunbird uses the information you provide us to contact you about our relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

Artificial intelligence chip graphic

NVIDIA H200 Power Requirements: Can Your Racks Support Them?

This week, NVIDIA unveiled what they are calling “the world’s most powerful GPU for supercharging AI and HPC workloads,” the H200 Tensor Core GPU.

There is much hype around the H200 as it is the first GPU with HBM3e. The larger and faster memory will further enable generative AI, large language models, and advance scientific computing for HPC workloads.

Read the NVIDIA press release.

Data centers around the world are racing to support the AI boom, and need to know if their existing facilities can accommodate the power-hungry infrastructure.

What Are the Power Requirements for the NVIDIA H200?

According to NVIDIA, the H200 Tensor Core GPU has a maximum Thermal Design Power (TDP) of 700W. This represents the maximum power consumption and amount of heat generated by a single H200 GPU under normal operating conditions.

The NVIDIA HGX H200 system combines the H200 Tensor Core GPUs with high-speed interconnects to form the world’s most powerful servers with configurations of four or eight GPUs.

Therefore, the HGX H200 4-GPU power requirements are 2.8kW and the HGX 200 8-GPU power requirements are 5.6kW.

Can Your Racks Support NVIDIA H200 Systems?

The H200 system will consume a lot of power, cooling, space, and data/power ports. Trying to calculate if you have the capacity to deploy the H200 using multiple data sources, manual math, and estimations is time-consuming, prone to human error, and very risky.

However, there is a solution that can make your life easier.

Data Center Infrastructure Management (DCIM) software offers real-time power and environmental monitoring, accurate asset and circuit management, and intelligent capacity planning capabilities that make it easy to see if you have the available resources to accommodate increased rack densities.

DCIM software can be used to determine if your existing racks can support NVIDIA H200 systems via:

  • Automatic server power budgeting. Instead of derating nameplate values, let “Auto Power Budget” automatically calculate and update power budget profiles for each server instance in your data center based on their actual trended power consumption. More accurate power budget values often reveal that there is stranded rack power capacity that can be recovered to safely deploy more equipment in your existing racks. Comcast reported that this feature unlocked 40% more capacity in their existing facilities.
  • What-if analysis. Visualize the impact your planned projects with NVIDIA H200 systems will have on your rack-level space and power utilization. Performing what-if analysis with easy-to-understand dashboard charts shows you at-a-glance if you have the capacity to support these projects or if you will need to purchase more.
  • Intelligent capacity search. With the intelligent capacity search feature, finding the optimal cabinet space to deploy servers in is fast and easy. Simply enter the make and model of the equipment you are deploying and you will get a list of all the cabinets with enough space, power, and port capacity available to support it. Then, with one click, you can reserve those resources.
  • Built-in power circuit intelligence. Understand the power capacity and load at every hop in your power circuits. This information lets you know if new loads will trip a breaker, helps you balance all three phases, and simplifies redundancy planning. Automatic, interactive, and dynamic single-line diagrams let you visualize your power circuits to aid in power planning.
  • Correlated capacity reporting. On a 2D or 3D floor map of your data center, you can visualize, correlate, and analyze multiple capacity parameters at once with red-yellow-green color-coding of your racks to see where you have capacity. Combine space, power, cooling, weight, and other capacity constraints for a holistic view of capacity across any site.

Tips to Monitor and Manage Your H200 Systems

When deploying the NVIDIA H200 system or a similar AI infrastructure, your responsibilities extend beyond merely identifying available capacity. Ongoing monitoring and management are crucial for maintaining uptime and controlling energy costs in high-density infrastructure.

  • Monitor in real-time. DCIM software offers real-time monitoring of high-density power and environmental data, ensuring the smooth operation of services and applications. Modern DCIM software can monitor over 10 billion data points daily, encompassing power data from intelligent rack PDUs, floor PDUs, busways, branch circuits, RPPs, and UPSs, as well as environmental data from temperature, humidity, airflow, and other sensors.
  • Set thresholds and alerts. By setting warning and critical thresholds on various parameters such as power loads, three-phase balance, rack PDU circuit breaker state changes, temperature, and humidity, you will receive automatic alerts when these thresholds are violated. This proactive notification allows you to investigate and resolve potential issues before they escalate into serious problems.
  • Identify hot spots and overcooling. Ensure optimal temperatures by identifying hot spots and airflow patterns through thermal map time-lapse videos of your data center floor. Additionally, see if cabinets are being overcooled beyond accepted thermal guidelines using patented ASHRAE psychrometric cooling charts.
  • Track KPIs for high densities. Utilize out-of-the-box dashboard charts and reports to track high-density KPIs, including available power, space, cooling, data/power port connections, delta T per cabinet, energy cost, and Power Usage Effectiveness (PUE). This provides a quick overview of the health and capacity of any site.
  • Leverage remote visualization. Visualize high-density racks and cabling on your 3D floor map with overlaid power and environmental data. See how everything connects with automatically updated network diagrams that include both active and passive components.

Bringing It All Together

As the AI market experiences robust growth with significant projections for the years ahead, data centers must ready themselves to accommodate advanced systems like the NVIDIA H200.

Assessing whether your current facilities can handle such high-demand equipment can be a formidable task without the proper tools and data.

DCIM software emerges as a vital solution, providing real-time monitoring and streamlined capacity planning for the effective management of high-density AI infrastructure.

Through DCIM software, not only can you ascertain if your racks are equipped to support AI workloads, but you can also guarantee the seamless operation of high-density infrastructure. This includes maintaining optimal conditions and monitoring KPIs, ultimately enhancing the efficiency and reliability of your data center.

Try DCIM for AI Infrastructure Management

See how Sunbird's second-generation DCIM software can help you plan and manage your high-density AI infrastructure. Get your free test drive today.

November 16, 2023
Share