The first decade of the 21st century was one of rapid growth and change for data centers. For most of the decade, data center managers were forced to react to rapid, continuous changes dictated by the capacity and availability requirements of their organizations, and the density of the equipment being deployed to meet those requirements.
Now, data centers must enter a new stage of maturity marked by a more proactive approach to management to enable increased efficiency, better planning and higher levels of service. Achieving actionable visibility into data center operations requires the ability to collect, consolidate and analyze data across the data center, using advanced devices, sensors and management software.
The 10 steps outlined by Emerson Network Power, and highlighted in this slideshow, provide a systematic approach to building the foundation for data center infrastructure management by deploying and leveraging measurement, intelligent controls and centralized monitoring and management. Data centers employing these 10 prescribed point solutions for infrastructure performance monitoring stand to gain an operational, strategic and transformative advantage for their enterprise or business.
Click through for a 10-step process to comprehensive data center infrastructure monitoring, as outlined by Emerson Network Power.
One of the most significant consequences of the growth in data center density and complexity is the issue of heat density. As data center density has increased, cooling loads have grown and become more heterogeneous. It is no longer possible to manage temperatures on a facility level because rack densities may vary widely, creating hot spots in one zone while another zone is cooled below the desired temperature.
Installing a network of temperature sensors across the data center helps ensure that all equipment is operating within the ASHRAE recommended temperature range (64.4° F to 80.6° F). By sensing temperatures at multiple locations the airflow and cooling capacity of the precision cooling units can be more precisely controlled, resulting in more efficient operation.
Additionally, the network of sensors can reduce cooling costs by allowing safe operation closer to the upper end of the temperature range—operating, for example, at 75° F instead of 65° F. According to an ASHRAE paper developed by Emerson Network Power, a 10° F increase in server inlet temperatures results in a 30 percent reduction in compressor power draw. Assuming the Computer Room Air Conditioning (CRAC) units supporting the facility are equipped with digital or unloading compressors, this reduction in compressor power draw translates into a 21 percent reduction in cooling energy costs.
The best practice is to attach at least one sensor on every rack, and it is also acceptable to place a sensor on every other rack when racks are arranged in the hot aisle/cold aisle configuration, and there is uniform loading across the row. Sensors should be located near the top of the rack where temperatures are generally highest. It is also advantageous to locate sensors near the end of the row where they can detect any hot air entering the cold aisle from the hot aisle.
To gain a comprehensive picture of data center power consumption, power should be monitored at the Uninterrumpible Power Supply (UPS), the room Power Distribution Unit (PDU) and within the rack. Measurements taken at the UPS provide a base measure of data center energy consumption that can be used to calculate Power Usage Effectiveness (PUE) and identify energy consumption trends. Monitoring the room PDU prevents overload conditions at the PDU and helps ensure power is distributed evenly across the facility.
The best view of IT power consumption comes from the power distribution units inside racks. Rack PDUs now feature integrated monitoring and control capabilities to enable continuous power monitoring. Because rack power consumption varies based on the specific equipment within the rack and its load, each rack should be equipped with a PDU— two for dual bus environments—capable of monitoring power consumption to the rack PDU, as well as overload-protected receptacle groups and, where required, at the receptacle level.
These systems can provide PDU, branch-level and receptacle-level monitoring of volts, kilowatts (kW), amps and kW per hour. This provides the most direct measure of power consumption available to data center management and supports both higher data center efficiency and availability. In addition to more effective power management, rack PDUs are used to support more accurate chargeback of IT services and identify stranded capacity.
With increasing densities, a single rack can now support the same computing capacity that used to require an entire room. Visibility into conditions in the rack can help prevent many of the most common threats to rack-based equipment, including accidental or malicious tampering, and the presence of water, smoke and excess humidity or temperature.
A rack monitoring unit can be configured to trigger alarms when rack doors are opened (and can even capture video of the event), when water or smoke is detected, or when temperature or humidity thresholds are exceeded. These “eyes inside the rack” can be connected to a central monitoring system where environmental data can be integrated with power data from the rack PDUs, while also providing local notification by activating a beacon light or other alarm if problems are detected. They should always be deployed in high-density racks and racks containing business-critical equipment.
A single water leak can cost thousands of dollars in equipment damage—and lose many times more in lost data, customer transactions and enterprise productivity. Leak detection systems use strategically located sensors to detect leaks across the data center and trigger alarms to prevent damage. Sensors should be positioned at every point fluids are present in the data center, including around water and glycol piping, humidifier supply and drain lines, condensate drains and unit drip pans.
A leak detection system can be operated as a standalone system or connect into the central monitoring system to simplify alarm management. Either way, it is an important part of the sensor network that gives data center managers visibility into operating conditions.
Intelligent controls integrated into room and row air conditioners allow these systems to maintain precise temperature and humidity control as efficiently as possible. They coordinate the operation of multiple cooling units to allow the units to complement rather than compete with each other, as sometimes occurs when intelligent controls are not present.
For example, one unit may get a low humidity reading that could trigger the precision cooling system’s internal humidifier. But before turning on the humidifier, the unit checks the humidity readings of other units and discovers that humidity across the room is at the high end of the acceptable range. Instead of turning on the humidifier, the system continues to monitor humidity to see if levels balance out across the room.
In one large data center’s carefully monitored retrofit application, adding intelligent controls to 32 Liebert Deluxe precision cooling units with integrated Liebert iCOM controls reduced energy consumption by 200 kW per hour, and generated a return on investment of 1.2 years.
Integrated control systems on room- and rack-based cooling systems can also be used to enable preventive maintenance programs and speed response to system problems. Data collected by these systems enables predictive analysis of components and proactive management of system maintenance. Event logs, service history logs and spare parts lists all support more efficient service.
UPS systems now include digital controls with the intelligence to alter and optimize the performance of the UPS. They automatically calibrate the system and ensure the UPS is working properly. In addition, they ensure that the UPS switches between traditional operation and bypass during overloads, protecting the UPS system and the overall power infrastructure. This minimizes the need to make manual adjustments based on site conditions. Instead of requiring a service technician to manually adjust the analog controls, the UPS system itself monitors the conditions at the site (power factor, load and ambient temperature) and makes adjustments to maintain optimum performance.
Minimizing system downtime has been the traditional justification for data center infrastructure monitoring and it continues to be a powerful benefit. The ability to view immediate notification of a failure—or an event that could ultimately lead to a failure—through a centralized system allows for a faster, more effective response to system problems.
Equally important, a centralized alarm management system provides a single window into data center operations and can prioritize alarms by criticality, to ensure the most serious incidents receive priority attention. Every alarm needs to be gauged for its impact on operations. For example, it may be acceptable to defer a repair of one precision cooling unit if 30 are working normally, but not if it is one of only two units.
Taken a step further, data from the monitoring system can be used to analyze equipment operating trends and develop more effective preventive maintenance programs.
Finally, the visibility into data center infrastructure provided by a centralized system can help prevent problems created by changing operating conditions. For example, the ability to turn off receptacles in a rack that is maxed out on power, but may still have physical space, can prevent a circuit overload. Alternately, alarms that indicate a rise in server inlet temperatures could dictate the need for an additional row cooling unit before overheating brings down the servers the business depends on.
Automating collection and analysis of data from the UPS and PDU monitoring systems can help reduce energy consumption while increasing IT productivity. Energy efficiency monitoring can track total data center consumption, automatically calculate and analyze PUE and optimize the use of alternative energy sources.
Using data from the UPS, the monitoring system can track UPS power output, determine when UPS units are running at peak efficiency, and report Level 1 (basic) PUE. Monitoring at the room or row PDU provides the ability to more efficiently load power supplies, dynamically manage cooling and automatically calculate Level 2 (intermediate) PUE. Panel board monitoring provides visibility into power consumption by non-IT systems, including lighting and generators, to ensure efficient use of those systems. Finally, rack-level monitoring provides the most accurate picture of IT equipment power consumption and can support Level 3 (advanced) PUE reporting. The ability to automate data collection, consolidation and analysis related to efficiency is essential to data center optimization and frees up data center staff to focus on strategic IT issues.
To prevent data loss and increase uptime, most data centers require a dedicated battery monitoring system. According to Emerson Network Power’s Liebert Services business, battery failure is the leading cause of UPS system loss of power. Utilizing a predictive monitoring battery monitoring method can provide early notification of potential battery failure. The best practice is to implement a monitoring system that connects to and tracks the health of each battery within a string. The most effective battery monitoring systems continuously track all battery parameters, including internal resistance, using a DC test current to ensure measurement accuracy and repeatability. Supported by a well-defined process for preventive maintenance and replacement, monitoring batteries can significantly reduce the risk of dropped loads due to battery failure, optimize battery life and improve safety.
Data center remote monitoring can lift the burden of infrastructure monitoring from internal personnel and place it with an organization with resources devoted to this task, as well as deep infrastructure expertise. In addition to improved resource utilization, a dedicated monitoring organization can respond more quickly to portfolio issues.
For instance, in monitoring data across multiple facilities, they may be alerted to a problem caused by a certain manufacturer’s breaker. Very quickly, the manufacturer can be notified so as to avoid a potential problem occurring across hundreds of sites, many of which contain similar equipment.