Data Center Journal

VOLUME 36 | FEBRUARY 2015

Issue link: https://cp.revolio.com/i/457819

Contents of this Issue

Navigation

Page 12 of 32

10 | THE DATA CENTER JOURNAL www.datacenterjournal.com and larger, more complex data centers. ese new facilities will also need to be adequately powered, cooled and staffed and, if built according to past trends, they will be designed with full redundancy. But redundancy isn't the same thing as "resiliency" – which is to say what happens when one or more pieces of equipment fail – or, more likely, are taken temporarily out of service for preventive maintenance, upgrades, etc. One may have two of everything, but the moment you take one component out, you may have a "single point of failure." Or even when you do have two of everything, how is everything connected? How does the power load get redistributed when some- thing is taken out – and what impact does this added load have? Are you in danger of this added "fail-over" load overloading other parts of your system? e last thing you want to have happen is for a failure of one component of a system to result in a cascading failure of other parts. Understanding these potential overload scenarios and these single points of failure is understanding the resiliency of the redundant system. ere are two ways that companies are using soware to help deal with the resiliency issue. First, a few well-known organizations are applying a new philosophy, which is to forget about large-scale hardware redun- dancy. ey build the fault-tolerance into their soware, through load balancing, virtualization and other techniques, which essentially move workload from server to server, from rack to rack, and from data center to data center, to plan for effective fail-over. is approach makes sense for web applications, where a user isn't greatly inconvenienced by an occasional server er- ror and transfer – in many cases this can be done seamlessly, and even if not, at worst, the user has to hit "refresh" on the browser or log in again. But what about applications which don't have this characteristic? Many of the most sophisticated data centers are supporting "mission critical" applications. Facilities Managers are concerned with providing 24/7 up-time, up to "six nines" availability. ese metrics are supported by redundancy, reliability and understand- ing of the resiliency of the system. ese managers look at how many single points of failure the system has (particularly during outages). A good DCIM (Data Center Infrastructure Management) tool should allow the user to consider various "what if " scenarios and see what the weak points are if certain critical operations fail (or are taken out of service for scheduled maintenance). e tool helps managers be better prognosticators, by providing suf- ficient system information to help optimize current operations, avoid critical system failure, and have various scenarios planned out about what to do in cases of specific types of system downtimes. is added operational visibility allows data center managers to safely push the workloads, space, power, cooling and network to their limits. is is how resiliency can be improved by using DCIM soware with "what if " capabilities. Best Dcim soLution features A solid foundational DCIM solution is imperative for data centers pushing mis- sion-critical facility capacity limits. With DCIM's real-time data on server capacity, physical space and temperature (hot and cold aisles), data center operators are bet- ter equipped to avert service disruptions. Why? Because real-time server capacity information enables data center opera- tors to plan workload shis to lesser-used equipment. In addition, DCIM soware provides a holistic view of the data center operations, which can be used for proac- tive capacity planning and to help avoid or delay costly physical plant expansion. It's important to note that simply gathering information about a data center is not sufficient. Using a DCIM solution's robust analytics and easy-to-read reports and dashboards, and undertaking "what- if " analysis can empower operators to make swi, intelligent business decisions. e more information at hand the faster managers can make configuration changes. In addition to helping with resiliency, an ideal DCIM solution provides: Temperature monitoring With DCIM providing real-time information, data center operators can safely raise the overall operating tempera- ture, saving on cooling costs, and quickly shi workload from higher-temperature to over-cooled areas of the data center. Cool- ing costs vary with electricity rates as well as weather conditions. is means shiing a workload to less expensive facilities and equipment can be expedited when manag- ers have a single pane-of-glass view of the entire facility – or your entire portfolio of facilities. Risk Management e closer physical servers are oper- ated to their temperature limits, the more important alarming and alerting becomes. A real-time view of capacity limits is es- sential, as is the ability to know that equip- ment is nearing a limit and having enough time to correct whatever the problem is before it becomes crucial. Scalability A DCIM solution allows operators to sustain greater workloads on existing equipment and refrain from purchas- ing unnecessary equipment to support expansion. Normalizing data from various sources A data center runs on a wide variety of equipment, both on the IT and Facilities sides of the house. is challenge is magni- fied with greater numbers of facilities. e ability to have information from all those machines in one place -- normalized, easy to read and actionable -- helps ensure uptime 24x7x365. Trending Real-time information is only half the story. e other half is trending over time, which helps identify peak demand and determines when computing power might be less expensive because of lower demand. Trending also provides efficiency metrics when changes are made to the con- figuration or when new energy efficiency measures have been taken. In summary, resiliency metrics pro- vide data center operators with valuable in- formation relating to system vulnerability and time to system failure, while helping to identify where the power load would go in the event of a man-made or natural disas- ter (or during scheduled maintenance). n about the author: Sev Onyshkevych is Chief Marketing Officer for FieldView Solutions, a leading developer of Data Center Infrastructure Management (DCIM) software, and "Big Data" analytics for optimizing the energy, thermal, and operational efficiency of the world's leading data centers.

Articles in this issue

Links on this page

Archives of this issue

view archives of Data Center Journal - VOLUME 36 | FEBRUARY 2015