Data Center Journal

VOLUME 46 | OCTOBER 2016

Issue link: https://cp.revolio.com/i/734838

Contents of this Issue

Navigation

Page 23 of 32

THE DATA CENTER JOURNAL | 21 www.datacenterjournal.com s ignificant data center power outages are actually more common than most of us think, some of them fail to make the headlines. Power outages—no matter how short—can be costly and cause irreversible damage to an organization's brand, reputa- tion and customer loyalty. In addition to reputation and customer loss, the financial costs oen amount to tens of thousands of dollars. How much lost revenue is at stake? In 2013, for instance, Google's five-minute power outage cost the company $500,000 and, according to GoSquared, led to a 40% drop in worldwide Internet traffic. So how can you take an active ap- proach to preventing power failure in your data center? Consider these six steps to assessing power-chain risk: 1. is yoUr physical it infrastrUctUre Mapped to yoUr power chain? e first step in assessing your power- chain risk is to determine what devices actually make up your power chain—their location and life-cycle status. en deter- mine whether they are still under warranty. If not, consider retiring them, as it becomes more expensive to continue operating older devices than to replace them. Find out the last time each asset was serviced and by whom. Determine what device is connected to what, as well as their respective depen- dencies. Doing so will help determine what might happen if the device fails or is taken off line when you need to make changes or work on a specific piece of equipment. Can you generate a specific report on demand? Not in a spreadsheet or a product request, but a report that will be clear and actionable to the person requesting it. 2. do yoU have a single pane of glass across all data centers and yoUr data rooMs? Once you determine what's in your power chain, you must then fully under- stand what the power chain is doing—and how and where power is being used. If yours is like most data centers, you already have some sort of monitoring system. You may already be monitoring the facilities side of the infrastructure with a building- management system (BMS) or HVAC sys- tem. And most probably these systems are siloed, which only goes so far, as you must have transparency on a single pane-of-glass across your entire organization. It should include the following: • All data centers • Multiple BMS systems • Mixed vendor/hardware • Facilities equipment Avoid becoming locked into one ven- dor by selecting a data center infrastructure management (DCIM) soware solution that can interoperate with hardware and soware from many different vendors. Data center professionals occasionally question why they need a DCIM solution if they already have a BMS. It's important to remember that although a BMS is vital to a well-run data center, it doesn't address the IT devices and their energy needs and dependencies. An important part of a current view of your data center is real-time monitoring and alarms across all these systems. Up- to-the-minute monitoring and alarms can alert a data center operator before some- thing goes wrong, while there's still time to correct the problem and minimize the risks involved. A DCIM solution can also enable managers to see what areas might be over- or underutilized, so they can make changes to reduce risks and optimize energy use. Furthermore, it gives you the ability to see what's happening in your data center in real time. 3. do yoU have the aBility to rUn power-failUre siMUlations? Power-failure simulations allow data center managers to test their power chain's resilience, identify possible weak spots and determine where everything is connected. If you can test, in complete safety, what would happen in the event of a major power fail- ure, or if a system or a piece of equipment failed, you can minimize risk by making changes to the power chain. If, for example, you can test in a model what would happen if the A side fails or is switched off, you could determine whether the B side can handle the load or will fail as well, poten- tially causing a disastrous cascading outage. You can identify systems without power redundancy. If you have everything mapped, you'll be able to determine what's mission critical and what isn't. If you already run these "what if " power-failure simulations, all the better. When was the last time you conducted such a test? When is your next one planned? Are you confident enough to push that button and simulate a power failure in a produc- tion data center? If you document every- thing, you can also run reports and build a recovery plan. 4. are all incidents in yoUr data center Being captUred in an itsM service desk? It's important to keep track of anything that goes wrong so you can determine why an incident happened and avoid it in the future. Doing so requires full integration between the data center and the IT service management (ITSM) service desk—plus facilities—to document when a problem exists and make changes to improve uptime. You must determine the following: • Is the number of incidents increasing or decreasing? • When was your last power failure? • Is there a discernable pattern? • Are you trending towards a critical situation? Also make sure that changes are syn- chronized in both directions: ITSM to data center and data center to ITSM. 5. are yoU Using trend analysis to identify potential risks of failUre Before they happen? Looking back in order to look forward means carefully monitoring and documenting what data center capacity is in use and when. is approach will help detect trends and patterns that occur over time to help plan for future capacity needs. You can predict what capacity you might need, look for fluctuations in demand and become active instead of reactive. Con- ducting all these operations will enable you to deliver better services. But this approach depends on the breaking down of information silos in your data center to gain visibility into all affected systems and make informed decisions. Complete real-time monitoring enables phase balancing, the ability to see what might break and the ability to plan for inevitable change instead of reacting to change.

Articles in this issue

Archives of this issue

view archives of Data Center Journal - VOLUME 46 | OCTOBER 2016