Data Center Journal

VOLUME 56 | AUGUST 2018

Issue link: http://cp.revolio.com/i/1012043

Contents of this Issue

Navigation

Page 15 of 20

THE DATA CENTER JOURNAL | 13 www.datacenterjournal.com o IPs o Server names o Port communication for network design o Core-service dependencies o Replication method • Playbook—is document provides sequenced tasks delineating the play-by-play efforts required for resources to validate and test failover. • Runbook—is document will serve as a guide for people and processes in the event of a failover. It should encompass a detailed communication plan between critical groups, as well as a list of failover procedures and processes to get each application back up and running in a timely manner. • Business-impact analysis—is document estab- lishes and provides the acceptable amount of time during which the application will be unavailable (RTO) and the amount of data lost (RPO). PHASE 3: TEST Failover is the ability to switch application process- ing from a primary to a secondary location. Failback is the process of shiing back to the primary location aer a failover. Both capabilities require testing to ensure applica- tion continuity should a complete data center outage oc- cur. us, creating and performing pilot simulations with a practice playbook of these steps helps vet the processes and ensure they work as designed. Testing will determine whether a system is capable of handling additional CPUs or servers during critical failures. Organizations should test failover procedures and functions for each application. Successful testing avoids taking down the production site, preventing any op- erational disruption or degradation of normal business activity. If any of these problems occur, troubleshooting of failover procedures and functions is necessary. PHASE 4: TABLETOP e final phase involves conducting a tabletop or mock disaster to educate and validate the appropriate resources for readiness and preparation for a failover. e first step involves prioritizing the sequence in which the applications will be restored. Next, all necessary resources and command-center requirements should be defined. A tabletop DR scenario should be performed every 6–12 months. It will help the enterprise's entire IT team get familiar with the accessing the required documentation and using the documented processes. It will also allow the team to find errors, gaps and needed updates in documents as well as ensure that the enterprise's DR strategy reflects the most current version of applications. Regularly repeating this final phase helps organiza- tions avoid the three common factors that may cause a disaster-recovery strategy to fail from a lack of testing, absence of updated documentation and outdated DR- scenario training. About the Author: Kevin Torf is a managing partner at T2 Tech Group. CONCLUSIONS Having an up-to-date DR strategy is essential for an enterprise's long-term welfare. The inability to effectively manage an unexpected outage can threaten an organization's survival—it has already harmed the competitive position of too many in today's fast- moving marketplace (especially financial services). Also, maintaining an ongoing currency in the proficiencies of the people, processes and technologies is crucial to ensuring optimal operational efficiencies and success. Doing so entails regularly accounting for any new infrastructure components and applications, or upgrades. Moreover, providing up-to-date training on any new technology is essential. Any developments within the organizational structure, such as mergers, acquisitions, and new regional or global initiatives, must be quickly addressed and incorporated into the DR strategy. Development of DR plans isn't a one-time exercise, but an ongoing commitment of due diligence by the organization and its IT department. n

Articles in this issue

Links on this page

Archives of this issue

view archives of Data Center Journal - VOLUME 56 | AUGUST 2018