Data Center Journal

Page 20 of 32

18 | THE DATA CENTER JOURNAL www.datacenterjournal.com correlated doesn't imply which one caused the other to spike, however; such analysis does not imply causation. We need to understand the cause/effect relationship between data sources. e key to effective root-cause analysis lies in establishing cause/effect relationships between available data sources. It's criti- cal to understand which data sources contain triggers that will affect the environment, the actual results of the triggers and how the environment responds to the changes. is approach involves establishing basic relationships be- tween collected data sources and correlating events, tickets, alerts and changes using cause/effect relationships. Examples include linking a change request to the actual changes in the environ- ment, linking an APM alert to a specific environment and linking a log error to a particular web service. Because we're dealing with various levels of unstructured data, the linking process (or corre- lation) isn't that obvious. is task is perfect for machine learning, which can create general rules relating different data sources, determine how to link them to environments and deciding when it makes sense to do so. Machine learning can also build an environment dependen- cy model based on environment topology, component dependen- cies and configuration dependencies. Such a model can be used to apply topology-based correlation by suppressing root causes of elements that are unreachable from the environment in which the problem was reported. On the other hand, such a dependency diagram can be modeled with a probabilistic Bayesian network, which may augment the model with probabilities of error propa- gation, defect spillover and influence. Building such a model is practically impossible, as it requires specifying many probabilities of influences between environment components even without addressing constantly evolving environment structure. By using machine learning and vast amounts of data describing historical performance, however, it's possible to build a model that estimates all the required probabilities automatically and update them on the fly. active, not reactive: freqUent-pattern mining, classification and forecasting Incidents typically start with a change in the IT system, such as a configuration change, a change in request workload or a change in code, leading to an incident. An incident causes particular components or services to fail, and the monitoring tools detect this failure, triggering an alert. At this point, the tier- one/tier-two team reacts to the incident, starts an investigation, proposes a fix and resolves the incident. But if we can recognize a potentially risky change early, why wait for an incident? Preventive analytics can detect problems ahead of time, preventing them from turning into incidents. Ma- chine learning powers preventive analytics with three approaches: frequent-pattern mining, classification and forecasting. Frequent-pattern mining automatically crawls data to identify items frequently appearing together. e same algorithm powers retail-basket analysis, identifying what items are fre- quently bought together, what's the next product that customers are likely to purchase and how to bundle products and services to maximize revenue. Frequent-pattern mining is well suited to the data routinely collected in day-to-day IT operations, enabling ITOA tools to understand how a change in one component will affect the performance of another component. Frequent-pattern mining can drive the classification ap- proach, which identifies the components that will be affected or what kind of impact a specific change will have. A change in firewall configuration might impair connectivity, for example, and a change in the number of visitors might reduce performance. Forecasting can estimate the magnitude, severity and tim- ing of a potential issue. By looking into the past behavior of the system, the models can indicate how systems react to various changes in workload, configuration or infrastructure. conclUsion IT operations is ready to enter a third phase of growth. In the first phase, everything was done manually, an approach that doesn't scale. In the second phase, IT operations grew large enough to automate all the functions that can be automated, and it used detailed component instrumentation to ensure every- thing is running as it should. e infrastructure increased by orders of magnitude, as did the amount of collected data, alerts and complexity. At this stage, IT ops is ready for the next phase: turn learning algorithms loose on the collected data and let them extract the hidden insights. IT operations analytics (ITOA) has a tremendous op- portunity to employ machine-learning algorithms on existing problems, offering previously unavailable solutions to make sense of collected data, gain overall visibility into IT operations and facilitate self-learning analytics. Pedro Domingos, the author of e Master Algorithm, said, "e Industrial Revolution auto- mated manual work and the information revolution did the same for mental work, but machine learning automates automation itself." n about the author: Boštjan Kaluza, PhD, is chief data scientist at Evolven. He's done extensive research into artificial intelligence and intelligent systems, machine learning, predictive analytics, and anomaly detection. Before Evolven, Boštjan served as a senior researcher in the Department of Intelligent Systems at the Jozef Stefan Institute and led research projects involving pattern and anomaly detection, machine learning and predictive analytics. Focusing on the detection of suspicious behavior and data analysis, Boštjan has published numerous articles in professional journals and delivered conference papers. In 2013, he published his first book on data science, instant weka how-to, exploring how to employ machine learning using Weka. In 2016, Boštjan published his second book, Machine learning in Java, exploring how to use important machine- learning Java libraries to solve various problems. Boštjan is also the author of and contributor to a number of patents in the areas of anomaly detection and pattern recognition.

VOLUME 47 | DECEMBER 2016

Contents of this Issue

Navigation

Page 20 of 32

Articles in this issue

Links on this page

Archives of this issue