How to Detect (and Resolve) IT Ops/APM Issues Before Your Users Do
September 19, 2014

Kevin Conklin
Prelert

Share this

Among the most embarrassing situations for application support teams is first hearing about a critical performance issue from their users. With technology getting increasingly complex and IT environments changing almost overnight, the reality is that even the most experienced support teams are bound to miss a major problem with a critical application or service. One of the contributing factors is their continued reliance on traditional monitoring approaches.

Traditional tools limit us to monitoring for a combination of key performance indicator thresholds and failure modes that have already been experienced. So when it comes to finding new problems, the best case is alerts that describe the symptom (slow response time, transaction fails, etc.). A very experienced IT professional will have seen many behaviors, and consequently can employ monitoring based on best practices and past experiences. But even the most experienced IT professional will have a hard time designing rules and thresholds that can monitor for new, unknown problems without generating a number of noisy false alerts. Anomaly detection goes beyond the limits of traditional approaches because it sees and learns everything in the data provided, whether it has happened before or not.

Anomaly detection works by identifying unusual behaviors in data generated by an application or service delivery environment. The technology uses machine learning predictive analytics to establish baselines in the data and automatically learn what normal behavior is. The technology then identifies deviations in behavior that are unusually severe or maybe causal to other anomalies – a clear indication that something is wrong. And the best part? This technology works in real-time as well as in troubleshooting mode, so it's proactively monitoring your IT environment. With this approach, real problems can be identified and acted upon faster than before.

More advanced anomaly detection technologies can run multiple analyses in parallel, and are capable of analyzing multiple data sources simultaneously, identifying related, anomalous relationships within the system. Thus, when a chain of events is causal to a performance issue, the alerts contain all the related anomalies. This helps support teams zero in on the cause of the problem immediately.

Traditional approaches are also known to generate huge volumes of false alerts. Anomaly detection, on the other hand, uses advanced statistical analyses to minimize false alerts. Those few alerts that are generated provide more data, which results in faster troubleshooting.

Anomaly detection looks for significant variations from the norm and ranks severity by probability. Machine learning technology helps the system learn the difference between commonly occurring errors as well as spikes and drops in metrics, and true anomalies that are more accurate indicators of a problem. This can mean the difference between tens of thousands of alerts each day, most of which are false, and a dozen or so a week that should be pursued.

Anomaly detection can identify the early signs of developing problems in massive volumes of data before they turn into real, big problems. Enabling IT teams to slash troubleshooting time and decrease the noise from false alarms empowers them to attack and resolve any issues before they reach critical proportions.

If users do become aware of a problem, the IT team can respond "we're on it" instead of saying "thanks for letting us know."

Kevin Conklin is VP of Marketing at Prelert.

Share this

The Latest

April 28, 2017

Today 96 percent of organizations have Digital Transformation initiatives on their roadmap and more than half of those initiatives are in process now. However, there is a major disconnect between user expectations and what IT can deliver, and it is hindering innovation, according to a new survey by Veeam ...

April 27, 2017

According to the tenth annual State of the Network study from Viavi, nearly 90 percent of respondents say they are involved in troubleshooting security-related issues, with 80 percent reporting the time consumed by these issues has increased in the past year ...

April 25, 2017

While the idea of shifting toward digital business was speculative for most CEOs a few years ago, it has become a reality for many in 2017. 47 percent of CEOs are being challenged by the board of directors to make progress in digital business, and 56 percent said that their digital improvements have already improved profits ...

April 24, 2017

The Internet of Things (IoT) is increasingly present in our daily lives, at work, in the home and in the public sphere, making the world a more connected place. In fact, 2020 will see at least 20 billion connected devices across the globe. So, let's take a look at the most common iterations of the IoT at the moment, and what we can expect to see in the IoT landscape over the next 5 years ...

April 21, 2017

In the spirit of Earth Day, which is Saturday, April 22, we recently asked IT professionals for the tips and tricks they're using to help keep their data centers as green as possible. Here are a few ideas inspired by the responses we got ...

April 20, 2017

Almost One-Third (28 percent) of IT workers surveyed fear that cloud adoption is putting their job at risk, according to a survey conducted by ScienceLogic ...

April 19, 2017

A majority of senior IT leaders and decision-making managers of large companies surveyed around the world indicate their organizations have yet to fully embrace the aspects of IT Transformation needed to remain competitive, according to a new study conducted by Enterprise Strategy Group (ESG) ...

April 18, 2017

The move to cloud-based solutions like Office 365, Google Apps and others is one of the biggest fundamental changes IT professionals will undertake in the history of computing. The cost savings and productivity enhancements available to organizations are huge. But these savings and benefits can't be reaped without careful planning, network assessment, change management and continuous monitoring. Read on for things that you shouldn't do with your network in preparation for a move to one of these cloud providers ...

April 17, 2017

One of the most ubiquitous words in the development and DevOps vocabularies is "Agile." It is that shining, valued, and sometimes elusive goal that all enterprises strive for. But how do you get there? How does your organization become truly Agile? With these questions in mind, DEVOPSdigest asked experts across the industry — including analysts, consultants and vendors — for their opinions on the best way for a development or DevOps team to become more Agile ...

April 12, 2017

Is composable infrastructure the right choice for your IT environment? The following are 5 key questions that can help you begin to explore the capabilities of composable infrastructure and its applicability within your own IT environment ...