Automated Analytics: The Third-Dimension of Application Performance Problem Solving
September 05, 2013
Jason Meserve
Share this

It doesn’t seem all that long ago that one would arrive at the office in the morning, find that the email system or web site was down and call IT to let them know. Sadly, that call would be the first notification IT had to check to see if the reported system was indeed down.

That scenario is the first level of application performance analytics. It isn’t very proactive or smart and can lead to a lot of frustrated users. In 2013, if the first notice of an outage is coming from an employee or worse, a customer, then IT needs to seriously investigate a new solution for alerting to problems. With the competition a click away and razor thin margins, businesses today can’t afford slowdowns and outages, never mind one that requires an end user to report it.

This is why Application Performance Management (APM) systems were developed. To give IT a way of easily seeing problem spots in complex applications and drilling down into the varied layers of the application to find root cause. The majority of today’s APM solutions accomplish this through setting thresholds and baselines (automatically or manually) and alerting when those lines in the sand are approached or crossed. This approach is great for alerting to extreme behavior and lighting up the red, yellow and green lights on an IT operator’s dashboard.

Dashboards are important to Operations. If you’re responsible for a complex system, it helps to watch for extreme measurements on each component. In practice, however, although managing the components for extreme behavior helps, this never proves to be sufficient in keeping the system healthy or in restoring health to the system when it degrades or fails. Components interact with other components. Those interactions can be very important to the overall system, even when no extreme behavior is evident on any one component.

Consider an analogy. If a sick patient seeks care from three different specialists (each responsible for the health of one component of the system) and each specialist prescribes medication without considering the actions of the other specialists, then the interaction of the drugs can cause serious harm to the patient (i.e., the system) even though no single drug is prescribed in excess or would cause any ill effects alone.

In a similar manner, management of IT components in isolation, without consideration of the IT system as a whole and the interactions between all the components, is known to result in poor overall performance, more outages, and slower recovery times.

Let’s focus on an important fact: It’s very expensive to have an outage. “The most recent Enterprise Management Associates (EMA) research finds that for 25% of companies surveyed, an hour of downtime costs the business between $100,000 and $500,000. Another 29% report the cost of downtime to be between $75,000 and $100,000,” according to research published by EMA. And that’s just the bottom line cost. What about customer loyalty and brand reputation? Damage those too badly and the company may never recover.

A Third Wave of Analytics

There’s a new, third wave of smarter, more sophisticated analytics hitting the APM market; these solutions are designed to help shorten the duration of outages and possibly prevent them by giving application operators earlier warnings of problems brewing beneath the surface. A recent APM Digest Q&A with Netuitive’s Nicola Sanna touched on the importance of having machine-driven analytics.

Today’s advanced analytical engines allow the IT practitioner to rise above the level of component management and practice a more efficient and effective form of systems management. Such an engine does not require thresholding, baselining or configuring for any specific application. Instead, the engine consumes raw data and then learns metric, component, and system behavioral patterns on its own. This means the engine learns from observation the difference between normal and abnormal behavior, not at the metric level, not at the component level, but at the systems level.

Sophisticated analytic engines use multivariate anomaly detection to find intervals of time when groups of metrics or application components are interacting with each other in a manner not consistent with the historical patterns. Visualization and analysis of the patterns from such groups of metrics during an abnormal interval reveals where impactful change occurred across multiple components, when change occurred and the scope of the impact across multiple components. This provides a new type of insight not revealed by the other types of APM analysis. In most cases it can either reveal root causes or at least clues about root causes, including relationships the application operator would not have otherwise known.

This achievement of systems management over component management does not work if configuration is required. Neither the operator nor the administrator can be expected to know in advance the interactions which occur in a complex system. They cannot possibly construct rules, thresholds, and dashboards sufficient for capturing relationships they don’t even know about. Nor could they possibly maintain proper configuration over time as change occurs throughout the system. Fortunately, analytics technology has advanced to the point that zero-configuration monitoring and analysis systems are feasible.

Having automated analytics built right into the APM workflow can help application operators discover the source of problems in complex applications more quickly as they do not have to switch between various systems when problems arise. Making cutting-edge analytics part of the everyday APM environment can make IT operators more efficient, helping to reduce the time associated with outages and slowdowns.

This type of analysis harnesses the Big Data created by APM systems and delivers value. As APM monitors collect performance data from thousands of nodes every 15 seconds, the amount of metrics being processed by an APM system quickly adds up. This data is already used for extreme alerting via thresholds which color traffic lights on dashboards, flow maps, and Top-N views. Now it’s possible to augment this component-centric, extreme-behavior-centric approach with machine-driven analytics that enable systems management by mining big data for potential problems, making those millions (or, in some cases, billions) of metrics even more valuable.

With IT staffs spread thin, growing application complexity and increased user demand and expectations, application owners and operators need every insight possible into the performance of critical systems. Add advanced, automated analytics, the must-have next step in delivering that insight, to complement your existing alerts and give your team that critical edge they need to deliver business service reliability.

ABOUT Jason Meserve

Jason Meserve has been working in high-tech for over 15 years, and is currently a Product Marketing Manager at CA Technologies where he focuses on Service Assurance solutions such as Application Performance Management. He built his tech resume in the 10 years he spent as a journalist at Network World, where he created everything from articles, features, blogs, videos and podcasts. Meserve has also held marketing and editorial positions at Constant Contact and Application Development Trends.

Related Links:

www.ca.com/apm

Q&A Part One: Netuitive's Nicola Sanna Talks About Aligning IT with the Business

www.google.com
Enterprise Management Associates Report: The Top-line and the Bottom-line Impact of Application Performance Challenges

Share this

The Latest

November 17, 2017

Just in time for the holiday shopping season, APMdigest asked experts from across the industry for their opinions on the best way to measure eCommerce performance, in terms of applications, networks and infrastructure. Part 3, the final installment, covers the customer journey ...

November 16, 2017

Just in time for the holiday shopping season, APMdigest asked experts from across the industry for their opinions on the best way to measure eCommerce performance, in terms of applications, networks and infrastructure. Part 2 covers APM and monitoring ...

November 15, 2017

As the holiday shopping season looms ahead, and online sales are positioned to challenge or even beat in-store purchases, eCommerce is on the minds of many decision makers. To help organizations decide how to gauge their eCommerce success, APMdigest compiled a list of expert opinions on the best way to measure eCommerce performance ...

November 14, 2017

More than 90 percent of respondents are concerned about data and application security in public clouds while nearly 60 percent of respondents reported that public cloud environments make it more difficult to obtain visibility into data traffic, according to a new Cloud Security survey ...

November 13, 2017

Today's technology advances have enabled end-users to operate more efficiently, and for businesses to more easily interact with customers and gather and store huge amounts of data that previously would be impossible to collect. In kind, IT departments can also collect valuable telemetry from their distributed enterprise devices to allow for many of the same benefits. But now that all this data is within reach, how can organizations make sense of it all? ...

November 09, 2017

CIOs trying to lead digital transformation at the speed needed to succeed need a mix of three scale accelerators, according to Gartner, Inc. The three scale accelerators include: digital dexterity, network effect technologies, and an industrialized digital platform ...

November 08, 2017

While the majority of IT practitioners in the UK believe their organization is equipped to support digital services, over half of them also say they face consumer-impacting incidents at least one or more times a week, sometimes costing their organizations millions in lost revenue for every hour that an application is down, according to PagerDuty's State of Digital Operations Report: United Kingdom ...

November 07, 2017

Today's IT is under considerable pressure to remain agile, responsive and scalable to meet the changing needs of business. IT infrastructure can't become a bottleneck, it must be the enabler. But as new paradigms, such as DevOps, are adopted, data center complexity increases and infrastructure constraints can block the ability to achieve these goals ...

November 06, 2017

It's 3:47am. You and the rest of the Ops team have been summoned from your peaceful slumber to mitigate an application delivery outage. Your mind races as you switch to problem solving mode. It's time to start thinking about how to make this mitigation FUN! ...

November 03, 2017

With the increased complexity of IT environments, the rising cyber threats and the growing number of IT alerts, IT organizations have come to the realization that throwing more people at IT issues doesn't solve the problem ...