10 APM Capabilities Every IT Manager Should Have
April 19, 2012
Irad Deutsch
Share this

One of the common questions that every IT manager asks on a regular basis is, “Why is my application so slow today when everything was fine yesterday?” Application Performance Management (APM) is the only way to truly answer that question, and it is one of the must-have tools for every IT manager.

With this APM imperative in mind, the following are 10 capabilities every IT manager should look for when choosing an APM solution:

1. Real-time monitoring

Real-time monitoring is a must. When digging into a problem, tracking events in real-time as they occur is by far more effective than doing so via “post-mortem” analysis. There are many APM vendors that claim to provide real-time monitoring but sometimes they really mean “near real-time”, with delays from 30 seconds to five minutes, typically. This restricts your ability to analyze and react to events in real-time. Make sure real-time is truly real-time. Real-time monitoring should provide you with important metrics such as: Who is doing what, how much resources are being taken, and who is affecting who right now?

2. Rich data repository

Sometimes you get lucky and witness a problem in real-time. But in most cases, this doesn’t happen. This is why a good APM solution must be able to collect all transaction activity and performance metrics into a rich, but light-weight repository.

3. “Single anomaly” granularity

Some APM vendors store the statistics they gather but they aggregate it to save disk space or because they just can’t handle too much data in a reasonable amount of time. Analyzing performance incidents based on aggregated data is similar to assessing a book by reading only its rear cover. You get the general idea but you have no ability to understand what really happened. That’s why good APM solutions must give you all of the granular information including individual transactions and their characteristics, resource consumption, traffic order (chain of events) etc.

4. Measuring Quality of Service (QoS) and Service Level Agreements (SLAs)

APM solutions are designed to improve the end user experience. Improving user experience starts by measuring it and identifying QoS and SLA anomalies. Only then can you make informed decisions and take action. You should also have the ability to compare user experience before and after a change is applied to your systems.

5. Performance proactivity – enforcing QoS and SLA

Some APM solutions enable users to analyze performance data and identify root problems retroactively, but do nothing to enable real-time resolution of performance issues. Because these solutions are fundamentally passive by nature, you have no choice but to wait for application performance to nosedive before corrective action can be taken. And in these cases, the wait time from issue identification to resolution can be hours or even days. Avoiding QoS problems can be achieved only if you take proactive steps. Proactive APM solution can turn this: “I got a text message at 2:00AM from our APM tool that indicated that we had a QoS problem so I logged into the system and solved it,” into: “I got a text message at 8:00 AM from our APM tool letting me know that at 1:50 AM a QoS problem was about to occur and it took care of it automatically.” Being proactivite can be achieved in many ways: by activating automatic scripts, managing system resources, and triggering third party tools, etc.

6. Detecting bottlenecks and root cause analysis

If an APM tool only notifies you that you ran out of system resources because of job X, then you don’t really have root cause analysis capabilities. Root cause analysis is when your APM tool tells you that this job usually runs at 8:00 PM but because of problem on a secondary system, it has started 1 hour later and collided with another job that was scheduled to run at the same time. APM tools must do the hard work of correlating many little pieces of data so that you can get to the source of the problem. Otherwise you will find yourself trying to assemble a 1,000 piece puzzle while your CEO knocks on your door every 5 minutes looking for answers.

7. Chain reaction analysis

Analyzing a problem can take many shapes. The conventional way is by digging into the top-10 hit lists. But those top-10 lists always miss something - the chain of events. Who came first, who came after, “it was all fine until this transaction came in”, etc. Analyzing the chain of events before the system crashed is crucial if you wish to avoid this problem in the future. An APM tool should give you the ability to travel back in time and look into the granular metrics second by second as if you were watching a movie in slow motion. This is possible only if the APM tool collects data at a very high level of granularity and does not lose it over time (i.e. it retains the raw collected metrics).

8. Performance comparisons

There are two main performance troubleshooting approaches that an APM tool should support. Performance drill downs to a specific period of time, and performance comparison. If you have a performance problem now, but all was fine yesterday, you must assume that something has changed. Hunting for those changes will lead you to the root cause much quicker than a conventional drill down into the current problem's performance metrics. You should have the ability to answer questions like these in seconds: “Is this new storage system I just implemented faster than the old one we had?” and “why is it working very well in QA but not in production?” If your APM tool collects and stores raw performance metrics, by comparing those metrics you can easily answer all these questions and dramatically shorten your mean time to recovery.

9. Business Intelligence-like dashboard

When an APM tool stores millions of pieces of raw (and aggregated) data, it should also deliver a convenient way to slice and dice this data. Some APM tools will decide for you the best way to process this data by providing a pre-defined set of graph and report templates. A good APM tool will let you decide how you want to slice and dice this data by giving you a flexible and easy to use BI-like dashboard where you can drag and drop dimensions and drill down by double clicking in order to answer questions like, “What user consumed most of my CPU and what is the top program he/she has been using that caused the most impact?”

10.Charge back capability

Bad performance usually starts with bad design or bad coding and very rarely stems from hardware faults. If a developer writes a poor piece of code, the IT division needs to spend more money on hardware or software licenses to deal with it. This is why it’s becoming popular in many organizations to turn this dynamic upside down - here the annual budgets are distributed between the application development divisions that use this money to buy IT services from their IT division. If they write poor code they ultimately need to pay more. This is workable only if the IT department has an APM tool that can measure and enforce resources usage by ‘tenant’. This approach has proven to be effective in helping companies reduce their IT budget quite significantly.

ABOUT Irad Deutsch

Irad Deutsch is a CTO at Veracity group, an international software infrastructure integrator. Irad is also the CTO of MORE IT Resources - MoreVRP, a provider of application and database performance optimization solutions.

Related Links:

www.morevrp.com

Share this

The Latest

December 18, 2024

Industry experts offer predictions on how NetOps, Network Performance Management, Network Observability and related technologies will evolve and impact business in 2025 ...

December 17, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 6 covers cloud, the edge and IT outages ...

December 16, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 5 covers user experience, Digital Experience Management (DEM) and the hybrid workforce ...

December 12, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 4 covers logs and Observability data ...

December 11, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 3 covers OpenTelemetry, DevOps and more ...

December 10, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 2 covers AI's impact on Observability, including AI Observability, AI-Powered Observability and AIOps ...

December 09, 2024

The Holiday Season means it is time for APMdigest's annual list of predictions, covering IT performance topics. Industry experts — from analysts and consultants to the top vendors — offer thoughtful, insightful, and often controversial predictions on how Observability, APM, AIOps and related technologies will evolve and impact business in 2025 ...

December 05, 2024
Generative AI represents more than just a technological advancement; it's a transformative shift in how businesses operate. Companies are beginning to tap into its ability to enhance processes, innovate products and improve customer experiences. According to a new IDC InfoBrief sponsored by Endava, 60% of CEOs globally highlight deploying AI, including generative AI, as their top modernization priority to support digital business ambitions over the next two years ...
December 04, 2024

Technology leaders will invest in AI-driven customer experience (CX) strategies in the year ahead as they build more dynamic, relevant and meaningful connections with their target audiences ... As AI shifts the CX paradigm from reactive to proactive, tech leaders and their teams will embrace these five AI-driven strategies that will improve customer support and cybersecurity while providing smoother, more reliable service offerings ...

December 03, 2024

We're at a critical inflection point in the data landscape. In our recent survey of executive leaders in the data space — The State of Data Observability in 2024 — we found that while 92% of organizations now consider data reliability core to their strategy, most still struggle with fundamental visibility challenges ...