The Anatomy of APM – 4 Foundational Elements to a Successful Strategy
April 04, 2012

Larry Dragich
Auto Club Group

Share this

By embracing End-User-Experience (EUE) measurements as a key vehicle for demonstrating productivity, you build trust with your constituents in a very tangible way. The translation of IT metrics into business meaning (value) is what APM is all about.

The goal here is to simplify a complicated technology space by walking through a high-level view within each core element. I’m suggesting that the success factors in APM adoption center around the EUE and the integration touch points with the Incident Management process.

When looking at APM at 20,000 feet, four foundational elements come into view:

- Top Down Monitoring (RUM)


- Bottom Up Monitoring (Infrastructure)


- Incident Management Process (ITIL)


- Reporting (Metrics)


Top Down Monitoring

Top Down Monitoring is also referred to as Real-time Application Monitoring that focuses on the End-User-Experience. It has two has two components, Passive and Active. Passive monitoring is usually an agentless appliance which leverages network port mirroring. This low risk implementation provides one of the highest values within APM in terms of application visibility for the business.

Active monitoring, on the other hand, consists of synthetic probes and web robots which help report on system availability and predefined business transactions. This is a good complement when used with passive monitoring to help provide visibility on application health during off peak hours when transaction volume is low.

Bottom Up Monitoring

Bottom Up Monitoring is also referred to as Infrastructure Monitoring which usually ties into an operations manager tool and becomes the central collection point where event correlation happens. Minimally, at this level up/down monitoring should be in place for all nodes/servers within the environment. System automation is the key component to the timeliness and accuracy of incidents being created through the Trouble Ticket Interface.

Incident Management Process

The Incident Management Process as defined in ITIL is a foundational pillar to support Application Performance Management (APM). In our situation, Incident Management, Problem Management, and Change Management processes were already established in the culture for a year prior to us beginning to implement the APM strategies.

A look into ITIL's Continual Service Improvement (CSI) model and the benefits of Application Performance Management indicates they are both focused on improvement, with APM defining toolsets that tie together specific processes in Service Design, Service Transition, and Service Operation.

Reporting Metrics

Capturing the raw data for analysis is essential for an APM strategy to be successful. It is important to arrive at a common set of metrics that you will collect and then standardize on a common view on how to present the real-time performance data.

Your best bet: Alert on the Averages and Profile with Percentiles. Use 5 minute averages for real-time performance alerting, and percentiles for overall application profiling and Service Level Management.

Conclusion

As you go deeper in your exploration of APM and begin sifting through the technical dogma (e.g. transaction tagging, script injection, application profiling, stitching engines, etc.) for key decision points, take a step back and ask yourself why you're doing this in the first place: To translate IT metrics into an End-User-Experience that provides value back to the business.

If you have questions on the approach and what you should focus on first with APM, see Prioritizing Gartner's APM Model for insight on some best practices from the field.

Larry Dragich is Director of Enterprise Application Services at the Auto Club Group.

You can contact Larry on LinkedIn

Larry Dragich of AAA Joins The BSM Blog

For a high-level view of a much broader technology space refer to slide show on BrightTALK.com which describes “The Anatomy of APM - webcast” in more context.

Share this

The Latest

July 28, 2016

IT has access to an amazing amount of data. Often we collect hundreds of data points on one server such as individual processor load, thread state, disk throughput both in and out etc. We then store this in a bin and use this to create a metric called something similar to server performance ...

July 27, 2016

Today's IT managers and engineers have an incredible arsenal of powerful tactical tools; APM, NPM, BSM, EUEM and the list goes on. The strength of these tools, their narrow, bottom-up focus, is also the cause of a real problem for businesses. These narrow tools miss issues that stem from the hand-off from one node or application to the next. The monitoring tools can't see the data falling into the gaps ...

July 25, 2016

After many science fiction plots and decades of research, Artificial Intelligence (AI) is being applied across industries for a wide variety of purposes. AI, Big Data and human domain knowledge are converging to create possibilities formerly only dreamed of. The time is ripe for IT operations to incorporate AI into its processes ...

July 22, 2016

More than $1 trillion in IT spending will be directly or indirectly affected by the shift to cloud during the next five years, according to Gartner, Inc. This will make cloud computing one of the most disruptive forces of IT spending since the early days of the digital age ...

July 21, 2016

One of the most common problems network monitoring tools are employed to solve are problems with bandwidth. Availability is critical for IT departments of all sizes, and slow bandwidth creates productivity problems and even outages that have a real effect on businesses. Identifying the problems behind bandwidth drains can be difficult, so to help, I’ve put together a list of the five most common causes of sudden traffic spikes ...

July 20, 2016

In 2014 Gartner predicted that "75 percent of IT organizations will be bi-modal in some way by 2017." We are in the midst of this two-speed IT approach that organizations are adopting at an increasing rate to stay relevant for their customers. Speed 1 is the traditional IT that is being managed by the IT Operations persona and Speed 2 is the agile IT where within the organization especially the Developer persona and the Line of Business Persona are involved to get the most out of the digital innovations that flood our daily lives. One thing that these personas have in common is that they have a need for monitoring. In this blog I will focus on the needs of the various personas ...

July 19, 2016

While shoppers enjoy the bargains on Prime Day – a 24-hour sale on Amazon – few may realize that the success of such massive events hinges on network and application performance ...

July 18, 2016

I am constantly hearing the common theme that organizations want to make their IT-dependent employees and customers top priority in order to better support business growth. However, what I then find contradictory is while the desire is there, it's a significant challenge for organizations to actually achieve this. Here are four common barriers to business transformation initiatives – and suggested steps enterprises can take to overcome them ...

July 15, 2016

You already see the potential of adopting an Internet of Things model into your enterprise, but are you doing it in the best way? The following are four questions you and your team should be answering to determine how to find the right opportunity in the IoT space for your business ...

July 14, 2016

Cloud is no longer a new topic for IT, or for IT service management (ITSM). But its impact on how ITSM teams work, as well as on how IT works overall, has probably never been greater. Leveraging EMA research on the future of ITSM and on digital and IT transformation, this blog looks at data relevant to the impact of cloud on ITSM teams and addresses the following questions ...