In BSMdigest’s exclusive interview, Michael Sydor, Engineering Services Architect for CA Technologies, talks about APM and his new book APM Best Practices: Realizing Application Performance Management.
In BSMdigest’s exclusive interview, Michael Sydor, Engineering Services Architect for CA Technologies, talks about APM and his new book APM Best Practices: Realizing Application Performance Management.
BSM: What did you see in the industry or market that caused you to decide to write the book?
MS: I had developed a set of best practices in response to a number of assessments of client practices and progress with APM which revealed some troubling trends, across all vendors. Organizations were simply unprepared to actually adopt APM, despite enterprise licensing (unlimited access to the technology), resulting in a relatively small number of operational deployments. There were also a number of ‘anti-patterns’ that I encountered that actually prevented broad adoption of APM.
Five years ago, we began actively building client teams to staff an "APM practice”, responsible for both deployments and the overall APM lifecycle – keeping the technology operating safely and efficiently. At that time it was obvious that a book was needed, in order to better prepare the marketplace for successful APM.
BSM: What is the basic difference between APM and the ITSM or monitoring infrastructure most companies already have in place?
MS: The difference is in the number of potential consumers of performance data and the collaborative infrastructure that has to be established in order to get the full value from performance monitoring. While APM looks and sounds like infrastructure monitoring, in terms of thresholds and alerts - what I simply call “Availability Monitoring” - it is really more about managing software/service quality.
While there is an obvious overlap between availability and performance in terms of alerts, the team that manages availability usually has nothing to do with getting performance problems addressed. You simply cannot reboot a server to solve a scalability issue or a software defect. For sure, infrastructure problems dominate the overall number of critical incidents, which is why I always position APM as complementary to availability monitoring. But when you consider what options the operations team actually has (reboot, open a bridge and reboot) versus what is needed (configuration or code change) you can start to appreciate how an organization can completely miss the APM value proposition – and then give up on the technology.
BSM: When the Help Desk is reporting application performance issues before the organization's monitoring tools, is this the time to move to APM?
MS: When the Help Desk is a better indicator of application performance – this is when APM is the only path to improving operational experience. One of the biggest limitations in using availability monitoring is that the resolution of an incident, actually seeing a change on your dashboard (trouble management), can be anywhere from 15 to 30 minutes. Your Help Desk is going to see increased call volume in 1-5 minutes, depending on the nature of the business - or as soon as users realize that they can’t complete their transactions. APM is going to detect trends hours before, and in the event of an outright failure, APM is going to alert in a minute or less.
For organizations that are looking ahead, the motivation to move to APM will be to exploit the additional visibility to better manage the application lifecycle, rather than finding out their current monitoring solution has run out of steam. And this will happen in proportion to the ‘cost’ of incurring poor user experience when their software systems are being used.
BSM: Can companies build on their legacy monitoring to achieve APM, or do they need a completely new APM solution, designed for the virtual and cloud worlds?
MS: There is much potential to leverage purpose-built or other customized monitoring simply because APM technology leverages features of the most modern development languages, which may leave some significant gaps for your older applications and services. My definition of APM actually begins with log files – which can have tremendous utility, provided that the volume of logging hasn’t already compromised performance.
When APM is deployed, critical applications are always going to be first, followed by compatible applications, and then integrating these earlier tools and technologies. The challenge is more process-centric, rather than technology because many of these earlier tools were never intended to be used by the greater community of users that APM is trying to bring together. Evaluating your portfolio of technologies and existing processes – which I call visibility assessments – are key to the success of the APM initiative overall.
BSM: In the book you say, “APM is much more than a monitoring technology. It is a collaboration technology." What do you mean by this?
MS: Much of the real-time information about an application and its resources has been reserved for a small set of specialists, usually in an operational role. To improve the user experience you need to understand and account for performance capabilities throughout the lifecycle and often from a business perspective. Integrating these formerly isolated sources of information, along with disparate tools used in different environments (Development, QA, Production) is a daunting task. The ability of APM to integrate these different sources of information into a single dashboard, and allowing that dashboard to be tailored to its audience, is what APM brings to the table.
We don’t have to "boil the ocean" – we start with the critical applications, and keep expanding the footprint with each deployment. We simply bring the most complete set of information about the application to the people who are best positioned to take advantage of it – and only the information that can be demonstrated to help address performance issues. That’s the essence of collaboration – lots of contributors, refining data and processes so that more folks benefit than did earlier.
BSM: What are the most common mistakes organizations make when selecting and deploying an APM solution?
MS: Restricting the technology to “Operations” only, and not ensuring that at least two persons are fully competent with the technology. That’s probably how 60% of APM initiatives stall or fail. The rest is mostly weakness in deploying and managing the APM environment.
Too many folks bypass the opportunity to participate in their initial deployments, in favor of turn-key deployments. IT is really hard to pick up where someone else left off, after a year or more.
And after deployment issues, they then fail to consider any sort of collaboration with the performance data and techniques, and what it will take to break down the barriers to collaboration.
As the deployment pace increases, and the organization begins to exceed the capacity of their original APM footprint, the next area for trouble is simply managing the performance and capacity of the APM solution. This is, I believe, the legacy of the availability monitoring mindset. That earlier technology was something you deployed once and never had to think about again. The pace of APM adoption is what really surprises customers. Once the APM users understand what they can use the data for, adoption literally explodes, and what was a properly sized environment 1-2 years ago, suddenly falls apart.
BSM: In addition to technology, must an APM initiative also include an organizational evolution, to designate new stakeholders with responsibilities?
MS: This is a more difficult question. It all depends on the initial scope of the initiative and usually, these are excessively narrow and focus on a single or small number of applications. I try to cultivate an executive sponsor who acts as a thought leader and corporate repository for the goals and progress of the various initiatives, and who understands how the initiative can and will grow.
The book is really an attempt to "virtualize" this role because an appropriate individual is not always available. So in the book I’ve given the reader the full story about what a successful APM initiative looks like and how to get there, no matter what your starting point. The benefits of APM don’t really change but you have to be pragmatic about what resources you can bring to bear. The more resources and support available, the more that the organization structure (of the APM initiative) becomes important.
BSM: In your book, it sounds like you say that organizations should not focus on ROI of an APM initiative?
MS: I really find that an ROI analysis, undertaken without any prior operational experience with APM, is little more than "plausible" and not at all reflective of reality. For example, while the cost of a critical incident can be accurately determined, the assumption that APM use will decrease the overall number of incidents, is actually false. Initially, you will actually be finding more, not fewer incidents, as a result of getting visibility into problems you never knew existed. The same situation applies if you are expecting to “reduce time spent on bridge calls”. If you don’t have appropriate processes to respond to performance problems – the actual remediation – then the resolution time grows without bound. None of the traditional measures of IT investment apply – except if you are refreshing an earlier APM initiative.
Instead, you narrow scope and act tactically in order to get a “demonstration of value”: identify, fix and validate a "real" problem. This works practically every time but you are only creating a future problem because you never achieved consensus or broader support.
You could also try the “better tools – better software quality” approach, which is the developer perspective. This misses the collaboration benefits as most organizations are heavily siloed and don’t even consider "collaboration" as a benefit, let alone how to associate a cost or savings with it.
The third route, and the real alternative to the ROI, is to undertake a detailed assessment and justification of your monitoring capabilities. It is relatively easy to identify the visibility gaps and the cost to “improve the situation”. And I’ve devoted a whole chapter to it so you can complete it on your own. This is a justification by preponderance of the evidence.
BSM: Is APM essential for Business Service Management?
MS: Absolutely. In the simplest sense, IT is responsible for the infrastructure. Business is responsible for the "business". Both perspectives will exploit APM visibility and information.
Traditionally, IT deploys monitoring technology which it uses to manage the infrastructure. It didn’t have to share that monitoring technology before and no one was really asking for it. Today, the business needs to see the end-user experience and they need it in "business-time" – not simply an availability report the next day or end of week. Today, with APM, IT manages the tool but the business is the primary consumer of the information.
About Michael Sydor
Michael J. Sydor is an Engineering Services Architect for CA Technologies. With more than 20 years in the mastery of high performance computing technology, he has significant experience identifying and documenting technology best practices, as well as designing programs for building, mentoring and operating successful client performance management teams.