Monitoring today's apps and digital architectures is a lot like aerial combat. Just when you think you've got everything control, another slippery piece of tech gets right on your tail. Today that's containers and microservices – a big headache for operational flight aces.
When organizations adopt containers and microservice style architectures in production, systems become incredibly complex. For operations it's a shock because it means coming to grips with many new container tech nuances - plus letting go of the old monitoring rule book – because, well, it doesn't work anymore.
Depending on architectural patterns, containers introduce many more moving parts and dependencies – all meaning more system checks, events and alarms. Moreover, the convenience of empowering development teams' with a disposable and immutable platform upon which to accelerate software delivery doesn't come without its own set of gotchas. Not least being woefully unprepared for the increased rate of change and complexity. If not addressed, this inevitably leads to outages, more organizational stress - perhaps even backing away from a technology that's a business no-brainer.
To combat these issues some organizations are taking a leaf out of the cloud-native book with design for failure methods and increasing redundancy at every level of the technology stack. Some are going further than resilience; developing autonomous microservices that can handle and exploit the complexity of containers in order for the entire system to become stronger over time. All commendable practices, but again placing more demands on monitoring systems.
With so much complexity it's perhaps not surprising that analytics is now touted as the operational answer to the container complexity conundrum. And why not – take truckloads of data, logs and time-series, sprinkle in a liberal selection of algorithmic fairy-dust – and hey presto – problem solved. Well here's hoping, but before running down to the analytics store, it's worth assessing capabilities based on how they help teams make fast decisions in constantly changing and dynamic container environments.
Interestingly, there's a proven DevOps practice taken from aerial combat called OODA Loops (observe, orient, decide and act) that's an effective way of assessing the efficacy of application monitoring and analytics. By mapping OODA to a production container environment, teams can build a good picture of what constitutes an effective monitoring strategy and where analytics is especially important.
Any half-decent DevOps site reliability engineers use monitoring solutions to collect masses of data from as many sources as possible. In container environments where everything's in constant flux and events unfold rapidly, the best analytics (like fighter pilots) are capable of correlating information from multiple sources. Continuously asking questions about what's happening across all the hosts, containers and pods and the impacting on service - essentially distilling down mega amounts of information into actionable views. Here, good APM analytics use proven statistical methods like to filter out signals from noise – especially relevant for microservices where dependencies can produce cascades of alarms.
Analytics doesn't end with information capture. Once data has been sorted and processed, solutions should be capable of identifying the exact problem with as little noise as possible. Again, more advanced methods use analytics to gather evidence and guide cross-functional support teams towards resolutions. One perfect example is the assisted triage feature of CA APM which leverages graph theory and workflow to quickly orient teams towards solutions – again especially valuable for container environments where problems can be multi-faceted.
Based on all the symptoms identified during OODA orientation, the best monitoring systems help teams decide what action to take to address an event. For increased value, systems should surface this information to staff best positioned to act upon it - in context of their work. A good application performance monitoring solution for example would detect a performance problem associated with a software build - a better one would incorporate pass/fail conditions into the process and allow cross-build performance comparisons – all directly accessible from a developer's workstation and fully integrated into DevOps workflows and practices (continuous integration is this example).
In dynamic container environments, app monitoring and analytics must allow for fast actions once decisions have been made. This means integrating monitoring and analytical horsepower into many more workflows and processes to increase range and impact. For example, using app monitoring analytics to determine which coding practices lead to the best performance outcomes and instantiating those across teams. Or integrating analytics with auto scaling to dynamically optimize workloads based on load predictions – with great analytics solutions the improvement opportunities are endless.
Like fighter pilots, application monitoring solutions must process and respond at speed. Look for modern solutions that allow OODA loops to be implemented and measure their effectiveness based on how quickly they can act on problems and drive improvements.
To learn more about what it takes to be an IT operations analytics Top Gun, attend www.ITOA2Summit.com or download the Monitoring Redefined: Digital Experience Insights white paper.
Pete Waterhouse is Advisor, Product Marketing, at CA Technologies.