Our digital economy is intolerant of downtime. But consumers haven't just come to expect always-on digital apps and services. They also expect continuous innovation, new functionality and lightening fast response times.
Organizations have taken note, investing heavily in teams and tools that supposedly increase uptime and free resources for innovation. But leaders have not realized this "throw money at the problem" approach to monitoring is burning through resources without much improvement in availability outcomes.
The Moogsoft State of Availability Report — which helps engineering teams and leaders uncover insights about availability KPIs, teams and tools — found that businesses are double-investing in monitoring. Organizations spend too much money on too many tools, and teams spend the majority of their days monitoring their monitoring tools.
This over-investment in incident management goes largely unnoticed by management. So does the fact that monitoring cycles siphon resources from the future-driven work that delights customers and keeps engineers engaged.
We identify a few common causes of the spend for less approach here:
1. Sprawling single-domain monitoring tools
In a noble attempt to keep digital apps and services available to end users at all times, business leaders buy tools that monitor their increasingly large and complex IT infrastructures. In theory, these tools should speed fixes to performance-affecting issues by continuously scanning systems and notifying engineers about anomalies.
The problem is: Teams have far too many tools. On average, engineers manage 16 monitoring tools. And that number can creep up to 40 as SLAs increase. Sprawling tools like this are unwieldy and license, management and maintenance overheads are expensive. But the over-investment in monitoring doesn't stop there.
2. Days spend in monitoring cycles
IT monitoring tools should bear the brunt of monitoring itself. In principle, these tools relieve engineers from spending too much time on a fairly tedious task and enable them to deliver what customers want: bigger and better technology.
Unfortunately, teams spend by far the most time monitoring over any other task. Why? Engineers spin their wheels managing single-domain tools that are not integrated cross stack. and produce huge volumes of largely useless data. Teams facing a critical outage or incident waste valuable time investigating data from disparate tools and connecting the dots themselves.
3. Leadership-team misalignment
Business leaders do not see just how much time their teams spend on monitoring, and likely believe they're making sound monitoring investments. Leaders believe their teams spend about the same amount of their time on monitoring as they do on other daily (and often future-driven) responsibilities like automation, cloud transformation and development.
4. Stalling innovation and experimentation
With engineering teams stuck in monitoring cycles, something has to give. And unfortunately, that thing is innovation and experimentation — the very activities that delight customers and engage engineering teams. In other words, not only do organizations over-invest in monitoring, they do so to the detriment of customer experience improvements.
The solution: steps to tech stability
If you are part of an engineering team or team leader, chances are you're facing modern-day monitoring problems. Consider these best practices for breaking wasteful monitoring cycles and building your tech stability:
1. Baseline your tools. Audit your existing tools, understand their utilization and what they cost. Then, you can determine which of these assets advance availability goals and which just create more noise.
2. Consolidate your tools. Hold on to only those monitoring tools that provide value. Otherwise, try to shrink your monitoring tools' footprint to decrease total cost of ownership (TCO) and reduce noise.
3. Implement an artificial intelligence for IT Operations (AIOps) solution. Make your next monitoring investment one that makes engineer's jobs less toilsome, not more. AIOps connects cloud and on-prem monitoring tools, giving engineers a central system of engagement for all monitoring activities. The platform alerts engineers to data anomalies and their root cause and automates the entire incident lifecycle.
4. Pay down your technical debt. With time back on your side, tackle the most relevant tech debt and increase system stability. Free even more time by automating away toil and continue to increase availability with chaos engineering.
5. Invest in the future. With time and money saved, refocus your investments on company-differentiating initiatives.
Monitoring tools are essential to uptime. But monitoring cannot be the only thing teams do — especially when it hinders innovation and experimentation. Leaders must make more informed investments to monitor more effectively. Only then can organizations move from maintaining the customer experience to innovating the customer experience.
The Latest
Industry experts offer predictions on how NetOps, Network Performance Management, Network Observability and related technologies will evolve and impact business in 2025 ...
In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 6 covers cloud, the edge and IT outages ...
In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 5 covers user experience, Digital Experience Management (DEM) and the hybrid workforce ...
In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 4 covers logs and Observability data ...
In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 3 covers OpenTelemetry, DevOps and more ...
In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 2 covers AI's impact on Observability, including AI Observability, AI-Powered Observability and AIOps ...
The Holiday Season means it is time for APMdigest's annual list of predictions, covering IT performance topics. Industry experts — from analysts and consultants to the top vendors — offer thoughtful, insightful, and often controversial predictions on how Observability, APM, AIOps and related technologies will evolve and impact business in 2025 ...
Technology leaders will invest in AI-driven customer experience (CX) strategies in the year ahead as they build more dynamic, relevant and meaningful connections with their target audiences ... As AI shifts the CX paradigm from reactive to proactive, tech leaders and their teams will embrace these five AI-driven strategies that will improve customer support and cybersecurity while providing smoother, more reliable service offerings ...
We're at a critical inflection point in the data landscape. In our recent survey of executive leaders in the data space — The State of Data Observability in 2024 — we found that while 92% of organizations now consider data reliability core to their strategy, most still struggle with fundamental visibility challenges ...