Our digital economy is intolerant of downtime. But consumers haven't just come to expect always-on digital apps and services. They also expect continuous innovation, new functionality and lightening fast response times.
Organizations have taken note, investing heavily in teams and tools that supposedly increase uptime and free resources for innovation. But leaders have not realized this "throw money at the problem" approach to monitoring is burning through resources without much improvement in availability outcomes.
The Moogsoft State of Availability Report — which helps engineering teams and leaders uncover insights about availability KPIs, teams and tools — found that businesses are double-investing in monitoring. Organizations spend too much money on too many tools, and teams spend the majority of their days monitoring their monitoring tools.
This over-investment in incident management goes largely unnoticed by management. So does the fact that monitoring cycles siphon resources from the future-driven work that delights customers and keeps engineers engaged.
We identify a few common causes of the spend for less approach here:
1. Sprawling single-domain monitoring tools
In a noble attempt to keep digital apps and services available to end users at all times, business leaders buy tools that monitor their increasingly large and complex IT infrastructures. In theory, these tools should speed fixes to performance-affecting issues by continuously scanning systems and notifying engineers about anomalies.
The problem is: Teams have far too many tools. On average, engineers manage 16 monitoring tools. And that number can creep up to 40 as SLAs increase. Sprawling tools like this are unwieldy and license, management and maintenance overheads are expensive. But the over-investment in monitoring doesn't stop there.
2. Days spend in monitoring cycles
IT monitoring tools should bear the brunt of monitoring itself. In principle, these tools relieve engineers from spending too much time on a fairly tedious task and enable them to deliver what customers want: bigger and better technology.
Unfortunately, teams spend by far the most time monitoring over any other task. Why? Engineers spin their wheels managing single-domain tools that are not integrated cross stack. and produce huge volumes of largely useless data. Teams facing a critical outage or incident waste valuable time investigating data from disparate tools and connecting the dots themselves.
3. Leadership-team misalignment
Business leaders do not see just how much time their teams spend on monitoring, and likely believe they're making sound monitoring investments. Leaders believe their teams spend about the same amount of their time on monitoring as they do on other daily (and often future-driven) responsibilities like automation, cloud transformation and development.
4. Stalling innovation and experimentation
With engineering teams stuck in monitoring cycles, something has to give. And unfortunately, that thing is innovation and experimentation — the very activities that delight customers and engage engineering teams. In other words, not only do organizations over-invest in monitoring, they do so to the detriment of customer experience improvements.
The solution: steps to tech stability
If you are part of an engineering team or team leader, chances are you're facing modern-day monitoring problems. Consider these best practices for breaking wasteful monitoring cycles and building your tech stability:
1. Baseline your tools. Audit your existing tools, understand their utilization and what they cost. Then, you can determine which of these assets advance availability goals and which just create more noise.
2. Consolidate your tools. Hold on to only those monitoring tools that provide value. Otherwise, try to shrink your monitoring tools' footprint to decrease total cost of ownership (TCO) and reduce noise.
3. Implement an artificial intelligence for IT Operations (AIOps) solution. Make your next monitoring investment one that makes engineer's jobs less toilsome, not more. AIOps connects cloud and on-prem monitoring tools, giving engineers a central system of engagement for all monitoring activities. The platform alerts engineers to data anomalies and their root cause and automates the entire incident lifecycle.
4. Pay down your technical debt. With time back on your side, tackle the most relevant tech debt and increase system stability. Free even more time by automating away toil and continue to increase availability with chaos engineering.
5. Invest in the future. With time and money saved, refocus your investments on company-differentiating initiatives.
Monitoring tools are essential to uptime. But monitoring cannot be the only thing teams do — especially when it hinders innovation and experimentation. Leaders must make more informed investments to monitor more effectively. Only then can organizations move from maintaining the customer experience to innovating the customer experience.
The Latest
Broad proliferation of cloud infrastructure combined with continued support for remote workers is driving increased complexity and visibility challenges for network operations teams, according to new research conducted by Dimensional Research and sponsored by Broadcom ...
New research from ServiceNow and ThoughtLab reveals that less than 30% of banks feel their transformation efforts are meeting evolving customer digital needs. Additionally, 52% say they must revamp their strategy to counter competition from outside the sector. Adapting to these challenges isn't just about staying competitive — it's about staying in business ...
Leaders in the financial services sector are bullish on AI, with 95% of business and IT decision makers saying that AI is a top C-Suite priority, and 96% of respondents believing it provides their business a competitive advantage, according to Riverbed's Global AI and Digital Experience Survey ...
SLOs have long been a staple for DevOps teams to monitor the health of their applications and infrastructure ... Now, as digital trends have shifted, more and more teams are looking to adapt this model for the mobile environment. This, however, is not without its challenges ...
Modernizing IT infrastructure has become essential for organizations striving to remain competitive. This modernization extends beyond merely upgrading hardware or software; it involves strategically leveraging new technologies like AI and cloud computing to enhance operational efficiency, increase data accessibility, and improve the end-user experience ...
AI sure grew fast in popularity, but are AI apps any good? ... If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment ...
If you were lucky, you found out about the massive CrowdStrike/Microsoft outage last July by reading about it over coffee. Those less fortunate were awoken hours earlier by frantic calls from work ... Whether you were directly affected or not, there's an important lesson: all organizations should be conducting in-depth reviews of testing and change management ...
In MEAN TIME TO INSIGHT Episode 11, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses Secure Access Service Edge (SASE) ...
On average, only 48% of digital initiatives enterprise-wide meet or exceed their business outcome targets according to Gartner's annual global survey of CIOs and technology executives ...
Artificial intelligence (AI) is rapidly reshaping industries around the world. From optimizing business processes to unlocking new levels of innovation, AI is a critical driver of success for modern enterprises. As a result, business leaders — from DevOps engineers to CTOs — are under pressure to incorporate AI into their workflows to stay competitive. But the question isn't whether AI should be adopted — it's how ...