Optimizing Root Cause Analysis to Reduce MTTR
October 11, 2012

Ariel Gordon

Share this

Efficiently detecting and resolving problems is essential, of course, to continue supporting - and minimizing impact on - business services, as well as minimizing any financial impacts.

The goal is to turn the tables on IT problems so that 80 percent of the time is spent on the root cause analysis versus 20 percent on the actual problem fixing.

In resolving the issue, communication is a critical factor for integrating different expert groups towards a common goal. Because each team holds a narrow view of its own domain and expertise, there is always the danger lurking that the "big picture" angle will be missing. You don't want lack of communication to result in blame games and finger pointing.

Some problem detection methods include:

- Infrastructure Monitoring: specific resource utilization like disk, memory, CPU are effective for identifying availability failures – sometimes even heading those off before they happen.

- Domain or Application Tools: These help, but leave the issue that overall problem detection is still a game of hide-and-seek, a manually-intensive effort that comes under the pressure of needing a fix as quickly as possible.

- Dependency mapping tools, which map business services and applications to infrastructure components, can help you generate a topology map that will improve your root cause analysis process for the following reasons:

1. Connect Symptoms to Problems: A single map that relates a business service (user point of view) to its configuration items, will help you detect problems faster.

2. Common Ground: The map ties in all elements so that different groups can focus on a cross-domain effort.

3. High-Level, Cross-Domain View: Teams can view problems not only in the context of their domain, but in a wider view of all network components. For example, a database administrator analyzing a slow database performance problem can examine the topology map to see the effect of networking components on the database.

Root cause is a complex issue, so that no single tool or approach will provide you with full coverage. The idea is to plan a portfolio of tools that together deliver the most impact for your organization.

For instance, if you do not have a central event management console, then consider implementing a topology-based event management solution. If most of your applications involve online transactions, try to look for a transaction management product that covers the technology stack that is common in your environment. Put differently, select a combination of tools that are right for your environment.

Once you assess the tools that provide the most value, implement them in ascending order of value so that you get the biggest impact first.

Ariel Gordon is VP Products and Co-Founder of Neebula.

Share this

The Latest

October 17, 2024

Monitoring your cloud infrastructure on Microsoft Azure is crucial for maintaining its optimal functioning ... In this blog, we will discuss the key aspects you need to consider when selecting the right Azure monitoring software for your business ...

October 16, 2024

All eyes are on the value AI can provide to enterprises. Whether it's simplifying the lives of developers, more accurately forecasting business decisions, or empowering teams to do more with less, AI has already become deeply integrated into businesses. However, it's still early to evaluate its impact using traditional methods. Here's how engineering and IT leaders can make educated decisions despite the ambiguity ...

October 15, 2024

2024 is the year of AI adoption on the mainframe, according to the State of Mainframe Modernization Survey from Kyndryl ...

October 10, 2024

When employees encounter tech friction or feel frustrated with the tools they are asked to use, they will find a workaround. In fact, one in two office workers admit to using personal devices to log into work networks, with 32% of them revealing their employers are unaware of this practice, according to Securing the Digital Employee Experience ...

October 10, 2024

In today's high-stakes race to deliver innovative products without disruptions, the importance of feature management and experimentation has never been more clear. But what strategies are driving success, and which tools are truly moving the needle? ...

October 09, 2024
A well-performing application is no longer a luxury; it has become a necessity for many business organizations worldwide. End users expect applications to be fast, reliable, and responsive — anything less can cause user frustration, app abandonment, and ultimately lost revenue. This is where application performance testing comes in ....
October 08, 2024

The demand for real-time AI capabilities is pushing data scientists to develop and manage infrastructure that can handle massive volumes of data in motion. This includes streaming data pipelines, edge computing, scalable cloud architecture, and data quality and governance. These new responsibilities require data scientists to expand their skill sets significantly ...

October 07, 2024

As the digital landscape constantly evolves, it's critical for businesses to stay ahead, especially when it comes to operating systems updates. A recent ControlUp study revealed that 82% of enterprise Windows endpoint devices have yet to migrate to Windows 11. With Microsoft's cutoff date on October 14, 2025, for Windows 10 support fast approaching, the urgency cannot be overstated ...

October 04, 2024

In Part 1 of this two-part series, I defined multi-CDN and explored how and why this approach is used by streaming services, e-commerce platforms, gaming companies and global enterprises for fast and reliable content delivery ... Now, in Part 2 of the series, I'll explore one of the biggest challenges of multi-CDN: observability.

October 03, 2024

CDNs consist of geographically distributed data centers with servers that cache and serve content close to end users to reduce latency and improve load times. Each data center is strategically placed so that digital signals can rapidly travel from one "point of presence" to the next, getting the digital signal to the viewer as fast as possible ... Multi-CDN refers to the strategy of utilizing multiple CDNs to deliver digital content across the internet ...