Take the War Out of the War Room
June 09, 2015

Nik Koutsoukos
Catchpoint

Share this

The development of new and more complex business technologies happens so quickly now that they are starting to outpace the rate at which IT organizations can effectively monitor the entire IT infrastructure and react to problems. This is particularly true as more enterprises adopt a hybrid model with some resources managed in the data center and some in cloud or SaaS-based environments. Simultaneously, IT organizations have become increasingly siloed as different personnel develop skillsets specific to different pieces of the IT infrastructure, such as database management, the network, information security, etc.

As a result, the “war room” – where IT personnel gather to diagnose and fix a problem – more often than not devolves into a session of finger pointing and delays. Remedying this situation demands a new approach to managing performance that enables IT to become more proactive instead of reactive, and more collaborative instead of siloed.

Riverbed recently held a webinar on this topic, and one of our presenters was Forrester Vice President and Principal Analyst Jean-Pierre Garbani. He opened his remarks with a statement that nicely summarizes how predictive analytics technologies have radically reshaped how any company does (or should do) business: “Every company board, IT organization and leadership team should assume that there are – or will be – new ways to more efficiently service customers.”

In other words, counting on the luxury of being able to time the development and release of new products, applications or services to slow-moving market trends is a thing of the past. Just ask the taxicab industry. After more than a century of enjoying a monopoly, it suddenly finds itself in a battle for its life against data-driven services like Uber and Lyft. Or consider the examples of Kodak, Blockbuster, Tower Records or Borders for evidence of how quickly a long-established business model can become obsolete very quickly.

Today companies can collect massive amounts of data and use predictive analytics technologies to determine and use invaluable information such as customer buying trends, supply chain capacity, commodity price futures, or to provide customers with data-driven offers. Enterprises are pouring money and energy into creating innovative applications and getting them to market faster, better and cheaper. Agile and DevOps capabilities can reduce release cycles from months to mere days, and the funding for these investments typically comes by spending reductions in infrastructure.

These complexities can quickly overwhelm human abilities and makes the job of resolving problems and maintaining systems increasingly difficult and time-consuming. That impacts service quality. Forrester has conducted a number of surveys and found that 56 percent of IT organizations resolve less than 75 percent of application performance problems in 24 hours, and in some cases, those performance issues can lag for months before resolution. Consider as examples outages that affect services like Gmail or Dropbox.

The root of the problem lies with the fact that IT grew up around domains such as the network, systems, applications, databases, etc., and they needed domain data to do their jobs. That has driven a proliferation of domain-centric point tools, which helps each domain group, but also means that for even very simple transactions, domain teams only see part of the transaction, such as packet data or metrics from an app server. This incomplete visibility means domain teams see different things due to inconsistent data sets and differing analytic approaches. That leads to a lack of collaboration, warring tribes, and ultimately conflicting conclusions that inhibit fast time to resolution.

For example, last year Adobe’s move to cloud-based software back fired momentarily when database maintenance resulted in application availability issues. The company’s Creative Cloud service was unavailable for about a day, leaving users unable to access the web versions of apps such as Photoshop and Premiere. In total, the outage was said to have impacted at least a million subscribers. Other Adobe-related products were impacted during the downtime as well, including Adobe's Business Catalyst analytics tool. The company has since implemented procedures to prevent a similar outage from happening again.

This instance highlights the area where companies typically struggle to solve performance issues. Once a problem occurs, it usually doesn’t take long for a frustrated employee or customer to raise it with IT, and once the specific cause is identified, fixing and validating that fix should not take long. Where the delays occur is in the middle of that timeline: the diagnosis, or what Forrester refers to as the “Mean Time to Know” (MTTK).

Because an IT organization is typically divided into independent silos that have little interaction with each other, the diagnosis process cannot be a collaborative effort. The war room where personnel gather to battle the problem becomes a war against each other. Instead of one collaborative effort, each silo uses its own specialized tools to evaluate the issue, and can typically only determine the fault lies with another group, but does not know which one. So the problem gets passed from group to group, a tedious and time-wasting exercise.

We will always have different, specialized groups within one IT organization to oversee services and applications such as end-user experiences, application monitoring, database monitoring, transaction mapping and infrastructure monitoring. What must change is the elimination of the individual dashboards each group uses to monitor its own domains. The key is to roll all of that reporting information in real-time into one global dashboard that provides broad domain monitoring capabilities that can be abstracted and analyzed in a way that focuses on services and transactions. Providing this single source of truth will reconcile technology silos and support better incident and problem management processes.

In other words, you take the war out of the war room. Each participant can find the right information needed to perform his or her tasks while also sharing that information with their peers so they can do the same.

Implementing this new approach to performance management will be a radical change for many organizations, and there may be initial resistance to overcome as groups worry their individual roles are at risk of marginalization. Again, the ultimate goal is not to eliminate specialized groups within one IT organization, it is to improve the collaboration among those groups. The result is performance management that is much less reactive and must wait for a problem to occur before taking action. Universal real-time monitoring can enable IT to anticipate when and where a problem may arise and fix it before the end user or customer even notices it. The most productive end user and happiest customer can often be the ones you never hear from because their experiences are always positive. That kind of silence is golden.

Nik Koutsoukos is CMO, Strategy & Product Leader, at Catchpoint
Share this

The Latest

March 27, 2024

Nearly all (99%) globa IT decision makers, regardless of region or industry, recognize generative AI's (GenAI) transformative potential to influence change within their organizations, according to The Elastic Generative AI Report ...

March 27, 2024

Agent-based approaches to real user monitoring (RUM) simply do not work. If you are pitched to install an "agent" in your mobile or web environments, you should run for the hills ...

March 26, 2024

The world is now all about end-users. This paradigm of focusing on the end-user was simply not true a few years ago, as backend metrics generally revolved around uptime, SLAs, latency, and the like. DevOps teams always pitched and presented the metrics they thought were the most correlated to the end-user experience. But let's be blunt: Unless there was an egregious fire, the correlated metrics were super loose or entirely false ...

March 25, 2024

This year, New Relic published the State of Observability for Financial Services and Insurance Report to share insights derived from the 2023 Observability Forecast on the adoption and business value of observability across the financial services industry (FSI) and insurance sectors. Here are seven key takeaways from the report ...

March 22, 2024

In MEAN TIME TO INSIGHT Episode 4 - Part 2, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and AIOps ...

March 21, 2024

In the course of EMA research over the last twelve years, the message for IT organizations looking to pursue a forward path in AIOps adoption is overall a strongly positive one. The benefits achieved are growing in diversity and value ...

March 20, 2024

Today, as enterprises transcend into a new era of work, surpassing the revolution, they must shift their focus and strategies to thrive in this environment. Here are five key areas that organizations should prioritize to strengthen their foundation and steer themselves through the ever-changing digital world ...

March 19, 2024

If there's one thing we should tame in today's data-driven marketing landscape, this would be data debt, a silent menace threatening to undermine all the trust you've put in the data-driven decisions that guide your strategies. This blog aims to explore the true costs of data debt in marketing operations, offering four actionable strategies to mitigate them through enhanced marketing observability ...

March 18, 2024

Gartner has highlighted the top trends that will impact technology providers in 2024: Generative AI (GenAI) is dominating the technical and product agenda of nearly every tech provider ...

March 15, 2024

In MEAN TIME TO INSIGHT Episode 4 - Part 1, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and network management ...