Flying Blind — The 2013 IT Operations Quotient Report
June 11, 2013

Sasha Gilenson
Evolven

Share this

IT Operations is now overwhelmed — by the volume, velocity and variety of change and configuration data, lacking insight or actionable information, all making change and configuration problems a chronic pain.

As shown by recent surveys at the Gartner Data Center Summit and ServiceNow Knowledge13 conferences, where Evolven surveyed over 300 IT Operations professionals asking questions critical to IT operations management, 84% of IT professionals said that they want to significantly improve their IT operations management.

The 2013 IT OQ (Operations Quotient) Report provides a good indication to IT executives as to whether IT ops investments have yielded desired results, using the IT Operations Quotient (OQ), a metric for evaluating operational ability to support existing business services and incoming business requirements.

When an Incident Occurs, Can You Quickly Know What Changed?

Only 7% of the professionals surveys indicated that, using their current IT management tools, they could quickly identify what changed in order to respond to problems and incidents.

The first question IT operations asks themselves when an incident occurs is "what changed?" Due to the complexity and dynamics taking place in the modern data center, with overwhelming configuration data and frequent changes, this question has become quite formidable.

Between applications, environments, and individual instances, mistakes and unauthorized changes happen, demanding that IT ops spend significant amounts of time managing configuration values.

Traditional IT management tools were not designed to deal with the complexity and dynamics of the modern data center. These tools have not been automated to collect data down to granular details, analyzing all changes and consolidating information to extract meaningful information from the sea of raw change and configuration data.

Without systems to manage and organize this growth, IT will drown in its own data.

Can You Automatically Validate that Your Release Deployed Accurately?

Only 8% of the participants surveyed agreed that they could currently automatically validate the accuracy of their deployments. Available release management tools are unprepared for one-off changes or changes that do not follow policy.

IT organizations regularly transition changes to production environments, checking changes throughout a set of pre-production environments.

Now IT is under even more pressure. To meet business requirements, application deployments have accelerated and software deployment schedules have driven up high-paced change activity. The increasingly agile nature of application and infrastructure change requests, leaves IT operations at a loss as they are inundated by change requests that run the gamut from the critical and high priority to the minor and unimportant.

With a typical environment having thousands of different system configuration parameters, any little change can impact performance. So it’s not surprising to see many companies going through painful stabilization periods after a release, as well as production outages.

Even when using automated tools for deployment, the lack of detailed visibility into the release means IT ops can’t ensure accurate, error-free deployments.

Can You Quickly Identify the Incident’s Root Cause?

As shown in this survey, the vast majority of IT professionals surveyed concurred that they lack the capabilities to quickly identify an incident’s root cause. IT organizations find themselves challenged when assessing system failure and tracking down the root cause, such as if a patch wasn't deployed or a server failed.

Any minute misconfiguration or omission of a single configuration parameter can quickly lead to an incident with high impact. With an infinite number of these configuration parameters in play when an environment incident hits, finding the root cause consumes both precious time and manpower, making MTTR woefully high in most organizations.

The root cause of downtime and incidents often start at the most granular level of configuration changes where today's configuration management and change management tools don't provide visibility. The different groups in organizations, like IT Development, Support, and Operations, tend to point the finger of blame for issues, and fail to diagnose or deal with the root cause of the problem.

After a major incident, root cause analysis should focus on root cause of the failure in order to not only resolve the incident but to head off a recurrence. Even when IT teams manage to suppress a failure, and operations can return to "normal", the true root cause may still remain unresolved, leaving the organization exposed to further chaos.

Can You Automatically Verify the Consistency of Your Environments?

From our survey, only 5% of the respondents felt that currently they can automatically verify the consistency of their environments, where they need to go into the fine, granular details and identify the make-up of even minor changes, having to process the enormous amounts of configuration data, for verifying the consistency between servers and environments.

As IT organizations regularly transition changes to production environments, IT teams need to check changes throughout a set of pre-production environments that can include system test, performance test, UAT, staging, etc (changes are also mirrored in a Disaster Recovery environment). IT has sought to diversify their workloads, spreading deployments over multiple IT environments to mitigate risk, yet also doubling complexity.

The high volumes of changes means that not all changes consistently make their way to all environments (pre-prod, prod, DR). The configuration parameters must be validated for consistency in real-time.

IT Operations Analytics Helps

With performance at risk from any disruptions to stability, IT teams need to know exactly what has changed in an environment.

Managing IT environments with intelligent automated analytics will drive more sophisticated proactive processes like comparing environment states, validating releases, and verifying consistency of changes,helping to prevent or identify critical issues. So rather than continue to feed bloated system tools, IT Operations should strive to simplify and implement configuration management based on IT Operations Analytics, and turn the situation around from what can’t be managed to being what can be done about performance and availability.

Sasha Gilenson is the Founder and CEO of Evolven Software.

Share this

The Latest

March 18, 2024

Gartner has highlighted the top trends that will impact technology providers in 2024: Generative AI (GenAI) is dominating the technical and product agenda of nearly every tech provider ...

March 15, 2024

In MEAN TIME TO INSIGHT Episode 4 - Part 1, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and network management ...

March 14, 2024

The integration and maintenance of AI-enabled Software as a Service (SaaS) applications have emerged as pivotal points in enterprise AI implementation strategies, offering both significant challenges and promising benefits. Despite the enthusiasm surrounding AI's potential impact, the reality of its implementation presents hurdles. Currently, over 90% of enterprises are grappling with limitations in integrating AI into their tech stack ...

March 13, 2024

In the intricate landscape of IT infrastructure, one critical component often relegated to the back burner is Active Directory (AD) forest recovery — an oversight with costly consequences ...

March 12, 2024

eBPF is a technology that allows users to run custom programs inside the Linux kernel, which changes the behavior of the kernel and makes execution up to 10x faster(link is external) and more efficient for key parts of what makes our computing lives work. That includes observability, networking and security ...

March 11, 2024

Data mesh, an increasingly important decentralized approach to data architecture and organizational design, focuses on treating data as a product, emphasizing domain-oriented data ownership, self-service tools and federated governance. The 2024 State of the Data Lakehouse report from Dremio presents evidence of the growing adoption of data mesh architectures in enterprises ... The report highlights that the drive towards data mesh is increasingly becoming a business strategy to enhance agility and speed in problem-solving and innovation ...

March 07, 2024
In this digital era, consumers prefer a seamless user experience, and here, the significance of performance testing cannot be overstated. Application performance testing is essential in ensuring that your software products, websites, or other related systems operate seamlessly under varying conditions. However, the cost of poor performance extends beyond technical glitches and slow load times; it can directly affect customer satisfaction and brand reputation. Understand the tangible and intangible consequences of poor application performance and how it can affect your business ...
March 06, 2024

Too much traffic can crash a website ... That stampede of traffic is even more horrifying when it's part of a malicious denial of service attack ... These attacks are becoming more common, more sophisticated and increasingly tied to ransomware-style demands. So it's no wonder that the threat of DDoS remains one of the many things that keep IT and marketing leaders up at night ...

March 05, 2024

Today, applications serve as the backbone of businesses, and therefore, ensuring optimal performance has never been more critical. This is where application performance monitoring (APM) emerges as an indispensable tool, empowering organizations to safeguard their applications proactively, match user expectations, and drive growth. But APM is not without its challenges. Choosing to implement APM is a path that's not easily realized, even if it offers great benefits. This blog deals with the potential hurdles that may manifest when you actualize your APM strategy in your IT application environment ...

March 04, 2024

This year's Super Bowl drew in viewership of nearly 124 million viewers and made history as the most-watched live broadcast event since the 1969 moon landing. To support this spike in viewership, streaming companies like YouTube TV, Hulu and Paramount+ began preparing their IT infrastructure months in advance to ensure an exceptional viewer experience without outages or major interruptions. New Relic conducted a survey to understand the importance of a seamless viewing experience and the impact of outages during major streaming events such as the Super Bowl ...