There are probably more myths and misunderstandings about the term DevOps than there are hard facts. I recently sat down with Gene Kim and Patrick Debois - two of the DevOps movement's most widely practiced individuals - to cut through the noise. We swapped experiences and discussed how a CIO should be thinking about DevOps, when to consider a DevOps approach, the business case for adopting it, patterns of success and patterns of failure.
We agreed that there are three general attributes that provide a good indication that DevOps might help your organization:
1. Work in Progress (un-deployed change such as a new feature or application) is on the rise.
2. “Fragile”, poor availability applications finding their way into production resulting in a low tolerance for experimentation.
3. Long waiting lines for line-of-business projects and rampant “shadow IT”.
Each of these symptoms can be traced back to a root cause problem where DevOps might help, but you need to take the time to understand the root cause. Let’s look at each.
Excess work in progress is often the root cause of fragile applications, change avoidance and the "shadow IT". These result from frustrated line-of-business executives feeling they have no choice but to bypass corporate IT in order to execute against their goals.
Let's assume you're using Agile and Kanban boards, measuring and establishing KPIs for excess WIP that is relatively straightforward. We're looking for work that's been started but not finished — not being actively tested by QA or not in production being tested by real users/customers. The simplest way to establish this measure is to chat with your Agile team leader and take a look at the boards.
If you're not at that stage of maturity, you can find WIP in other ways. First, check in with your Finance or Program Management Office and look for long running projects and/or projects burning through a lot of cash that haven't been delivered into production.
Second, simply go and chat with the LOB executives and get a sense of the tempo of delivery vs. project initiation.
Measuring the fragility of applications and services is a more subtle exercise. Virtually everyone will complain that an application or service "crashes" from time to time. What we're looking for here are signs of systemic failure to establish hardened, resilient services where the root causes can be traced back to design time decisions. This is not about finger-pointing at developers for producing buggy code or operations staff for not having paid for enough "nines" in their 99.xxx% availability infrastructure.
The KPIs we're looking for here are preventable defects in production, extended fault diagnosis times due to insufficient diagnostic/log file telemetry, outages/bugs due to discrepancies between development, test and production environments. The ideal person to help with this process is your problem manager – assuming you’ve established a problem management process!
By far the easiest thing to measure is the waiting line of projects that cannot commence due to insufficient resources being available. Again your Program Management Office should be able to furnish you with this, although it might be easier to get at your next senior leadership team meeting (just remember to wear a flak jacket if you don’t already know the answer!).
Sadly, there are a number of anti-patterns which might disguise how bad the situation really is. I am mainly thinking of the use of shadow IT resources, particularly cloud services, to get around IT bottlenecks. These users are often harder to find because they've often gone to great lengths to hide their footprints. Especially if shadow IT is not endorsed by the CxO suite.
Thankfully, it's easier to identify where teams have sought to reduce waiting lines by increasing the number of projects they run in parallel. Again your Agile team leaders and PMO should be able to furnish these numbers. But remember that it's a zero-sum game. Your teams have simply substituted the frustration of waiting for your project to start to the frustration of waiting for the project to finish. Unrealized WIP increases as does pressure to ship code before it's ready and before you know it, you're in a worse position than when you started.
“Wait a minute!” I hear you say, “Didn’t we implement Agile specifically to address all of these concerns?" Before you reach for your scrum master's throat, some clarification is in order.
Agile can and should improve the timeliness, quality and cost of new applications, maintenance and services. You wouldn't believe the number of Fortune 1000 IT shops I have visited where the mere mention of Agile near an operations manager results in a 30-minute tirade. Where it has failed to achieve these goals is where it results in a localized optimization, that is where the optimization ignores the impact up and downstream of the development process. This leads to the symptoms we describe, effectively undermining the Agile transformation and giving it a bad name.
It's because of these near-sighted optimizations that "bad Agile" can create an impedance mismatch between development and the other parts of the IT supply chain, in particular security and operations functions. Agile teams function in terms of a "release per day" philosophy (i.e. the whole code package should be able to be compiled and "run" at the end of each working day) – something that you can only confirm by subjecting it to QA on a test environment.
Operations teams don’t have the time to build and deploy unfinished assets on a daily basis. The work is left to the Dev and QA teams to jerry-rig their own build and deployment processes together. This typically results in a massive difference between the system under development and test vs. the ultimate production system.
Assets and artifacts from the Agile build process are not placed under effective change control, they are rarely utilized downstream from development. The teams that will be spending the most time with the application (75-95 percent of an applications' lifecycle is spent in production vs. development) are not involved in the requirements management process until far too late, if at all.
DevOps, along with the principals of continuous integration, delivery and deployment, aim to reduce the impedance mismatch. They do it by aligning people and processes with automated tools along the value chain in order to help deliver applications and services faster, at each stage of the application lifecycle.
There are so many patterns for success, we'll save that for a later article. Let's say you should be aiming to reduce quantum — the size — of changes and increase the frequency of change (a phrase borrowed from manufacturing is to "reduce the batch size"). This is a critical insight that runs against the common sense experience that fewer changes mean fewer outages.
We've all had the experience of arriving at work on Monday morning only to find your bleary-eyed operations staff sleeping under their desks. They just saved the organization after backing out a major change that went horribly wrong – as a result we've learned to avoid change. Change avoidance is a critical anti-pattern of a DevOps transformation.
That's not to say small changes can't hurt you – they are often the source of some of the biggest outages. An example is Google's outage several years back when a single punctuation mark destroyed the system — you need to change how you think about quality to catch defects earlier.
It's one of the reasons why Quality Assurance/testing must become a core discipline for DevOps and Agile shops. To make it stick and to keep the cost of quality low, it should be automated and comprehensive. Most important, it should reflect the production environment to the greatest degree possible. Ideally this is through automation of the dev-to-test-to-stage-to-production automation process.
Speaking of anti-patterns, much like a fool with a tool is still a fool, a Dev who thinks like Ops is still a flop if they're unable to execute due to organizational misalignment outside of IT. For example, a CIO might be thinking about aligning incentives within their own organization. They should remember that people outside of the IT group can be just as big a bottleneck to innovation.
As the most senior leader in the IT team, the CIO's role in a DevOps transformation is to anticipate those bottlenecks (such as over zealous compliance teams) and ensure they're on board and part of the process, not part of the problem.
DevOps is not an all-or-nothing proposition. Any transformation should start with identifying where its unique value can be applied.
Paul Muller is Vice President and Chief Evangelist at HP Software. His 20 years in the IT industry has been spent designing and deploying IT management systems as well as sales and marketing leadership both regionally and globally.