Automation has been around since the first server admin wrote a script. Since then, IT life has continually become more complex — multiple data centers, high availability, disaster recovery, and now the cloud all create a dynamically ever-changing hybrid IT estate.
Over the last few decades, IT departments have decreased budgets in part because of recession. As a result, they have are being asked to do more with less. The increase in work has amplified the need for automation.
IT Process Automation
IT process automation can be used from simple individual scripts to branching scripts to full-blown orchestration systems. In addition to commercial offerings, the open-source community has developed over a dozen projects.
Until recently, all automation required human activation to kick off a process. Operations teams would manually go through the data they received. Once the problem was found, which could take days to fix and war room assemblies, a fix could then be implemented manually, with a script, or an automation tool. As compliance became more important, using tools that could log the actions to implement a fix became more important, but we still have humans working the automation.
Runbooks started as paper instructions on what to do when a well-known problem occurred. More recently, runbooks are automated scripts or orchestration systems. This documentation and automation helps move problem resolution to less experienced operators, sometimes called shift-left.
AIOps
AI has recently become a reality for IT operations and can find problems then activate the appropriate automation to fix the problem. The original AIOps definition was focused on applying machine learning to the vast amount of data that IT operations was getting from all the monitoring tools it has. According to Enterprise Management Associates (EMA), enterprises have more than 10 monitoring tools managing hundred thousand metrics per day — not including log files.
This amount of data is too much for any human, even a group of humans to process. Hence, the application of machine learning which can process all this information in minutes or hours and point to the most likely root cause or at least narrow to a small number. Succinctly, AIOps turns IT operations data in operational insights to pinpoint the root cause of a problem.
Automated AIOps
Most recently, the definition of AIOps evolved to include automation. The idea is that once the machine learning determines the problem as described above, it kicks off the automation tools to fix the problem.
A recent survey determined about half the responding organizations allowed for fully automated problem resolution — no human involved. The other half wanted human review before acting, but even this is preferable to having the human take the time to decide which automation flow is required. This is a significant change from five years ago when most organizations were very nervous about automated remediation.
Getting to Automated AIOps
How do you get from where your current IT operations reality to automated AIOps? As the title implies, there are two parts - automation and AI. The below table shows the maturity curve from the automation perspective.
At the end of the day, the goal of the AI part is to take in data and automatically determine the problem. Some problems do not require machine learning to find, but the system must be able to take in data and isolate the problem. Once you have AI identifying a problem, you can connect it to the automated runbook that will remediate it. Then, from the Ops side, start by picking a single-use case, for example, "optimize event management" and then work with your teams to identify problems they see repeatedly. Voila — you have now automated AIOps.
The Latest
We're at a critical inflection point in the data landscape. In our recent survey of executive leaders in the data space — The State of Data Observability in 2024 — we found that while 92% of organizations now consider data reliability core to their strategy, most still struggle with fundamental visibility challenges ...
From the accelerating adoption of artificial intelligence (AI) and generative AI (GenAI) to the ongoing challenges of cost optimization and security, these IT leaders are navigating a complex and rapidly evolving landscape. Here's what you should know about the top priorities shaping the year ahead ...
In the heat of the holiday online shopping rush, retailers face persistent challenges such as increased web traffic or cyber threats that can lead to high-impact outages. With profit margins under high pressure, retailers are prioritizing strategic investments to help drive business value while improving the customer experience ...
In a fast-paced industry where customer service is a priority, the opportunity to use AI to personalize products and services, revolutionize delivery channels, and effectively manage peaks in demand such as Black Friday and Cyber Monday are vast. By leveraging AI to streamline demand forecasting, optimize inventory, personalize customer interactions, and adjust pricing, retailers can have a better handle on these stress points, and deliver a seamless digital experience ...
Broad proliferation of cloud infrastructure combined with continued support for remote workers is driving increased complexity and visibility challenges for network operations teams, according to new research conducted by Dimensional Research and sponsored by Broadcom ...