IT engineers and executives are responsible for system reliability and availability. The volume of data can make it hard to be proactive and fix issues quickly. With over a decade of experience in the field, I know the importance of IT operations analytics and how it can help identify incidents and enable agile responses.
Start with: Everything You Need to Know About IT Operations Analytics - Part 1
How ITOA Applies Big Data Principles
ITOA works on Big Data principles. The purpose is to use your company's data for better business outcomes. The key Big Data steps are gathering, storing, and organizing data. These efforts enable you to perform analytics and visualizations.
ITOA unifies data from:
■ Data logs from the network, hardware, applications, and other system information
■ Monitoring solutions
■ Software agents that observe and report on the IT environment and resource usage
■ Virtual machine monitoring (VMM) software, also known as a hypervisor
The information flow from these tools is characterized by the three Vs of Big Data: velocity, volume, and variety. The various surveillance, monitoring, and reporting solutions produce data in large quantities, at high speed, in multiple formats, and from a variety of sources. The best practice is to use an analytics tool that brings together all your data sources and provides a unified view of your entire IT ecosystem.
The Big Data technologies that enable users to perform analytics on this data include open-source software frameworks such as Hadoop for data lakes and unstructured data stores such as NoSQL.
IT Operations Analytics mine these large data volumes and find patterns and relationships in the data. These findings are the basis for algorithmic models that spot anomalies.
Working with data in this way represents a shift from the traditional approach in which IT Ops teams looked at data within each of their monitoring tools. Examining each piece in isolation leads to a fragmented view. One common pain point for teams was the need to toggle between screens to see each tool's output.
Big Data makes it possible to bring data from all the monitoring and reporting tools together, both for more effective analysis and a simplified single-pane view for the user. IT teams gain a holistic picture of system performance. Doing this makes sense because the system's components interact, and issues in one area affect another.
Some people describe this integration as data-driven IT as opposed to tool-driven IT because the data set as a whole directs IT actions, not the output of individual tools.
This evolution dovetails with trends toward integrated monitoring architecture, cross-functional teams, and continuous monitoring and improvement. In addition, continuous integration, continuous deployment, and continuous delivery of code updates increase the value of ITOA.
IT Operations Analytics Architecture
To maximize ITOA performance, the architecture needs to have scalability, interoperability, security, and flexibility. ITOA systems built on open-source tools facilitate this architecture.
Features an ITOA analytics architecture offers include:
■ Scalability: Can expand as systems and data volume grow without bottlenecks, usage restrictions, or cost barriers.
■ Interoperability: Works with all operating systems and programming languages; is open and nonproprietary.
■ Integration: Can integrate data in many ways, including APIs, middleware, and virtually; also provides uniform access and common storage methods.
■ Security: Does not put the organization's systems or data at risk.
■ Flexibility: Integrates data of all types, from all tools, in one store.
Many companies have built IT monitoring systems piecemeal, acquiring different tools for different needs such as network monitoring or applications support. This tends to result in an abundance, or even an excess, of tools. Each tool would produce helpful but siloed data. Robust ITOA demands integrating data from all sources with Big Data principles.
ITOA architecture provides complete visibility into the IT environment by working with data from all sources. These include:
■ Agent Data: Data from monitoring and surveillance agents, which can include agents that detect software coding errors.
■ Human Data: Data resulting from human activity, including text, images, video, social media posts, and more; most ITOA systems can store this information, but IT Operations Analytics for this data type are immature.
■ Machine Data: Data reported by the system itself, such as audit logs and event tracing.
■ Synthetic Data: Data created to test systems and services; this data emulates real data, including data that simulates customer transactions in different locations.
■ Wire Data: Data from communications among system layers, from Layer 2 (data link) to Layer 7 (applications).
The operations analytics system must be able to handle the following:
■ Complex Queries: These use multiple parameters and may require joins across multiple data tables and nested subqueries.
■ High Query Volume: The system is able to serve concurrent queries.
■ Live Sync: The database automatically and continuously updates with new data from all sources.
■ Low Data Latency: Updates to data are visible within a few seconds.
■ Low Query Latency: Results are returned in near real time.
■ Mixed Data: Data of different types are stored together, minimizing cleaning and reducing latency.
Four Types of IT Operations Analytics and When to Use Them
ITOA includes the four common types of analytics: descriptive, diagnostic, predictive, and prescriptive. These progress in complexity and difficulty. Descriptive analytics looks at data to describe what has happened. Prescriptive analytics answers the question, "What should we do next?"
As organizations increase their experience with ITOA, they become increasingly capable and ready for a more difficult level of analytics. In an analytics maturity model, prescriptive analytics requires the most maturity.
■ Descriptive IT Operations Analytics: This type of analytics provides information about what has happened in the IT environment. An example of this would be when the ITOA system detects customers having trouble checking out from the company's e-commerce site. The IT team can swing into action and fix the problem before more sales are lost. Another example would be looking at historical data to calculate the IT Ops team's mean time to resolve (MTTR), the average amount of time it takes to fix an issue.
■ Diagnostic IT Operations Analytics: This helps pinpoint the source and cause of the IT problem. For example, ITOA through root cause analysis can highlight an issue with the integration to the e-commerce site's payment processor.
■ Predictive IT Operations Analytics: This tells you what is likely to happen. For example, based on historical data about past system crashes, ITOA can identify the system state, usage patterns, and other factors that are likely to cause a system outage in the future.
■ Prescriptive IT Operations Analytics: This supports better decision-making by telling you which actions will produce the best outcomes. It uses simulation and optimization algorithms. This area of ITOA is the least mature. Decision support from prescriptive analytics improves as ITOA becomes more proficient working with data ambiguity. For example, analytics can tell you that the company is better off building a new data center now based on usage patterns, network traffic, geographic distribution of sales, growth trends, and the relative costs and maintenance needs of adding capacity to existing data centers versus building them.
Go to: Everything You Need to Know About IT Operations Analytics - Part 3