Using Machine Learning Analytics to Deliver Service Levels
September 21, 2016

Jerry Melnick
SIOS Technology

Share this

While the layers of abstraction created in virtualized environments afford numerous advantages, they can also obscure how the virtual resources are best allocated and how physical resources are performing. This can make maintaining optimal application performance a never-ending exercise in trial-and-error.

This post highlights some of the challenges encountered when using traditional monitoring and analytics tools, and describes how machine learning, as a next-generation analytics platform, provides a better way to meet SLAs by finding and fixing issues before they become performance problems. A future post will describe how machine learning analytics can also be used to allocate resources for optimal performance and cost-saving efficiency.

Most IT departments identify performance problems with tools that monitor a variety of discrete events against preset thresholds. For example they set a specific threshold for CPU utilization. Whenever that threshold is exceeded, the tool fires off alerts. But the use of thresholds presents several challenges. They do not account for the interrelated nature of resources in virtualized environments, where a change to or in one can have a significant impact on another. Such interrelationships exist both within and across silos. Without a complete understanding of the environment across silos, users of threshold-based tools frequently discover that their attempts to solve a problem have simply moved it to a different silo.

Thresholds often generate "alert storms" of meaningless data and miss important correlations that might indicate a severe problem exists. They are ineffective in detecting the symptoms of subtle issues that may indicate a significant imminent problem such as "noisy neighbors" or datastore latency issues. These subtle issues may not exceed a threshold related to the root cause or may exceed a threshold in short, random intervals, producing alerts that are frequently lost amid the "noise" of alert storms.

Even the so-called dynamic thresholds cannot accommodate the constant change in dynamic environments and, as a result, require significant ongoing IT intervention. And finally, while they may alert IT to an issue, they rarely provide sufficiently actionable information for resolving it. The exponential growth in the size and complexity of virtual environments has outstripped the ability of IT staff to set, manage, and continuously adjust threshold-based tools effectively. The time for an automated solution has come.

Advanced machine learning-based analytics software overcomes these and other challenges by continuously learning the many complex behaviors and interactions among interrelated objects – CPU, storage, network, applications – across the infrastructure. Unlike threshold-based solutions, this growing knowledge enables machine learning-based IT analytics solutions to provide a highly accurate means of identifying the root cause(s) of performance problems and making specific recommendations for resolving them cost-effectively.

This ability to aggregate, normalize, and then correlate and analyze hundreds of thousands of data points from different monitoring and management systems enable machine learning analytics solutions to transform massive volumes of data into meaningful insights across applications, servers and hosts, and storage and network infrastructures.

As it gathers and analyzes this wealth of data, the MLA system learns what constitutes normal behaviors, and it is this baseline that gives the system the ability to detect anomalies and find root causes automatically.

In addition to identifying root causes, advance machine learning based analytics solutions are able to simulate and predict the impact of making certain changes in resources and their allocations, which can be particularly useful for optimizing resource utilization and planning for expansion. This capability can also be useful for assessing if there is adequate capacity to handle a partial or complete failover. And these are topics worthy of a deeper dive in a future post.

Jerry Melnick is President and CEO of SIOS Technology.

Share this

The Latest

September 21, 2017

The increased complexity of new computing architectures coupled with new application development methodologies – especially in the face of time-to-market and security threat pressures – should make secure UX the first strategic decision for CEOs and CFOs on the path to digital transformation ...

September 19, 2017

IT professionals tend to go above and beyond the scope of their core responsibilities as the changing business landscape demands more of their attention, both inside and outside of the office, according to the Little-Known Facts survey conducted by SolarWinds in honor of IT Professionals Day ...

September 18, 2017

Digital video consumption is viral and, according to a new study released by IBM and International Broadcasting Convention (IBC), more than half of the 21,000 consumers surveyed are using mobiles every day to watch streaming videos, and that number is expected to grow 45 percent in the next three years ...

September 15, 2017

No technology that touches more than one IT stakeholder, no matter how good and how transformative, can deliver its potential without attention to leadership, process considerations and dialog. In this blog, I'd like to share effective strategies for AIA adoption ...

September 14, 2017

Enterprise IT environments are becoming more heterogeneous and complex, with fragmentation permeating cloud infrastructure, tooling and culture, according to a survey recently conducted by IOD Cloud Technologies Research in partnership with Cloudify ...

September 12, 2017

One area that enables enterprises to reduce complexity and streamline operations is their virtual desktop infrastructure (VDI). Virtualization is a linchpin of digital transformation and effectively optimizing an enterprise's VDI is essential to moving forward with digital technologies. Delivering the best possible VDI performance means taking a fresh look at what "desktop" means today. The endpoint, or desktop, now can be a physical thin client, a software-defined thin client, a traditional laptop, a phone or tablet. To reduce operational waste and achieve better performance across the desktop environment, consider these five actions ...

September 11, 2017

In incident management, we often overlook the simple things in favor of trying to do too much, too soon. Why not make sure we've done the fundamentals properly? ...

September 08, 2017
For our Advanced IT Analytics (AIA) Buyer's Guide, we interviewed more than 20 deployments to help us better assess vendor strengths and limitations. So given the abundance of riches to work with, I've decided to illustrate several of the more prominent AIA benefit categories with actual real-world comments ...
September 07, 2017

The Input/Output Operations per Second (I/O) capabilities of modern computer systems are truly a modern wonder. Yet no matter how powerful the processors, no matter how many cores, how perfectly formed the bus architecture, or how many flash modules are added, somehow it never seems to be enough ...

September 06, 2017

By taking advantage of performance monitoring, IT and business decision makers can gain better visibility into their cloud and application performance. Dedicated performance monitoring has become essential for providing visibility into all areas of application performance and keeping the business running optimally ...