Exploring the Convergence of Observability and Security - Part 2: Logs, Metrics and Traces
June 06, 2023

Pete Goldin
APMdigest

Share this

With input from industry experts — both analysts and vendors — this 8-part blog series will explore what is driving the convergence of observability and security, the challenges and advantages, and how it may transform the IT landscape.

Start with: Exploring the Convergence of Observability and Security - Part 1

One reason why observability and security make a good pairing is that traditional telemetry signals — metrics, logs, and traces — are helpful to maintain both performance and security.

"The convergence of security and observability is happening throughout the observability landscape, and telemetry pipelines are enabling organizations to make that happen," explains Buddy Brewer, Chief Product Officer at Mezmo. "Security engineers, developers, and SREs use telemetry pipelines to access telemetry data effectively and efficiently. Many are also adopting standards like OpenTelemetry to ease their data ingestion woes and allow teams across the organization to use standardized data and break down silos."

Brewer cites a recent ESG report showing that metrics, logs, and traces account for 86% of application data by volume. He maintains that this data is essential for SecOps teams to understand what parts of an application are working properly, identify errors, and determine how to address those errors. The same report shows that 69% of SecOps teams regularly or continuously access data from these three sources.

"Traditional application performance signals help SecOps by serving as a proof point that you are watching for outlier issues, for example, you are able to see and flag when something doesn't look right in your system," says Jam Leomi, Lead Security Engineer at Honeycomb. "This outlier data is surfaced in real-time using observability tools and can serve as an early indicator that something malicious is going on."

"There are emerging use cases for issues such as Kubernetes security or CSPM, where there does seem to be a big advantage to adding security capabilities to the traditional three pillars of logs, metrics and traces for observability," says Asaf Yigal, CTO of Logz.io. "Whether you have ops-type teams that can act on that data themselves or use it as a better informed stream of data to channel to their dedicated security teams, the reality is that cloud apps and infrastructure are so complex and fast moving, security has to be part of the picture for everyone involved."

Leomi of Honeycomb adds that the convergence of tools can help distinguish between performance and security issues, saying, "While a lot of the data surfaced in observability tools can look like an average system bottleneck or performance issue, applying the security lens to it could bring to light potential indicators of a security event."

Colin Fallwell, Field CTO of Sumo Logic agrees, "Many security incidents impact operations. For example, one can expect serious performance degradation to occur in a DDOS attack. Telemetry like tracing and logging data is naturally going to carry header information from web requests, IP information, and much, much more. Metrics are the canaries in the coal mine and serve as an early warning that something is wrong or trending out of the norm. All this data is valuable to security use cases as well. Deep application visibility, and deviations from the norm on authentication, access, processing, and DB access are table stakes for operations and highly valuable to SecOps. Consider how valuable this data is to security teams when trying to understand the impact and blast radius of security events."

Performance signals provide technologists with a detailed look into the health of their applications — if there are any bottlenecks, the signals can help locate where it's occurring and why, Joe Byrne, VP of Technology Strategy and CTO Adviser at Cisco AppDynamics adds. "For SecOps teams, detecting potential security threats before an attack is crucial, so having real-time insight into applications' performance would benefit them. SecOps teams can leverage observability tools to determine if any performance delays are due to vulnerabilities or security threats, allowing them to take immediate action to achieve resolution."

Let's look at each type of performance signal individually.

Logs

Log analytics tools have been serving cybersecurity teams for years, says Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA). "Logs are a record of what happened on a device or piece of software. Real time analysis will point to ongoing security incidents and forensic analysis will help security teams reconstruct an incident."

Use the player or download the MP3 below to listen to EMA-APMdigest Podcast Episode 2 — Shamus McGillicuddy talks about Network Observability, the convergence of observability and security, and more.

Click here for a direct MP3 download of Episode 2 - Part 1

Logs provide a detailed record of application behavior, and can be used for troubleshooting issues, identifying performance bottlenecks, and detecting security threats. These are the time-stamped records of events, notes Roger Floren, Principal Product Manager at Red Hat.

"It's all about the logs to some extent — it always has been and always will be," says Yigal from Logz.io. "Consider that the SIEM — the virtual nervous system of the modern security ecosystem, for decades now — is a centralized repository for security data, and its primary job has always been to consume and provide analysis on top of mountains of log data. And this is telemetry running the full gamut from ITOps logs to security data coming in from other purpose-built security tooling. So, there's that: you have to maintain visibility and analysis into your log data, and it's a foundational element of security practices.

Ajit Sancheti, GM, Falcon LogScale at CrowdStrike outlines the history: "DevOps, ITOps and SecOps teams need to be able to access different types of data for a variety of use cases, such as investigating threats, debugging network issues, maximizing application performance and much more. In the past, this meant that these individual teams would deploy siloed monitoring, SIEM and log management tools. Additionally, many of the log management tools on the market lacked the scale to centrally collect and store all logs and allow large numbers of users to simultaneously access and query this data."

"Today, organizations are finally able to log security and observability data in one place," Sancheti continues. "This is due to innovations like index-free logging architectures, which enable organizations to ingest a petabyte of data per day (or more)."

Chaim Mazal, Chief Security Officer at Gigamon says the challenge is that logging tools see things in hindsight, they do not detect threats in real time. It's only when log data and network-derived intelligence are integrated that SecOps teams can detect threats or performance issues in real-time before they harm or slow the business down.

"Once integrated and SecOps teams gain the deep observability required, they can shift toward a proactive security posture and ensure cloud security across their infrastructure whether it's located on-premises, in private clouds, in containers, or in the public cloud," Mazal adds.

Metrics

Performance metrics can also be used to identify security events in some cases.

"Deep performance signals such as identifying a workload's performance through metrics including CPU usage, system calls, memory usage, etc. allows security customers to determine aberrations from normal behavior," says Prashant Prahlad, VP of Cloud Security Products at Datadog.

For example,metrics can help to identify a possible denial of service attack if an unexpected and dramatic spike in usage is seen, according to Kirsten Newcomer, Director, Cloud and DevSecOps Strategy at Red Hat.

Yigal from Logz.io adds, "We see massive value in helping organizations quickly translate their huge volumes of logs into more immediately useful metrics from the traditional IT ops side, saving both time and money. But there's also the notion of introducing more security content, creating and tracking more security-relevant trends, so we do see some organizations moving in this direction."

Traces

Some experts say the key observability signal that makes a difference for security is traces. Newcomer from Red Hat says traces provide data about how information is flowing through a system and can be used to visualize unexpected errors and events.

"Security staff have always been dealing with logs. Metrics are also helpful. Traces are a new kind of information that observability brings into the picture," explains Mike Loukides, VP of Emerging Tech Content at O'Reilly Media. "They let you ask detailed questions about what's happening in the application — the sorts of questions that could help you to spot a compromise early on."

"To take an overly simple example: any system that's online will see failed login attempts all the time. These will be in the logs, and they don't tell you much," he continues. "When a failed login attempt is followed by a successful login from the same IP address, that might tell you something — or it might be that an authorized user mistyped his password. That's about as far as logging will take you. But when that now-authorized user starts interacting with parts of the system that they shouldn't have access to, you know you have a real problem. You can ask questions like: How did they get in? When did they get in? And what did they do while they were in our system? And that's the kind of information that you're going to get from traces."

Prahlad from Datadog concludes, "The applications get instrumented with libraries for tracing and the exact same traces are used to detect attacks. In many cases SecOps detect these aberrations from the performance data and identify security issues much more quickly — all without additional instrumentation and performance overheads."

Go to: Exploring the Convergence of Observability and Security - Part 3: Tools

Pete Goldin is Editor and Publisher of APMdigest
Share this

The Latest

December 18, 2024

Industry experts offer predictions on how NetOps, Network Performance Management, Network Observability and related technologies will evolve and impact business in 2025 ...

December 17, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 6 covers cloud, the edge and IT outages ...

December 16, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 5 covers user experience, Digital Experience Management (DEM) and the hybrid workforce ...

December 12, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 4 covers logs and Observability data ...

December 11, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 3 covers OpenTelemetry, DevOps and more ...

December 10, 2024

In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 2 covers AI's impact on Observability, including AI Observability, AI-Powered Observability and AIOps ...

December 09, 2024

The Holiday Season means it is time for APMdigest's annual list of predictions, covering IT performance topics. Industry experts — from analysts and consultants to the top vendors — offer thoughtful, insightful, and often controversial predictions on how Observability, APM, AIOps and related technologies will evolve and impact business in 2025 ...

December 05, 2024
Generative AI represents more than just a technological advancement; it's a transformative shift in how businesses operate. Companies are beginning to tap into its ability to enhance processes, innovate products and improve customer experiences. According to a new IDC InfoBrief sponsored by Endava, 60% of CEOs globally highlight deploying AI, including generative AI, as their top modernization priority to support digital business ambitions over the next two years ...
December 04, 2024

Technology leaders will invest in AI-driven customer experience (CX) strategies in the year ahead as they build more dynamic, relevant and meaningful connections with their target audiences ... As AI shifts the CX paradigm from reactive to proactive, tech leaders and their teams will embrace these five AI-driven strategies that will improve customer support and cybersecurity while providing smoother, more reliable service offerings ...

December 03, 2024

We're at a critical inflection point in the data landscape. In our recent survey of executive leaders in the data space — The State of Data Observability in 2024 — we found that while 92% of organizations now consider data reliability core to their strategy, most still struggle with fundamental visibility challenges ...