Exploring the Convergence of Observability and Security - Part 2: Logs, Metrics and Traces
June 06, 2023

Pete Goldin
APMdigest

Share this

With input from industry experts — both analysts and vendors — this 8-part blog series will explore what is driving the convergence of observability and security, the challenges and advantages, and how it may transform the IT landscape.

Start with: Exploring the Convergence of Observability and Security - Part 1

One reason why observability and security make a good pairing is that traditional telemetry signals — metrics, logs, and traces — are helpful to maintain both performance and security.

"The convergence of security and observability is happening throughout the observability landscape, and telemetry pipelines are enabling organizations to make that happen," explains Buddy Brewer, Chief Product Officer at Mezmo. "Security engineers, developers, and SREs use telemetry pipelines to access telemetry data effectively and efficiently. Many are also adopting standards like OpenTelemetry to ease their data ingestion woes and allow teams across the organization to use standardized data and break down silos."

Brewer cites a recent ESG report showing that metrics, logs, and traces account for 86% of application data by volume. He maintains that this data is essential for SecOps teams to understand what parts of an application are working properly, identify errors, and determine how to address those errors. The same report shows that 69% of SecOps teams regularly or continuously access data from these three sources.

"Traditional application performance signals help SecOps by serving as a proof point that you are watching for outlier issues, for example, you are able to see and flag when something doesn't look right in your system," says Jam Leomi, Lead Security Engineer at Honeycomb. "This outlier data is surfaced in real-time using observability tools and can serve as an early indicator that something malicious is going on."

"There are emerging use cases for issues such as Kubernetes security or CSPM, where there does seem to be a big advantage to adding security capabilities to the traditional three pillars of logs, metrics and traces for observability," says Asaf Yigal, CTO of Logz.io. "Whether you have ops-type teams that can act on that data themselves or use it as a better informed stream of data to channel to their dedicated security teams, the reality is that cloud apps and infrastructure are so complex and fast moving, security has to be part of the picture for everyone involved."

Leomi of Honeycomb adds that the convergence of tools can help distinguish between performance and security issues, saying, "While a lot of the data surfaced in observability tools can look like an average system bottleneck or performance issue, applying the security lens to it could bring to light potential indicators of a security event."

Colin Fallwell, Field CTO of Sumo Logic agrees, "Many security incidents impact operations. For example, one can expect serious performance degradation to occur in a DDOS attack. Telemetry like tracing and logging data is naturally going to carry header information from web requests, IP information, and much, much more. Metrics are the canaries in the coal mine and serve as an early warning that something is wrong or trending out of the norm. All this data is valuable to security use cases as well. Deep application visibility, and deviations from the norm on authentication, access, processing, and DB access are table stakes for operations and highly valuable to SecOps. Consider how valuable this data is to security teams when trying to understand the impact and blast radius of security events."

Performance signals provide technologists with a detailed look into the health of their applications — if there are any bottlenecks, the signals can help locate where it's occurring and why, Joe Byrne, VP of Technology Strategy and CTO Adviser at Cisco AppDynamics adds. "For SecOps teams, detecting potential security threats before an attack is crucial, so having real-time insight into applications' performance would benefit them. SecOps teams can leverage observability tools to determine if any performance delays are due to vulnerabilities or security threats, allowing them to take immediate action to achieve resolution."

Let's look at each type of performance signal individually.

Logs

Log analytics tools have been serving cybersecurity teams for years, says Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA). "Logs are a record of what happened on a device or piece of software. Real time analysis will point to ongoing security incidents and forensic analysis will help security teams reconstruct an incident."

Use the player or download the MP3 below to listen to EMA-APMdigest Podcast Episode 2 — Shamus McGillicuddy talks about Network Observability, the convergence of observability and security, and more.

Click here for a direct MP3 download of Episode 2 - Part 1

Logs provide a detailed record of application behavior, and can be used for troubleshooting issues, identifying performance bottlenecks, and detecting security threats. These are the time-stamped records of events, notes Roger Floren, Principal Product Manager at Red Hat.

"It's all about the logs to some extent — it always has been and always will be," says Yigal from Logz.io. "Consider that the SIEM — the virtual nervous system of the modern security ecosystem, for decades now — is a centralized repository for security data, and its primary job has always been to consume and provide analysis on top of mountains of log data. And this is telemetry running the full gamut from ITOps logs to security data coming in from other purpose-built security tooling. So, there's that: you have to maintain visibility and analysis into your log data, and it's a foundational element of security practices.

Ajit Sancheti, GM, Falcon LogScale at CrowdStrike outlines the history: "DevOps, ITOps and SecOps teams need to be able to access different types of data for a variety of use cases, such as investigating threats, debugging network issues, maximizing application performance and much more. In the past, this meant that these individual teams would deploy siloed monitoring, SIEM and log management tools. Additionally, many of the log management tools on the market lacked the scale to centrally collect and store all logs and allow large numbers of users to simultaneously access and query this data."

"Today, organizations are finally able to log security and observability data in one place," Sancheti continues. "This is due to innovations like index-free logging architectures, which enable organizations to ingest a petabyte of data per day (or more)."

Chaim Mazal, Chief Security Officer at Gigamon says the challenge is that logging tools see things in hindsight, they do not detect threats in real time. It's only when log data and network-derived intelligence are integrated that SecOps teams can detect threats or performance issues in real-time before they harm or slow the business down.

"Once integrated and SecOps teams gain the deep observability required, they can shift toward a proactive security posture and ensure cloud security across their infrastructure whether it's located on-premises, in private clouds, in containers, or in the public cloud," Mazal adds.

Metrics

Performance metrics can also be used to identify security events in some cases.

"Deep performance signals such as identifying a workload's performance through metrics including CPU usage, system calls, memory usage, etc. allows security customers to determine aberrations from normal behavior," says Prashant Prahlad, VP of Cloud Security Products at Datadog.

For example,metrics can help to identify a possible denial of service attack if an unexpected and dramatic spike in usage is seen, according to Kirsten Newcomer, Director, Cloud and DevSecOps Strategy at Red Hat.

Yigal from Logz.io adds, "We see massive value in helping organizations quickly translate their huge volumes of logs into more immediately useful metrics from the traditional IT ops side, saving both time and money. But there's also the notion of introducing more security content, creating and tracking more security-relevant trends, so we do see some organizations moving in this direction."

Traces

Some experts say the key observability signal that makes a difference for security is traces. Newcomer from Red Hat says traces provide data about how information is flowing through a system and can be used to visualize unexpected errors and events.

"Security staff have always been dealing with logs. Metrics are also helpful. Traces are a new kind of information that observability brings into the picture," explains Mike Loukides, VP of Emerging Tech Content at O'Reilly Media. "They let you ask detailed questions about what's happening in the application — the sorts of questions that could help you to spot a compromise early on."

"To take an overly simple example: any system that's online will see failed login attempts all the time. These will be in the logs, and they don't tell you much," he continues. "When a failed login attempt is followed by a successful login from the same IP address, that might tell you something — or it might be that an authorized user mistyped his password. That's about as far as logging will take you. But when that now-authorized user starts interacting with parts of the system that they shouldn't have access to, you know you have a real problem. You can ask questions like: How did they get in? When did they get in? And what did they do while they were in our system? And that's the kind of information that you're going to get from traces."

Prahlad from Datadog concludes, "The applications get instrumented with libraries for tracing and the exact same traces are used to detect attacks. In many cases SecOps detect these aberrations from the performance data and identify security issues much more quickly — all without additional instrumentation and performance overheads."

Go to: Exploring the Convergence of Observability and Security - Part 3: Tools

Pete Goldin is Editor and Publisher of APMdigest
Share this

The Latest

November 21, 2024

Broad proliferation of cloud infrastructure combined with continued support for remote workers is driving increased complexity and visibility challenges for network operations teams, according to new research conducted by Dimensional Research and sponsored by Broadcom ...

November 20, 2024

New research from ServiceNow and ThoughtLab reveals that less than 30% of banks feel their transformation efforts are meeting evolving customer digital needs. Additionally, 52% say they must revamp their strategy to counter competition from outside the sector. Adapting to these challenges isn't just about staying competitive — it's about staying in business ...

November 19, 2024

Leaders in the financial services sector are bullish on AI, with 95% of business and IT decision makers saying that AI is a top C-Suite priority, and 96% of respondents believing it provides their business a competitive advantage, according to Riverbed's Global AI and Digital Experience Survey ...

November 18, 2024

SLOs have long been a staple for DevOps teams to monitor the health of their applications and infrastructure ... Now, as digital trends have shifted, more and more teams are looking to adapt this model for the mobile environment. This, however, is not without its challenges ...

November 14, 2024

Modernizing IT infrastructure has become essential for organizations striving to remain competitive. This modernization extends beyond merely upgrading hardware or software; it involves strategically leveraging new technologies like AI and cloud computing to enhance operational efficiency, increase data accessibility, and improve the end-user experience ...

November 13, 2024

AI sure grew fast in popularity, but are AI apps any good? ... If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment ...

November 12, 2024

If you were lucky, you found out about the massive CrowdStrike/Microsoft outage last July by reading about it over coffee. Those less fortunate were awoken hours earlier by frantic calls from work ... Whether you were directly affected or not, there's an important lesson: all organizations should be conducting in-depth reviews of testing and change management ...

November 08, 2024

In MEAN TIME TO INSIGHT Episode 11, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses Secure Access Service Edge (SASE) ...

November 07, 2024

On average, only 48% of digital initiatives enterprise-wide meet or exceed their business outcome targets according to Gartner's annual global survey of CIOs and technology executives ...

November 06, 2024

Artificial intelligence (AI) is rapidly reshaping industries around the world. From optimizing business processes to unlocking new levels of innovation, AI is a critical driver of success for modern enterprises. As a result, business leaders — from DevOps engineers to CTOs — are under pressure to incorporate AI into their workflows to stay competitive. But the question isn't whether AI should be adopted — it's how ...