The Importance of Network Observability for Tech Companies
November 15, 2022

Nadeem Zahid
cPacket Networks

Share this

Tech companies tend to be the earliest adopters of IT and digital transformation trends, for obvious reasons. These companies have already embraced a cloud-first mentality, and are well in to migrating business-critical workloads to the cloud. However, that "tip of the spear" position in regard to cloud adoption puts these companies at considerable risk of losing visibility into application workloads, leaving them to struggle to detect performance issues and potential threats.

The challenge is that cloud monitoring and visibility is hard, especially for public clouds, which tend to be a black box when it comes to observability. This balancing act between enthusiastic cloud adoption and consistent and complete visibility is crucial for big tech to get right, for two reasons.

First, the heavy reliance on SaaS-based apps (both as a product offering and for internal usage) and cloud data means that IT teams must maintain network performance and rapidly troubleshoot in hybrid cloud environments. A few seconds (or even milliseconds) of performance latency can lead to frustrated employees and customers.

Second, tech companies are prime targets for attackers. The financial and reputational damage of a security breach, especially for high-value targets such as large fintech companies, can easily ruin a company's image and operation. Security teams need both a real-time, reliable feed of packet data for their NDR and firewall tools, and a store of packet data going back weeks for forensic investigations.

Building the visibility infrastructure to make these cloud networks observable is a complex technical challenge. But with careful planning and a few strategic decisions, it's possible to appropriately design, set up and manage visibility solutions for the cloud.

Observability Challenges for Security and NPM

One of the key mandates for IT teams is ensuring consistency, making network performance monitoring (NPM) a high priority. If there's a problem, IT needs the ability to quickly trace it to a specific application, then onto specific nodes or parts of the public/private cloud infrastructure to solve the problem.

If the cloud provider is at fault, then IT will need detailed packet data to prove an SLA is being violated. Without that data, the troubleshooting can quickly devolve into useless finger pointing. (Yet turning on a cloud provider's built-in traffic mirroring and then investigating performance issues will take weeks.) To be useful, visibility must be in place before the issue arises.

Unfortunately, you can't just throw a switch to get access to packet data through traffic mirroring. In particular, managing the "fire hose" of cloud data in real-time for these mirroring scenarios is technically challenging.

Security is the other side of the observability coin and (at the risk of stretching the metaphor to breaking) it has two sides. The first is getting access to real-time packet data; this is similar to the performance monitoring challenge above, but with unique nuances. The second issue is the ability to save packets for forensic investigation.

For security purposes, real-time packet data feeds must go to security tools like NDR and firewalls. Not missing any of these packets is crucial; for cloud this makes an inline packet solution ideal. That said, security tools can often only ingest packets at 10G speeds, so faster connections will require a packet broker that can handle both 10G and 40/100G traffic. In terms of the packets themselves, traffic that is traversing environments, either between an application and the open internet or between the data center and the cloud, is often of particular interest to security teams as these can be likely entry points for an intruder. Unfortunately, this traffic can be particularly difficult to monitor.

For forensic analysis, security team investigations will require packet data that covers days or weeks of traffic between critical nodes. This means observability plans need to cover not just packet access, but capture and storage as well.

When setting up the monitoring infrastructure, several factors must also be weighed. At a basic level, the brokers, taps, capture devices, etc. all take up valuable rack space; consolidation, density and adequate topology planning are all critical. If data that's being monitored is sensitive or subject to privacy regulations, access to the visibility system and data must be controlled. The monitoring itself also creates a technical load on the network that must be accounted for (you don't want the monitoring itself to be the cause of performance issues).

Bridging the Visibility Gap

The appropriate monitoring infrastructure should be built around a subnet comprised of a load balancer, virtual packet broker and storage appliance, with equipment placed throughout the network at key points. One strategy to conserve space, save money and maximize resources is to use brokers as the "power strip" that distributes packets to firewalls and other security or NPM tools at the correct speeds. The subnet can further connect packet capture and storage to forensic tools for investigation, and feed NPM tools with real-time data to quickly triangulate network issues like latency, allowing IT teams to determine fault and, if necessary, negotiate with the cloud provider.

As mentioned, access to packet data in the public cloud is particularly difficult. The hyperscale providers all recognize the problems this lack of visibility causes, and each have taken different paths to solving it. AWS and GCP use similar mirroring approaches (VPC traffic (AWS) or packet (GCP) mirroring service). In basic terms, this traffic/packet mirroring duplicates network traffic to and from the client's applications and forwards it to cloud-native performance and security monitoring tool sets for assessment, and to capture devices for later analysis. This eliminates the need to deploy ad-hoc forwarding agents or sensors in each VPC instance for every monitoring tool. The raw data itself is not ready for analysis, and requires a virtual or cloud packet broker to ensure the right data gets to the right monitoring or security tools. That said, combining these mirroring options with virtual packet brokers can ultimately reduce cost, as a single stream only has to be mirrored once for the broker (as opposed to once per each NPM or security tool).

Solving the visibility challenge with Azure is different, and requires using what's known as "inline mode" on certain virtual packet brokers. This allows the packet broker itself to monitor subnet ingress and egress traffic to capture, pre-process, and deliver packet data in real-time to security, performance management, analytics, capture and other solutions.

Developing this visibility topology is complex; many companies may not have the necessary in-house staff to handle it, and may need to work with service providers or vendors on the design and set-up. But whether handled in-house or outsourced, keep tool and infrastructure sprawl in mind: a mixture of virtual and physical devices can save rack space in data centers, and leveraging the cloud for a consolidated management view of all packet broker and capture solutions can save considerable time.

Tech companies often take the slings and arrows that come with early adoption. But paying careful attention to visibility and monitoring allows organizations to better weather these issues by staying on-top of threats and ensuring the network is operating according to plan.

Nadeem Zahid is VP of Product Management & Marketing at cPacket Networks
Share this

The Latest

December 03, 2024

We're at a critical inflection point in the data landscape. In our recent survey of executive leaders in the data space — The State of Data Observability in 2024 — we found that while 92% of organizations now consider data reliability core to their strategy, most still struggle with fundamental visibility challenges ...

December 02, 2024

From the accelerating adoption of artificial intelligence (AI) and generative AI (GenAI) to the ongoing challenges of cost optimization and security, these IT leaders are navigating a complex and rapidly evolving landscape. Here's what you should know about the top priorities shaping the year ahead ...

November 26, 2024

In the heat of the holiday online shopping rush, retailers face persistent challenges such as increased web traffic or cyber threats that can lead to high-impact outages. With profit margins under high pressure, retailers are prioritizing strategic investments to help drive business value while improving the customer experience ...

November 25, 2024

In a fast-paced industry where customer service is a priority, the opportunity to use AI to personalize products and services, revolutionize delivery channels, and effectively manage peaks in demand such as Black Friday and Cyber Monday are vast. By leveraging AI to streamline demand forecasting, optimize inventory, personalize customer interactions, and adjust pricing, retailers can have a better handle on these stress points, and deliver a seamless digital experience ...

November 21, 2024

Broad proliferation of cloud infrastructure combined with continued support for remote workers is driving increased complexity and visibility challenges for network operations teams, according to new research conducted by Dimensional Research and sponsored by Broadcom ...

November 20, 2024

New research from ServiceNow and ThoughtLab reveals that less than 30% of banks feel their transformation efforts are meeting evolving customer digital needs. Additionally, 52% say they must revamp their strategy to counter competition from outside the sector. Adapting to these challenges isn't just about staying competitive — it's about staying in business ...

November 19, 2024

Leaders in the financial services sector are bullish on AI, with 95% of business and IT decision makers saying that AI is a top C-Suite priority, and 96% of respondents believing it provides their business a competitive advantage, according to Riverbed's Global AI and Digital Experience Survey ...

November 18, 2024

SLOs have long been a staple for DevOps teams to monitor the health of their applications and infrastructure ... Now, as digital trends have shifted, more and more teams are looking to adapt this model for the mobile environment. This, however, is not without its challenges ...

November 14, 2024

Modernizing IT infrastructure has become essential for organizations striving to remain competitive. This modernization extends beyond merely upgrading hardware or software; it involves strategically leveraging new technologies like AI and cloud computing to enhance operational efficiency, increase data accessibility, and improve the end-user experience ...

November 13, 2024

AI sure grew fast in popularity, but are AI apps any good? ... If companies are going to keep integrating AI applications into their tech stack at the rate they are, then they need to be aware of AI's limitations. More importantly, they need to evolve their testing regiment ...