Don't Be the Next Instapaper
February 21, 2017

Michelle McLean
ScaleArc

Share this

Instapaper, a "read later" tool for saving web pages to read on other devices or offline, suffered an extensive outage 2 weeks ago. The site was unavailable for a day and a half, and even after restoring service, the company had to explain that its archives would be impacted for another full week. Ultimately, it was able to restore the archives sooner, but the outage garnered extensive press and social media coverage.

The cause of the outage was that an indexing file Instapaper relies on for reaching all stored links exceeded the max file size supported on the older instance of Amazon Web Services the site was first built on. You can read if you want more details .

While Instapaper hit a unique problem — a file size limitation — its experience speaks to a much larger problem: scaling a database is difficult, and never quick. That basic fact explains why outages like the one Instapaper suffered are surprisingly common.

Engineering a scaled database — and then performing the application changes needed to take advantage of that scaled out database — is tough coding work indeed. We encounter companies with full control of their source code who are petrified to make the changes needed to scale database capacity. Perhaps it's an ecommerce app, and it's too close to Black Friday. Or maybe it's just a case of attrition: the folks who really understand that code base are long gone, and the current engineers don't dare mess with the interworkings of the app.

These kinds of meltdowns are common during surge events, like the one ESPN suffered with the launch of Fantasy Football or the one Macy's suffered last Black Friday. Sometimes customers can see these events coming (e.g., they're expecting a major traffic surge on Black Friday) and sometimes they simply don't (e.g., their product gets a nod from a celebrity and all of a sudden they're swamped).

When a traffic surge takes down your site, it usually means the data tier was already fragile. Scaling the web infrastructure is pretty easy, as is scaling internet capacity. But scaling the data tier itself is where the challenges lie.

The Instapaper crisis also illustrates how the cloud alone doesn't solve the challenge of scaling the data tier. While elasticity is a hallmark of cloud services, the physics around having an application talk to multiple instances of a database remains a challenge. We've seen some customers suffer from an inflated sense of confidence that running in the cloud takes away these difficulties.

Don't wait for disaster to strike. Whether you're running on prem or in the cloud, keep a close eye on all metrics that reveal how "hot" your systems are running. Ensure your disaster recovery plan is robust — and recently tested. Better yet, don't rely on disaster recovery. Instead, run in active/active mode, where you've got multiple instances of all critical systems running in different locales, with the systems able to take on the full load if one portion fails.

Take steps now to scale your data tier and avoid these kinds of catastrophic outages. Those "Here's why we failed" engineering blog entries are no fun to write.

Michelle McLean is VP of Marketing at ScaleArc.

Share this

The Latest

June 23, 2017

"Become the Automator, Not the Automated." While it's a simple enough phrase, it speaks directly to how today's organizations and IT teams must innovate to remain competitive. A critical aspect of innovation is acknowledging the digital transformation of businesses. The move to digitalization enables organizations to more effectively unlock the power of information technology (IT) to fuel and accelerate business innovation. It is a competitive weapon and a survival imperative ...

June 22, 2017

Executives in the US and Europe now place broad trust in Artificial Intelligence (AI) and machine learning systems, designed to protect organizations from more dynamic pernicious cyber threats, according to Radware's 2017 Executive Application & Network Security Survey ....

June 21, 2017

While IT service management (ITSM) has too often been viewed by the industry as an area of reactive management with fading process efficiencies and legacy concerns, a new study by Enterprise Management Associates (EMA) reveals that, in many organizations, ITSM is becoming a hub of innovation ...

June 20, 2017

Cloud is quickly becoming the new normal. The challenge for organizations is that increased cloud usage means increased complexity, often leading to a kind of infrastructure "blind spot." So how do companies break the blind spot and get back on track? ...

June 19, 2017

Hybrid IT is becoming a standard enterprise model, but there’s no single playbook to get there, according to a new report by Dimension Data entitled The Success Factors for Managing Hybrid IT ...

June 16, 2017

Any mobile app developer will tell you that one of the greatest challenges in monetizing their apps through video ads isn't finding the right demand or knowing when to run the videos; it's figuring out how to present video ads without slowing down their apps ...

June 15, 2017

40 percent of UK retail websites experience downtime during seasonal peaks, according to a recent study by Cogeco Peer 1 ...

June 14, 2017

Predictive analytics is a popular ITOA technology that you can leverage to improve your business by leaps and bounds. Predictive analytics analyzes relationships among various data points to predict behavioral trends, growth opportunities and risks, which can add critical value to your business. Here are a few questions to help you decide if predictive analytics is right for your business ...

June 13, 2017

Many organizations are at a tipping point, as new technology demands are set to outstrip the skills supply, according to a new Global Digital Transformation Skills Study by Brocade ...

June 12, 2017

Network capacity is the lifeblood of an enterprise — bandwidth enables business. Getting the most out of the network is a fine balancing act, so it's understandable that enterprises are always hungry for more bandwidth. Two out of three IT and network professionals expect bandwidth usage to increase by up to 50% by the end of 2017. However, bandwidth availability issues could become a thing of the past. We are on the cusp of a great surge of capacity as gigabit speed internet becomes a reality ...