Don’t get me wrong, this isn't a political statement. Applications don't care if you are a Republican or a Democrat. They don't care if they were created in Canada or India, or if they are written in Ruby, Java or .NET. And the bottom line is that I don't care either — because the real problem behind the site's performance problems is dead simple — lack of testing.
Load testing has been around for years. Vendors such as Keynote Systems and (Compuware) Gomez offer hosted load testing. Vendors such as NRG Global (AppLoader), Apica, HP (LoadRunner), and SmartBear offer in-house solutions. The products are out there, and range in price from inexpensive to enterprise-grade. So cost is really no excuse for NOT taking advantage of what these products have to offer.
Too many application development shops assume that finding and fixing code bugs is all that goes into making sure an application is production ready — but that is simply not the case. While well-written code and well-designed sites are certainly a starting point, they don't guarantee quality applications.
Performance also depends on a host of infrastructure and configuration-related factors including:
- Network bandwidth and latency
- Contention from other applications
- Latency of "called" services
- Proximity of applications and users to the data
- Provisioning and configuration of web servers and load balancers
- Provisioning and configuration of physical and virtual servers
- Etc., etc., etc.
This list could have included hundreds of possible potential failure points. But the fact is that these failure points often don't actually "fail" until the application is stressed by multiple users. The biggest performance impact typically comes from a very simple source: the number of concurrent users of the system. And there is no way to ensure that the application is production ready EXCEPT by stressing it to its load limits.
Load testing helps pinpoint the number of real users the system can handle before slowing down and, ultimately, breaking entirely. I've seen commercial applications that could supposedly handle an unlimited number of users bog down at 25 and break at 75. In this case, the problem was with the load balancer configurations; however it could have been in any other infrastructure element as well.
The bottom line is that the code was fine — it was the supporting structure that was the problem. However this didn't resonate with the buyer that expected the application to support 1000 concurrent users out of the box.
It's too late for this iteration of the Affordable Healthcare site — it broke under a load of thousands of concurrent users. Hopefully before version 2 is rolled out later in November, SOMEONE out there will have load tested the new version.
With the complexity of today's componentized, massively integrated applications, load testing has become a "must have" versus a "nice to have". In the case of the Affordable Healthcare site, lack of adequate load testing created a problem that became national news and generated a political firestorm.
But this is a problem that didn't have to happen.