Traditionally, Application Performance Management (APM) is usually associated with solutions that instrument application code. There are two fundamental limitations with such associations. If instrumenting the code is what APM is all about, then APM is applicable only to homegrown applications for which access to code is available ...
The most destructive root cause of 75 percent of outages during big online events like Black Friday and Cyber Monday are unplanned configuration changes to a system – when IT and Ops teams find something they think might cause a problem and try to fix it immediately, unintentionally creating a much bigger issue for the web or mobile site.
The following are BigPanda's top recommendations for preventing outages during throughout the entire holiday shopping season:
- Identify the systems that are mission critical to your business. Many companies don't and try to treat their entire system as business critical – and this is a mistake.
- Have a bulletproof plan for your critical services. Once you've identified what your critical services are, know how to keep them up with a bulletproof plan for them. For instance, if Amazon checkout goes down – you need a disaster and recovery plan for this. But if the Recommendation Engine has problems, this is not at the same level of criticality.
- Tier your services. Having 3-5 tiers makes prioritization and response much easier, quicker and more effective when there is a problem. And make sure you have a backup and failover plan for the highest tier of your services.
- You don't need failover for everything. IT and Ops teams who try to have failover for everything often discover that they don't have it ready for anything.
- Don't become overly focused on the components of infrastructure. Make sure you are spending more time and focus on your services.
- Make sure you have planned for load capacity. Not planning for the sheer volume of people visiting your web or mobile site accounts for 25 percent of outages during big online events.
- Use a tool that allows you to consolidate your IT data. Implementing an alert correlation platform allows IT and Ops teams to separate signal from noise and focus more on the customer experience by providing a consolidated view of their IT alert data. This allows them to stop being reactive firefighters and become proactive before an issue has the chance to affect the customer.
Michael Butt is Director of Product Marketing at BigPanda.