The most destructive root cause of 75 percent of outages during big online events like Black Friday and Cyber Monday are unplanned configuration changes to a system – when IT and Ops teams find something they think might cause a problem and try to fix it immediately, unintentionally creating a much bigger issue for the web or mobile site.
The following are BigPanda's top recommendations for preventing outages during throughout the entire holiday shopping season:
- Identify the systems that are mission critical to your business. Many companies don't and try to treat their entire system as business critical – and this is a mistake.
- Have a bulletproof plan for your critical services. Once you've identified what your critical services are, know how to keep them up with a bulletproof plan for them. For instance, if Amazon checkout goes down – you need a disaster and recovery plan for this. But if the Recommendation Engine has problems, this is not at the same level of criticality.
- Tier your services. Having 3-5 tiers makes prioritization and response much easier, quicker and more effective when there is a problem. And make sure you have a backup and failover plan for the highest tier of your services.
- You don't need failover for everything. IT and Ops teams who try to have failover for everything often discover that they don't have it ready for anything.
- Don't become overly focused on the components of infrastructure. Make sure you are spending more time and focus on your services.
- Make sure you have planned for load capacity. Not planning for the sheer volume of people visiting your web or mobile site accounts for 25 percent of outages during big online events.
- Use a tool that allows you to consolidate your IT data. Implementing an alert correlation platform allows IT and Ops teams to separate signal from noise and focus more on the customer experience by providing a consolidated view of their IT alert data. This allows them to stop being reactive firefighters and become proactive before an issue has the chance to affect the customer.
Michael Butt is Director of Product Marketing at BigPanda.
The Latest
If you were lucky, you found out about the massive CrowdStrike/Microsoft outage last July by reading about it over coffee. Those less fortunate were awoken hours earlier by frantic calls from work ... Whether you were directly affected or not, there's an important lesson: all organizations should be conducting in-depth reviews of testing and change management ...
In MEAN TIME TO INSIGHT Episode 11, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses Secure Access Service Edge (SASE) ...
On average, only 48% of digital initiatives enterprise-wide meet or exceed their business outcome targets according to Gartner's annual global survey of CIOs and technology executives ...
Artificial intelligence (AI) is rapidly reshaping industries around the world. From optimizing business processes to unlocking new levels of innovation, AI is a critical driver of success for modern enterprises. As a result, business leaders — from DevOps engineers to CTOs — are under pressure to incorporate AI into their workflows to stay competitive. But the question isn't whether AI should be adopted — it's how ...
The mobile app industry continues to grow in size, complexity, and competition. Also not slowing down? Consumer expectations are rising exponentially along with the use of mobile apps. To meet these expectations, mobile teams need to take a comprehensive, holistic approach to their app experience ...
Users have become digital hoarders, saving everything they handle, including outdated reports, duplicate files and irrelevant documents that make it difficult to find critical information, slowing down systems and productivity. In digital terms, they have simply shoved the mess off their desks and into the virtual storage bins ...
Today we could be witnessing the dawn of a new age in software development, transformed by Artificial Intelligence (AI). But is AI a gateway or a precipice? Is AI in software development transformative, just the latest helpful tool, or a bunch of hype? To help with this assessment, DEVOPSdigest invited experts across the industry to comment on how AI can support the SDLC. In this epic multi-part series to be posted over the next several weeks, DEVOPSdigest will explore the advantages and disadvantages; the current state of maturity and adoption; and how AI will impact the processes, the developers, and the future of software development ...
Half of all employees are using Shadow AI (i.e. non-company issued AI tools), according to a new report by Software AG ...
On their digital transformation journey, companies are migrating more workloads to the cloud, which can incur higher costs during the process due to the higher volume of cloud resources needed ... Here are four critical components of a cloud governance framework that can help keep cloud costs under control ...
Operational resilience is an organization's ability to predict, respond to, and prevent unplanned work to drive reliable customer experiences and protect revenue. This doesn't just apply to downtime; it also covers service degradation due to latency or other factors. But make no mistake — when things go sideways, the bottom line and the customer are impacted ...