Monitoring Pokémon Go: When Your App Breaks All Records
August 03, 2016

Payal Chakravarty
IBM

Share this

In July 2016, the world of gaming was taken over by a new phenomenon – Pokémon Go. Within a matter of days "augmented reality" became mainstream and the app, which was launched mainly in the US and Australia, overtook Tinder and Twitter in the total number of downloads. Pokémon Go surpassed the wildest expectations of its creators, Niantic Labs, and then some.

With popularity comes scale, and with scale comes an overload of requests to the gaming servers. If you are not prepared enough, requests fail and users are frustrated. Frustrated with Pokémon Go crashes, laymen were talking about server status and memes were being created and circulated on social networks. Overnight, websites spun up just to report if the game was up or down in different countries. Being closely related to the APM space, my head was drawing up various ways in which Pokémon Go was perhaps addressing the issue and what monitoring they had to put in place to retain their popularity. Here is my list of probable solutions Pokémon Go could employ to improve the experience for their users and avid fans:

1. Synthetic monitoring

The first and most important question: Is the application up or down, and can users login from around the world?

A game this popular would need to ensure a five nine availability and high Apdex score. With synthetic end-user monitoring, simulated tests can be run from around the world to check for availability and response time as often every few seconds. The simulations can allow you to login to the app and interact with the app as a gamer would.

For example if a user is catching a Pokémon, he makes a HTTP request to an API “catchPokemon” with a set of parameters. Continuously checking if these HTTP requests return a valid response code within a reasonable amount of time ensures the “catching a pokemon” capability is functioning right. This ensures problems are detected and fixed proactively. Synthetic monitoring also helps determine if an issue was due to network latency.

2. Mobile Real User Monitoring

Pokémon Go is a mobile game that is accessed only from mobile devices. Hence Mobile End User Monitoring with crash analytics is imperative to rapidly scope the problem.

Data points – such as how often did crashes occur; what devices, OS and applications versions were being used when the crash occurred; and which geographies did the user come from – are extremely essential to isolate the problem. For example, insights such as “crashes between 6 and 6:30 PM PST were happening from iOS v9 users on West Coast specifically when users attempted to transfer a Pokemon” gives an instant problem scope to delve deeper into.

Further, by tracing individual requests, one can delve into exactly what line of code or what services/microservices could have impacted a particular crash. This data becomes even more insightful if it can be correlated with Twitter sentiment analysis.

A comparison between response time trends and throughput is also another good data point to evaluate if slow responses were due to extra load or an application bug.

3. Server, Database, Application Server Monitoring

In order to deal with scale, the infrastructure to support the game needs to be monitored to spot bottlenecks easily. This requires automatic discovery and health check of all the components that the game runs on.

Considering auto-scaling and high resiliency failover will probably be turned on to cater to the load, the discovery needs to be truly dynamic to track any new nodes that come up. A dynamically discovered topology could have multiple components such as application servers, web servers, databases, load balancers, content distribution networks etc. Memory leaks, CPU consumption, database I/O and space utilization, queues and deadlocks are metrics whose trends need to be monitored continuously with automatic baselines to help identify deviation from normal. Additionally, tracking and correlating log errors via log analysis from these various resources can help diagnose issues rapidly.

4. Predictive anomaly detection for the future

With sudden popularity, one thing that is bound to go out of control is a flood of alerts. To reduce alert noise and ensure that right issues are being worked on, there is the need to have intelligent monitoring alerts. Alerts should be generated based on analyzing, correlating and de-duplicating a set of events and should present sufficient information to enable faster debugging.

As an advanced setup, Pokémon Go monitoring should enable predictive anomaly detection to predict trends on capacity and consumption of backend resources much before they become issues.

Payal Chakravarty is a Program Director of Product Management for IBM Application Performance Management.

Share this

The Latest

June 28, 2017

In 2017, every second counts and even minor issues can have a significant impact on the success or failure of a brand interaction. Our latest research found that two thirds of people have rising expectations for digital performance, showing that businesses can expect consumer pressure to grow. The App Attention Index 2017 revealed just how unforgiving consumers are of badly performing digital services ...

June 27, 2017

In today's everchanging IT industry, network engineers face a slew of challenges when it comes to network management. As networks continue to grow and become more complex, many IT professionals struggle to get a grasp on key workflows in which network engineers still rely on manual processes, including network documentation, troubleshooting, change management and cybersecurity ...

June 26, 2017

Many organizations are struggling to resolve customer-impacting incidents quickly enough to preserve brand loyalty and revenue, according to PagerDuty's recent State of Digital Operations Report ...

June 23, 2017

"Become the Automator, Not the Automated." While it's a simple enough phrase, it speaks directly to how today's organizations and IT teams must innovate to remain competitive. A critical aspect of innovation is acknowledging the digital transformation of businesses. The move to digitalization enables organizations to more effectively unlock the power of information technology (IT) to fuel and accelerate business innovation. It is a competitive weapon and a survival imperative ...

June 22, 2017

Executives in the US and Europe now place broad trust in Artificial Intelligence (AI) and machine learning systems, designed to protect organizations from more dynamic pernicious cyber threats, according to Radware's 2017 Executive Application & Network Security Survey ....

June 21, 2017

While IT service management (ITSM) has too often been viewed by the industry as an area of reactive management with fading process efficiencies and legacy concerns, a new study by Enterprise Management Associates (EMA) reveals that, in many organizations, ITSM is becoming a hub of innovation ...

June 20, 2017

Cloud is quickly becoming the new normal. The challenge for organizations is that increased cloud usage means increased complexity, often leading to a kind of infrastructure "blind spot." So how do companies break the blind spot and get back on track? ...

June 19, 2017

Hybrid IT is becoming a standard enterprise model, but there’s no single playbook to get there, according to a new report by Dimension Data entitled The Success Factors for Managing Hybrid IT ...

June 16, 2017

Any mobile app developer will tell you that one of the greatest challenges in monetizing their apps through video ads isn't finding the right demand or knowing when to run the videos; it's figuring out how to present video ads without slowing down their apps ...

June 15, 2017

40 percent of UK retail websites experience downtime during seasonal peaks, according to a recent study by Cogeco Peer 1 ...