Recently I was called in to diagnose a “slow network” issue at a customer site. Now, this problem had been looked at before by the company’s internal IT, and they, while no slouches, had not come up with anything.
I take a slightly different approach. The first problem is that we’re all way too technical and often a user reporting “The network is slow” is taken literally.
Then an investigation of the network is performed to absolutely no avail, because the user(s) who reported it have no way on earth of telling if the network is slow or fast or anything else. They’re not network specialists or computer scientists and their language is loose – they’re actually observing that their “Application is Slow” — and presumably the application uses the network. A qualitative observation.
Best to start by chatting to them in detail, and get more qualitative observations. First, a bit of the old “Sherlock Holmes”.
When is it slow? All the time? Some of the time? At every location?
Did their colleagues agree at the time?
Were they all using the same application — or part of it?
Is it ever fast, or OK? When?
When is it slowest?
Like Holmes, it’s important to build up a good list of questions, and sometimes a pattern emerges and from the fog a light appears.
But, as I said, everything so far is qualitative. Circumstantial you might say. It’s time to do some measurement and get quantitative. An APM monitoring system is installed, measures response times in the network (as well as network traffic) and with a parting shot I ask our reporting users to please note for the next week or so times when “The Application is Slow”. I’ve educated them now and they begin to get the difference – I used a road analogy to explain the network. Everyone gets it.
By the way, the simple act of going and spending time with the users to discuss their woes has already had a hugely beneficial effect on them. The company cares. Not our company, though we absolutely do care; their company finally has listened!
The background homework has been done with the technical guys — I know the addresses of the principal servers, the end user sites, the major applications in use, and I’m recording transaction times, network and server portions.
All I need now is to have sufficient data collected with the APM, chat to the users to get their “times when the application was slow”, and correlate response times and network traffic with those to figure out what’s going on.
The APM results are quite clear. “The Network” is not slow, though the “App” is. The user’s even demoed it being slow — it was awful — qualitative judgement. In fact the network is not generally maxed out, and when it is, the QoS policies are set to favor the apps in question.
In this case, for expediency, all the monitoring and measurement was done from the site where the (complaining) users are located. To find out more I need to set up basecamp at the data center, but there is one thing I can already say to “The Network is Slow”: NO IT ISN’T.
Or, at least, not this bit. The investigation is continuing at the data center ...
Jim Swepson is Pre-sales Technologist at Itrinegy.