NOTE: Numenta has
announced a strategic partnership with Avik Partners,
please read more about the future of
Grok for IT Analytics.
Like most engineering managers, I like to know when someone is manually touching
one of our servers. That’s especially true for production systems, but also
applies to QA servers. So, imagine my chagrin when Grok caught me red handed,
not just once, but twice this week!
This first example was when I upgraded one of our QA servers to Grok 1.3
(shameless plug – it’s available now!)
In this example, you can see a very slight increase in the number of bytes
received by the server, which was flagged very quickly. At the same time, the
CPU utilization starts to drop slightly, which Grok marked as yellow. Notice
that Grok picked up the update on 2 metrics at the same time, both right as the
process starts, before the metrics get into ranges that could be identified as
abnormal by most statistical techniques! Having a leading indicator – even by a
few minutes – that things are starting to behave atypically is a huge advantage!
Luckily for us, this was a totally innocuous change. And, investigation was
even easier because when I got the alert about the anomaly, I was still in the
middle of the update.
The second example from this week was when I decided doing some recovery testing
would be a good idea. I manually stopped all of the services on the QA server
and watched the CPU load drop rapidly again. This time, Grok picked up the
change very rapidly. But, notice that the new pattern also stabilizes very
quickly at ~20% CPU load, with the Grok anomaly score dropping back into the
green quickly. Grok then flags an anomaly around 12 hours later when I turned
the services back on and the CPU jumped back up to the ~85% mark. Now, if
you’re like me, you’ll have noticed a 3rd anomaly in the chart on the left,
right around 10PM. I drilled into that one and noticed that, sure enough, the
pattern is just slightly different at 10PM than it is at 9PM. It’s a visibly
subtle difference, but important nonetheless.
Stay tuned for our next “Anomaly of the Week”!