Anomaly detection

Anomalies highlighted on the graph

Anomalies highlighted on the graph

For trial and paid users, Server Density will continually analyse your data and use past behaviour to highlight anomalies so you can pick out things that you might otherwise miss.

As soon as you add a server, we will start analysing the data. However, we wait 48 hours before returning any results to allow a reasonable period of time to elapse so that we can pick up trends. After that initial 48 hours, we analyse the data hourly and display it on the Server Density charts.

Metrics

Anomaly detection is performed on the following metrics:

The other metrics provide too much variability for anomaly detection to be of any use.

Example

A relatively common example in the data we've looked at is apache busy workers - on busy webservers, you get a nice pattern coinciding with the daily cycle, and any deviations are flagged quite nicely. For mostly-idle servers, you tend to get < 5 workers, occasionally adding or losing one or two, with no stable pattern. Lots of anomalies will be flagged, but they don't really mean anything.

Graphing and more info

Anomalies will be graphed and highlighted on the normal graphs for each metric. They are toggled off by default.

Anomalies highlighted on the server snapshot

Anomaly highlighted on the server snapshot

Clicking the point where an anomaly is highlighted will take you to the snapshot as usual, however the anomalous value will be highlighted and further information displayed when you hover over the (i) in the bottom right corner.

Detection algorithm

Using a modified Holt-Winters algorithm, Server Density uses past behaviour to predict future behaviour, future variability, and measure whether current behaviour is within the expected bounds, given the behaviour that could be predicted from the past.

It uses both the immediate past, and the data from 24 hours ago to make its predictions - so if traffic increases every morning at 9am, then it takes that into account and will not flag it up as an anomaly.

It also models expected deviation over a 24 hour cycle - so if your traffic tends not only to increase during the day, but also to get much more bursty - while it is not bursty at all overnight - then it applies a narrower threshold for anomalies overnight than it does during the daytime.

Finally, it doesn't flag up an anomaly every time the behaviour crosses the expected bounds - it only flags it up when the behaviour crosses it more than a given number of times over a given period. So isolated fluctuations won't flag up anomalies, it needs to be consistently anomalous.

Training

The algorithm trains itself as it goes along - it will get more accurate at detecting anomalies the more history it has to work with. So after 48 hours it will start trying to detect anomalies, but it'll carry on getting better after that.

Also, if the server behaviour patterns change, it will adjust itself - it will flag up anomalies initially (because the behaviour has changed) but it will settle into the new pattern of behaviour fairly quickly - after a couple of days consistent behaviour.

It works best in general on uninterrupted data - but if there are interruptions, it'll do it's best anyway. Without data, it does un-train itself - so if you don't give it any data for several days, it will "forget" what the old pattern of behaviour was - but it will re-train itself once more data arrives.