Anomaly detection with Z-score

Anomaly detection enables abnormal metric trends in your system to be detected by analysing historical data. It is also an effective tool for detecting frauds and mitigating risks.

A change in a metric could be benign or harmful. As a result, it is crucial to have a robust anomaly detection system in place that can differentiate between unusual trends and minor changes around the clock and notify relevant parties to investigate the cause of anomalies.

There are many techniques to determine what an anomaly is depending on the monitoring tool that you’re using. In Ascenda, we leverage on Grafana’s interactive visuals coupled with Prometheus’s powerful querying language to build a time series anomaly detection within our system using z-score.

Demystifying Z-score

Image by Florencia Mangini

Numbers aren’t as intimidating once you put them into words. Z-score uses standard deviation (SD) as a measure to determine how far the value is from the average. The peak of the normal distribution curve represents the average therefore, the further the value is from the average, the more likely it is considered to be an anomaly.

According to the 3-sigma rule that is commonly associated with normal distribution, 99.7% of our data should be within 3 SD. Any other values greater than 3 SD would be considered an anomaly. However, this can be tweaked according to your desired sensitivity.

Anomaly detection is the way to go, true or false?

What should you know before building your own anomaly detection with z-score?

Before we dig into its use cases in Ascenda, there’s one caveat to note to determine if anomaly detection using z-score is suitable for your use case: anomaly detection with z-score only works for normal distribution.

How do you know if your data is normally distributed? Likewise, there are numerous techniques that you can use to check if your data is normally distributed. The simplest approach would be to test if your data has a z-score of between +4 to -4 (don’t worry, we’ll get to this in the later section).

A z-score that is excessively large e.g. a range of -15 to +15, signifies that your data has a long tailed distribution where the tail of the curve stretches further in one direction. This means that events occurring further away from the head are rare. To illustrate this further using request response time as an example, we would expect most requests to have a fast response time (represented by the head of the curve) and slow response time should be a rare case (represented by the tail of the curve). Thus, suggesting that threshold alerts would be a better option compared to anomaly detection as we can determine the threshold for a slow response time.

Does z-score always go hand-in-hand with anomaly detection?

There are other techniques that you can use for anomaly detection such as seasonality. This topic deserves a post on its own where we can have a deeper dive into this but as a little segue into anomaly detection with seasonality, seasonality is suitable for metrics that have predictable data. Notice that the graph below depicts that there’s a similar shape on weekdays and little to no activities on the weekends (7th to 8th May). This denotes that our data has a predictable pattern and seasonality would be an effective technique for anomaly detection for this metric.

Ascenda’s anomaly detection in action

Next, we’ll take a look at how this can be done.

A peek into z-score with Grafana x Prometheus

Let’s walk through the use case of detecting anomalies in password reset failure as an abnormal amount of password reset failure could signify suspicious activity.

Here, we’ve tested with 1 weeks’ worth of data to determine if our data is normally distributed. Since it’s between +4 to -4, our dataset is suitable for anomaly detection using z-score.

(max_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[1w])) 
- (avg_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[1w]))
) / 
(stddev_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[1w]))

Let’s check for the number of failed password resets over the past 5 minutes in comparison to the last 12 hours and set an alert when our z-score exceeds 3 in accordance to the 3-sigma rule. From 7am onwards, notice that our z-score has exceeded 3.

(avg_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[5m])) 
- (avg_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[12h]))
) / 
(stddev_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[12h]))

However, if we look at the datapoint from 7pm the previous day to 7am the next day in the image below, the spike from 6am onwards would cause a false positive because the data spanned across the last 12h wasn’t sufficient to detect abnormal trend accurately since activity is minimal during the night.

avg(increase(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[5m]))

To reduce such cases of false positives, we modify our parameters to compare it with 1 week worth of data. Notice that the graph becomes smoother and the false positive is avoided.

(avg_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[5m])) 
- (avg_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[1w]))
) 
/ (stddev_over_time(Web_Controllers_PasswordResets_Create_Failed{environment="production"}[1w]))

However, this is where the Goldilocks problem might come in:

Too little amount of past data would result in false positives
With significant amount of past data, we avoid giving too much weight to abnormal behaviour that might have occurred in the recent past, potentially causing a positive case to be undetected
Our data should span over a reasonable amount of time to build a more accurate anomaly detection. Perhaps for a start, ask yourself if the data you expect in the past 12 hours is the same as the past 1 week, or the past 1 month? In this case, password reset activity would differ from day to night, weekdays and weekends. However, the activity is expected to be the same week-on-week. Therefore, 1 week would be a reasonable amount of time compared to 12 hours or a month.

To wrap it all up like a burrito

Anomaly detection enables our alerts to be smarter, and z-score is just one of the many methods that we can use for implementing an anomaly detection alert. However, we need to ensure that our dataset is normally distributed. Because different metrics produces different types of datapoints, there’s no one-size-fits-all solution. This might be a complex concept to grasp at first but through experimentation, you’ll eventually get the hang of it!

If you're interested to find out more about what we do, check us out at https://www.ascendaloyalty.com/