Grubb’s Test for Anomaly Detection

Problem Statement:

We are receiving time series of count data everyday and we want to detect whenever there is drastic change in this count.

 

Grubb’s test assumes a t-distribution of input and find out the outliers for required confidence interval. We remove this outlier and repeat the test again. Here is the pseudo code:

 

Grubbs Test(X, p-val=0.05):
    Repeat :
        Z <- zscore(X)
        n < len(X)
        zi, index <- max(abs(Z)), index(max(abs(Z)))
        if zi > threshold(N, p-val):
            remove X[index] 
        else:
            break

 

Traditionally Grubb’s tests has a alternate hypothesis that exactly one outlier is present in data. In above we modified it get all possible outliers.

 

Test Hypothesis
Grubb’s Test H0: There are no outliers in the data set
Ha: There is exactly one outlier in the data
Tietjen-Moore test H0: There are no outliers in the data set
Ha: There are exactly k outliers
Generalized ESD test H0: There are no outliers in the data set
Ha: There are up to r outliers

 

Above can be extended to two sided tests as well.

 

We had followed this in time series based anomaly detection and following approach were considered for pre processing before applying Grubb’s test:

  • Raw Count (No processing)
  • Residuals after STL decomposition
  • Residuals after fitting ARIMA

In our case raw count had worked well enough.

 

Reference:

https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h1.htm

Github Gist

 

Leave a comment