anomaly detection | Data Stories

These are notes form Andrew N.G’s machine learning course on coursera.

Anomaly Detection Problem

Assembly line prepared aircraft engine, you want to check if it okay.
- Features can be
  - heat generated
  - vibration intensity
Fraud detection in finance/retail
- Feature would be based on user’s activity
Monitoring CPUs in data center
- Features would be memory used, CPU load, network traffic.
Generally we have less number of positive (anomalous example) compared to negative ones (normal).

Solution Based on density Estimation

We try to fit normal examples in gaussian distribution
For new engine we estimate the probability of it p
If p < epsilon – we flag the new engine as anomalous
We generally train one gaussian per features and multiply like naiver bayes
- That is to say off diagonal elements in multivariate gaussian are zero
- More details in the last section of this post ( multivariate gaussian )

Model Evaluation

Once we have multiple models having different features how to evaluate which one is better ?
We also need to tune epsilon parameter.
We can use standard setup of train-set, test-set and cross validation set
Train set would have normal examples only. It would okay if few anomalous samples slips in
- So training is unsupervised only
- Predict y = 1 if p(x) < epsilon else 0
Bad metric
- classification accuracy (because classes are imbalanced)
Good metric
- TP, FP, TN, FN
- Precision/recall
- F1 score
Cross validation set is used for tuning epsilon

Anomaly Detection vs Supervised Learning

Supervised model like logistic regression would require
- 1) More training examples
- 2) Somewhat balanced classes

Feature Engineering

Since we are fitting gaussian we need to do some transformation if feature distribution does not look like one. Popular transformations are
- log (x)
- log (x + c)
- sqrt (x)
How to introduce new feature
- We need to do this when p(x) is comparable for normal and anomalous sample
- Once you find anomalous sample for which p(x) is not low enough, try looking deep into it.
- Property which is making it anomalous would be a new feature to add
Feature Engineering Recommendation
- Think about features which will be too high or too low in case of anomaly
- x5 and x6 can be a good feature in below image.

Multivariate Gaussian

Shortcoming of individual gaussian is that in case of correlated features it won’t be able to detect the anomaly.
- Green sample in above image will not be detected
To mitigate this we can hand-code ratio based features
Original model is more popular because it scales well with no of features.

Problem Statement:

We are receiving time series of count data everyday and we want to detect whenever there is drastic change in this count.

Grubb’s test assumes a t-distribution of input and find out the outliers for required confidence interval. We remove this outlier and repeat the test again. Here is the pseudo code:

Grubbs Test(X, p-val=0.05):
    Repeat :
        Z <- zscore(X)
        n < len(X)
        zi, index <- max(abs(Z)), index(max(abs(Z)))
        if zi > threshold(N, p-val):
            remove X[index]  
        else:
            break

Traditionally Grubb’s tests has a alternate hypothesis that exactly one outlier is present in data. In above we modified it get all possible outliers.

Test	Hypothesis
Grubb’s Test	H0: There are no outliers in the data set Ha: There is exactly one outlier in the data
Tietjen-Moore test	H0: There are no outliers in the data set Ha: There are exactly k outliers
Generalized ESD test	H0: There are no outliers in the data set Ha: There are up to r outliers

Above can be extended to two sided tests as well.

We had followed this in time series based anomaly detection and following approach were considered for pre processing before applying Grubb’s test:

Raw Count (No processing)
Residuals after STL decomposition
Residuals after fitting ARIMA

In our case raw count had worked well enough.

Reference:

https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h1.htm

Github Gist

Data Stories

Tag anomaly detection

Anomaly Detection Basics