Check outliers by z-score transformation

202307061152
Status:
Tags: Statistics Hypothesis testing

Convert raw scores on some dependent variable into z-scores to check for outliers. In a normal distribution, 95% of observations will lie between the interval [-1.96; +1.96]. If there are more than 5% of observations with an absolute z-score larger than two, you have reason to believe in some serious outliers.

An observation with an absolute z-score of greater than 3 is very unlikely.

Implementation

In MATLAB: zscore(X).

Further Testing

Code suspected outliers as 1, and other observations 0, and carry out logistic regression analysis with Outlier(0,1) as binary dependent variable, and all independent variables included. If there are no significant effects, there is no reason to exclude an outlier with z-score smaller than 3.


First-pass statistical exploration


References