First-pass statistical exploration

Tags: Statistics Hypothesis testing

After defining hypotheses, the first step in a statistical analysis is to visualize the data and look for any initial issues.

Visualize the distribution of the dependent variable for each group.

  • Symmetric or skewed distribution?
  • Unimodal or multimodal?
  • Conspicuous outliers?
  • Range of values?
  • Mean? (or should we look at some other measure, e.g. median?)
  • Standard deviation?
  • Most common values? (mode)


The boxplot tells us some ordinal characteristics including the mean, lowest value that doesn’t qualify as an outlier, and the same for the highest value. Outliers are determined as 1.5*IQR (interquartile range, circle) or 3*IQR (star).

Once you’ve visualized the data, you can check the outliers.
