First-pass statistical exploration

202307051856
Status:
Tags: Statistics Hypothesis testing

After defining hypotheses, the first step in a statistical analysis is to visualize the data and look for any initial issues.

Visualize the distribution of the dependent variable for each group.

  • Symmetric or skewed distribution?
  • Unimodal or multimodal?
  • Conspicuous outliers?
  • Range of values?
  • Mean? (or should we look at some other measure, e.g. median?)
  • Standard deviation?
  • Most common values? (mode)

Example:


The boxplot tells us some ordinal characteristics including the mean, lowest value that doesn’t qualify as an outlier, and the same for the highest value. Outliers are determined as 1.5*IQR (interquartile range, circle) or 3*IQR (star).

Once you’ve visualized the data, you can check the outliers.



References