First-pass statistical exploration
202307051856
Status:
Tags: Statistics Hypothesis testing
After defining hypotheses, the first step in a statistical analysis is to visualize the data and look for any initial issues.
Visualize the distribution of the dependent variable for each group.
- Symmetric or skewed distribution?
- Unimodal or multimodal?
- Conspicuous outliers?
- Range of values?
- Mean? (or should we look at some other measure, e.g. median?)
- Standard deviation?
- Most common values? (mode)
Example:
The boxplot tells us some ordinal characteristics including the mean, lowest value that doesn’t qualify as an outlier, and the same for the highest value. Outliers are determined as 1.5*IQR (interquartile range, circle) or 3*IQR (star).
Once you’ve visualized the data, you can check the outliers.