I am an unapologetic lover of boxplots, and as such I also am an unapologetic hater of barplots. Yet, about 90% of the time I’m asked to help someone make a figure in R, or more specifically in ggplot2, I’m asked for a barplot. So, this blog post is dedicated to why I think whenever possible you should use a boxplot over a barplot. If I’ve convinced you, there’s a handy summary provided at the end of the post that you can print out and distribute to any current barplot users.
TAKE AWAY POINTS FROM THIS POST
- Histograms are the best way to see the spread of your data.
- Boxplots are the next best way.
- Barplots are the worst way. Don’t use them.
For today’s post I’ve created two sets of fake data, both of them have the same mean (100) and the same standard deviation (20), but as we’ll see they are very different in terms of their distributions.
First we’ll make some histograms to show that the two data sets have clearly different distributions. The first data set has a normal distribution, but the second is logarithmic.
This difference can be seen in the boxplots too, although to a lesser extent than than the histograms. The box for the normal distribution has roughly evenly sized quartiles are either sides of the median, and only a couple outliers. Conversely the logarithmic data has a long left-skewed distribution, as can be seen by all of the low value outliers.
Turning to our barplots we see that they are identical! Since both data sets have the same mean and standard deviation barplots completely lose the difference between the two data sets. This can be a problem if for example you wanted to run a statistical test to test for differences between these two groups. One, it’s a problem because many statistical tests require a normal distribution of the data, and a barplot will not warn you that part of your data is non-normal. Two, if you run a test only looking at means you won’t know that the two groups are different in regards to distributions.
Hopefully after this post you see that whenever plotting data with a distribution boxplots are preferred over barplots. If you want to spread the word feel free to distribute this handy summary. Happy (hopefully boxplot) plotting!