Box-and-Whisker Plots

To understand box-and-whisker plots, you have to understand medians and quartiles of a data set.

The median is the middle number of a set of data, or the average of the two middle numbers (if there are an even number of data points).

The median (Q2) divides the data set into two parts, the upper set and the lower set. The lower quartile (Q1) is the median of the lower half, and the upper quartile (Q3) is the median of the upper half.

Example:

Find Q1, Q2, and Q3 for the following data set, and draw a box-and-whisker plot.

2, 6, 7, 8, 8, 11, 12, 13, 14, 15, 22, 23

There are 12 data points. The middle two are 11 and 12. So the median, Q2, is 11.5.

The "lower half" of the data set is the set {2, 6, 7, 8, 8, 11}. The median here is 7.5. So Q1 = 7.5.

The "upper half" of the data set is the set {12, 13, 14, 15, 22, 23}. The median here is 14.5. So Q3 = 14.5.

A box-and-whisker plot displays the values Q1, Q2, and Q3, along with the extreme values of the data set (2 and 23, in this case):

A box & whisker plot shows a "box" with left edge at Q1, right edge at Q3, the "middle" of the box at Q2 (the median) and the maximum and minimum as "whiskers".

Note that the plot divides the data into 4 equal parts. The left whisker represents the bottom 25% of the data, the left half of the box represents the second 25%, the right half of the box represents the third 25%, and the right whisker represents the top 25%.

Outliers

If a data value is very far away from the quartiles (either much less than Q1 or much greater than Q3), it is sometimes designated an outlier. Instead of being shown using the whiskers of the box-and-whisker plot, outliers are usually shown as separately plotted points.

The standard definition for an outlier is a number which is less than Q1 or greater than Q3 by more than 1.5 times the interquartile range (IQR = Q3 − Q1). That is, an outlier is any number less than Q1 − (1.5 × IQR) or greater than Q3 + (1.5 × IQR).

Example:

Find Q1, Q2, and Q3 for the following data set. Identify any outliers, and draw a box-and-whisker plot.

5, 40, 42, 46, 48, 49, 50, 50, 52, 53, 55, 56, 58, 75, 102

There are 15 values, arranged in increasing order. So, Q2 is the 8th data point, 50.

Q1 is the 4th data point, 46, and Q1 is the 12th data point, 56.

The interquartile range IQR is Q3 − Q1 or 56 − 47 = 10.

Now we need to find whether there are values less than Q1 − (1.5 × IQR) or greater than Q3 + (1.5 × IQR).

Q1 − (1.5 × IQR) = 46 − 15 = 31

Q3 + (1.5 × IQR) = 56 + 15 = 71

Since 5 is less than 31 and 75 and 102 are greater than 71, there are 3 outliers.

The box-and-whisker plot is as shown. Note that 40 and 58 are shown as the ends of the whiskers, with the outliers plotted separately.