When reporting ‘central tendencies’ in data (that is, finding the ‘middle’ values), you often find that two measures are used:-
Mean: This is the sum of all values, divided by the number of available values.
Median: If all values are placed in order, this is the midpoint – the central value in the list, or the average of the central two values if there are an even number of values.
Let me explain further. Look at the following simple set of data:-
1, 8, 10, 7, 4, 8, 6, 2, 3, 7
To work out the mean of this set, we add all of the figures together and divide by 10 (the number of values available). This would give us:-
1+8+10+7+4+8+6+2+3+7 = 56, which divided by 10 returns a value of 5.6. So in this case, 5.6 is the mean.
To get to median, we first need to sort the values in ascending order, which would give us:-
1, 2, 3, 4, 6, 7, 7, 8, 8, 10
As there are an even number of figures here, the median is the average of the two middle figures in the list (6 and 7), which is 6.5.
Pretty easy, huh?
So, which should you use and when? It depends on the data you have to hand and how you want to report on it. In our case the two are pretty close, mainly because there is only a small amount of data and also that the range of figures only runs from 1 to 10.
What if the data were more like this?
1, 4, 8, 2, 9, 3, 7, 9, 2, 1026
Let’s quickly work these out:-
Which would you use, taking the data into account, and why?
Well..take a look at the data. The value of 1026 is much, much higher than the rest. It’s actually what we would refer to as an outlier, which means it is one that possibly falls outside of the expected or observed set of values. Using the mean calculation in these cases takes that outlying value into consideration and ‘drags’ the mean more towards the larger value, giving a skewed view of ‘middle’. Median, however, looks the values as a whole and finds the value in the middle, which, given the data presented gives us a much more reasonable measure.
Which to use and in what circumstances does depend on the data. If the data you’re reporting on is ‘normal’ in nature (that is, no outliers and, when plotted in a graphs follows a ‘normal distribution’ (or symmetrical distribution) of values, then I would say go for the mean. If there are outliers present, then the median can give a much better view of ‘middle’, as the distribution might be skewed one way or the other. As with all analysis it’s often useful to use a little commonsense when looking at the figures you’ve left and choose appropriately.
As a real world example we could ask why, in the BBC article I posted about yesterday, they used the median age of users for reporting rather than the mean.
Why? It’s quite simple. Because of the nature of Facebook it’s pretty much a given that younger people will make up the majority of users, but the presence of ‘middle age and over’ users might actually skew the reported ‘middle’ value if mean were used.
As with most things, both figures can be useful, but think about the data you’re using.