An average is useful because it summarises a group of numbers into a single value. Everyone has heard of it but not everyone knows that there are actually three different types of averages, and even fewer people know how to use them correctly.
What is an Average?
Different Averages and How to Calculate them
Mean
The mean is typically the average that everyone uses when you have to calculate an average. You calculate it by summing all your values together then dividing by the number of values you had. For example, assume we had the following values
125, 100, 50, 30, 74, 30, 1000
then our mean would be 201.29 i.e (125+100+50+30+74+30+1000)/7
Median
The median is less common but still used when analysing data with specific properties. To calculate it you first need to sort your numbers from smallest to largest then literally pick the middle number. So using our dataset from first we order our numbers like this
30,30,50,74,100,125,1000
and pick the number in the middle as our median. Therefore, our median would be 88
Hopefully at this point you’ve realised how our mean is 201.29 but our median is 74 using the same data. This is exactly why it’s important to know when to use a specific average. I’ll explain why this happens after we go over the last average.
Mode
This type of average is hardly used because it has very little use in practice. To calculate it you pick the number that appears the most in your dataset. So using the same dataset our mode would be 30. This is because it appears twice whilst all the rest appear once. Essentially the mode is the value that appears the most in the dataset.
30,30,50,74,100,125,1000
You can probably tell why this type of average isn’t used very often especially when dealing with continous data. However, it is useful in some situations and you’ve probably used it before without even realising!
How and When to use each type
Mean
The mean is best used when your data follows a normal distribution and does not have any major outliers or skew. By this I mean when your values are close to one another and there’s not a lot of variation between values.
Below are two examples using histograms; one is of normally distirbuted data and the other is not.
Normally distirbuted data will have a bell shaped curve with the majority of values near the centre (aka the mean) and less values the further away from the centre. A normal distribution will also be symmetrical so around the mean.
Non-normal distributed data will not be symmetrical around the mean and will usually not follow the typical bell shaped curve. The image below is an example of an exponential distribution. The majority of the values are between 0 and 1 but it quickly tails of with no symmetry on the other side.
Median
Use this average when your data looks to be ‘skewed’ and doesn’t look ‘normal’. By this I mean data that looks a bit crazy and all over the place. For example, when the majority of values look to be clumped together but there are outlier value.
In the example below you can see the data is not normally distributed but seems to be mostly between 15-65. However. there are a few outliers around 100. Using the mean in this case may skew the actual average and give you an inaccurate number whereas the median should provide a more accurate figure.
Mode
The mode should be used when your data are categories, ordinal or discrete. It should definately not be used when data is continuous because it will either provide an inaccurate estimate of the average or simply give an error.
An good example of when to use the mode is when collecting data on their favourite flavour of ice cream. Ice-cream flavours are categories like vanilla, chocolate, strawberry….etc. In this case you can use the mode to analyse which flavour was voted the most.
Useful Tips
1. Always Calculate the mean and median
As a rule of thumb I would always calculate both the mean and median on any dataset as a quick way to identify whether your data is normal or not. If your data is ‘normal’ then your mean and median will be identical, or atleast extremely similar. If they are widely different then you know your data is skewed and you can investigate it further
It is also good practice to provide both the median and mean if they are very different as it is a good way to explain the structure of your data to non-data literate people.
2. Remeber that real world data is rarely normal
Although most examples will tend to have nice normal data for you to practice on keep in mind that real world data is rarely that simple. Also errors when collecting, inputting or manipulating data can cause data to be skewed, inaaccurate or generally messy with no pattern.
Learn about Rolling Averages
Learn about rolling averages and how they are essentially when analysing trends and time series data. Click the image below to get started!