Data
Chapters
Outliers
Outliers
Sometimes a set of data may contain one or more values that are a long way away from the other values in the set. We call these values outliers
,
and we need to be very careful about the way we deal with them.
Let's have a look at an example where an outlier occurs.
Example: Skipping
Christo's teacher has decided to set her class a challenge to see who could improve their skipping the most over a two week period. She records the number of times each member of the class can jump over a rope in five minutes at the beginning of the two week period and at the end of the two week period.
Here are the results:
Name | Before | After | After - Before |
---|---|---|---|
Claire | 60 | 88 | 28 |
Angelyn | 63 | 92 | 29 |
Christo | 66 | 17 | -39 |
Greg | 61 | 90 | 29 |
Steve | 70 | 95 | 25 |
Josh | 59 | 86 | 27 |
Here are the differences plotted on a number line:
If we calculate the mean improvement, we getSo, what's going on? Christo's improvement (or lack there of) is an outlier
. Sometimes the best thing to do with outliers
is to discard them. Let's see what happens without Christo's result:
Is this the right thing to do? Can we just throw away data values that make things look bad?
Dealing with Outliers
We can't just throw away data values without a good reason. Otherwise, we can be accused of fudging our results. Sometimes it is quite reasonable to have values that are much higher or much lower than other values. For example,
- Dimensions can be smaller or larger than other dimensions: e.g. people can be heavier or lighter, shorter or taller.
- People can have bad days.
- Plants grow better if they get enough sunlight, nutrients and water.
There may be a good reason for the strange data values that we haven't accounted for.
Let's see if we can find a reason for Christo's bad performance.
Skipping Example (continued)
It turns out that Christo decided it would be a good idea to see if he could juggle his soft toys while he skipped on the second day. He'd jump over the rope, throw his toys up into the air, jump over the rope, catch them, and so on. Consequently, it took him a lot longer to complete each jump, and he couldn't complete anywhere near the same number of jumps in 5 minutes.
So, Christo's result was rubbish, and deserved to be thrown away.
In some cases, however, it really isn't a good idea to discard outliers. We need to consider each situation individually before making our decision. We also need to be able to justify our decisions when we write our report.
Effects of Outliers on the Mean, Median and Mode
In the example, we saw that the presence of outliers can have a huge effect on the mean. What about the median and mode?
The median for our data set
- With Christo was 27.5
- Without Christo was 28
The mode for our data set
- With Christo was 29
- Without Christo was 29
The mean and median remained around most of the data values. These measures give a better indication of trends in a data set that includes outliers. The mean is not so reliable.
Description
This chapter series is on Data and is suitable for Year 10 or higher students, topics include
- Accuracy and Precision
- Calculating Means From Frequency Tables
- Correlation
- Cumulative Tables and Graphs
- Discrete and Continuous Data
- Finding the Mean
- Finding the Median
- FindingtheMode
- Formulas for Standard Deviation
- Grouped Frequency Distribution
- Normal Distribution
- Outliers
- Quartiles
- Quincunx
- Quincunx Explained
- Range (Statistics)
- Skewed Data
- Standard Deviation and Variance
- Standard Normal Table
- Univariate and Bivariate Data
- What is Data
Audience
Year 10 or higher students, some chapters suitable for students in Year 8 or higher
Learning Objectives
Learn about topics related to "Data"
Author: Subject Coach
Added on: 28th Sep 2018
You must be logged in as Student to ask a Question.
None just yet!