What do measures of variability tell us




















This is not so much the case for nominal and ordinal variables. If the variable is nominal, obviously the mode is the only measure of central tendency to use. If the variable is ordinal, the median is probably your best bet because it provides more information about the sample than the mode does. If the distribution is symmetrical, the mean is the best measure of central tendency.

If the distribution is skewed either positively or negatively, the median is more accurate. As an example of why the mean might not be the best measure of central tendency for a skewed distribution, consider the following passage from Charles Wheelan's Naked Statistics: Stripping the Dread from the Data :. Bill Gates walks into the bar with a talking parrot perched on his shoulder.

The parrot has nothing to do with the example, but it kind of spices things up. Obviously none of the original ten drinkers is any richer though it might be reasonable to expect Bill Gates to buy a round or two. This isn't a bar where multimillionaires hang out; it's a bar where a bunch of guys with relatively low incomes happen to be sitting next to Bill Gates and his talking parrot.

In addition to figuring out the measures of central tendency, we may need to summarize the amount of variability we have in our distribution. In other words, we need to determine if the observations tend to cluster together or if they tend to be spread out. Consider the following example:. Sample 2 has no variability all scores are exactly the same , whereas Sample 1 has relatively more one case varies substantially from the other four.

In this course, we will be going over four measures of variability: the range, the inter-quartile range IQR , the variance and the standard deviation.

The range is the difference between the highest and lowest scores in a data set and is the simplest measure of spread. We calculate range by subtracting the smallest value from the largest value. As an example, let us consider the following data set:. The maximum value is 85 and the minimum value is Whilst using the range as a measure of variability doesn't tell us much, it does give us some information about how far apart the lowest and highest scores are.

It basically means "quarter" or "fourth. Finding the quartiles of a distribution is as simple as breaking it up into fourths. Each fourth contains 25 percent of the total number of observations. Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively.

Q1 is the "middle" value in the first half of the rank-ordered data set. Q2 is the median value of the data set Q3 is the "middle" value of the second half of the rank-ordered data set Q4 would technically be the largest value in the dataset, but we ignore it when calculating the IQR we already dealt with it when we calculated the range.

Thus, the interquartile range is equal to Q3 minus Q1 or the 75th percentile minus the 25th percentile, if you prefer to think of it that way. As an example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, Q1 is the middle value in the first half of the data set. Q3 is the middle value in the second half of the data set. A box plot also known as a box and whisker plot splits the dataset into quartiles.

The body of the boxplot consists of a "box" hence, the name , which goes from the first quartile Q1 to the third quartile Q3. Within the box, a horizontal line is drawn at Q2, which denotes the median of the data set. Two vertical lines, known as whiskers, extend from the top and bottom of the box. The bottom whisker goes from Q1 to the smallest value in the data set, and the top whisker goes from Q3 to the largest value.

Below is an example of a positively skewed box plot with the various components labeled. Outliers are values are extreme values that for one reason or another are excluded from the dataset. Q3 is the median of this section of the distribution: 9. The variance is the average squared difference of the scores from the mean. To compute the variance in a population:. If the variance in a sample is used to estimate the variance in a population, it is important to note that samples are consistently less variable than their populations:.

The standard deviation is the average amount by which scores differ from the mean. The standard deviation is the square root of the variance, and it is a useful measure of variability when the distribution is normal or approximately normal see below on the normality of distributions.

The proportion of the distribution within a given number of standard deviations or distance from the mean can be calculated. A small standard deviation coefficient indicates a small degree of variability that is, scores are close together ; larger standard deviation coefficients indicate large variability that is, scores are far apart. In the previous section- Variance- we computed the variance of scores on a Statistics test by calculating the distance from the mean for each score,t hen squaring each deviation from the mean, and finally calculating the mean of the squared deviations.

Since we already know the variance, we can use it to calculate the standard deviation. To do so, take the square root of the variance. The square root of 1. The standard deviation is 1. Distributions with the same mean can have different standard deviations. As mentioned before, a small standard deviation coefficient indicates that scores are close together, whilst a large standard deviation coefficient indicates that scores are far apart. In this example, both sets of data have the same mean, but the standard deviation coefficient is different:.

In this example, the scores in Set A are 0. Both of them together give you a complete picture of your data. Using simple random samples , you collect data from 3 groups:. All three of your samples have the same average phone use, at minutes or 3 hours and 15 minutes. This is the x-axis value where the peak of the curves are.

Although the data follows a normal distribution , each sample has different spreads. Sample A has the largest variability while Sample C has the smallest variability. Range The range tells you the spread of your data from the lowest to the highest value in the distribution.

To find the range , simply subtract the lowest value from the highest value in the data set. The highest value H is and the lowest L is The range of your data is minutes. See an example. The interquartile range gives you the spread of the middle of your distribution. The interquartile range is the third quartile Q3 minus the first quartile Q1. This gives us the range of the middle half of a data set. Multiply the number of values in the data set 8 by 0.

Q1 is the value in the 2nd position, which is Q3 is the value in the 6th position, which is The interquartile range of your data is minutes. Just like the range, the interquartile range uses only 2 values in its calculation.

But the IQR is less affected by outliers: the 2 values come from the middle half of the data set, so they are unlikely to be extreme scores. Standard deviation The standard deviation is the average amount of variability in your dataset.

It tells you, on average, how far each score lies from the mean. The standard deviation is affected by outliers extremely low or extremely high numbers in the data set. And remember, the mean is also affected by outliers. The standard deviation has the same units as the original data.

Begin typing your search term above and press enter to search. Press ESC to cancel. Skip to content Home What do measures of variability tell us? Ben Davis May 31, What do measures of variability tell us? Why is variation in data important? Why is it important to measure variability in addition to measures of central tendency? What are the measures of central tendency and measures of variability?

What are the advantages of measures of central tendency? What is the advantages of mode? Why do we use the mode? Why is the mean useful? Why is the mean important?

What are the uses of mean median and mode?



0コメント

  • 1000 / 1000