Exotic Flavors of Means

The mean we’ve looked at so far is known as the arithmetic mean; it functions by adding together the scores, and dividing by the number of scores. Another way of thinking about this is that

= =

In the expression on the right above, the mean is a sum of "pieces" of the individual scores; take "one n-th" of each x and add them all up. The term (1/n) can be thought of as a weight assigned to each ; note that in the mean we’ve looked at, the weights are the same for all the scores. Also note that the sum of the weights over the n scores is 1.

 

Weighted mean

In the weighted mean, we allow for unequal weights for the different scores:

= where

Given different sets of weights, we can end up with different weighted means. We will come back to the concept of weighted mean later in the quarter. Weighted means are useful for determining the mean of data that is presented in a relative frequency table. The mean of the data is the mean of the values in the table, using the relative frequencies as weights.

 

Geometric mean

Another measure of central tendency is the geometric mean. Instead of adding together the scores and dividing by n, in the geometric mean we multiply the scores together and take the n-th root.

= =

By taking the logarithm of each side, we can see that the log of the geometric mean is the (arithmetic) mean of the logs. This is because logs convert multiplication into addition, and exponentiation into multiplication.

 

Geometric means are useful when dealing with growth rates (in finance, for example). If a stock has a 10% return one year, a 15% return the next, and a 5% return the next, what is an appropriate measure of "average" return? The arithmetic mean of the returns is inappropriate in this case. The best measure is based on the geometric mean; in particular, the geometric mean of 1.10, 1.15, and 1.05. Compared to the arithmetic mean , the geometric mean gives relatively more weight to low values than high values. Thus, the geometric mean is typically less than the arithmetic mean

 

Measures of Dispersion/Spread

Let’s now consider some measures of a distribution’s spread or dispersion. These measures get at the following questions: how spread out are the values of a variable? how wide is the histogram?

 

Standard Deviation

We’d like a measure of how much scores typically deviate from the average score. So let’s start out with the deviation score

 

However, if a score is 5 units greater than the mean or 5 units less than the mean, that shouldn’t make any difference for our purposes in measuring spread, so we need to eliminate the sign of the deviation score. We can do this by squaring it, giving us the squared deviation

 

We want to know what the typical squared deviation score is, and that’s what the mean of the squared deviations can give us:

(The mean squared deviation is known as the variance.)

 

The variance is nice, but unfortunately it is in terms of squared units. For instance, if our data is in inches, then the units of the variance are squared inches. We’d like a measure of spread that has the same units as the original data. So we take the square root of the variance to get the standard deviation:

 

So, the path toward determining the SD is:

Start with deviation scores

Square them (to remove sign)

Take the mean squared deviation scores (variance)

Convert back to original units (take square root)

 

Properties of the SD

1. SD can never be negative

2. lowest value is 0 (what can we say about the scores when the SD is 0?)

3. adding a constant amount to each score:

{x1,x2,x3,…xn} à {x1+k , x2+k, x3+k, …, xn+k} SD stays the same (does the variance change?)

4. multiplying each score by a constant:

{x1,x2,x3,…xn} à {x1*k , x2*k, x3*k, …, xn*k} SD is multiplied by |k| (variance is multiplied by ?)

 

These last two properties make sense: shifting the histogram from side to side doesn’t change its spread. But multiplying by a number greater than 1 stretches out the histogram, multiplying by a number less than 1 scrunches the histogram.

 

5. For the normal distribution, about 68% of the scores are within 1 SD of the mean, about 95% of the scores are within 2 SDs of the mean, and about 99.7% of the scores are within 3 SDs of the mean. If a histogram resembles the normal distribution, these are good approximations for the histogram as well.

 

The Interquartile Range

Another measure of spread that is useful when the data are asymmetric or contain a small number of extreme scores is the interquartile range. This is defined as the difference between the 75th percentile (also known as the 3rd quartile) and the 25th percentile (also known as the 1st quartile). In the same way that the median is not affected by extreme scores the way the mean is, the interquartile range is not affected by extreme scores the way the SD is.

 

Properties of the Interquartile Range

The interquartile range is the distance that covers the middle 50% of the scores.

For the normal distribution (the bell-shaped curve), the interquartile range is approximately 1.35 times the SD.

 

Standard units

We now have two measures that describe important properties of a distribution: the mean and SD. The mean splits the scores into high and low values. The SD is a unit of measurement for expressing how high or low a score is, relative to the mean.

 

Standard units, or z-scores, measure how many SDs away from the mean that a score is. To determine a score’s value in standard units, subtract away the mean, and then divide by the SD:

To convert from standard units back into the original units, multiply the z-score by the SD, and then add the mean:

 

Properties of z-scores

The mean of a set of z-scores is 0 (why?)

The SD of a set z-scores is 1 (why?)

If the histogram is close to the bell-shape of the normal curve, then about 68% of the scores will be between -1 and 1 in z-units, and about 95% of the scores will be within between -2 and 2 in z-units.

Given a bell-shaped histogram, z-scores of greater than 3 or less than -3 are very rare.

Given a bell-shaped histogram, can use z-scores and normal curve table to approximate areas of the histogram

z-scores allow comparisons of the relative position of scores from different measurements. Example: test scores. Also allows comparison of relative position of scores measured in different units.