UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Classroom Handout #6:
Derivation of Means and Variances of the OLS Estimators for Slope and Intercept


IMPORTANT: You are not expected to memorize these derivations, or even to be able to reproduce them from scratch. The important thing to remember is that line 1 leads to the result in line 8, line 9 leads to the result in line 11. Likewise, line 12 leads to the result in line 17, and line 18 leads to the result in line 23. Furthermore, the difference between line 24 and the analogous concept for the univariate case is important to note.

Each sample you might have drawn for estimating and ordinary least squares (OLS) regression would have produced different point estimates for the slope and the intercept of the regression line. Over all possible samples, there will be distributions for each of these estimators: intercept b1 and slope b2.

NOTE: a version of this page without detailed comments is also available

What are the E(b1) and E(b2)? (i.e. Are these estimators unbiased for B1 and B2?)

1 E(b2) = E [ S (Xi - )(Yi - ) / S (Xi - )2 ]              let c = S (Xi - )2

The use of the abbreviation c for the term in the denominator just simplifies the algebra. We will substitute back for c later on. Distribute the (Yi - ) term, and separate the summation operation into two pieces.

2            = E [ S (Xi - )Yi / c - S (Xi - ) / c ]

The numerator in the second term in this expression can have the constant quantity factored out from under the summation sign, because it does not vary over observations. The second term then becomes S(Xi - )/c. Now the stuff inside this summation is the sum of the deviations from the mean, which is zero. Thus the second term in the re-write of the numerator of line 2 disappears.

3           = E [ S (Xi - )Yi / c ]

The Yi values are the only random variables in this story. The X values are treated as deterministic (non-random) amounts. As "constants," they can be factored out of the expectation operator (review the rules of expectations).

4           = { S (Xi - ) E[Yi] } / c

Now we can write each individual value of Yi in terms of the parameters of the population regression function, since we assume that underlying our sample is a population wherein Yi = B1+B2Xi+ui.

5           = { S (Xi - ) E[B1 + B2 Xi + ui] } / c

The expectation operation can be distributed across a sum.

6           = S (Xi - ) { E[B1] + E[B2Xi] + E[ui] } / c

The E[B1] is just B1, because this parameter is some true, if unknown constant value. Since the X's are deterministic, the expectation E[B2Xi] is just B2Xi. By assumption E[ui] is zero (namely, the population regression function runs through the mean of the relevant conditional distributions of Y. Next, you can take the three terms in the "curly" brackets and distribute them, along with the S (Xi - ), into three separate terms. The B1 can be factored out of its summation, and B1S (Xi - ) will be equal to zero because the sum of the deviations of a variable from its mean is zero. The third term disappears because E[u] is zero. All that is left is the middle term, where the B2 can be factored out of the summation, but not the Xi, because that varies with i.

7           = B2 { S (Xi - )Xi / c }

To get to the next step, we need to reverse the result we used in getting from line 1 to line 3. If we use Xi instead of the Yi appearing above, it will be likewise true that S (Xi - )Xi is equal to S (Xi - )(Xi - ). This gets the algebra into a convenient form where it is easy to see that stuff cancels out, and we are left with the tidy result that E[B2]=B2.

8           = B2 { S (Xi - )2 / S (Xi - )2 } = B2 ...thus b2 unbiased

9  E(b1) = E [ - b2] = E [B1 + B2 + - b2 ]

Above, we have substituted for the mean of Y an expression for it in terms of the parameters of the population regression function. This involves writing the individual values of Yi as B1+B2Xi+ui, summing over all i = 1,...,n, and then dividing by n. This yields B1+B2+ubar. The expectation operator can be distributed across the four terms in the square brackets.

10           = E[B1] + E[B2 ] + E[] - E[b2 ]

The expected value of a true but unknown constant is just that constant, so E[B1]=B1. The X's being deterministic, the second term is also always equal to its expectation. The expected value (average) of the population regression function error term is zero. We have shown above that the slope estimator is an unbiased estimator for B2, so we can use that result here. Stuff cancels, and we see that the OLS intercept estimator is also an unbiased estimator for the true but unknown population intercept parameter.

11           = B1 + B2 + 0 - E[b2] = B1 + B2 - B2 = B1 ...thus b1 unbiased.

What are Var(b1) and Var(b2)? (i.e. How "noisy" are the estimators for B1 and B2?)

12 Var(b2) = Var [ S (Xi - )Yi / S (Xi - )2 ] = Var [ S (Xi - )Yi / c ]

This uses the same version of the formula for b2 as appears within equation 3 above. This is a nice simple place to start the derivation of the variance of the OLS slope estimator. For the next line, we write out all the terms in the summation.

13           = Var [ ((X1 - )/c) Y1 + ((X2 - )/c) Y2 + ... + ((Xn - )/c) Yn ]

This is were we draw upon the fact that the OLS estimator is a "linear" estimator. The formulas for the parameters can be expressed as a linear function of the data on Y. The expanded summation can be viewed as nothing more than a conventional linear combination of independent random variables (i.e. coefficient time variable, plus coefficient times variable, plus coefficient times variable...). If our sample is randomly drawn, then the observations on Yi are statistically independent. For the variance of this linear combination of independently distributed random variables, we need to take the squares of the coefficients and multiply them times the variances of the individual random variables. This is exactly where the assumption of "no serial correlation in the errors" comes into the calculation of interesting quantities in regression analysis. If the the Y values (and hence the error terms) are serially correlated, then we would have to deal with all the covariance terms in the general formula for the variance of a linear combination of random variables. In the absence of serial correlation in the Y values (and therefore the errors), all of these covariance terms are zero.

14           = ((X1 - )/c)2 Var(Y1) + ((X2 - )/c)2 Var(Y2) + ... + ((Xn - )/c)2 Var(Yn)

Now it is convenient to put the drawn-out sum of terms back into summation notation, with one term for each value of i in the sample. But if we are assuming that the conditional variance of Y is the same, regardless of the value of X at which the distribution of Y is being considered (homoscedasticity assumption), then all the Var[Yi] terms are equal to the same s2.

15           = S [ (Xi - )2/c2 ] Var(Yi) = S [ (Xi - )2/c2 ] s 2

This common s2 value for all terms in the summation will factor out of the summation, and we can now reinstate the full definition of the abbreviation c. This is exactly where the assumption of homoscedastic errors comes into the calculation of interesting quantities in regression analysis. If the conditional error variances differed from observation to observation, we could not factor out the common s2 value as we do here.

16           = s 2 S [ (Xi - )2/c2 ] = s 2 S [ (Xi - )2/ (S (Xi - )2)2]

The value of S [ (Xi - )2 in the numerator will now cancel with one of the S [ (Xi - )2 terms in the denominator, leaving just s 2/[S [ (Xi - )2]. This can be tidied up by using the deviation notation introduced earlier in the course:

17           = s 2 / S xi2  .... s.e.(b2) = s /Ö (S xi2 )

18 Var(b1) = Var [ - b2 ] = Var [ B1 + B2 + - b2 ]

The X values are deterministic (not random variables), so there is no concern about covariances between any of the terms inside the square brackets in equation 18. The variance of this sum of terms is just the sum of their variances.

19           = Var(B1) + Var(B2) + Var() + Var(b2)

The variance of anything that is not random is just zero, so the first two terms disappear. The population regression function error u is random, so we express the mean PRF error as a linear combination of the individual value of this random variable.

20           = 0 + 0 + Var (S ui/n) + 2 Var(b2)

For the variance of the mean PRF error term u, as usual, the variance of a mean is s2/n. For the last term, we use the result just derived for the variance of the slope estimator.

21           = s 2/n + 2 ( s 2 / S xi2  )

Now we can factor out the s2 term. We are left with:

22           = s 2 [ (1/n) + 2/S (Xi - )2 ], or can be alternatively expressed as

This takes a bit more messing around, namely creating a common denominator, expanding the square, collecting terms and simplifying. Remember that SXi = n, and that S2 = n2. Eventually, you can get to:

23           = s 2 [ S Xi2  / nS xi2  ] .... s.e.(b1) = s Ö (S Xi2  / nS xi2 )

For s 2, use sample variance with modified degrees of freedom (2 estimated parameters required before ei can be ascertained)

24      s2 = S ei2 /(n-2) = (1/(n-2)) S (Yi - b1 - b2Xi)2


COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS
Updated: January 26, 1998
Prepared by: Trudy Ann Cameron