IMPORTANT: You are not expected to memorize these derivations, or even to be able to reproduce them from scratch. The important thing to remember is that line 1 leads to the result in line 8, line 9 leads to the result in line 11. Likewise, line 12 leads to the result in line 17, and line 18 leads to the result in line 23. Furthermore, the difference between line 24 and the analogous concept for the univariate case is important to note.
Each sample you might have drawn for estimating and ordinary least squares (OLS) regression would have produced different point estimates for the slope and the intercept of the regression line. Over all possible samples, there will be distributions for each of these estimators: intercept b1 and slope b2.
NOTE: a version of this page without detailed comments is also available
What are the E(b1) and E(b2)? (i.e. Are these estimators unbiased for B1 and B2?)
1 E(b2) = E [ S
(Xi - The use of the abbreviation c for the term in the
denominator just simplifies the algebra. We will substitute back for c later on.
Distribute the (Yi -
)(Yi -
) / S
(Xi -
)2 ]
let c = S
(Xi -
)2
)
term, and separate the summation operation into two pieces.
2
= E [ S
(Xi -
)Yi / c -
S
(Xi -
)
/ c ]
The numerator in the second term in this expression can
have the constant
quantity factored
out from under the summation sign, because it does not vary over observations.
The second term then becomes
S(Xi -
)/c. Now the stuff inside this summation is the sum of the
deviations from the mean, which is zero. Thus the second term in the re-write of
the numerator of line 2 disappears.
3
= E [ S
(Xi -
)Yi / c
]
The Yi values are the only random variables in this story. The X values are treated as deterministic (non-random) amounts. As "constants," they can be factored out of the expectation operator (review the rules of expectations).
4
= { S
(Xi -
) E[Yi] }
/ c
Now we can write each individual value of Yi in terms of the parameters of the population regression function, since we assume that underlying our sample is a population wherein Yi = B1+B2Xi+ui.
5
= { S
(Xi -
) E[B1 +
B2 Xi + ui] } / c
The expectation operation can be distributed across a sum.
6
= S
(Xi -
) { E[B1]
+ E[B2Xi] + E[ui] } / c
The E[B1] is just B1, because this
parameter is some true, if unknown constant value. Since the X's are
deterministic, the expectation E[B2Xi] is just
B2Xi. By assumption E[ui] is zero (namely, the
population regression function runs through the mean of the relevant conditional
distributions of Y. Next, you can take the three terms in the "curly" brackets
and distribute them, along with the S (Xi -
), into three separate terms. The B1 can be
factored out of its summation, and B1S (Xi -
) will be equal to zero because the sum of the deviations
of a variable from its mean is zero. The third term disappears because E[u] is
zero. All that is left is the middle term, where the B2 can be
factored out of the summation, but not the Xi, because that varies with
i.
7
= B2 { S
(Xi -
)Xi / c
}
To get to the next step, we need to reverse the result we
used in getting from line 1 to line 3. If we use Xi instead of the
Yi appearing above, it will be likewise true that S (Xi -
)Xi
is equal to S (Xi -
)(Xi -
). This gets the algebra into a convenient
form where it is easy to see that stuff cancels out, and we are left with the tidy
result that E[B2]=B2.
8
= B2 { S
(Xi -
)2 / S
(Xi -
)2 } =
B2 ...thus b2 unbiased
9 E(b1) = E [
- b2
] = E
[B1 + B2
+
-
b2
]
Above, we have substituted for the mean of Y an
expression for it in terms of the parameters of the population regression
function. This involves writing the individual values of Yi as
B1+B2Xi+ui, summing over all i =
1,...,n, and then dividing by n. This yields B1+B2
+ubar. The expectation operator can be
distributed across the four terms in the square brackets.
10
= E[B1] + E[B2
]
+ E[
] - E[b2
]
The expected value of a true but unknown constant is just that constant, so E[B1]=B1. The X's being deterministic, the second term is also always equal to its expectation. The expected value (average) of the population regression function error term is zero. We have shown above that the slope estimator is an unbiased estimator for B2, so we can use that result here. Stuff cancels, and we see that the OLS intercept estimator is also an unbiased estimator for the true but unknown population intercept parameter.
11
= B1 + B2
+ 0 -
E[b2] = B1 +
B2
-
B2 = B1 ...thus
b1 unbiased.
What are Var(b1) and Var(b2)? (i.e. How "noisy" are the estimators for B1 and B2?)
12 Var(b2) = Var [ S
(Xi -
)Yi /
S
(Xi -
)2 ] = Var
[ S
(Xi -
)Yi / c
]
This uses the same version of the formula for b2 as appears within equation 3 above. This is a nice simple place to start the derivation of the variance of the OLS slope estimator. For the next line, we write out all the terms in the summation.
13
= Var [ ((X1 -
)/c)
Y1 + ((X2 -
)/c)
Y2 + ... + ((Xn -
)/c) Yn ]
The expanded summation can be viewed as nothing more than a conventional linear combination of independent random variables (i.e. coefficient time variable, plus coefficient times variable, plus coefficient times variable...). If our sample is randomly drawn, then the observations on Yi are statistically independent. For the variance of this linear combination of independently distributed random variables, we need to take the squares of the coefficients and multiply them times the variances of the individual random variables.
14
= ((X1 -
)/c)2
Var(Y1) + ((X2 -
)/c)2 Var(Y2) + ... + ((Xn -
)/c)2 Var(Yn)
Now it is convenient to put the drawn-out sum of terms back into summation notation, with one term for each value of i in the sample. But if we are assuming that the conditional variance of Y is the same, regardless of the value of X at which the distribution of Y is being considered (homoscedasticity assumption), then all the Var[Yi] terms are equal to the same s2.
15
= S
[ (Xi -
)2/c2 ] Var(Yi) = S
[ (Xi -
)2/c2 ] s
2
This common s2 value for all terms in the summation will factor out of the summation, and we can now reinstate the full definition of the abbreviation c.
16
= s
2 S
[ (Xi -
)2/c2 ] = s
2 S
[ (Xi -
)2/
(S
(Xi -
)2)2]
The value of S
[ (Xi -
)2 in
the numerator will now cancel with one of the S
[ (Xi -
)2
terms in the denominator, leaving just s
2/[S
[ (Xi -
)2].
This can be tidied up by using the deviation notation introduced earlier in the
course:
17 = s 2 / S xi2 .... s.e.(b2) = s /Ö (S xi2 )
18 Var(b1) = Var [
- b2
] =
Var [ B1 + B2
+
- b2
]
The X values are deterministic (not random variables), so there is no concern about covariances between any of the terms inside the square brackets in equation 18. The variance of this sum of terms is just the sum of their variances.
19
= Var(B1) + Var(B2
) + Var(
) +
Var(b2
)
The variance of anything that is not random is just zero, so the first two terms disappear. The population regression function error u is random, so we express the mean PRF error as a linear combination of the individual value of this random variable.
20
= 0 + 0 + Var (S
ui/n) +
2
Var(b2)
For the variance of the mean PRF error term u, as usual, the variance of a mean is s2/n. For the last term, we use the result just derived for the variance of the slope estimator.
21
= s
2/n +
2 ( s
2 / S
xi2 )
Now we can factor out the s2 term. We are left with:
22
= s
2 [ (1/n) +
2/S
(Xi -
)2 ], or
can be alternatively expressed as
This takes a bit more messing around, namely creating a
common denominator, expanding the square, collecting terms and simplifying.
Remember that SXi = n
, and that S
2 = n
2. Eventually, you can get to:
23 = s 2 [ S Xi2 / nS xi2 ] .... s.e.(b1) = s Ö (S Xi2 / nS xi2 )
For s 2, use sample variance with modified degrees of freedom (2 estimated parameters required before ei can be ascertained)
24 s2 = S ei2 /(n-2) = (1/(n-2)) S (Yi - b1 - b2Xi)2
| COURSE OUTLINE | LECTURE OUTLINES | PROBLEM SETS | PROBLEM SOLUTIONS | COMPUTER LABS |
| SHAZAM EXAMPLES | DATA SETS | ONLINE QUIZZES | GRAPHICS | HANDOUTS |