PURPOSE: to show how the estimator formulas for the slope and intercept of an ordinary least squares regression are derived. You will not be required to reproduce these derivations; they are provided so that you can appreciate the intuition of the process that leads to the eventual formulas that are used, in the background, by SHAZAM and other regression software packages when you ask the program to find the "best-fitting linear relationship" between Y and X. Except for one very simple homework exercise, you will never again be using these formulas explicitly. Let the software do it for you.
CAUTION: newly transformed to HTML; please report any remaining errors.
Sample Regression
Function:
Yi = b1 + b2 Xi +
ei
implies ei = Yi - b1 - b2
Xi
All sums (S) below are from i=1 to i=n (i.e. over the entire sample).
OLS: min (over b1, b2 ) S ei2 = min S (Yi - b1 - b2 Xi)2 = min [ * ]
Recipe: take the first partial derivatives (slopes) of the minimand function [*] and set them equal to zero; then solve for the values of b1 and b2 that make these equations true.
First, with respect to b1:
¶ */¶
b1:
S 2 (Yi - b1 - b2
Xi) (-1) = 0
same as: S
Yi - n b1 - b2 S
Xi = 0
Then, with respect to b2: ¶ */¶ b2: S 2 (Yi - b1 - b2 Xi) (-Xi) = 0
same as: S Xi Yi - b1 S Xi - b2 S Xi2 = 0
These derivatives, set equal to zero, give us the so-called "normal equations" for ordinary least squares:
(1) S Yi = n b1 + b2 S Xi
(2) S Xi Yi = b1 S Xi + b2 S Xi2
As soon as you have a sample of data on X and Y, you will know the Xi, the Yi, and n.
The "normal equations" are two equations in two unknowns: b1 and b2
From (1), b1 = ( S
Yi - b2 S Xi)/n
= S Yi/n -
b2 (S
Xi/n) =
-
b2
From (2), b2 = (S Xi Yi - b1 S Xi)/(S Xi2 ) = (S Xi Yi)/(S Xi2 ) - b1 (S Xi)/(S Xi2 )
With these rearrangements, we recognize a system of two linear equations in b1 and b2. Each unknown parameter is expressed as a linear function of the other, with the "coefficients" in this case being functions of the data on Xi and Yi.
Strategy is to substitute the expression derived for b1 (in terms of b2) into the expression for b2, leaving b2 the only unknown in (2) after the substitution is made. This is just a messy version of high school algebra. The final result for b2 can then be expressed in several different ways:
b2 = [ n S Xi Yi - S Xi S Yi ] / [ n S Xi2 - (S Xi)2 ]; or, dividing all terms by n
b2 = [ S
Xi
Yi - n
] / [
S Xi2 - n X2 ]
or, rearranging
b2 = [ S
(Xi
-
)(Yi -
) ] / [
S (Xi - X)2 ]
,
or, if we define xi = Xi
-
and yi =
Yi
-
,
b2 = (S xiyi)/(S xi2 ) ... easiest version of slope estimator
Then the intercept estimate, b1 =
- b2
,
is then trivial to calculate.
| COURSE OUTLINE | LECTURE OUTLINES | PROBLEM SETS | PROBLEM SOLUTIONS | COMPUTER LABS |
| SHAZAM EXAMPLES | DATA SETS | ONLINE QUIZZES | GRAPHICS | HANDOUTS |