UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression
Analysis

*Computing Lab Session #4: *__Multiple Regression; Omitted Variables
Bias__

### Goals for this lab:

- Reinforce interpretation of simple and multiple regression output:
- meaning of intercept; meaning(s) of slope(s)
- point estimates
- standard errors
- t-ratios (and P-values--you do not need the tables in the text)
- zero hypotheses
- use of TEST command (for other hypotheses)
- R-squared (adjusted R-squared)

- Reinforce issue of potential omitted variables bias in ANY regression
- Reinforce the connections between (a) verbal formulation of hypotheses, (b)
corresponding regression model and statistical hypotheses, (c) translation back
into verbal interpretation of results

### Tasks:

1. Omitted Variables Bias Example from Problem Set #4 (review problems; not to
be handed in): Concept: Finding an apparent relationship in a simple regression
that actually isn't there, as a consequence of omitted variables bias. File
needed: n:colds.sha.

a.) Try a naive regression model that treats COLDS as a function of VISITS to
places of worship: OLS COLDS VISITS. If your atheist brother-in-law asserts that
more-religious people are light-weight hypochondriacs who are always getting sick,
what would this regression imply with respect to his assertion? Using the basic
regression results, test the null hypothesis that VISITS have no effect on COLDS.
(Also try an appropriate version of the TEST command to verify these results.) BE
SURE YOU KNOW HOW TO DO THIS AND HOW TO INTERPRET THE RESULTS.
b.) You wonder if any other factors besides religiosity affect health status.
AGE is a variable you have handy. You control for AGE before examining the effect
of VISITS on COLDS by running the regression OLS COLDS VISITS AGE. (Note that the
order of the explanatory variables doesn't matter; the results for each variable
will be unaffected.) Repeat the test of the hypothesis that VISITS have no effect
on COLDS. What happens now? Test the hypothesis that AGE has no effect on
COLDS.
c.) Examine a plot of the relationship between the two explanatory variables:
VISITS and AGE. Use PLOT VISITS AGE / GNU (Why do you not use the LINEONLY or
LINE option?) Explain what accounts for the conflicting implications of the
simple regression and the multiple regression. Which model is more "right"?
Why?
d.) Compare the "fit" of the simple and multiple regression models. Which
model does a better job of explaining the observed variation across people in the
number of colds? What about the fact that one model has an unfair advantage in
that it uses more regressors? How do you control for this unfair advantage?
2. Omitted Variables Bias Example from Problem Set #4 (review problems; not to
be handed in): Concept: Finding NO effect of one variable on another (or an
effect that is counter to what you might expect) when there actually is an effect,
but it is obscured by omitted variables. File needed: n:study.sha.

a.) Repeat the steps for the n:colds.sha example
using the n:study.sha program and data. Here you are
interested in knowing whether hours of studying (STUDY) has a statistically
significant effect on midterm grades (MIDTERM). If you fail to control for
"ability" approximated by prior GPA, you find some counter-intuitive results; an
effect you might think should be there is not statistically significant--and the
sign is even "wrong." When you do control for ability, something more reasonable
appears.
b.) Be sure to examine the relationship between study hours and GPA for a
clue to the source of the bias in the simple regression. Ensure that you can
explain in words what accounts for the difference in results between the simple
and the multiple regressions in this case.
c.) For your preferred specification, test the hypothesis that study time has
no effect on midterm grade. What do you conclude, based on this specification?
Now test the hypothesis that an extra hour of study time, on average, will produce
a 5-point-higher midterm score. What does your model imply?
d.) Compare the "fit" of the simple and multiple regression models. Which
model does a better job of explaining the observed variation across people in
midterm scores? What about the fact that one model has an unfair advantage in
that it uses more regressors? How do you control for this unfair advantage?
e.) Contemplate the interpretation of the intercept in this model. Is it
meaningful? Why or why not?
3. (Time permitting) Exploring how to get fancy laser-printed plots that show
both raw data and fitted values: GNUPLOT option on SHAZAM PLOT command with
subsequent editing of the gnuplot command file.

a.) Consider a plot of the relationship between VISITS and AGE from the
colds.sha example (as an illustration of the technique). Append to the end of
your own version of colds.sha the following additional code:
ols visits age / predict=vhat

plot visits vhat age / gnu lineonly

plot visits vhat age / gnu lineonly commfile=cold.gnu &

datafile=cold.dat

The first plot command lets you see a screen version of the graphics plot. It
will have a straight line and a wiggly line. We are going to erase the wiggly
line and leave just its "dots." This requires the second plot command and
subsequent editing of the appropriate hidden file indicated in the cold.gnu output
file generated by this code. The second plot command won't send output to the
screen; instead, it will all go to files. Note (as before) that the names you
specify for the commfile= and datafile= options cannot be more than 8 characters
in total, including the "." and the extension, so if you are going to use
informative extensions like those above (.gnu and .dat), the first part of the
filenames should be no more than 4 characters long. Make a note of the name you
selected for the commfile. Now type STOP to exit SHAZAM.
b.) Now select TED, because we are going to peek into the cold.gnu file.
Note the name of the real gnuplot program that this file points to, exit TED, and
then enter TED again to edit this hidden file. This is where you can change the
title for the plot, and change the names of the variables if you like. In
particular, you want to find the line of code for the VISITS variable (not VHAT)
and delete the **w lines** part. When you are happy with your
modifications, save and exit TED.
c.) Now select GNUPLOT for Windows and tell the program the name of the file
you want processed to create a plot for the laser printer. This will be the
COLDS.GNU file (or the hidden filename), if you have been proceeding as
above.
d.) As before, you can print the plot on the laser printer by right-clicking
on the plot after it appears on the screen, ensuring that the print options are to
your liking, and then sending the print task to the printer.

Updated: October 23, 1997

Prepared by: Trudy Ann Cameron