UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Computing Lab Session #4: Multiple Regression; Omitted Variables Bias


Goals for this lab:

Tasks:

1. Omitted Variables Bias Example from Problem Set #4 (review problems; not to be handed in): Concept: Finding an apparent relationship in a simple regression that actually isn't there, as a consequence of omitted variables bias. File needed: n:colds.sha.

a.) Try a naive regression model that treats COLDS as a function of VISITS to places of worship: OLS COLDS VISITS. If your atheist brother-in-law asserts that more-religious people are light-weight hypochondriacs who are always getting sick, what would this regression imply with respect to his assertion? Using the basic regression results, test the null hypothesis that VISITS have no effect on COLDS. (Also try an appropriate version of the TEST command to verify these results.) BE SURE YOU KNOW HOW TO DO THIS AND HOW TO INTERPRET THE RESULTS. b.) You wonder if any other factors besides religiosity affect health status. AGE is a variable you have handy. You control for AGE before examining the effect of VISITS on COLDS by running the regression OLS COLDS VISITS AGE. (Note that the order of the explanatory variables doesn't matter; the results for each variable will be unaffected.) Repeat the test of the hypothesis that VISITS have no effect on COLDS. What happens now? Test the hypothesis that AGE has no effect on COLDS. c.) Examine a plot of the relationship between the two explanatory variables: VISITS and AGE. Use PLOT VISITS AGE / GNU (Why do you not use the LINEONLY or LINE option?) Explain what accounts for the conflicting implications of the simple regression and the multiple regression. Which model is more "right"? Why? d.) Compare the "fit" of the simple and multiple regression models. Which model does a better job of explaining the observed variation across people in the number of colds? What about the fact that one model has an unfair advantage in that it uses more regressors? How do you control for this unfair advantage?

2. Omitted Variables Bias Example from Problem Set #4 (review problems; not to be handed in): Concept: Finding NO effect of one variable on another (or an effect that is counter to what you might expect) when there actually is an effect, but it is obscured by omitted variables. File needed: n:study.sha.

a.) Repeat the steps for the n:colds.sha example using the n:study.sha program and data. Here you are interested in knowing whether hours of studying (STUDY) has a statistically significant effect on midterm grades (MIDTERM). If you fail to control for "ability" approximated by prior GPA, you find some counter-intuitive results; an effect you might think should be there is not statistically significant--and the sign is even "wrong." When you do control for ability, something more reasonable appears. b.) Be sure to examine the relationship between study hours and GPA for a clue to the source of the bias in the simple regression. Ensure that you can explain in words what accounts for the difference in results between the simple and the multiple regressions in this case. c.) For your preferred specification, test the hypothesis that study time has no effect on midterm grade. What do you conclude, based on this specification? Now test the hypothesis that an extra hour of study time, on average, will produce a 5-point-higher midterm score. What does your model imply? d.) Compare the "fit" of the simple and multiple regression models. Which model does a better job of explaining the observed variation across people in midterm scores? What about the fact that one model has an unfair advantage in that it uses more regressors? How do you control for this unfair advantage? e.) Contemplate the interpretation of the intercept in this model. Is it meaningful? Why or why not?

3. (Time permitting) Exploring how to get fancy laser-printed plots that show both raw data and fitted values: GNUPLOT option on SHAZAM PLOT command with subsequent editing of the gnuplot command file.

a.) Consider a plot of the relationship between VISITS and AGE from the colds.sha example (as an illustration of the technique). Append to the end of your own version of colds.sha the following additional code:
ols visits age / predict=vhat
plot visits vhat age / gnu lineonly
plot visits vhat age / gnu lineonly commfile=cold.gnu &
       datafile=cold.dat
The first plot command lets you see a screen version of the graphics plot. It will have a straight line and a wiggly line. We are going to erase the wiggly line and leave just its "dots." This requires the second plot command and subsequent editing of the appropriate hidden file indicated in the cold.gnu output file generated by this code. The second plot command won't send output to the screen; instead, it will all go to files. Note (as before) that the names you specify for the commfile= and datafile= options cannot be more than 8 characters in total, including the "." and the extension, so if you are going to use informative extensions like those above (.gnu and .dat), the first part of the filenames should be no more than 4 characters long. Make a note of the name you selected for the commfile. Now type STOP to exit SHAZAM.
b.) Now select TED, because we are going to peek into the cold.gnu file. Note the name of the real gnuplot program that this file points to, exit TED, and then enter TED again to edit this hidden file. This is where you can change the title for the plot, and change the names of the variables if you like. In particular, you want to find the line of code for the VISITS variable (not VHAT) and delete the w lines part. When you are happy with your modifications, save and exit TED. c.) Now select GNUPLOT for Windows and tell the program the name of the file you want processed to create a plot for the laser printer. This will be the COLDS.GNU file (or the hidden filename), if you have been proceeding as above. d.) As before, you can print the plot on the laser printer by right-clicking on the plot after it appears on the screen, ensuring that the print options are to your liking, and then sending the print task to the printer.
COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS
COMPUTER LABS SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES

Updated: October 23, 1997
Prepared by: Trudy Ann Cameron