Due: February 3, 1998
INSTRUCTIONS:This homework set is intended to consolidate in your mind what happens when you ask SHAZAM to run an OLS regression.
NETWORK FILES NEEDED: transfer.dat, doodad.dat
. NOTE: If you have BruinOnline or other
Web access, you should be able to get the contents of these files by going
to
http://www.sscnet.ucla.ed
u/98W/econ143-1.
Select the link to Problem Sets, find Problem Set #3, and
click on the name of
the file you want. Or, you can go directly to the list of Data Sets for the course and look for the ones you need by
name. You should then be able to save the contents of the
file to disk, edit out the <html> commands at the top and the bottom
so it is just plain ASCII data, and proceed with the homework. See the
instructions in the SHAZAM orientation handout.
1. Do a very small "simple regression" problem
by hand, using the basic computations necessary to arrive at b1,
b2, s2 = (1/(n-2))S
ei2 . Also calculate the standard errors of the point
estimates for b1 and b2. (You might want to postpone
the standard error calculations until after we have covered this in lecture.)
You will probably want to mimic some of the steps displayed in Table 5.4.
Assume that your data are as follows:
Verify your results using the appropriate one-line OLS command in SHAZAM. (Note that with this small number of observations, you will probably find it easiest to embed the data directly within the SHAZAM commands, rather than reading the data from a separate file. See the handout on how to run SHAZAM for how to do this (the READ statement with no filename given). NOTE: Be sure to regress Y on X (i.e. OLS Y X), not vice versa.
2. Determine whether the following models are linear in the parameters, or the variables, or both. Which of these models can be estimated as linear regression models (possibly after transformation of the data)?
a.) Yi = B1 + B2 (1/Xi) + ui
b.) Yi = B1 + B2 log(Xi) + ui
c.) Yi = B1 XB 2 eui (exponent on e is ui)
Note: ln and log are used interchangeably to signify natural logarithms (log to the base e). Base 10 logarithms are almost never used in econometrics.
4. Two special cases. Note: these exercises are easiest if you mimic the algebra covered in the two class handouts on OLS estimator formulas and derivation of the variances of these estimators. Just zero-out the parameters that are not relevant.
Yi = B2Xi + ui
In this model, the intercept term is absent (perhaps some theory tells us it should be exactly zero--that when X is zero, Y must also be zero). The model is therefore known as "regression through the origin." SHAZAM can estimate such a model by using the command OLS Y X / NOCONSTANT. For this model, show that:
i.) b2 = S Xi Yi / S Xi2
ii.) Var (b2) = s 2 / S Xi2 (This question should be considered optional if we do not get to the discussion of the variance of a regression slope estimator before the problem set is due.)
b.) What happens if your population regression
function (PRF) assumes the following form:
5. I have sent to the network (and posted on the web) a copy of the data file transfer.dat. Imagine that this file contains data on government transfer payments to families (transfer) and family expenditures on children (childexp). Look at the contents of n:transfer.dat using TED. Create your own SHAZAM command program using TED to accomplish the following tasks.
b.) Using all the data provided, estimate the parameters in a linear regression of "monthly expenditures on family's children" (childexp) on "monthly receipts of transfer payments" (transfer) and obtain the coefficient of determination (r-squared value) for the model. What does this coefficient imply in a simple regression model? (Text pp. 160-164)
c.) Does this model suggest
that
(ii.) on average, families spend positive amounts on their children, even if
they receive no transfer payments?
The answers to these questions concern the slope coefficient and
intercept coefficient in the regression. (It is helpful to think
about the verbal definition of the slope and intercept in any regression model.
The slope is the "change in Y for a one-unit change in X." The intercept is the
"expected value of Y when X is zero.")
d.) A little harder: Does this model suggest that, on average, for an additional dollar of transfer payments, these families tend to spend all of that additional dollar on the family's children? The answer to this also concerns the slope coefficient in the regression.
e.) Plot the data in a scattergram. Examine the plot carefully. Are any points that are likely to be "influential" in the fitting of a regression line (called "outliers")? Explain. From a simple plot of childexp against transfer, identify a range of values for, say, transfers, that includes ONLY the offending observations. Say this range includes values greater than 800. Use the SKIPIF command to force SHAZAM to leave this observation out of subsequent calculations. The format will be skipif(transfer.ge.800). The ".ge." is the way SHAZAM compares variable values to some benchmark. Correspondingly, you could use .le., .gt., .lt., .eq., .ne., for "less than or equal to," "greater than or equal to," "less than," "equal to," "not equal to," and so on. Re-run the above tasks on this reduced data set. What happens to your results from the above regressions?
b.) Change of scale: Now measure units in dozens (i.e. GENR QD=Q/12), re-estimate the model, identify which quantities of interest on the regression output have changed and which have not. Why? What happens to the product (slope coefficient times variable) when you change the scale of measurement of an explanatory variable.
c.) Change of origin: Go back to the original quantity measure, Q, but now measure MC in "dollars in excess of $100." (i.e. GENR MC100=MC-100.) Which quantities are now different from the original model, which aren't, and why?
d.) Part (b.) represented a 'change of scale,' while part (c.) was a 'change of origin.' A special combination of a change of scale and a change of origin is called "standardization." Variable-by-variable, one first subtracts the mean and then divides by the standard deviation. A regression of standardized MC on standardized Q is interesting in that the slope coefficient(s) tell the number of standard deviations by which MC will change when Q changes by one standard deviation. When we begin considering models with more than one explanatory variable, this will be a useful way to compare the relative influence of different explanatory variables on the dependent variable. The units of the different explanatory variables will not matter. (Why?)
SHAZAM produces the coefficients for this "standardized" regression automatically on every run. Locate them on your output. How do these coefficients change between (a.), (b.), and (c.) above? Can you visualize why using a graph? Optional: Can you produce them explicitly by generating the standardized variables directly and regressing them? Try it. (HINT: You can get the means and the standard deviations using the "STAT MC Q / MEAN=mvars STDEV=svars" command. The mean of the first variable, MC, can then be referred to as mvars:1 and its standard deviation as svars:1; likewise, the mean of Q will be mvars:2 and the standard deviation of Q will be svars:2.
e.) Optional: Reflect upon the validity of fitting a straight line to these data. Think back to Economics 1. What does economic theory have to say about the shape of a MC curve? What does a plot of MC against quantity suggest about the shape of the MC curve?
| COURSE OUTLINE | LECTURE OUTLINES | PROBLEM SETS | PROBLEM SOLUTIONS | COMPUTER LABS |
| SHAZAM EXAMPLES | DATA SETS | ONLINE QUIZZES | GRAPHICS | HANDOUTS |