UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics

Economics 143 (Cameron) - Applied Regression Analysis

Problem Set #3: Simple Regression; Estimation

Due: October 22, 1998

INSTRUCTIONS:This homework set is intended to consolidate in your mind what happens when you ask SHAZAM to run an OLS regression.

NETWORK FILES NEEDED: transfer.dat, doodad.dat . NOTE: If you have BruinOnline or other Web access, you should be able to get the contents of these files by going to http://www.sscnet.ucla.ed u/98W/econ143-1. Select the link to Problem Sets, find Problem Set #3, and once inside, click on the name of the file you want. Or, you can go directly to the list of Data Sets for the course and look for the ones you need by name. You should then be able to highlight, copy, and save to a new file in SHAZAM the relevant contents of the file of interest, and proceed with the homework. Be sure to get the file extensions right (either *.sha or *.dat, accordingly). See the instructions in the SHAZAM for WINDOWS orientation handout.

1. Do a very small "simple regression" problem by hand, using the computations necessary to arrive at b1, b2, s2 = (1/(n-2))S ei2 . Also calculate the standard errors of the point estimates for b1 and b2. (You might want to postpone the standard error calculations if we do not get to this in time during the lectures.) You will probably want to mimic some of the steps displayed in the handout entitled "Table 5.4" which shows the kinds of step-by-step calculations that are now done much more efficiently by computers than by people. Assume that your data are as follows:

Y    6    5    1    0
X    1    2    3    4

Verify your results using the appropriate one-line OLS command in SHAZAM. (Note that with this small number of observations, you will probably find it easiest to embed the data directly within the SHAZAM commands, rather than reading the data from a separate file. See the handout on how to run SHAZAM for Windows for how to do this (the READ statement with no filename given). NOTE: Be sure to regress Y on X (i.e. OLS Y X), not vice versa.

2. Determine whether the following models are linear in the parameters, or the variables, or both. Which of these models can be estimated as linear regression models (possibly after transformation of the data)?

a.) Yi = B1 + B2 (1/Xi) + ui

b.) Yi = B1 + B2 log(Xi) + ui

c.) Yi = B1 XB 2 eui (exponent on e is ui)

Note: ln and log are used interchangeably to signify natural logarithms (log to the base e). Base 10 logarithms are almost never used in econometrics.

4. Two special cases. Note: these exercises are easiest if you mimic the algebra covered in the two class handouts on OLS estimator formulas and derivation of the variances of these estimators. Just zero-out the parameters that are not relevant.

a.) There are occasions when the two-variable population regression function (PRF) assumes the following form:

Yi = B2Xi + ui

In this model, the intercept term is absent (perhaps some theory tells us it should be exactly zero--that when X is zero, Y must also be zero). The model is therefore known as "regression through the origin." SHAZAM can estimate such a model by using the command OLS Y X / NOCONSTANT. For this model, show that:

i.) b2 = S Xi Yi / S Xi2

ii.) Var (b2) = s 2 / S Xi2      (This question should be considered optional if we do not get to the discussion of the variance of a regression slope estimator before the problem set is due.)

b.) What happens if your population regression function (PRF) assumes the following form:
 

Yi = B1 + ui ?
The result in this specification is what you would find if you issued to SHAZAM the command OLS Y with no explanatory variables at all (but, of course, with some data on Y in the computer's memory) . Compare to results of STAT Y.
 

5. I have sent to the network (and posted on the web) a copy of the data file transfer.dat. Imagine that this file contains data on government transfer payments to families (transfer) and family expenditures on children (childexp). Look at the contents of n:transfer.dat. Create your own SHAZAM command by opening a new file and entering appropriate commands to accomplish the following tasks.

a.) Read in the data using: sample 1 100
read(n:transfer.dat) transfer childexp
Remember, if you have copied the file n:transfer.dat from the network to your own diskette, which resides, say, in your a: drive, you would refer to the file as a:transfer.dat.

b.) Using all the data provided, estimate the parameters in a linear regression of "monthly expenditures on family's children" (childexp) on "monthly receipts of transfer payments" (transfer). What are these parameter estimates and what are their standard errors?

c.) Does this model suggest that
 

(i.) on average, for each additional dollar of transfer payments, these families will spend some of that dollar on their children?
 
(ii.) on average, families spend positive amounts on their children, even if they receive no transfer payments?

The answers to these questions concern the slope coefficient and intercept coefficient in the regression. (It is helpful to think about the verbal definition of the slope and intercept in any regression model. The slope is the "change in Y for a one-unit change in X." The intercept is the "expected value of Y when X is zero.")

d.) A little harder: Does this model suggest that, on average, for an additional dollar of transfer payments, these families tend to spend all of that additional dollar on the family's children? The answer to this also concerns the slope coefficient in the regression.

e.) Plot the data in a scattergram. Examine the plot carefully. Are any points that are likely to be "influential" in the fitting of a regression line (called "outliers")? Explain. From a simple plot of childexp against transfer, identify a range of values for, say, transfers, that includes ONLY the offending observations. Say this range includes values greater than 800. Use the SKIPIF command to force SHAZAM to leave this observation out of subsequent calculations. The format will be skipif(transfer.ge.800). The ".ge." is the way SHAZAM compares variable values to some benchmark. Correspondingly, you could use .le., .gt., .lt., .eq., .ne., for "less than or equal to," "greater than or equal to," "less than," "equal to," "not equal to," and so on. Re-run the above tasks on this reduced data set. What happens to your results from the above regressions?


7. In this problem, you will explore the consequences of 'changes in scale' and 'changes in origin' in the measurement of either the dependent or the explanatory variable. Imagine that you have been supplied with six observations on the marginal costs incurred by the Acme Doodad company for the production of one additional doodad. Marginal costs (MC) depend crucially on the level of output (Q) at which the company is producing. The data are available on the network as the file n:doodad.dat (or you could type these data into your program or into your own data file).
MC ($)   117   111   109   114   126   131
Q (#)        94   106   118   130   142   154
a.) Using OLS MC Q, estimate a linear marginal cost "curve" for this firm using SHAZAM. Be sure to give the units associated with each coefficient on your annotated computer output.

b.) Change of scale: Now measure units in dozens (i.e. GENR QD=Q/12), re-estimate the model, identify which quantities of interest on the regression output have changed and which have not. Why? What happens to the product (slope coefficient times variable) when you change the scale of measurement of an explanatory variable.

c.) Change of origin: Go back to the original quantity measure, Q, but now measure MC in "dollars in excess of $100." (i.e. GENR MC100=MC-100.) Which quantities are now different from the original model, which aren't, and why?

d.) Part (b.) represented a 'change of scale,' while part (c.) was a 'change of origin.' A special combination of a change of scale and a change of origin is called "standardization." Variable-by-variable, one first subtracts the variable's mean and then divides by the variable's standard deviation. A regression of standardized MC on standardized Q is interesting in that the slope coefficient(s) tell the number of standard deviations by which MC will change when Q changes by one standard deviation. When we begin considering models with more than one explanatory variable, this will be a useful way to compare the relative influence of different explanatory variables on the dependent variable. The units of the different explanatory variables will not matter. (Why?)

SHAZAM produces the coefficients for this "standardized" regression automatically on every run. Locate them on your output. How do these coefficients change between (a.), (b.), and (c.) above? Can you visualize why using a graph? Optional: Can you produce them explicitly by generating the standardized variables directly and regressing them? Try it. (HINT: You can get the means and the standard deviations using the "STAT MC Q / MEAN=mvars STDEV=svars" command. The mean of the first variable, MC, can then be referred to as mvars:1 and its standard deviation as svars:1; likewise, the mean of Q will be mvars:2 and the standard deviation of Q will be svars:2.

e.) Optional: Reflect upon the validity of fitting a straight line to these data. Think back to Economics 1. What does economic theory have to say about the shape of a MC curve? What does a plot of MC against quantity suggest about the shape of the MC curve?



Updated: 12:04 PM 10/14/98; Prepared by: Trudy Ann Cameron; Site Index