UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics
Fall 1997 Cameron
Economics 143 - Final Examination - Outline of Solutions in Red

Version without Solutions

INSTRUCTIONS: Answer all questions in the space provided (or indicate clearly where you have continued your answer on the back of the page). Calculators are NOT permitted. Reduce all computations to the simplest form so that anyone with a calculator could attain the answer easily. Show your work and reasoning to the fullest extent possible so that part marks can be assigned as warranted. You have three hours to complete this exam. There are 25 questions (or sections) worth 5 points each. Total points = 125. Budget your time carefully. Exhibit pages should not be turned in with your exam. Remember: answer questions in a manner that reflects the econometric reasoning you have learned in this course.

1. The following questions refer to the computer output in EXHIBIT 1. These are hypothetical demand data concerning a cross-sectional sample of 50 communities wherein a firm markets its output. The dependent variable is Q = number of units sold per month in each market; the available explanatory variables are P = price charged per unit in that market, and INC = median family income in thousands of dollars per year in that market.

a.) Taking into account the information in STATISTICS 1, if we neglected to include income in a model to explain quantity demanded, what would you expect to happen to the slope coefficient on P, compared to the results that appear in REGRESSION 1A? Explain.

In these data, income and price have a correlation of only -0.06986. Thus the price coefficient will NOT be biased very much if income is omitted from the model. [Extra point allocated if anybody works out the direction of this small bias, but you would have to assume normality of the good (price coefficient thus more negative) or inferiority of the good (price coefficient thus less negative).]

b.) Comment on the theoretical plausibility of the parameter point estimates in REGRESSION 1A. What is implied by these estimates about the nature of demand for this good?

Price is usually expected to have a negative effect on quantity demanded. In this model, however, so does income. For most goods, we expect quantity demanded to increase when income increases, so the negative sign on the slope with respect to INC is a little surprising. [Aside: suggests the good is "inferior."]

c.) In REGRESSION 1B, you attempt a more general specification. Based on the estimated parameters, what can you now infer about the nature of the relationship between quantity demanded and community incomes, controlling for variations in price? (Recall that goods for which an increase in income leads to an increase in quantity demanded are called "normal," whereas goods for which an increase in income leads to a decrease in quantity demanded are called "inferior.")

The coefficient on the quadratic term in income is statistically significant at the 5% level, so the relationship is as depicted. The good is "normal" for incomes below 37.8 and "inferior" for incomes above 37.8 (37.8 is INSIDE the range of INC, which is 21 to 69 thousand dollars). [Aside: this model is preferred to Regression 1A because the adjusted R2 value is higher.]

d.) At the means of the data (according to REGRESSION 1B), is demand price-elastic, or price-inelastic? Explain.

The elasticity at the means of q and p is -0.3891. This implies inelastic demand, since a 1% increase in price leads to a 0.38% decrease in quantity demanded.

e.) Since the point estimate of the coefficient on INC in REGRESSION 1B is not statistically significantly different from zero at the 5% level, it would be alright to drop this variable from the model. True, False, Uncertain? Explain.

Leaving out the linear term in a quadratic specification forces the maximum (or minimum) of the quadratic shape to occur at INC=0. This would be undesirable. [As we include higher powers of a variable, we generally retain the lower powers.]

f.) What does REGRESSION 1C tell you about the properties of the estimates produced by REGRESSION 1B?

We use the fitted squared errors as a proxy for the true but unknown population regression error variance (sigma-squared). This regression shows that the conditional error variance is inversely related to the magnitude of P. Implications: we have heteroscedasticity; the parameter point estimates by OLS are unbiased, but the standard errors are calculated incorrectly, so all inferences are invalid.

g.) Compare the results of REGRESSION 1B with those of REGRESSION 1D. Which specification would you prefer, and why? Does REGRESSION 1E have any bearing on your choice?

Preferred regression: Regression 1D

Since we have included a loglog option on the regression command in 1D, we can compare the maximized log-likelihoods. FOR THIS SAMPLE, 1D has a higher value (-178.531) than does 1B (-178.803), so the log-log model is preferred, if only slightly. Regression 1E reveals that logging the dependent variable does NOT eliminate the heteroscedasticity in this case, even though it sometimes does.

h.) REGRESSION 1D suggests that neither income term makes a statistically significant contribution to explaining the level of demand, implying that we could leave income out of the model without much compromise of its explanatory power. True, False, Uncertain? Explain.

Following this regression is an F-test of the hypothesis that the coefficients on "linc" and "linc2" are jointly zero. Since the P-value for this F-test is less than 0.05, we reject this hypothesis; the income terms DO belong in the model.

i.) Based on the results of either REGRESSION 1C or REGRESSION 1E, choose among REGRESSIONS 1F through 1K for the next logical step in your analysis. Explain your choice.

Preferred regression: Regression 1J

(If you chose wrong in part (g.), we gave credit for choosing 1G instead.) Since we prefer Regression 1D, regression 1E shows that the error terms (squared) vary inversely with LP. Thus we want the weights to be givver when LP is bigger, so wt5=LP is probably the best choice among those available.

j.) What is the estimated price elasticity of demand at the means of the data for your chosen specification? -0.38154 (or, -0.4011 if you selected 1G above). Is it possible that the true underlying demand relationship could have unit elasticity, such that a small adjustment in price would have no effect on the revenues of the sellers? Explain how you would test this.

The test is easiest to implement in the constant- elasticity log-log specification, since it amounts to a hypothesis test concerning whether the slope on logP is 1. You would follow the OLS command with TEST LOGP=1. If the P-value of this test is less than 0.05, you would reject. Alternatively, you can construct the specialized t-test by hand: [-0.38154-(- 1)]/0.05640 equals about -.62/0.05, which is roughly 12. This is clearly in the rejection region, so we reject the null hypothesis that elasticity is unitary.

2. The following questions pertain to EXHIBIT 2. These are real data, and we will explore a preliminary model to explain the observed quarterly time-series variation in automobile loans at commercial banks. The variables read by the program are defined as follows:

DATE = year and quarter in decimal form (e.g. 1960.25=1960:1)
AUTOCRED = consumer installment credit outstanding: automobiles, commercial banks (million $, end of month, not seasonally adjusted) [CITIBASE variable CCIUAC; monthly data averaged for each quarter (1960:1-1996:4)]
YP = gross national product, total [CITIBASE variable GNP; quarterly (1960:1-1996:4)].
R = nominal interest rate, measured as the rate on commercial paper, 6-mo (% per annum, not seasonally adjusted) [CITIBASE variable FYCP; monthly data averaged for each quarter (1960:1-1996:4)].
AUTOINV = inventories, business, retail durables, motor vehicle dealers; billions [CITIBASE variable GLRDA; quarterly (1960:1-1996:4)]
QTR1, QTR2, QTR3, QTR4 = set of quarterly dummy variables, equal to one during each respective quarter, zero otherwise.

 a.) Which three variables in this data set are the most highly correlated?

YP, AUTOCRED, AUTOINV b.) According to REGRESSION 2A, approximately what is the rate of change of outstanding automobile loans per year? 16160. These are quarterly data, so the estimated change per year is 4[4040.7], which is approximately 16160.

c.) Based solely on REGRESSION 2B, does multicollinearity compromise our ability to discern the incremental effects on AUTOCRED of changes in any of the individual explanatory variables? Explain.

DW = 0.31 (P-value=0.000) implies AR(1) errors at a minimum. YP and AUTOINV are highly collinear, so their statistical significance may be compromized in Regression 2B. However, it is still very high, with t- ratios of 2.18 and 8.96, so despite the multicollinearity, we can still identify the distinct contributions of each variable. However, this also assumes that the OLS standard errors are valid--in fact, they are NOT. In reality, we do not know the true standard errors yet.....

d.) Is there evidence of systematic "seasonal" variations in the level of AUTOCRED, according to REGRESSION 2B? Explain.

This model, which ignores serial correlation in the errors, suggests no seasonality, since the F-test of the joint significance of the set of quarterly dummmy variables fails to reject that the coefficients on QTR2, WTR3, WTR4 are jointly zero. [However, note that these hypothesis tests are NOT valid, since the regression ignores serially correlated errors in computing standard errors of coefficients.]

e.) What is the purpose of REGRESSION 2C? What does it imply about the results obtained from REGRESSION 2B?

Regression 2C is intended to detect serial correlations up to order=4 in the initial OLS error terms. Despite the collinearity among the various lagged errors, ELAG1 through ELAG3 are strongly significant. We need an autoregressive error model. Point estimates are unbiased, but standard errors (and therefore t- ratios) are wrong in Regression 2B.

f.) Is REGRESSION 2D likely to be adequate to correct the problems revealed by REGRESSION 2C? Why or why not? Explain.

Regression 2D allows only for first-order serial correlation in the data, whereas we have identified a more-commplex lag structure in the errors in Regression 2C. These are quarterly data, so we might expect more than just AR(1): et = rho*et-1 + epsilont. Instead, try AR(4): et = rho1*et-1 + rho2*et-2 + rho3*et-3 + rho4*et-4 + epsilont.

g.) Suppose the REGRESSION 2E was your preferred model. Does this specification suggest the presence of seasonal effects in AUTOCRED? Which months tend to have the highest amount of outstanding car loans? July, August, September (QTR3). Which months tend to have the lowest amount of outstanding car loans? January, February, March (QTR1).

Two of the seasonal dummies are now strongly statistically significant, and we are more inclined to believe this significance since we have remedied much of the serially correlated error problem by using the AUTO command.

h.) How do the implications of REGRESSION 2E differ from those of REGRESSION 2B concerning the effects on car loans of (a) nominal interest rates, and (b) car dealer inventories? Explain.

The apparent effect of R changes from large and highly significant to small and completely insignificant; same for the effects of AUTOINV. Both effects remain positive, but our hopes for using these variables to explain AUTOCRED have evaporated. It seems to be all just GNP and quarterly effects!

i.) Compare the goodness-of-fit of REGRESSION 2B with that of REGRESSION 2E.

R2 in Regression 2B is 0.9848; R2 in Regression 2E is 0.9994. Each model uses the same number of regressors, so these are comparable, and these R2 values are corrected for the original variables (recall the transformation of the data that is necessary to get the AUTO parameter estimates). We would expect the predicted and actual AUTOCRED seris to track pretty well in both cases, but almost perfectly for Regression 2E.

3. Since she knows you have taken Economics 143 at UCLA, you have been asked by your manager to evaluate an empirical research proposal prepared by an economist in another division of your company. As a dependent variable, the researcher plans to use the sales of your firm's product (laundry soap) in each of its 20 sales regions. Price differs across regions, so price is one logical explanatory variable. Also, since women typically do more laundry than men, and wealthier people have a higher proportion of clothing that requires dry-cleaning, the researcher also plans to use gender and household income to explain household demand for this product. What will be your main comment(s) about this proposal?

Dependent variable uses "sales region" as an observation, whereas some of the explanatory variables pertain to individual consumers. All variables need to correspond to the same unit of observation. I.e. ALL for sales regions, or ALL for households, not a mixture. Exception: model of household demands might use sales region data on explanatory variables for all households in the same sales region as proxies if some pertinent data is not available at the household level.

 
4. If your dependent variable is a (0,1) dummy variable that indicates the category to which an observation belongs, ordinary least squares is still the best way to estimate the average effects of changes in the explanatory variables on category membership. True, False, Uncertain? Explain.

No. OLS leads to predicted probabilities outside the [0,1] interval, and also suffers from heteroscedasticity problems. Preferred strategy is to shift to maximum likelihood estimation of a probit (or logit) discrete choice model. Slopes can only be estimated up to a scale factor, but coefficient signs and t-ratios have the usual interpretations--we can find out hos specific variables affect the probability of a "1" (or "yes") outcome.

 
5. Suppose you are reading an article concerning the effects of citizenship status on earnings and you encounter the following estimated model:
 

EARNi = 5.72 + 1.26 EXPi - 0.92 NONCi + 0.35 EXPi*NONCi
             (1.22)   (0.31)          (0.55)                 (0.20)

where EARNi = earnings;
          EXPi = job experience;
          NONCi = 1 if noncitizen; = 0 if citizen; i = 1,...,468.

and the parameter standard errors are given in parentheses below each point estimate.

Thus the t-ratios for the zero hypothesis for each estimated parameter, in order, are "more than 3," "more than 3," "less than 1.96," and "about 1.75."

a.) Based on the point estimates, what is the average "starting salary" for a citizen? 5.72 For a non-citizen? 5.72 - 0.92 = 4.80.

Based on the point estimates, what is the return to experience for a citizen? 1.26. For a non-citizen? 1.26 + 0.35 = 1.61.

b.) Does citizenship status have a statistically significant effect on earnings? Explain.

Individually, the coefficients on NONC and EXP*NONC are not statistically significantly different from zero. We would need to do a joint test to answer this question. Insufficient info is provided to do an F-test.

c.) Does this model predict that citizens will always earn more than non-citizens (or vice-versa)? If not, how does the predicted earnings differential (citizens-noncitizens) change with experience? What formula would you use to determine the experience level at which the differential changes sign? To what would you compare the result of this calculation before concluding the relevance of a sign change in the differential?
 

On a plot of EARN against EXP, the fitted regression lines will cross somewhere, because the NONC intercept is lower than the CITIZEN intercept, but its slope is greater. We need to figure out the value of EXP* at which EARN is equal for the two groups and then determine whether this EXP* value lies within the range of the data for this sample. The algebra is as follows:
5.72 + 1.26 EXP* = 4.80 + 1.61 EXP*
(1.61 - 1.26) EXP* = (5.72 - 4.80)
EXP* = (5.72-4.80)/(1.61-1.26)

6. Suppose you have supervised twenty different studies (for twenty different firms) of employee sick days. For each firm, you collected individual employee records on sick days taken per year (SICKi) as a function of daily average intake of Vitamin C supplements (VITCi) by that employee. For each firm, you have estimated a model of the following form:

          SICKi = b1 + b2 VITCi + e i,      where i indexes individual employees.

Every one of these twenty different empirical models has shown that the coefficient b2 is negative and strongly statistically significantly different from zero. The empirical evidence is extremely robust across studies. Are you ready to order a press release announcing that taking of Vitamin C should become company policy for any firm that wishes to reduce losses due to employee health problems? Explain.
 

The explanatory variable, VITCi is endogenous, in that its value is determined by the health maintenance decisions of the individual employee. To make a policy recommendation, you would prefer to have a controlled experiment, where VITCi values were randomly assigned to different employees, and their resulting SICKi values recorded. If people who take more vitamin C are more careful about their health in lots of ways, they might be expected to have fewer sick days. Correlation does not imply causality.

BONUS: (5 points)

Suppose you use a classroom survey to collect data on average hours of sleep per night (SLEEPi) as a function of age (AGEi). Everybody reports a value for SLEEPi, but 20% of your sample fails to report their ages. Suggest a model that will allow you to use all of the data and to estimate the effect of AGE on SLEEP, conditional on AGE being known, as well as expected SLEEP hours for the group that failed to report their age.

Create a new variable, say HAVEAGE=1 if age data are available, 0 if they are not. Interact this variable with the AGE variable, so that HAVEAGE*AGE = 0 if age data are unavailable, and =AGE if age data are available. Then run the regression:
SLEEPi = b1 + b2 HAVEAGEi + b3 HAVEAGEi*AGEi + ei
If age data are unavailable, the model becomes:
SLEEPi = b1 + ei
If age data ARE availabe, the model becomes:
SLEEPi = (b1 + b2) + (b3) AGEi + ei
The coefficient b3 is the effect of AGE on E[SLEEP], given that AGE data are in fact available. Thus, this is a conditional derivative.

 


COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS
Updated: February 18, 1998
Prepared by: Trudy Ann Cameron