UNIVERSITY OF CALIFORNIA, LOS ANGELES
Department of Economics
Winter 1998
Economics 143 - Midterm Examination - Outlines of Solutions in Red

Version without Answers

INSTRUCTIONS: Answer all questions in the spaces provided (or indicate clearly where you have continued your answer). Calculators are NOT permitted. Reduce all computations to the simplest form so that anyone with a calculator could attain the answer easily. Show your work and reasoning to the fullest extent possible so that part marks can be assigned as warranted. You have 75 minutes to complete this exam. All parts of both questions are worth 10 points (and some are much easier than others). Total points = 150. This means roughly 5 minutes for each answer. Budget your time carefully. NOTE: these data are fictitious.

SCENARIO: The marketing sub-committee for a consortium of dealers of American-made luxury automobiles has hired your consulting firm to tell them about the determinants of demand for their products. For a random sample of 17 dealerships in different neighborhoods, you collect data on the average number of cars sold per month (carsi), the median age of the population in the same zipcode as the dealership (agei), the median household income (in thousands of dollars) from all sources in the same zipcode (inci), median price of luxury cars stocked at the dealership (in thousands of dollars) (pricei), and distance from the nearest foreign luxury car dealership (disti). The statistical analyses you perform are given in the Exhibits.

1. Fill in the blanks:

Across these 17 dealerships, what is the mean number of cars sold? 6.2659
What is the highest observed median price of luxury cars across these dealerships? $59,000
What is the standard deviation in median zipcode incomes across the sample? $38,230
Do the descriptive statistics you have just provided refer to the joint distribution of these three variables, or to their marginal distributions? marginal distributions
What is the correlation between agei and inci in this sample? .58242
What are the units for this correlation measure? none; correlation is a unit-free measure of linear association

Almost everybody who studies the old midterms seems to have gotten this right, although many people did not heed my advice to give units. This advice was meant to force you to realize that all dollar amounts in the variables are in thousands. Not recognizing this can mess you up later on the exam.
 
2. Using the descriptive statistics only, test the hypothesis that the true marginal mean number of cars sold at ALL dealerships is 5 cars per month. The thing to remember here is that this is a test of the mean of the cars variable (like a test of the mean of X). It is not a test about a regression slope or intercept. You would need to use s/(square root of n) as the variance of X-bar. Construct a t-test statistic as follows: [6.2659-5]/(3.1415/17). If this test statistic is further from zero than the 5% critical value of a t-distribution with n-1=16 degrees of freedom (which is 2.120), then you would reject the hypothesis.

3. Does Regression 1 make sense? Why or why not?

In this model, the dependent variable is AGE. This is the median age of the population in the same zipcode as the dealership. Such a model proposes that this median age will be affected by the rest of the variables. This is remotely possible, but not very likely. We would have to argue convincingly that the number of cars sold by a luxury-car dealership will influence the demographics of its neighborhood. The AGE variable, rather than being "endogenous," is probably one of the most "exogenous" variables in this story.

In grading this question, I found a surprising number of people who are not using the correct terminology. We regress the "Y" variable on the X variable. Several people talked about "regressing price, medinc, dist and spills on age." You can only regress one variable on a list of other variables (at least at this point in your career in econometrics.

4. The chairperson for the consortium says "I took Econ 1 and I know that demand curves slope downwards from left to right. I don't think much of your skills as an economist if the demand curve you estimate is not characterized by a negative slope." Based upon the relevant simple regression in the Exhibits, is it possible that there is a downward sloping demand curve for these cars? Explain how you have reached this conclusion.

The best way to answer this question is to construct a confidence interval for the true but unknown slope coefficient on the price variable. If this confidence interval includes any negative values, then we cannot reject ALL negative values for the true but unknown slope coefficient. Recall that confidence intervals contain the same information as is produces by t-tests of individual slope parameters. We can see by the t-test statistic and its associated P-value in Regression 2 that it is not possible to reject the zero hypothesis for this slope. Thus we are unlikely to be able to reject small negative values either (the P-value is nowhere near being on the edge of statistical significance--i.e. 0.05--it is 0.176). Alternately, you could look at the standard error estimate on the slope in this regression and see that a distance 1.96 standard errors to the left of the point estimate captures some negative values very easily. That would be the narrowest possible confidence interval if the degrees of freedom for the problem were huge. The actual confidence interval will be even wider than this.

Only a few people thought to use a confidence interval to describe the set of acceptable hypotheses. Some mentioned that if you could not reject zero, you probably couldn't reject small negative values either. Some talked about omitted variables bias, for which I gave partial credit, although it was not what I was looking for here.

5. Based on Regression 3, test the hypothesis that in order for a dealership to sell, on average, one additional luxury car per month, the neighborhood income needs to be $20,000 higher.

This requires a little thinking. If a difference in income of 20 led to a difference in cars sold of 1, then a difference in income of 1 would correspond to a difference in cars sold of 0.05 (one-twentieth). For the verbal hypothesis to be plausible, we would have to be unable to reject the null hypothesis that the slope on the inc variable in Regression 3 is 0.05. There are two special TEST statements associated with this regression. The first, TEST INC=20, is a "red herring." The second is the useful one: TEST INC=.05. With a P-value of about 0.73, we cannot reject this hypothesis, so the verbal statement is plausible.

A surprising number of people tried to plug $20,000 into the fitted regression equation for cars as a function of income. The question is not asking about the level of carsi for an income level of $20,000. Instead, it asks about the change in the sales of cars for a $20,000 change in income. This is a slope question.

6. Based on Regression 3, what level of monthly sales would you expect for a dealership in a neighborhood with median income of $60,000? Give the formula for a point estimate and explain explicitly how a 95% confidence interval for this prediction would be constructed. Why should you use caution in making this prediction?

For this, you would plug 60 into the estimated model in Regression 3. Expected sales of cars in such a neighborhood would be -4.0973 + 0.055436* (60). The confidence interval for prediction would require the components necessary for the "big messy" formula for the confidence interval for prediction for the E[Y|X=60]. These components are: s = 2.3950; n = 17; X-bar = 186.94. You need to calculate the sum of the little-xi2 by using [s/(s.e.b2)]2. You would want to use caution in making this prediction because it is an "out-of-sample" prediction. An income value of 60 is outside the range in the data, which is 100 to 250. I guess luxury car dealerships locate in rich neighborhoods....

7. You finally remember that demand functions are functions of several variables, not just one at a time. You estimate Regression 6 in order to ascertain the joint effects of all available demand determinants on the number of cars sold per month. Describe what appears to happen to the apparent effect of the income variable when you include the other variables in your model. WHY is the apparent effect of income different in the more-complex specification?

When income alone was an explanatory variable, it had a statistically significant positive effect on the number of cars sold (the P-value is 0.003 on the slope). However, when the other variables are included in the model, the slope coefficient on income changes. The slope on income is now statistically significant at the 10% level, but no longer at the 5% level. The effect of income in the simple regression was 0.055; in the multiple regression model, it is only 0.033. Based on the STAT output, we see that median income and median age are correlated across neighborhoods. When income alone was in the model, it was picking up some of the effect of age (which is also positive). Putting both variables into the model left income with a lesser share of the explanatory power shared by the two variables. The simple regression model appears to have been afflicted by omitted variables bias. The multiple regression model may suffer from multicollinearity. Remember that even a relatively innocuous amount of correlation between regressors (here income and age) can be enough to prevent rejection of the zero hypothesis for their two slopes if the amount of vertical dispersion in points around the regression plane (sigma-squared) is sufficiently large. This seems to be what is happening here.

8. In Regression 6, explain the use of the / auxrsqr option on the ols command. What does it tell you here?

The purpose of the auxrsqr command is to reveal potential sources of multicollinearity among the regressors. Among these regressors, inc and age display the highest degree of linear dependence on the other regressors, whereas price and dist seem to be unrelated to the others in any linear fashion.

9. For Regression 6, test the hypothesis that none of the explanatory variables has any effect on the dependent variable. Explain your reasoning.

For this, we need to look at the automatic F-test in the analysis-of-variance-from- means output. If the null hypothesis that all slopes are simultaneously zero is true, this F-test statistic will be relatively small (the explained sum of squares will be pretty small). The F-test statistic (with 4 and 12 degrees of freedom) is 5.639. The P-value associated with this value of the relevant F random variable is 0.009. Values like this will occur only 0.9% of the time, so this is an unusual event if the null hypothesis is true. We conclude that the null hypothesis is probably false. Therefore, at least one of the explanatory variables has some effect on the dependent variable.

10. On particularly self-assured dealer in the consortium brags that for years, he has claimed that in the luxury-car business, having a clientele that is older by 10 years is equivalent to having a clientele that is richer by $10,000. Do you have enough information in the output to test this informal "hypothesis"? Explain.

This requires a little advance thinking. The effect on sales of a median ages that is ten years greater is ten times the effect of a median age that is one year greater, which is the slope coefficient on the age variable. Likewise, the effect of a clientele that is richer by $10,000 is ten times the effect of a clientele with an inc value that is greater by one unit (since inc is measured in thousands of dollars). This is just the slope coefficient on the inc variable. The assertion (in a linear model) is equivalent to the assertion that "older by 1 year is equivalent to richer by $1000 (or one unit)". So, this boils down to a test of whether the slope coefficients on income and age could be identical. Fortunately, you are provided with a specialized F-test for exactly this hypothesis. The p-value associated with the F-test of the restriction that the slopes be identical is 0.26033. We cannot reject the hypothesis of equal coefficients.

 
11. In the different specifications in the Exhibits, what fraction of the variation in cars sold across dealerships can be explained by a model that uses only age? 50.46% or 0.5046. What fraction can be explained by a model that uses income, age, price and distance to nearest foreign luxury car dealership? 65.27% or 0.6527. Can these be compared? Why or why not?

These ordinary R-squared values cannot be compared because the second one would have been larger no matter what. Adding more variables can only improve the explained sum of squares, it can never reduce it, since at worst, the slopes on the additional variables will be zero. To compare goodness-of-fit across models with different numbers of regressors, you need to use the adjusted R-squared value. In this example, the adjusted R-squared values are 0.4715 and 0.5370, respectively. Thus, the larger model has a better goodness-of-fit for this particular sample than does the smaller model, even when we have penalized the fit for lost degrees of freedom by using adjusted R-squared.

 
12. The demand function estimated in Regression 6 exhibits a negative intercept. Can demand functions have negative intercepts? Does this result and/or its statistical significance trouble you? Why or why not? Explain.

We would only be concerned about a negative intercept for a demand curve if the only explanatory variable was price. Then, the intercept would have to be positive (and the slope negative) if we were expecting to find an ordinary downward-sloping sort of a demand curve. However, if other variables are included, they essentially "shift" the demand curve intercept (on the Q-axis) by an amount equal to the slopes on those variables time the values of the variables. Each entity in the sample could have a different demand function intercept, while all share the same slope. In Regression 6, the intercept is different for each observation if each observation is characterized by distinct values of inc, age, and dist. The only time a negative coefficient on the constant term would bother us would be if it was possible that inc=age=dist=0 for some dealership in this story. Since this event will never occur (probably) the negative coefficient on the constant term is a non-issue.

 
13. In Regression 6, are the apparent effects of inci and agei individually statistically significantly different from zero? Could these coefficient both be zero? Explain carefully. Do you have adequate information?

Neither the effect of income, nor the effect of age, is individually statistically significantly different from zero at the 5% level in this multiple regression. In order to test whether the two slopes could be jointly zero, we would prefer to run an F-test, using the following sequence of commands after Regression 6:
test
test inc=0
test age=0
end
However, this test does not appear to have been provided in the available output. Fortunately, a joint confidence ellipse for these to coefficients IS provided. We can look there for the answer to the question. Unfortunately, this is a "crummy plot" rather than a nice "gnuplot," so the resolution is not really sufficient. Still, from the locations of the four little "+" signs in the joint confidence ellipse. in conjunction with the information above the plot, we know that the lower-left "+" occurs at -0.001164 along the horizontal axis. The odds are very good, therefore, that the point (0,0) is outside the joint confidence ellipse. The F-test will probably reject the hypothesis that both slopes are simultaneously zero.

 
14. Is a model to explain carsi that uses only inci and agei (leaving out pricei and disti) an adequate representation of the factors that influence sales of US-made luxury cars at the dealerships represented by this sample? Comment and explain.

Here, we want an F-test for the significance of the incremental contribution of the pair of variables, price and dist. This test has not been ordered explicitly, but we have all the ingredients to construct it. We have the unrestricted model and the restricted model, and their respective explained sums of squares. The F-test statistic we would construct would be given by:
[ (103.07 - 95.940)/2 ] / [ 54.835 / 12 ] =

[ (103.07 - 95.940)/2 ] / [ 4.5695 ]
If this test statistic value exceeds the 5% critical value for an F-distributed random variable with 2 and 12 degrees of freedom, we would reject the null hypothesis tha the slopes on price and dist are jointly zero and would opt in favor of Regression 6 over Regression 7.

 
15. (i.) Specifically, what do we call the distribution that appears in the histogram at the end of the Exhibits?

This is the marginal distribution of carsi in this sample.

(ii.) Specifically, what do we call the scatterplot that appears at the end of the Exhibits?

This is the joint distribution of inci and agei in the sample.

 
 


COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS
Updated: February 18, 1998
Prepared by: Trudy Ann Cameron