UNIVERSITY OF CALIFORNIA, LOS
ANGELES
Department of Economics
Winter 1998
Economics 143 - Midterm Examination - Outlines of
Solutions in Red
Version without Answers
INSTRUCTIONS: Answer all questions
in the spaces provided (or indicate clearly where you have continued your
answer). Calculators are NOT permitted. Reduce all computations
to the simplest form so that anyone with a calculator could attain the
answer easily. Show your work and reasoning to the fullest extent possible
so that part marks can be assigned as warranted. You have 75 minutes to
complete this exam. All parts of both questions are worth 10 points
(and some are much easier than others). Total points = 150. This means
roughly 5 minutes for each answer. Budget your time carefully. NOTE: these
data are fictitious.
SCENARIO: The marketing sub-committee
for a consortium of dealers of American-made luxury automobiles has hired
your consulting firm to tell them about the determinants of demand for
their products. For a random sample of 17 dealerships in different neighborhoods,
you collect data on the average number of cars sold per month
(carsi),
the median age of the population in the same zipcode as the dealership
(agei), the median household income (in thousands of
dollars) from all sources in the same zipcode (inci),
median price of luxury cars stocked at the dealership (in
thousands of
dollars) (pricei), and distance from the nearest foreign
luxury car dealership (disti). The statistical analyses
you perform are given in the Exhibits.
1. Fill in the blanks:
Across these 17 dealerships, what is
the mean number of cars sold? 6.2659
What is the highest observed median
price of luxury cars across these dealerships? $59,000
What is the standard deviation in median
zipcode incomes across the sample? $38,230
Do the descriptive statistics you have
just provided refer to the joint distribution of these three variables,
or to their marginal distributions? marginal
distributions
What is the correlation between
agei and
inci in this sample? .58242
What are the units for this correlation
measure? none; correlation is a unit-free measure of linear
association
Almost everybody who studies the old midterms seems to
have gotten this right, although many people did not heed my advice to give units.
This advice was meant to force you to realize that all dollar amounts in the
variables are in thousands. Not recognizing this can mess you up later on the
exam.
2. Using the descriptive statistics
only, test the hypothesis that the true marginal mean number of cars
sold at ALL dealerships is 5 cars per month.
The thing to remember here is that this is a test of
the mean of the cars variable
(like a test of the mean of X). It is not a test about a regression
slope or intercept. You would need to use s/(square root of n) as the variance
of X-bar. Construct a t-test statistic as follows: [6.2659-5]/(3.1415/17).
If this test statistic is further from zero than the 5% critical value of a
t-distribution with n-1=16 degrees of freedom (which is 2.120), then you
would reject the hypothesis.
3. Does Regression 1 make sense?
Why or why not?
In this model, the dependent variable is AGE. This is
the median age of
the population in the same zipcode as the dealership. Such a model proposes
that this median age will be affected by the rest of the variables. This
is remotely possible, but not very likely. We would have to argue convincingly
that the number of cars sold by a luxury-car dealership will influence the
demographics of its neighborhood. The AGE variable, rather than being
"endogenous," is probably one of the most
"exogenous" variables in this story.
In grading this question, I found a surprising number of people who are not using
the correct terminology. We regress the "Y" variable on the X variable.
Several people talked about "regressing price, medinc, dist and spills on age."
You can only regress one variable on a list of other variables (at least at this
point in your career in econometrics.
4. The chairperson for the consortium
says "I took Econ 1 and I know that demand curves slope downwards from
left to right. I don't think much of your skills as an economist if the
demand curve you estimate is not characterized by a negative slope." Based
upon the relevant simple regression in the Exhibits, is it possible
that there is a downward sloping demand curve for these cars? Explain how
you have reached this conclusion.
The best way to answer this question is to construct a
confidence interval for the
true
but unknown slope coefficient on the price variable. If this confidence interval
includes any negative values, then we cannot reject ALL negative values for the
true but unknown slope coefficient. Recall that confidence intervals contain the
same information as is produces by t-tests of individual slope parameters. We can
see by the t-test statistic and its associated P-value in Regression 2 that it is
not possible to reject the zero hypothesis for this slope. Thus we are unlikely
to be
able to reject small negative values either (the P-value is nowhere near being
on the edge of statistical significance--i.e. 0.05--it is 0.176). Alternately,
you could look at the standard error estimate on the slope in this regression and
see that a distance 1.96 standard errors to the left of the point estimate
captures some negative values very easily. That would be the narrowest possible
confidence interval if the degrees of freedom for the problem were huge. The
actual
confidence interval will be even wider than this.
Only a few people thought to use a confidence interval to describe the set of
acceptable hypotheses. Some mentioned that if you could not reject zero, you
probably couldn't reject small negative values either. Some talked about
omitted variables bias, for which I gave partial credit, although it was not what
I was looking for here.
5. Based on Regression 3, test
the hypothesis that in order for a dealership to sell, on average, one
additional luxury car per month, the neighborhood
income needs to be $20,000
higher.
This requires a little thinking. If a difference in
income of 20 led to a
difference in cars sold of 1, then a difference in income of 1 would correspond to
a difference in cars sold of 0.05 (one-twentieth). For the verbal hypothesis to
be plausible, we would have to be unable to reject the null hypothesis that the
slope on the inc
variable in Regression 3 is 0.05. There are two special TEST statements
associated
with this regression. The first, TEST INC=20, is a "red herring." The second is
the useful one: TEST INC=.05. With a P-value of
about 0.73, we cannot reject
this
hypothesis, so the verbal statement is plausible.
A surprising number of people tried to plug $20,000 into the fitted regression
equation for cars as a function of income. The question is not asking about the
level of carsi for an income level of $20,000. Instead, it asks about
the change in the sales of cars for a $20,000 change in income. This is a slope
question.
6. Based on Regression 3, what
level of monthly sales would you expect for a dealership in a neighborhood
with median income of $60,000? Give the formula for a point estimate and
explain explicitly how a 95% confidence interval for this prediction would
be constructed. Why should you use caution in making this prediction?
For this, you would plug 60 into the estimated model in
Regression 3. Expected
sales of cars in such a neighborhood would be -4.0973 + 0.055436* (60). The
confidence
interval for prediction would require the components necessary for the "big messy"
formula for the
confidence interval for prediction for the E[Y|X=60]. These components are:
s = 2.3950; n = 17; X-bar = 186.94. You need to calculate the sum of the
little-xi2
by using [s/(s.e.b2)]2. You would want to use caution in
making this prediction because it is an "out-of-sample" prediction. An income
value of 60 is
outside the range in the data, which is 100 to 250. I guess luxury car
dealerships
locate in rich neighborhoods....
7. You finally remember that demand functions
are functions of several variables, not just one at a time. You estimate
Regression 6 in order to ascertain the
joint effects of all available
demand determinants on the number of cars sold per month. Describe what
appears to happen to the apparent effect of the income variable when you
include the other variables in your model. WHY is the apparent effect of
income different in the more-complex specification?
When income alone was an explanatory variable, it had a
statistically significant
positive effect on the number of cars sold (the P-value is 0.003 on the slope).
However,
when the other variables are included in the model, the slope coefficient on
income
changes. The slope on income is now statistically significant at the 10% level,
but no
longer at the 5% level. The effect of income in the simple regression was 0.055;
in the
multiple regression model, it is only 0.033. Based on the STAT output, we see
that median income and median age are correlated across neighborhoods. When
income
alone was in the model, it was picking up some of the effect of age (which is also
positive). Putting both variables into the model left income with a lesser share
of the
explanatory power shared by the two variables. The simple regression model
appears to have been afflicted by omitted variables bias. The multiple regression
model may suffer from multicollinearity. Remember that even a relatively
innocuous amount
of correlation between regressors (here income and age) can be enough to
prevent rejection of the zero hypothesis for their two slopes if the amount
of vertical dispersion in points around the regression plane (sigma-squared) is
sufficiently
large. This seems to be what is happening here.
8. In Regression
6, explain the
use of the / auxrsqr option on the ols command. What does it tell
you here?
The purpose of the auxrsqr command is to reveal
potential sources of
multicollinearity
among the regressors. Among these regressors, inc and age display the highest
degree of linear dependence on the other regressors, whereas price and dist
seem to be unrelated to the others in any linear fashion.
9. For Regression
6, test the
hypothesis that none of the explanatory variables has any effect
on the dependent variable. Explain your reasoning.
For this, we need to look at the automatic F-test in
the analysis-of-variance-from-
means
output. If the null hypothesis that all slopes are simultaneously zero is true,
this F-test statistic will be relatively small (the explained sum of squares will
be pretty small).
The F-test statistic (with 4 and 12 degrees of freedom) is 5.639. The P-value
associated with this value of the relevant F random variable is 0.009. Values
like this will occur only 0.9% of the time, so this is an unusual event if the
null hypothesis is true. We conclude that the null hypothesis is probably false.
Therefore,
at least one of the explanatory variables has some effect on the dependent
variable.
10. On particularly self-assured dealer
in the consortium brags that for years, he has claimed that in the luxury-car
business, having a clientele that is older by 10 years is equivalent to
having a clientele that is richer by $10,000. Do you have enough information
in the output to test this informal "hypothesis"? Explain.
This requires a little advance thinking. The effect on
sales of a median
ages that is ten years greater is ten times the effect of a median age that
is one year greater, which is the slope coefficient on the age variable.
Likewise,
the effect of a clientele that is richer by $10,000 is ten times the effect of
a clientele with an inc value that is greater by one unit (since inc
is measured in thousands of dollars). This is just the slope coefficient on
the inc variable. The assertion (in a linear model) is equivalent to the
assertion that "older by 1 year is equivalent to richer by $1000 (or one unit)".
So, this boils down to a test of whether the slope coefficients on income and
age could be identical. Fortunately, you are provided with a specialized
F-test for exactly this hypothesis. The p-value associated with the
F-test of the restriction that the slopes be identical is 0.26033. We cannot
reject the hypothesis of equal coefficients.
11. In the different specifications
in the Exhibits, what fraction of the variation in cars sold across
dealerships can be explained by a model that uses only age? 50.46% or 0.5046. What
fraction can be explained by a model that uses income, age, price and distance
to nearest foreign luxury car dealership? 65.27% or
0.6527. Can these be compared?
Why or why not?
These ordinary R-squared values cannot be compared
because the second one
would have been larger no matter what. Adding more variables can only
improve the explained sum of squares, it can never reduce it, since at worst,
the slopes on the additional variables will be zero. To compare goodness-of-fit
across models with different numbers of regressors, you need to use the
adjusted R-squared value. In this example, the adjusted R-squared values
are 0.4715 and 0.5370, respectively. Thus, the larger model has a better
goodness-of-fit for this particular sample than does the smaller
model, even when we have penalized the fit for lost degrees of freedom by
using adjusted R-squared.
12. The demand function estimated in
Regression 6 exhibits a negative intercept.
Can demand functions
have negative intercepts? Does this result and/or its statistical significance
trouble you? Why or why not? Explain.
We would only be concerned about a negative intercept
for a demand curve if
the only explanatory variable was price. Then, the intercept would have to be
positive (and the slope negative) if we were expecting to find an ordinary
downward-sloping
sort of a demand curve. However, if other variables are included, they
essentially "shift" the
demand curve intercept (on the Q-axis) by an amount equal to the slopes on those
variables time the values of the variables. Each entity in the sample could have
a different demand function intercept, while all share the same slope. In
Regression 6,
the intercept is different for each observation if each observation is
characterized
by distinct values of inc, age, and dist. The only time a negative coefficient on
the constant term would bother us would be if it was possible that inc=age=dist=0
for some
dealership in this story. Since this event will never occur (probably) the
negative
coefficient on the constant term is a non-issue.
13. In Regression 6, are the
apparent effects of inci and agei
individually
statistically significantly different from zero? Could these coefficient
both be zero? Explain carefully. Do you have adequate information?
Neither the effect of income, nor the effect of age, is
individually statistically
significantly different from zero at the 5% level in this multiple regression.
In order to test whether the two slopes could be jointly zero, we would prefer
to run an F-test, using the following sequence of commands after Regression 6:
test
test inc=0
test age=0
end
However, this test does not appear to have been provided in the available output.
Fortunately, a joint confidence ellipse for these to coefficients IS provided. We
can
look there for the answer to the question. Unfortunately, this is a "crummy plot"
rather than a nice "gnuplot," so the resolution is not really sufficient. Still,
from the locations of the four little "+" signs in the joint confidence ellipse.
in conjunction with the information above the plot, we know that the lower-left
"+"
occurs at -0.001164 along the horizontal axis. The odds are very good, therefore,
that the point (0,0) is outside the joint confidence ellipse. The F-test will
probably
reject the hypothesis that both slopes are simultaneously zero.
14. Is a model to explain
carsi
that uses only inci and agei (leaving
out pricei and disti) an adequate
representation
of the factors that influence sales of US-made luxury cars at the dealerships
represented by this sample? Comment and explain.
Here, we want an F-test for the significance of the
incremental contribution of the
pair
of variables, price and dist. This test has not been ordered explicitly, but
we have all the ingredients to construct it. We have the unrestricted model
and the restricted model, and their respective explained sums of squares. The
F-test statistic we would construct would be given by:
[ (103.07 - 95.940)/2 ] / [ 54.835 / 12 ] =
[ (103.07 - 95.940)/2 ] / [ 4.5695 ]
If this test statistic value exceeds the 5% critical value for an F-distributed
random variable with 2 and 12 degrees of freedom, we would reject the null
hypothesis tha the slopes on price and dist are jointly zero and would opt
in favor of Regression 6 over Regression 7.
15. (i.) Specifically, what do we call
the distribution that appears in the histogram at
the end of the
Exhibits?
This is the marginal distribution of carsi
in this sample.
(ii.) Specifically, what do we call the
scatterplot that appears at the end of the
Exhibits?
This is the joint distribution of inci and
agei in the
sample.
Updated: February 18, 1998
Prepared by: Trudy Ann Cameron