INSTRUCTIONS: Answer all questions in the space provided (or indicate clearly where you have continued your answer on the back of the page). Calculators are NOT permitted. Reduce all computations to the simplest form so that anyone with a calculator could attain the answer easily. Show your work and reasoning to the fullest extent possible so that part marks can be assigned as warranted. You have three hours to complete this exam. There are 25 questions (or question sections) worth 5 points each except where noted. Total points = 125. Budget your time carefully. Exhibit pages should not be turned in with your exam. Remember: answer questions in a manner that reflects the econometric reasoning you have learned in this course.
1. Exhibit A describes an analysis of some fictitious data concerning graduate student movie- going behavior as a function of the average full-price (including parking costs) of movies at theater complexes in the student's local area and the annual income of the student (in thousands of dollars per year).
a.) According to Regression A1 and Regression A2, are movies a normal or an inferior good, on average, for the graduate students in the sample? Or, are they a normal good for lower-income students and an inferior good for higher income students. Explain.
It does not appear that income has much effect on quantity no matter how it is incorporated into the model. It does not appear to be statistically significant if it enters linearly; nor is either term significant if it enters both in linear and quadratic form. Note, however, that residuals analysis following Regression A2 suggests that we should be cautious in drawing any inferences about anything from Regression A2.
b.) What does the diagnos / het output following Regression A2 tell us? What are the implications for the standard OLS results produced by Regression A2?
The diagnos / het output provides just the relevant results from a set of regression of the squared errors from the previous regression on a variety of things to which they might be shown to be related. The squared errors from a preliminary naive OLS regression represent the best information we have about the actual sizes of the true si2. If there is homoscedasticity, these conditional error variances should be independent of any other variable we consider. These results show that they are not. We reject the hypothesis of no relationship between the error variance and each thing except in one case. The exception is the test for ARCH (AutoRegressive Conditional Heteroscedasticity). This is only relevant if we have time series data, where there are patterns in the error variance over time (as opposed to patterns in the sign of the error over time). Since there is no particular order to these data, it is not surprising that the variance associated with adjacent observations is unrelated.
c.) Consider Regression A3, Regression A4, Regression A5, and Regression A6 . Is there one exogenous variable that is unambiguously the most closely related to the sizes of the unobserved individual conditional error variances, si2? Explain.
In the one-by-one regressions, the squared errors appear to be positively related to p and to p2. They are unrelated to y. When we regress the squared errors on all three candidates, we see a high degree of multicollinearity between p and p2, such that it is impossible to distinguish, statistically, the independent contributions of these two variables to explaining the squared errors if both are used. It looks like a toss-up which we choose from among p and p2 to capture the variations in the error variance across observations. We definitely do not want to use y.
d.) Among Regression A7, Regression A8, and Regression A9, which specification is inappropriate as a potential remedy for the problems afflicting Regression A2? Explain why.
Since the error variance appears to vary directly with the magnitude of either p or p2, and the weights should vary inversely with the error variance, therefore the weights should vary inversely with either p or p2. The inappropriate weighting variable would be wt1. Either wt2 or wt3 is probably sufficient, with perhaps the slightest edge to 1/p, since the statistical significance of p as an explanatory variable for the squared errors is ever so slightly larger and the R-squared value for that model is slightly higher.
e.) In this example, are the substantive implications of the fitted model altered by the use of weighted least squares methods? Discuss.
If my preferred model is taken to be Regression A9, the parameter point estimates change only very slightly (since different formulas are used to calculate them under WLS than under OLS). As for the standard errors, if we had the corrected OLS standard errors, they should be expected to be larger than the WLS standard errors--since WLS is more efficient than OLS under heteroscedasticity. However, all we get from SHAZAM is the uncorrected standard errors, which are just plain wrong, since they have been calculated using a formula that assumes you can factor out a common s2, when you cannot. All the same, there appear to be no meaningful or surprising changes in the implications of this model when we correct for heteroscedasticity. Unfortunately, this is not a general result. You never know that heteroscedasticity correction will make little difference to your estimates or inferences until you do the weighted least squares model and find out.
f.) If you did not have to worry about violations of the maintained hypotheses for OLS regarding error terms, would you prefer the linear specification in Regression A2 or the log- log specification in Regression A10? By what criterion? Explain.
In Regression A10, we have been careful to include the loglog option in the regression command, so SHAZAM knows that the dependent variable in this model has been logged. Thus the regression algorithm undertakes to adjust the formula for the log-likelihood to account for the logged dependent variable, thereby making the log-likelihood comparable to models which use the (raw) "levels" of the dependent variable. For the log-log model, the maximized log-likelihood is -253.223. For the levels model, it is -246.915. The higher value is obtained for the levels model, so it would be preferred. Recall that a log-likelihood can be interpreted as the log of the joint probability of observing the data that we have observed in our sample.
g.) Sometimes, using a log-log model will eliminate a heteroscedasticity problem. Is this the case here? Explain. Mention the circumstances under which a logarithmic transformation of the dependent variable will perfectly remedy a heteroscedasticity problem.
Residuals analysis following the log-log model in Regression A10 is conducted in Regression A11. The squared errors from the log- log model (note that e is redefined in Regression A10) are statistically significantly related to the magnitude of the log of price. Thus the log transformation is not successful, for these data, in eliminating the heteroscedasticity that plagues the levels data. Logging can sometimes eliminate heteroscedasticity, but only if the nature of the heteroscedasticity is exactly such that logging will get rid of it. If the heteroscedasticity is not of this kind, logging can fail to remedy the problem, or even make it worse.
2. The following questions pertain to EXHIBIT B. These are real data, and we will explore a preliminary model to explain the observed monthly time-series variation in new construction of public buildings for education. The variables read by the program are defined as follows:
There is also a variable called YROB, which is a CITIBASE annual data observation indicator running from 1964-01 to 1995-01. The annual population data are "spread" across all twelve months in the relevant year, since no more-frequent data on populations are available.
Yes, the coefficient on the time trend variable T is positive and appears to be hugely statistically significant (although we'll have more on this later). The point estimate suggests that public school expenditure has been growing, on average, by $4.4 million per month over the time period from January 1964 through December 1995.
b.) According to Regression B1, does new construction of public school buildings depend upon the numbers of kids of different ages in the population? Explain.
New public school construction appears to depend positively on the sizes of the first three cohorts (P1, P2, and P3), but not on the size of the oldest cohort of school-aged children (P4). Changes in the size of the P2 group appear to have the biggest influence on new public school construction. Again, we will see in a minute that we cannot really trust the t-ratios, although in this model they seem pretty "healthy."
c.) Is there multicollinearity among the regressors in Regression B1? Is it causing any problems of inference concerning the parameters in this model? Explain.
There is considerable multicollinearity between the sizes of the four different age cohorts of school-aged children. One would expect this to compromise the individual statistical significance of the slope coefficients on these variables. The strong linear relationship between P4 and the others might indeed explain the statistical insignificance of its coefficient in the regression. (We'll argue in a minute that time-series hypothesis tests, before we have assessed the error properties, are always suspect. Thus, while it looks as though this multicollinearity has not ruined our ability to detect statistically- different-from-zero slope coefficients on three of the cohort-size variables, we cannot be sure. In any event, multicollinearity will make the parameter standard errors larger, meaning relatively more hypotheses will be deemed acceptable. I.e., our parameters estimates will offer less resolution than might have been possible with uncorrelated regressors. Here, unfortunately, there is no option of going back for a different sample. History is history. Although there might be hope for future data, say over the next 20 years.)
d.) Based on the output following Regression B1 and on the results of Regression B2, what do you suspect might be wrong with the results of Regression B1? Why?
Inconveniently, whomever ran these regressions for you did not ask for a DWPVALUE (exact Durbin-Watson test statistic and p-value). All you get with this output is a point estimate of the D-W test statistic (0.2884) and a point estimate of the AR(1) error correlation (rho=0.85648). These look suspicious, even though you do not have d-tables in which to look up the critical values. Smells like positively serially correlated errors of at least AR(1), possibly more, since these are monthly data. Fortunately, Regression B2 provides pretty strong evidence of systematic relationships between the errors associated with different time periods at particular intervals of spacing. Positive serial correlation in the errors generally means that the standard errors on the parameter estimates are understated, leaving the t-ratios overstated, and the P-values too small. You end up rejecting a lot of hypothesis you really cannot reject. The point estimates are still unbiased, but your inferences are pretty worthless until you have corrected the problem.
e.) Why does Regression B2 involve so many explanatory variables? Are we concerned that there might be multicollinearity among these regressors? Explain.
Since we have monthly data and many processes have a regular annual cycle, we can expect a priori that 12th order autoregressive errors may be relevant in monthly data. Thus, we want to explore at least the relationship between current error and each of the first twelve lags of the error term. This is an interesting side-issue, intended to show if you understand serial correlation patterns in errors and can tie this to multicollinearity. If you have high-order serial correlation in the errors and you run this regression of current errors on twelve different lags of the same variable, then each et will be correlated with each et-1. If rho is large, there could be very substantial multicollinearity and it could get difficult to discern the individual coefficients in this regression. Despite this problem here, the first, eighth, and eleventh lags appear to be statistically significant. Note that there would be no way to "fix" this multicollinearity, since all "variables" are just different past observations on the same variable.
f.) Explain succinctly the main tasks that are performed "behind the scenes" by SHAZAM when the AUTO command is used.
Consider an AR(1) model, where the presumption is that ut = rho ut- 1 + et. SHAZAM comes up with an initial estimate of the correlation between current and lagged error terms and uses this to transform the regression equation by taking each variable (including the constant term), lagging it, and calculating the transformed variables according to the current-period value minus rho times the lagged-1-period value. SHAZAM then regresses uses these generalized differences in a regression where the error term is "fine" because the current minus rho-times-lagged error term is just et, which is pure noise and therefore fits the criteria for OLS. The "iteration" part: Next, SHAZAM takes the point estimates from this generalized difference regression and applies them to the raw X data to compute a fitted Y value for each observation. The difference between this fitted Y and the actual Y is a new estimate for et. If you take the correlation between this new et and its lagged value, you get an revised estimate of rho. Use this again in creating the generalized differences. Continue these iterations until some convergence criterion is achieved (either a stable largest-achievable maximized log-likelihood, or a stable smallest-achievable sum of squared errors). These final "fine-tuned" parameter estimates (intercept, slope, rho value(s)...and sigma-squared, of course) can then be reported along with their asymptotic standard errors and associated asymptotic t-ratios that allow us to test statistically the null hypotheses that individual rho parameters are zero.
g.) Consider the revised results in Regression B3: (i.) Does public school construction depend on demographics? Explain. (ii.) Does public school construction activity seem to anticipate future enrollments, or simply respond to current enrollments? Explain.
Conveniently, there is an F-test provided to test the hypothesis that all coefficients on the P1 through P4 variables are simultaneously zero. This hypothesis is soundly rejected, even though ONLY P2 is now individually statistically significant. Just the P2 coefficient could account for this, however. The interesting test would have been to see a joint test for the coefficients that appear individually insignificant: P1, P3 and P4. You'd certainly want to try one of these. If it weren't for the multicollinearity issue, we'd probably conclude that public school construction only starts when there has been a surge in kindergarten-through-fourth-grade aged children. If preschool populations (P1) had an individually statistically significant effect on new school construction, you could say that construction anticipated enrollments. The insignificance of this coefficient could still be due to multicollinearity, however, so we cannot be entirely certain about this conclusion.
h.) Regression B4 explores a more-general specification for the public school construction model. According to this model, does this new construction change systematically over time? Does it change systematically in response to populations of children in different age groups? Explain each answer carefully.
Regression B4 includes interaction terms between the time trend variable and each of the four population variables. This means that the derivative of new construction with respect to time is no longer constant and simply equal to the coefficient on the time variable. Likewise, the derivatives of new construction with respect to the sizes of each of the four relevant sub- population now depends on time, rather than simply being constants. The coefficient on the linear term in T is no longer individually statistically significant. However in order for the "change in construction for a 1-unit change in time" to be zero, the coefficients on t, tp1, tp2 tp3, and tp4 would have to be all simultaneously zero. If we write out the relevant part of the main regression model, it is:
b1 + b2Ti +...+ b7 T*P1i + b8 T*P2i + b9 T*P3i + b10 T*P4i +...+ eiIf we write out the time derivative of the estimated function, we get:
b2 + b7 P1i + b8 P2i + b9 P3i + b10 P4iIndividually, tp2 has a statistically significant coefficient. But we are provided with just the F-test we need: If the coefficients on t, tp1, tp2, tp3 and tp4 are all simultaneously zero, the time derivative is zero. Apparently, it is not.
i.) Is there a "typical" seasonal pattern in public school construction expenditures? Characterize this pattern. Does it conform with your intuition?
The left-out month is January, so the basic intercept term is the intercept for January. The coefficients on the other monthly dummies tell how much expected new public school construction differs in that month compared to it's expected level in January. For example, expected new public school construction in August is higher than in January by about $287 million. In February, it is lower by $15.9 million compared to January, although this difference is not statistically significantly different from zero. What is the overall pattern? Public school construction is highest in the summer months, when there is good weather and most kids are out of school. It is lowest in the winter months when weather is bad and attempts to build new structures at existing campuses would disrupt classes.
3. (10 points) Non-experimental data can sometimes make
it very
difficult to draw policy implications from regression analysis. Choose (a.) OR
(b.)
a.) GUN CONTROL: Suppose your sample consists of
households that have been victimized by robbery. The dependent variable takes a
value of 1 if a household member is shot during the robbery and 0 otherwise.
One
of your explanatory variables is a dummy variable equal to 1 if there is a handgun
present in the house, 0 otherwise. When a handgun is present in a household, an
occupant of that house is much more likely to be shot in the process of a robbery
than when no handgun is present. Therefore, to minimize injury and loss of life
from robbery incidents, private ownership of handguns should be banned. Evaluate
this policy proposal and the "evidence" upon which it is premised. Briefly
describe the nature of the true "experiment" that would allow an unambiguous
determination of the effect of handgun presence on robbery shootings via a
regression like this.
Households gets to choose whether to own a handgun or
not. The reasons a household might choose to have a handgun might include fear of
robbery by violent criminals who might also be armed. If handgun ownership is
greater when the odds of violent robberies are higher, then it could even
be the case that presence of a handgun has no bearing whatsoever on whether a
household member is shot in a robbery. It might just be an indicator for a more
dangerous neighborhood, or a more ostentatious home with lots of goodies that look
ripe for robbery. It might also be an indicator for a more belligerent
householder who is more likely to resist or attempt to attack a robber. If the
model fails to control for all these other factors, it could look like the mere
presence of a gun leads to more homeowner shootings in robberies.
The "experiment" that would be necessary to discern the effects of handgun
presence on householder shootings in robberies might be something like the
following. Randomly give handguns to some households and ensure that other
households do not have them. (Would the NRA let you do that?) After some
suitable period of time, identify the households from this population that have
been robbed. Compare the freqency of homeowner shootings in the robbery group
that had handguns with the frequency of same in the robbery group who did
not have handguns. Since the presence or absence of a handgun in the
household would have been completely random (exogenous), and independent of
anything unobservable about the household, then the difference in homeowner
shooting rates between these two groups, if statistically significant, would tell
you whether the apparent effect in the non-experimental data was real.
In the absence of an opportunity to conduct such an experiment (which would
seem to be the case), a research could attempt to first model handgun holdings in
terms of strictly exogenous variables, and then to use "two-stage" types of
methods to purge the endogenous handgun-ownership variable of any correlation with
the error term in the main model. Note that the main model in this case is
probably going to be a probit or logit type model, but that is only a minor
variation on the usual intuition.
b.) LEGALIZATION OF MARIJUANA: Suppose you have a
random sample of at-risk 18-year-olds. The dependent variable is the number of
times each teenager has used heroin. Among the explanatory variables is a dummy
variable that takes a value of 1 if the subject experimented with marijuana prior
to age 13, and 0 otherwise. You find that the coefficient on this dummy variable
is positive and strongly statistically significant. Therefore, we should not
legalize marijuana use (which would make it much more accessible to pre-teens)
since this will lead to widespread use of heroin. Evaluate this policy proposal
and the "evidence" upon which it is premised. Briefly describe the nature of the
true "experiment" that would allow an unambiguous determination of the effect of
pre-teen marijuana use on subsequent heroin use via a regression like this.
The same individuals who are making choices that lead to
heroin experimentation are making the earlier choices about marijuana
experimentation. There may be innate individual tendencies to seek and use mood-
or mind-altering substances. Perhaps one could call this an "addictive
personality." Or perhaps there are hereditary or social factors that predispose
certain youngsters to illegal drug use (a neighborhood or school drug culture, for
example). Pre-teen experimentation with marijuana may have no effect whatsoever
on the odds of later heroin use (other kids might have started with their parent's
liquor cabinet). But if pre-teen marijuana experimentation is an indicator for a
host of conditions that combine to lead teens to try heroin, then it could
certainly look like the marijuana use is "causing" the later heroin use.
The "experiment" that would be necessary to judge causality might be as
follows. Take a sample of early pre-teen at-risk children who have not yet tried
marijuana. Randomly assign them to two groups and make one group use marijuana
and ensure that nobody in the other (control) group does. (Sure would be hard to
get funding for that research!) Revisit the group when they turn 18 and compare
average heroin use rates in the two groups. If the rates are statistically
significantly different, then you will have demonstrated causality.
In the absence of any opportunity to conduct such a controlled experiment, the
researcher would have to work with non-experimental data. This would necessitate
constructing a model to explain pre-teen marijuana experimentation in terms of
solely exogenous variables. The fitted portion of this model could then be used
in the main model to explain later heroin use, ensuring that this revised
exogenous "pre-teen marijuana-use propensity" variable is uncorrelated with
unobserved components of the main model error term.
This is a discrete-outcome, or dummy dependent variable
model. Thus, a probit or logit model is probably appropriate. Providing after-
school program eligibility is randomly assigned across school districts,
independent of levels of gang activity, you might be able to make the desired
assessment of the policy. If there is systematically greater (or lesser) access
to after-school programs in areas where gangs are more active, you might have
trouble with this "program evaluation."
Suppose the kid makes the decision to belong to the gang or not. The kid also
makes decisions with respect to how much effort to put out in school, thereby
making GPA a potentially endogenous variable. If gang membership influences
household stability, it may be that absent parents are sometime a result, rather
than a cause, of gang membership (or at least these two outcomes may be jointly
determined by some of the same conditions in the neighborhood or local culture).
b.) Multicollinearity among the regressors can lead to problems in making
clear inferences about the effects of changes in individual explanatory variables
only in Ordinary Least Squares models. It is not a concern in fundamentally
nonlinear estimation methods such as probit or logit models. True, False,
Uncertain? Explain.
FALSE. OLS estimation methods are called linear
estimation methods (even if they are non-linear in the variables) because it is
possible to calculate the parameter point estimates as a solution to k equations
in k unknowns (there are k first-order conditions for the minimization of the sum
of squared errors function with respect to k unknown intercept and slope
parameters). Nonlinear models (like the probit and the logit) cannot be solved
this simply and it is necessary to use a search algorithm to find the best
parameter values (usually those which maximize the log-likelihood function for the
model). Almost all commonly used "regression-type" models, however, involve some
linear-in-parameters "index" of the explanatory variables. If any of the
explanatory variables are highly correlated, it will be difficult to identify the
separate slope coefficients on each of them.
SERVi = 30.90 - 1.20 TIMEi
+ 9.30 LEGALi + 0.33 TIMEi*LEGALi
where SERVi = value of social services
utilized (in hundreds of dollars per year);
and the parameter standard errors are given in
parentheses below each point estimate.
For a legal immigrant, the "LEGAL" variable takes on a
value of 1, so all coefficients are relevant. Interpret the TIME variable as
"number of PRIOR years in the US" (as per instructions during the exam). In the
"first year after arrival in the US" an immigrant would have a value of the TIME
variable equal to zero. Therefore, fitted SERV will be just 30.90 + 9.30 = 40.20.
For an undocumented immigrant, predicted SERV is just 30.90, since the LEGAL dummy
variable takes on a value of 0. Note, however that the point estimate of the 9.30
coefficient on LEGAL is less than twice its standard error, so we cannot reject
the hypothesis that there is no difference between the two groups.
b.) Based on the point estimates, how does
utilization of
social services vary with time in the US for a legal immigrant? _________________
For an undocumented immigrant? ____________________________
The derivative of SERV with respect to TIME is not a
single constant, but is equal to (-1.20 + 0.33 LEGALi). Thus, for
legal immigrants, SERV changes by -1.20 + 0.33 = -0.87 per year. In words, social
service utilization falls by $870 per year for legal immigrants who have been
receiving services. For undocumented immigrants, LEGAL=0, so service
utilization falls by $1200 per year. However, note that the point estimate for
the coefficient on the interaction term, 0.33, is less than twice its standard
error, so we cannot reject the hypothesis that there is no difference in the
rates at which service utilization changes over time.
c.) Overall, does legal/undocumented status have a
statistically significant effect on utilization of social services?
Explain.
This is a question about the derivative of SERV with
respect to LEGAL, which shows up in two places in the estimated model. The
formula for this derivative is 9.30 + 0.33 TIMEi. The only way for
there to be NO effect of status on utilization would be if both the 9.30 and 0.33
coefficients were actually zero. Individually, it has already been noted that we
cannot reject the zero hypothesis for either of these parameters. The relevant
test, however, would be an F-test to discern whether they could be jointly
zero. We are not provided with enough information to perform this test, so the
question cannot be answered with the available data. Of course, for the F-test to
reject the jointly zero hypothesis when the individual t-tests fail to reject the
marginal hypotheses, there would have to be some correlation between the two
variables in question. Since one variable is LEGAL and the other is TIME*LEGAL,
this is certainly a possibility.
d.) Does this model predict that legal immigrants
will
always utilize more social services than undocumented immigrants (or
vice-versa)? If not, how does the predicted utilization differential (legal-
undocumented) change with time in the US? When will predicted utilization be the
same for both groups? Comment.
If we were to plot SERV against TIME, for each of the two
groups in the sample (the LEGAL=1 and LEGAL=0 groups), we would see that the
intercept is higher for the LEGAL=1 group, but its slope is less negative.
This means that the undocumented immigrant utilization profile starts lower and
drops more quickly. Thus the model says (within the range of the data only) that
legal immigrants will always use more services. Utilization falls for both
groups, but the utilization differential widens over time. The predicted
utilization will not be the same for the two groups anywhere within the relevant
range of the data.
6. Suppose you are working with individual
household survey data. If you do not have data at the individual household level
for one of your explanatory variables, you might be able to use group averages as
a proxy for this variable (e.g. 5-digit zip code median household income instead
of individual household incomes for a nation-wide sample). To the extent that the
groups you use are relatively homogeneous, the proxies may be very useful in
mitigating what would otherwise be omitted variables bias. The same strategy is
appropriate if you do not have any individual data for your desired dependent
variable. True, False, Uncertain? Explain, suggesting the best alternative if
you disagree.
Everything is fine here until you get to the part that
asserts that you can do the same thing if you do not have any individual data for
the desired dependent variable. The key issue here is the "unit of
observation." It is possible to "spread" more-aggregated data across observations
on the right-hand-side of a regression model. (In fact, we did that with the
annual data on population cohorts in question 2 above!) However, it is considered
poor form to have variables on the right-hand-side be at a lesser level of
aggregation tha the dependent variable. For instance, we would not get very far
if we had county-level unemployment rates as a dependent variable, but individual
data on people's education levels, genders, and ethnicities on the right-hand
side. In a sense, there would be more than one observation on each X variable for
each available value of the Y variable. If faced with this sort of a situation,
researcher generally resign themselves to doing the whole analysis at the higher
level of aggregation. E.g. switch to county proportions of people at each of
several education levels, county proportions of each of several ethnic groups, and
so on. We lose the individual detail (and gender, for example, will almost always
be 0.50 female), so it is often a shame that we do not have disaggregated data for
the dependent variable (such as whether or not each individual has a job).
BONUS: (5 points) If you estimate a regression
model and get a
counter-intuitive sign on a slope coefficient, what sort of problem(s) do you
initially suspect? Explain.
First guess: omitted variables bias. Second guess:
endogeneity bias. "Wrong" signs are generally signs that are biased so much by
one of the potential reasons for bias that the "true" sign is reversed. We saw an
example of this in the study.sha program in one of the
early labs. When we failed to control for GPA, it looked like more studying might
actually decrease your expected midterm grade, implying a policy recommendation
that one should study zero hours in order to maximize their grade.
4. Assume your dependent variable takes on a value of 1
if a high-school student is affiliated with a gang and zero otherwise. Among your
explanatory variables are included: family income level, GPA in school, dummy
variables for father present in household and mother present in household,
eligibility for after-school programs, educational attainment of each parent, etc.
5. Suppose you are reading an article concerning
the effects of immigration status on utilization levels of social services among
legal and undocumented immigrants who have been in the US for less than 10 years
and who have been receiving social services. You encounter the following
estimated model. (Note that the sample producing these results is
fictitious.)
(5.2) (0.31)
(5.50)
(0.20)
TIMEi = time spent in the US (in years);
LEGALi = 1 if legal immigrant; = 0 if undocumented; i =
1,...,676.
a.) Based on the point estimates, what is
the
average utilization of social services for a legal immigrant in the first year
after arrival in the US? ______________ For an undocumented immigrant in the first
year?
___________________________
| COURSE OUTLINE | LECTURE OUTLINES | PROBLEM SETS | PROBLEM SOLUTIONS | COMPUTER LABS |
| SHAZAM EXAMPLES | DATA SETS | ONLINE QUIZZES | GRAPHICS | HANDOUTS |