PARTIAL INVENTORY OF PROPOSAL BLOOPERS
When I read a set of research proposals, I usually find a
large number of mistakes. Some of these are very
serious, some consist of assumptions that most researchers would
consider too restrictive for initial specifications, and others are
merely semantic, but all of them contribute to the impression that the
writer is not yet really an expert at empirical research. To give you
the most opportunities to learn from the mistakes of others (as well
as your own mistakes), I have put together this file of "bloopers."
Many of these are lifted verbatim from people's proposals; others are
somewhat paraphrased to focus on the problem. Remember that non-
native-English speakers confront greater challenges in exposition than
do other students. Sometimes I have cleaned up the syntax enough to
make it comprehensible. Other times, I have left it as is. Technical
writing in applied econometric work requires not only fluency in
English, but also fluency in the quasi-language we might call
"economese."
- I would expect the sex parameter to be positive if female since
companies are trying to raise their recruitment and retention rates for
women.
- I would expect this parameter to be positive since the model is
trying to prove the importance of job experience.
- Analyzing the output, I would be able to determine if any variables
should be eliminate due to multicollinearity.
- Does commuting to school a distance of twenty miles or greater have
an effect on a college student's academic performance?
- For the sake of simplicity, all variables have only two values:
high/strong=1 and low/weak=1.
- Communications skills and leadership are similarly shown by
extracurricular (sic) and work experience. Thus these variables need
not be included because they are assumed in the variables included in
the model.
- The dependent variable is the total annual sales of Y in the US.
...To estimate the total sales of Y, we can first find the average
sales of one city and then multiply by the number of cities in the US.
- The dummy variable, gender of singers, should be tested to see if
there is sexism involved [in detemination of singers' wages].
- The intercept is expected to be positive.
- The next variable will be to assign a value or a null set to a city
in the county that has faced some sort of disaster.
- It will demonstrate that the most significant factors that come
into play regarding {topic} is Demographics, Party Affiliation,
District Location and the Election Schedule.
- [Our] dependent variable, RATE, is measured in ratio of graduating
students and non-graduating students.
- As for the size of the university we will use small for colleges
with less than 10,000 students, medium..., and large.... These
standards are given values of 0, 1, 2, respectively.
- For location, it will be divided through four regions: South,
West, East and Midwest. They will also be given values 0, 1, 2, 3,
respectively.
- High absolute value of [coefficient] b1 will show that WORK and
RATE is closely correlated.
- It is likely that our explanatory variables, INC and WORK, may be
directly related. This will impose multicollinearity problem in our
model.
- [In a proposal:] When looking at such regression model, we could
also expect that the R is relatively high, which shows the strong
relationship between the variables and wages.
- The t-ratio will also give a clear indication of the significance
of each variable.
- Specification: Y = X1 + X2 + X3 + X4 + e
- A 2.5-page paragraph.
- The price of X does not vary by much...therefore, it would be
useless to include these in our model because they would only
constitute as a shift in the slope intercept and not explain the
partial effect of our explanatory on our dependent variable.
- One concern in any model is correlation of variables.
- Years of driving experience may be correlated with an increased
amount of tickets. A remedy to this problem would be to consider the
proportional amount of speeding tickets relative experience on the road
by utilizing a logarithmic model.
- Time of departure as an explanatory variable in a linear-in-
variables model to explain commute duration.
- Let PVOTE=percentage of voting population who voted (dependent
variable) and let E1 =1 if completed high school education, 0
otherwise; E2=1 if completed undergrad education, 0 otherwise, etc.
- Dependent variable will be the sale of a [video] store. The two
explanatory variables will be the number of new movies released denoted
by N, and the number of new memberships denoted by M.
- There may be more explanatory variables...but since it is hard to
gather numerical data [on] those variables, we don't want to include
them in the model.
- Earnings= b0 + b1*education+... I can explore B1 by holding other
factors constants and use the multiple dummies as following: 0= did
not finish high school, 1 = high school graduate, 2=some college,
3=complete college, 4=post-graduate education.
- To explain the [race] effects, we have to consider the "dummy
explanatory variable" by holding the same level of education and work
experience, as following: Let 0 = Hispanic Females, 1 = Black Females,
2 = White Females, 3 = Hispanic Males, 4 = Black Males, 5 = White
Males.
- Once the model is calibrated with the actual data, it can be used
to actually predict quantity demanded for electric vehicles.
- Dependent variable: donations to charity in dollars. The data that
will needed to run the proposed regression can be obtained from
charitable foundations...
- Dependent variable: dollars spent on car insurance. The most
efficient and easiest way to obtain data for this regression analysis
is to include a questionnaire in the consumer's insurance billing
statement.
- Both B2 and B7 are dummy variables in this model. [B2 is shown as
coefficient on a variable called HOUSE_LOCi] B2 equals one if the
consumer lives in an urban environment. It is expected that a consumer
who lives in the city will spend more dollars for insurance because of
the higher crime rate in the city. With this in mind, the coefficient
B2 is omitted, that is, will be zero, when the consumer lives in the
suburbs.
- Model: NUND=B1 - MGDP + MUMP + MCIP + USGDP - USUMP + PEDER
- The Variance Inflation Factor should be examined to determine if
there is any multicollinearity beween the variables.
- Explanatory variables include SEAS = seasonality.
- In the case of .... the coefficient is expected to be more or less
constant [since] the exchange rate has remained more or less constant.
- One remedy for in case there is multiollinearity among the
variables is to get additional data since the source of the problem
could be a sample too small.
- Dependent variable: HOURS spent studying per week. Explanatory
variables include: CURRGPA = 1 if current grade point average of the
student is above 3.0, 0 otherwise. WORK = 1 if student works more than
10 hours per week; 0 otherwise.
- If females study a considerable amount more and their GPAs still
fall below 3.0, it could suggest that retention programs should be
implemented for females....
- Dependent variable: quantity of cigarettes demanded in millions
per month; Explanatory variables include: price of cigarettes, federal
fund for smoker cessation programs, yearly income per person D2 = 0 if
income <15,000, 1 otherwise; education D3 = 0 if non-highschool
graduates, 1 otherwise; age D4 = 0 below 18 yrs old, 1 otherwise.
- Model: GPA = b0 AID + b1 WORK + b2 FINC + b3 STUDY
- The effects of rap music on violent behavior... Does this type of
music with its violent and/or obscene content have any correlation with
the number of violent crimes committed?
- This factor is negatively sloped.
- The data for industrialization can be measured as a weighted sum of
the increase in production from the factories and manufacturing plants.
- However, we cannot let the regression decide the duration of the
rumor phase for us.
- Model: WEIGHT= b1 + b2(FASTFOOD) + e. [this equation] shows an
analysis of the affect of FASTFOOD on WEIGHT in the population,
controlling for other variables. WEIGHT= b1 + b2(TECHNO) + e. [this
equation] examines the affect of TECHNO (technology) on WEIGHT, holding
everything else constant.
- The dependent variable in this model is the mean S.A.T. scores for
the sample population. We will call this variable "satscore." For
this model we need really only one more variable which we will call
"income." This variable denotes the average income per family in this
random data set.
- Specifically, an observation could best be handled in this research
by choosing one student from one income level and comparing his or her
S.A.T. score with a student of differing family income.
- [One model that would be interesting would be] satscore=B[1] + B[2]
income + e. Another model that we would be interested in exploring is
given by the equation: satscore = B[1] - B[2] income + e.
- One way to test this hypothesis is to use the 'null hypothesis
test.' In this test, we set H[0] equal to 0.
- We can use this model [satscore= B[1] + B[2] income + e] to
forecast future S.A.T. scores for individuals from each level of family
income, holding everything else constant.
- Dependent variable: BOOKS. Independent variables include ONE= on
average, # paperbacks bought per month with under 100 pages, TWO= on
average, # paperbacks bought per month with 100-275 pages,
...THREE...FOUR... SIX = on average, # paperbacks bought per month with
over 600 pages, INC = monthly income, PRICE = price of each book
purchased and read by individual, CHILD= number of dependent children
living with individual....
- In addition, the explanatory variables would be tested against each
other to see whether there is any collinearity between each of them.
In those case, we will try to set up another model (e.g. qudratic form,
log-log form) for the test or we might use the dummy variable to get a
more accurate estimation.
- I want to be able to answer the expected GPA of a student with X
SAT score, W quality of high school and a top tenth percent income
bracket versus a student with similar characteristics except in the
bottom tenth percent income bracket. This way, we can determine the
extent to which individual factors effect GPA, and also determine how
much an increase in family income has an affect of quality of school,
and on SAT scores.
- School is a dummy variable because it is qualitative in nature.
- Another expectation of mine is that the coefficients þ1, þ2 and þ3
will not be linear.
- The regressions we will run have the United States economy as its
base dependent variable. The United States trade deficit will also be
an important dependent variable. To analyze the correlations between
industries and specific effects, the health of various industries will
serve as dependent variables with other goods serving as the modifying
variables.
- MODEL: USGNPi = b0 + b1 FISHPROD + b2 CANDYPROD + b3 CARPROD + ei
- Since this study is intended to test the validity of tutorial or
retention programs based on family income, those who receive tutoring
will be excluded from the sample testing.
- The following list summarizes the explanatory variables that would
best explain the model:
- Evidently, þ can be any real number, but within the same industry
it probably will not vary that much, that is VAR(B) will be small,
that's why it is extremely important to calculate it as accurate as
possible, avoiding rounding up. It also might be useful to change the
scale of measurement of þ to, say þ*e-03.
- Dependent variable: TV = television hours watched per week.
...The theory of demand will guide this model, so expect that to see
negative coefficients for the price of televisions,....
- The reason is because in the reality, there are more than we can
imagine of variables that will be use to find the dependent variable.
But since we cannot include everything in our model, we have to remain
any other variables constant.
- As I suspect, the coefficient for the races will be negative or
even constant. There shouldn't be any much correlation among them.
- The dependent variable will be who got jobs, with the explanatory
variables covering grades, internships, and involvement with extra-
curricular activities. ...The research is limited to the top 20
undergraduate universities because of the competitive nature in-which
those students desire the best jobs throughout the country.
- The dependent variable in this study is the dollar amount of chief
executives are compensated including stock option compensation. The
independent variable involved in this research include the various
performance indicators of a public company. Popular variable
indicators that will be used include: increase/decrease in annual
revenue; increase/decrease in annual income; increase/decrease in
Earnings per Share; company stock performance. .... The increase in
various variables, such as company stock price, naturally correlates to
the increase in the dependent variable.
- [Intercept] þ0 depends on the independent variables.
- The admissions application allows for the following ethnic
categories: [lists 13 distinct categories]. The thirteen categories
are mutually exclusive and exhaustive, so one dummy variable is
sufficient for this characteristic. [proceeds to include 13 dummy
variables for race]
- The supposed model would want to infer that if a person was exposed
to nutrition labels and high number of advertising he/she would most
likely decrease their level of fat consumption. If it turns out that a
model of the counterfactual occurs it can be explained that people lied
in their surveys or simply a bad sample was chosen.
- [Cross-sectional data] The model can be used to predict the trend
of social inequality.
Updated: January 20, 1998
Prepared by: Trudy Ann Cameron