PARTIAL INVENTORY OF PROPOSAL BLOOPERS

When I read a set of research proposals, I usually find a large number of mistakes. Some of these are very serious, some consist of assumptions that most researchers would consider too restrictive for initial specifications, and others are merely semantic, but all of them contribute to the impression that the writer is not yet really an expert at empirical research. To give you the most opportunities to learn from the mistakes of others (as well as your own mistakes), I have put together this file of "bloopers." Many of these are lifted verbatim from people's proposals; others are somewhat paraphrased to focus on the problem. Remember that non- native-English speakers confront greater challenges in exposition than do other students. Sometimes I have cleaned up the syntax enough to make it comprehensible. Other times, I have left it as is. Technical writing in applied econometric work requires not only fluency in English, but also fluency in the quasi-language we might call "economese."
  1. I would expect the sex parameter to be positive if female since companies are trying to raise their recruitment and retention rates for women.
  2. I would expect this parameter to be positive since the model is trying to prove the importance of job experience.
  3. Analyzing the output, I would be able to determine if any variables should be eliminate due to multicollinearity.
  4. Does commuting to school a distance of twenty miles or greater have an effect on a college student's academic performance?
  5. For the sake of simplicity, all variables have only two values: high/strong=1 and low/weak=1.
  6. Communications skills and leadership are similarly shown by extracurricular (sic) and work experience. Thus these variables need not be included because they are assumed in the variables included in the model.
  7. The dependent variable is the total annual sales of Y in the US. ...To estimate the total sales of Y, we can first find the average sales of one city and then multiply by the number of cities in the US.
  8. The dummy variable, gender of singers, should be tested to see if there is sexism involved [in detemination of singers' wages].
  9. The intercept is expected to be positive.
  10. The next variable will be to assign a value or a null set to a city in the county that has faced some sort of disaster.
  11. It will demonstrate that the most significant factors that come into play regarding {topic} is Demographics, Party Affiliation, District Location and the Election Schedule.
  12. [Our] dependent variable, RATE, is measured in ratio of graduating students and non-graduating students.
  13. As for the size of the university we will use small for colleges with less than 10,000 students, medium..., and large.... These standards are given values of 0, 1, 2, respectively.
  14. For location, it will be divided through four regions: South, West, East and Midwest. They will also be given values 0, 1, 2, 3, respectively.
  15. High absolute value of [coefficient] b1 will show that WORK and RATE is closely correlated.
  16. It is likely that our explanatory variables, INC and WORK, may be directly related. This will impose multicollinearity problem in our model.
  17. [In a proposal:] When looking at such regression model, we could also expect that the R is relatively high, which shows the strong relationship between the variables and wages.
  18. The t-ratio will also give a clear indication of the significance of each variable.
  19. Specification: Y = X1 + X2 + X3 + X4 + e
  20. A 2.5-page paragraph.
  21. The price of X does not vary by much...therefore, it would be useless to include these in our model because they would only constitute as a shift in the slope intercept and not explain the partial effect of our explanatory on our dependent variable.
  22. One concern in any model is correlation of variables.
  23. Years of driving experience may be correlated with an increased amount of tickets. A remedy to this problem would be to consider the proportional amount of speeding tickets relative experience on the road by utilizing a logarithmic model.
  24. Time of departure as an explanatory variable in a linear-in- variables model to explain commute duration.
  25. Let PVOTE=percentage of voting population who voted (dependent variable) and let E1 =1 if completed high school education, 0 otherwise; E2=1 if completed undergrad education, 0 otherwise, etc.
  26. Dependent variable will be the sale of a [video] store. The two explanatory variables will be the number of new movies released denoted by N, and the number of new memberships denoted by M.
  27. There may be more explanatory variables...but since it is hard to gather numerical data [on] those variables, we don't want to include them in the model.
  28. Earnings= b0 + b1*education+... I can explore B1 by holding other factors constants and use the multiple dummies as following: 0= did not finish high school, 1 = high school graduate, 2=some college, 3=complete college, 4=post-graduate education.
  29. To explain the [race] effects, we have to consider the "dummy explanatory variable" by holding the same level of education and work experience, as following: Let 0 = Hispanic Females, 1 = Black Females, 2 = White Females, 3 = Hispanic Males, 4 = Black Males, 5 = White Males.
  30. Once the model is calibrated with the actual data, it can be used to actually predict quantity demanded for electric vehicles.
  31. Dependent variable: donations to charity in dollars. The data that will needed to run the proposed regression can be obtained from charitable foundations...
  32. Dependent variable: dollars spent on car insurance. The most efficient and easiest way to obtain data for this regression analysis is to include a questionnaire in the consumer's insurance billing statement.
  33. Both B2 and B7 are dummy variables in this model. [B2 is shown as coefficient on a variable called HOUSE_LOCi] B2 equals one if the consumer lives in an urban environment. It is expected that a consumer who lives in the city will spend more dollars for insurance because of the higher crime rate in the city. With this in mind, the coefficient B2 is omitted, that is, will be zero, when the consumer lives in the suburbs.
  34. Model: NUND=B1 - MGDP + MUMP + MCIP + USGDP - USUMP + PEDER
  35. The Variance Inflation Factor should be examined to determine if there is any multicollinearity beween the variables.
  36. Explanatory variables include SEAS = seasonality.
  37. In the case of .... the coefficient is expected to be more or less constant [since] the exchange rate has remained more or less constant.
  38. One remedy for in case there is multiollinearity among the variables is to get additional data since the source of the problem could be a sample too small.
  39. Dependent variable: HOURS spent studying per week. Explanatory variables include: CURRGPA = 1 if current grade point average of the student is above 3.0, 0 otherwise. WORK = 1 if student works more than 10 hours per week; 0 otherwise.
  40. If females study a considerable amount more and their GPAs still fall below 3.0, it could suggest that retention programs should be implemented for females....
  41. Dependent variable: quantity of cigarettes demanded in millions per month; Explanatory variables include: price of cigarettes, federal fund for smoker cessation programs, yearly income per person D2 = 0 if income <15,000, 1 otherwise; education D3 = 0 if non-highschool graduates, 1 otherwise; age D4 = 0 below 18 yrs old, 1 otherwise.
  42. Model: GPA = b0 AID + b1 WORK + b2 FINC + b3 STUDY
  43. The effects of rap music on violent behavior... Does this type of music with its violent and/or obscene content have any correlation with the number of violent crimes committed?
  44. This factor is negatively sloped.
  45. The data for industrialization can be measured as a weighted sum of the increase in production from the factories and manufacturing plants.
  46. However, we cannot let the regression decide the duration of the rumor phase for us.
  47. Model: WEIGHT= b1 + b2(FASTFOOD) + e. [this equation] shows an analysis of the affect of FASTFOOD on WEIGHT in the population, controlling for other variables. WEIGHT= b1 + b2(TECHNO) + e. [this equation] examines the affect of TECHNO (technology) on WEIGHT, holding everything else constant.
  48. The dependent variable in this model is the mean S.A.T. scores for the sample population. We will call this variable "satscore." For this model we need really only one more variable which we will call "income." This variable denotes the average income per family in this random data set.
  49. Specifically, an observation could best be handled in this research by choosing one student from one income level and comparing his or her S.A.T. score with a student of differing family income.
  50. [One model that would be interesting would be] satscore=B[1] + B[2] income + e. Another model that we would be interested in exploring is given by the equation: satscore = B[1] - B[2] income + e.
  51. One way to test this hypothesis is to use the 'null hypothesis test.' In this test, we set H[0] equal to 0.
  52. We can use this model [satscore= B[1] + B[2] income + e] to forecast future S.A.T. scores for individuals from each level of family income, holding everything else constant.
  53. Dependent variable: BOOKS. Independent variables include ONE= on average, # paperbacks bought per month with under 100 pages, TWO= on average, # paperbacks bought per month with 100-275 pages, ...THREE...FOUR... SIX = on average, # paperbacks bought per month with over 600 pages, INC = monthly income, PRICE = price of each book purchased and read by individual, CHILD= number of dependent children living with individual....
  54. In addition, the explanatory variables would be tested against each other to see whether there is any collinearity between each of them. In those case, we will try to set up another model (e.g. qudratic form, log-log form) for the test or we might use the dummy variable to get a more accurate estimation.
  55. I want to be able to answer the expected GPA of a student with X SAT score, W quality of high school and a top tenth percent income bracket versus a student with similar characteristics except in the bottom tenth percent income bracket. This way, we can determine the extent to which individual factors effect GPA, and also determine how much an increase in family income has an affect of quality of school, and on SAT scores.
  56. School is a dummy variable because it is qualitative in nature.
  57. Another expectation of mine is that the coefficients þ1, þ2 and þ3 will not be linear.
  58. The regressions we will run have the United States economy as its base dependent variable. The United States trade deficit will also be an important dependent variable. To analyze the correlations between industries and specific effects, the health of various industries will serve as dependent variables with other goods serving as the modifying variables.
  59. MODEL: USGNPi = b0 + b1 FISHPROD + b2 CANDYPROD + b3 CARPROD + ei
  60. Since this study is intended to test the validity of tutorial or retention programs based on family income, those who receive tutoring will be excluded from the sample testing.
  61. The following list summarizes the explanatory variables that would best explain the model:
  62. Evidently, þ can be any real number, but within the same industry it probably will not vary that much, that is VAR(B) will be small, that's why it is extremely important to calculate it as accurate as possible, avoiding rounding up. It also might be useful to change the scale of measurement of þ to, say þ*e-03.
  63. Dependent variable: TV = television hours watched per week. ...The theory of demand will guide this model, so expect that to see negative coefficients for the price of televisions,....
  64. The reason is because in the reality, there are more than we can imagine of variables that will be use to find the dependent variable. But since we cannot include everything in our model, we have to remain any other variables constant.
  65. As I suspect, the coefficient for the races will be negative or even constant. There shouldn't be any much correlation among them.
  66. The dependent variable will be who got jobs, with the explanatory variables covering grades, internships, and involvement with extra- curricular activities. ...The research is limited to the top 20 undergraduate universities because of the competitive nature in-which those students desire the best jobs throughout the country.
  67. The dependent variable in this study is the dollar amount of chief executives are compensated including stock option compensation. The independent variable involved in this research include the various performance indicators of a public company. Popular variable indicators that will be used include: increase/decrease in annual revenue; increase/decrease in annual income; increase/decrease in Earnings per Share; company stock performance. .... The increase in various variables, such as company stock price, naturally correlates to the increase in the dependent variable.
  68. [Intercept] þ0 depends on the independent variables.
  69. The admissions application allows for the following ethnic categories: [lists 13 distinct categories]. The thirteen categories are mutually exclusive and exhaustive, so one dummy variable is sufficient for this characteristic. [proceeds to include 13 dummy variables for race]
  70. The supposed model would want to infer that if a person was exposed to nutrition labels and high number of advertising he/she would most likely decrease their level of fat consumption. If it turns out that a model of the counterfactual occurs it can be explained that people lied in their surveys or simply a bad sample was chosen.
  71. [Cross-sectional data] The model can be used to predict the trend of social inequality.

COURSE OUTLINE LECTURE OUTLINES PROBLEM SETS PROBLEM SOLUTIONS COMPUTER LABS
SHAZAM EXAMPLES DATA SETS ONLINE QUIZZES GRAPHICS HANDOUTS

Updated: January 20, 1998
Prepared by: Trudy Ann Cameron