varselec.txt rev 15feb97 Soc. 113, Variable Selection Assignment Instructions: Before selecting variables for your research project, you should have done the scaling assignment; a form similar to the one used in that assignment, for variables that were assigned to you, will be used in this assignment, for variables that you choose yourself. The variables to be selected will be used in REGRESSION ANALYSIS. One variable is singled out as the DEPENDENT variable, and the analysis addresses a question of the form, "Why do some cases score higher and others lower on the dependent variable?" For example, one might select RINCOM91 and ask why some respondents have higher incomes than others do. The sort of answer considered involves an hypothesis about an INDEPENDENT variable and a MECHANISM by which scores on the independent variable might affect scores on the dependent variable. For example, one might select EDUC as an independent variable, and hypothesize a mechanism in which employers believe that better educated employees are more productive, and are therefore willing to pay them more. In conventional regression analysis both independent and dependent variables are directly measured. For example, both RINCOM91 and EDUC are included in the GSS dataset. (There do exist more complex forms of analysis involving "latent" or unmeasured variables). However, the mechanisms are typically unmeasured, coming from sources such as "theory" or "common sense", rather than from the data being analyzed. For example, while the GSS includes variables for respondents' incomes and educations, it has no inverviews of employers, asking whether employers do indeed regard education as important for employee productivity. (The variable REDUCEMP is a measure of employees', not their employers', views of the importance of education.) Social researchers are seldom satisfied with just one independent variable, and MULTIPLE regression is the variant incorporating more than one independent variable hypothesized to affect the same dependent variable. For example, one might hypothesize that income is affected by AGE as well as educational level, with older people tending to have higher incomes than younger people do. The AGE example illustrates that a variety of mechanisms might be hypothesized when they are not measured directly: (a) mechanisms directly related to improved job performance, as at least some workers acquire experience, practical knowledge, and maybe even wisdom, with increasing age; (b) ones that arise from bureaucratic procedures such as cost-of-living pay increases that are across-the-board rather than performance-based; and (c) mechanisms involving conspiracies among aged ageists who control the positions of power in which decisions about levels of compensation are made. None of these is directly measured in the GSS, so someone using AGE as an independent variable could interpret the empirical findings various ways. 1. From the General Social Survey, select a DEPENDENT variable that satisfies the McFarland criteria for being approximately interval scale. 2. Select three INDEPENDENT variables that satisfy the McFarland criteria for being approximately interval scales, and for which you can come up with hypotheses about mechanisms whereby a respondent's score on the independent variable might affect his or her score on the dependent variable. We need independent variable and dependent variable scores on the same cases, so make sure all your variables were measured in the same year. 3. If you wish, you may also select an independent variable that does NOT satisfy the McFarland criteria for approximate interval scales, but that is (or can be recoded as) a dichotomy, with at least 100 cases in each of the two categories. For example, SEX is already a dichotomy; and a dichotomy for childlessness could be constructed by recoding the variable for the number of children, CHILDS, collapsing the various non-zero numbers of children into a single category ( _CHLDLES = 1 if CHILDS = 0; else _CHLDLES = 0 if CHILDS < 9). Do not be disheartened if the hypothesis-generating part of the assignment seems difficult. Professional sociologists as well as students commonly find it difficult to formulate hypotheses, or formulate multiple and contradictory hypotheses, or subsequently find that their hypotheses are empirically false. Analyzing empirical data does NOT consist of merely confirming "what everybody already knew". Turn in this assignment with the pages in the following order: (a) worksheet for your dependent variable, (b) worksheets for independent variables that are approximately interval scales, (c) optional worksheet for a dichotomous variable, and (d) copies of GSS-CODEBOOK pages describing the selected variables. Fasten them together with a single staple in the upper left corner. --------------------------------------------------------------- Soc. 113 Variable Selection Assignment Student Name ______________________ 1. Variable mnemonic ___________ Year ______ (Dependent) or (Independent)? 2. Which responses can be treated as meaningful values along a single substantively interpretable dimension? 3a. How many possible values lie on that dimension? 3b. Do they need to be reordered? How? 3c. Lowest value? 3d. Highest value? 4a. Which responses need to be recoded as missing data? 4b. -as some type of nonresponse? 4c. -as a response lying on some other dimension? 5. Describe the substantively interpretable dimension along which most observations lie; what, precisely, does this variable measure? 6a. What similar variables are also available (see subject index of the GSS codebook)? 6b. How do they differ? 6c. Is this the one best suited for measuring that concept? 7. Does this variable (after recoding, if specified above) satisfy the requirements for a McFarland approximation of an interval scale? 8a. Which is this to be used as? (a) Dependent variable. (b) One of the independent variables. 8b. If an independent variable, what is your dependent variable? 8c. If an independent variable, what year's data are you using? And was the dependent variable also measured in that same year? 8c. If an independent variable, describe the mechanism you hypothesize whereby scores on this independent variable might affect scores on the dependent variable. ---------------------------------------------------------------