UCLA Soc. 210A, Topic 6, Study Design

28oct2000

Outline

UCLA Soc. 210A, Topic 6, Study Design

Professor: David D. McFarland

Web Pages for Fall 2000

Syllabus for logistics
ClassWeb site for announcements, discussion board
Outline for course content

Topic 6: Study Design

Required Readings:
Moore and McCabe, ch 3, "Producing Data".
Hamilton, pp. 55-58, "Using Random Numbers and Random Sampling."
Warning: The "sample" command deletes from memory those cases that are not selected. This has the sometimes unwanted feature that the cases not included in a sample are no longer available if you wish to select a second, different, sample. In Assignment 6 you will be asked to take multiple samples, not using the "sample" command, but in a manner that keeps all the cases so they are still available for selection in subsequent samples even if not selected in the current sample. This is done by, for each successive sample, creating a new variable which matches the original variable on the cases selected in that sample, but is missing for the cases not selected in that sample. Thus non-selection is handled by setting the variable as missing, rather than by deleting the case.
Optional Readings:
Meier, Paul. 1989. "The Biggest Public Health Experiment Ever: The 1954 Field Trial of the Salk Poliomyelitis Vaccine." Pp. 3-14 in: Judith M. Tanur et al. Statistics: A Guide to the Unknown. 3rd edn. Pacific Grove CA: Wadsworth and Brooks Cole. [College QA 276.16 S84 1989]
Assignment 6

Topic 6 deals with design of empirical studies, in particular, those designs that involve the investigator's use of random numbers for such things as determining which cases will receive one or another treatment. Probabilities considered previously, in Topics 4 and 5, often pertained to the social phenomena under study, doing such things as determining whether the next birth would be male or female. Here in Topic 6, the probabilities are involved in the research process itself, doing such things as determining which potential respondents are chosen to actually be interviewed.

Populations and Samples

With each study design, we need to consider how it makes the connection between the theoretically relevant universe, the universe from which data are actually sampled, and the actual sample.

Example: "Postindustrial society" is instantiated by "The U. S. in 1994", whose population is "the quarter billion or so U. S. residents in 1994", which is sampled by "the three thousand or so respondents in the 1994 General Social Survey" (albeit with slippage from such matters as nonresponse). Part of that chain involves random selection, and that part can be traced backward using statistical inference. "Postindustrial society" is not a well-defined universe from which it would be possible to select a random sample, not least because it presumably includes not just past and present instances, but also future instances that do not yet exist.

Example: Sometimes, and particularly to scholars who live in or are otherwise strongly affected by it, the universe that is actually sampled is the theoretically relevant universe. For many researchers, "The Contemporary U. S." per se is of considerable interest, quite apart from its being an instance of some more general phenomenon such as "postindustrial society".

Causality

Another important aspect of study designs is how they deal with causality issues, what assurance is provided that the effects found are due to the variables considered, and not to some other variables. (This really goes beyond univariate statistics, the main emphasis this quarter, but the rationale for study designs would be a mystery without some consideration of causal relations among variables.)

"Controlling For" Other Variables by Exclusion, by Regression, or by Randomization

Example: How does one know that observed differences in labor force participation are really due to differences in social networks, rather than, say, effects of gender? Stoloff et al. remove the possibility that they are gender effects by exclusion: they consider only females. Similarly, various other variables, including rural-urban differences, school enrollment, and retirement status, are controlled by exclusion.

Other variables were controlled by incorporating them into the prediction equation, along with social networks. Examples include immigrant status, welfare experience, work experience, and age. More on controlling by regression in 210B and C.

Controlling by randomization would have required reorganizing study participants' lives, randomly assigning them to two groups, installing the one group into new social networks, and removing the other group from any networks in which they were already involved. For very good reasons, nothing along these lines was attempted.

Stoloff, Jennifer, Jennifer L. Glanville, and Elisa Jayne Bienenstock. 1999. "Women's Participation in the Labor Force: The Role of Social Networks." Social Networks 21 (#1, January): 91-108. (Online in sciencedirect.)

Random Sample of What?

In the intro stat course you would have studied "simple random sampling", with and without replacement. That served the important purpose of providing a rationale for statistical inference procedures.

Here we will build on that, in a couple different directions. For one thing, many important datasets, including censuses, are not random samples from larger populations of theoretical interest; yet we use probability considerations to analyze the internal variability in such datasets. Our coverage will include examination of the rationale for the common practice of treating such datasets "as if" there were some larger population from which they were random samples.

This is a matter of some controversy among statisticians who concern themselves with the foundations of that discipline. My own take on it is that conventional statistics tell us something useful about the stability of the dataset, whether or not it is a random sample from some larger population about which one wishes to make inferences; and a dataset not large enough to give stable estimates of population parameters isn't large enough to have stable statistics of its own either.

Complex Probability Samples

Even datasets which are random samples seldom are 'simple' random samples, and we shall consider what needs modified when data come instead from more complex probability samples, such as stratified cluster samples commonly used in actual large-scale surveys.

At the production end, a complex sample does not require preparation of a list of all members of the population, from which to choose respondents. At the first stage one needs only a list of 'primary statistical units' (PSUs) such as counties, and some of those are chosen, with probabilities proportional to size. At the second stage, smaller units such as city blocks are listed, only for the selected PSUs, and a random selection is again made, with probabilities proportional to size. Successive random selections choose housing units within each chosen block, and choose one respondent within each chosen housing unit. At each stage, only the chosen units need to be subdivided and listed for the next stage selection. This is thus feasible when SRS would not be for lack of a list of the population.

One effect of this type of sampling is that the respondents are more geographically clustered than an equal number of respondents in a SRS would be. That, in turn, means that observed values of variables that also cluster geographically show less variability than they would in a SRS with the same number of respondents. The 'design effect' of the GSS varies with the particular variable, but over a mixture of demographic and attitudinal variables averages about 1.5, which means that the approximately 3,000 cases in 1994 are equivalent to approximately 2,000 cases in a simple random sample, as far as sampling variability is concerned. A huge sample is still huge, whether 2,000 or 3,000; where it matters is rare subpopulations, such as 90 cases being equivalent to only 60 SRS cases. For rare subpopulations, the Census public use data are better suited (see below).

Another place it matters is in significance tests, which will be covered in a couple weeks. One of the several reasons not to mechanically apply computer-generated significance tests is that they may not take design effects into account.

Survey Designs and Other Designs

Although important, surveys are not the only types of social research. We will briefly consider the features, and relative advantages, of different study designs, notably randomized experiments. (Soc. 212C is an entire course devoted to these matters.)

Volunteer Samples, Samples of Convenience
Cochran, William G., et al. 1954. Statistical Problems of the Kinsey Report on Sexual Behavior in the Human Male. Washington: American Statistical Association. [This critique followed six years after the 1948 publication of the Kinsey Report, but it was several decades before Laumann et al in 1994 published The Social Organization of Sexuality, based on a properly conducted survey using a probability sample.]
Sears, David O. 1986. "College Sophomores in the Laboratory: Influences of a Narrow Database on Social Psychology's View of Human Nature." Journal of Personality and Social Psychology 51: 515-530.
Censuses, and Studies With Difficult to Specify Theoretically Relevant Populations
Allen, Walter R., and Farley, Reynolds 1986. "The Shifting Social and Economic Tides of Black America, 1950-1980." Annual Review of Sociology 12: 277-306. [U. S. Census; also uses CPS.]
Kleinman, Lawrence C., Elizabeth A. Boyd, and John C. Heritage. 1997. "Adherence to Prescribed Explicit Criteria During Utilization Review: An Analysis of Communications Between Attending and Reviewing Physicians." Journal of the American Medical Association 278 (#6, 13 August): 497-501. [Like many studies, this has some aspects of "convenience" in choice of dataset, but it is an example of studies whose data and theoretically relevant universes are at least arguably close, unlike Sears' sophomores and 'human nature' above.]
Emigh, Rebecca Jean 1997. "The Spread of Sharecropping in Tuscany." American Sociological Review 62: 423-442. [The data are "certainly not a random sample..." (p 427), "and the population is hard to define" (p 428).]
Berk, Richard A., Bruce Western, and Robert E. Weiss 1995. "Statistical Inference for Apparent Populations." Ch. 11, pp 421-458, in Peter V. Marsden, ed. Sociological Methodology 1995. Oxford: Blackwell. [This paper's Bayesian suggestions by no means settled the matter. See also discussions by Bollen; Firebaugh; Rubin; and reply by Berk et al., pp 459-485 of the same volume.]
Census Public Use Data
California Census Research Data Center, at UCLA.
Tienda, Marta, and Franklin D. Wilson. 1992. "Migration and the Earnings of Hispanic Men." American Sociological Review 57: 661-682.
Treiman, Donald J., and Hye Kyung Lee. 1996. "Income Differences Among 31 Ethnic Groups in Los Angeles." Ch. 3, pp. 37-82 in Baron, James N., David B. Grusky, and Donald J. Treiman, eds. Social Differentiation and Social Inequality: Essays in Honor of John Pock. Boulder, Colorado: Westview Press.
Sample Surveys
One-time or Repeated Cross-sectional surveys.
Simple random samples vs. complex samples.
General Social Survey (GSS) online overviews from NORC and ICPSR.
Davis, James A., and Tom W. Smith. 1992. The NORC General Social Survey: A User's Guide. Newbury Park CA: Sage. Ch. 3, "Study Design", and ch. 4, "Sample Design and Weighting".
Lin, I-Fen, Nora Cate Schaeffer, and Judith A. Seltzer. 1999. "Causes and Effects of Nonparticipation in a Child Support Survey." Journal of Official Statistics 15: 143-166. [Use of both external data such as court records, and internal data, such as the number of telephone calls required to contact a respondent, to get at differences between respondents and nonrespondents.]
Quasi-Experiments such as Before/After Studies
Donato, Katharine M., Jorge Durand, and Douglas S. Massey 1992. "Stemming the Tide? Assessing the Deterrent Effects of the Immigration Reform and Control Act." Demography 29: 139-157. (Online in jstor.)
Donald T. Campbell. 1989. "Measuring the Effects of Social Innovations by Means of Time Series." Pages 93-103 in: Judith M. Tanur et al. 1989. Statistics: A Guide to the Unknown. 3rd edn. Pacific Grove CA: Wadsworth and Brooks Cole. [Col QA 276.16 S84 1989]
Longitudinal Surveys or Panel Studies
Gamoran, Adam, and Robert D. Mare 1989. "Secondary School Tracking and Educational Inequality: Compensation, Reinforcement, or Neutrality?" American Journal of Sociology 94 (#5, March): 1146-1183. (High School and Beyond)
England, Paula, George Farkas, Barbara Stanek Kilbourne, and Thomas Dou 1988. "Explaining Occupational Sex Segregation and Wages: Findings from a Model with Fixed Effects." American Sociological Review 53 (#4, August): 544-558. (NLS)
Phillips, Meredith, Jeanne Brooks-Gunn, Greg J. Duncan, Pamela Klebanov, and Jonathan Crane. 1998. "Family Background, Parenting Practices, and the Black-White Test Score Gap." Ch. 4, pp. 103-145 in: Christopher Jenks and Meredith Phillips, eds. 1998. The Black-White Test Score Gap. Washington: Brookings Institution Press. (Children of NLSY; also PSID)
Laboratory (And Other Controlled Environment) Experiments
Alvarez, Rodolfo 1968. "Informal Reactions to Deviance in Simulated Work Organizations: A Laboratory Experiment." American Sociological Review 33: 895-912.
Bonacich, Phillip 1990. "Communication Dilemmas in Social Networks: An Experimental Study." American Sociological Review 55: 448-459.
Kollock, Peter 1994. "The Emergence of Exchange Structures: An Experimental Study of Uncertainty, Commitment, and Trust." American Journal of Sociology 100 (#2): 313-345.
Spencer, Steven J., Claude M. Steele, and Diane M. Quinn 1999. "Stereotype Threat and Women's Math Performance." Journal of Experimental Social Psychology 35: 4-28. (Online in idealibrary.)
Field Experiments
Meier, Paul. 1989. "The Biggest Public Health Experiment Ever: The 1954 Field Trial of the Salk Poliomyelitis Vaccine." Pp. 3-14 in: Judith M. Tanur et al. Statistics: A Guide to the Unknown. 3rd edn. Pacific Grove CA: Wadsworth and Brooks Cole.
Symposium: "Employment, Marriage, and the Deterrent Effect of Arrest for Domestic Violence: Replications and Re-Analysis". American Sociological Review 57(#5, October 1992): 679-708. Editor's Note by Gerald Marwell; papers by Lawrence W. Sherman and Douglas A. Smith; Antony M. Pate and Edwin E. Hamilton; Richard A. Berk, Alec Campbell, Ruth Klap, and Bruce Western.