UCLA Soc. 210A, Topic 4, Probability

11oct2000

Outline:

UCLA Soc. 210A, Topic 4, Probability

Professor: David D. McFarland

Web Pages for Fall 1999

Syllabus for logistics
ClassWeb site for announcements, discussion board
Outline for course content

Topic 4: Probability

Assignment 4

Topic 4 of the course covers preliminary aspects of probability, random variables, and distributions. Postponed to Topic 5 are: conditional probability, independence, Bayes' Rule, and related matters.

Assigned reading is:

Moore and McCabe pp 295-301 (part of Section 4.2, "Probability Models")
Section 4.3 ("Random Variables")
Section 4.4 ("Means and Variances of Random Variables")

The following lecture material supplements the textbook, covering several important aspects of probability not treated by Moore and McCabe.

On the topic of probability, we depart from the usual practice in statistics courses, of treating probability as being "about" such things as gambling with dice or in card games. Instead, we shall use examples that arise in the study of various social phenomena.

Probability has three distinct though related uses in sociology:

in theorizing (probability models of social phenomena)
in designing empirical studies (experimental randomization as well as sampling)
in analysis of data (statistical inference, but also analysis where inference to some larger population is not involved).

The first of these, which is usually neglected in introductory statistics courses, will be one main focus here in Topic 4; the other two will be pursued further in subsequent parts of the course.

For our purposes, probability is not about coin-tossing or dice-rolling, but rather about things of sociological interest that sometimes occur and other times do not. Married couples may divorce or stay together. Students may or may not advance to the next grade level. Cities may or may not experience disturbances.

When treating social phenomena as probabilistic, one can look at both the consequences and the determinants of the probabilities.

Consequences. For example, someone studying problem-solving might want to take individual probabilities of finding a correct solution, and use them to draw conclusions about a group's probability of finding a correct solution in problems of the same type.
Determinants. The same researcher may wish to assess whether there are gender differences in the probability of finding correct solutions to problems of that type.

In the above examples, probability is attributed to the social phenomena per se, whether or not some social researcher happens to be studying those social phenomena. For now, we postpone probabilities involving the conduct of the researcher: for example, the decision of which potential respondents to select for interviewing.

A classic article, written by one of the first PhDs from the UCLA Sociology Department, Frank Camilleri (1962), is still worth reading on the three different uses of probability in sociology.

Axioms of Probability

Consider a universal set, U, and its subsets, generic ones of which are here denoted A and B. Consider also a function p defined on the subsets of U. It is a probability function if it has the following properties:

For any subset A, the function value p(A) is a non-negative real number.
p(empty set) = 0
p(U) = 1.0
p(A union B) = p(A) + p(B) - p(A intersect B)

Remark: As it stands, this is an abstract mathematical system. It is about "a", not "the", probability function, and pertains to sets A, B, etc., which have not been given any particular empirical referents.

Remark: From definitions in elementary set theory, the "union" of two sets consists of the elements that are in either or both of them. The "intersection" consists of the elements that are in both.

Remark: The last axiom is a form of "additivity"; the subtracted term is merely a correction for double-counting.

Remark: The last axiom is also the most interesting. If probabilities of some sets are already known, one can use this axiom, with the known probabilities, to calculate the probabilities of other sets.

Remark: (notA), defined as the set whose intersection with A is empty and whose union with A is the universal set U, has probability 1-p(A). This is implied by the axioms stated above, and thus could be derived from them as a theorem.

Remark: There are several equivalent ways of axiomatizing probability theory, with things that are taken as axioms in one being theorems in another, and vice versa. The probability rules (axioms) in Moore and McCabe, page 298, for example, include the statement about p(notA) just described as a "theorem" here, and do not include anything about the probability of a union except for disjoint events, which would follow from their axioms as a theorem. Alternative axiomatizations agree on what the properties of probabilities are, and for our purposes it isn't too important which of those properties get called "axioms" and which "theorems". Indeed, the only reason for us to get involved in axiomatic definition of probability is to avoid the logical problems that arise in attempts to define probability as long-run relative frequency.

Interpretation 1. Probability as Long-run Relative Frequency

One way of giving the probability axioms empirical referents is to imagine a long sequence of "trials" under identical conditions, each of which might or might not produce the event whose probability is under discussion. If as the number of trials increases the proportion of occurrences approaches a stable limit, we call that the probability of the event. This traditional type of interpretation is called "frequentist" probability.

Example: When a small number of births is observed, the proportion female is highly variable, but as larger and larger numbers of births are observed, the proportion female may stabilize around 49%, in which case we can write p(female) = .49 approximately.

(The figure .49 is not some universal constant, however; conditions may vary. In data from rural Sichuan Province, China, in the decade 1969-1978 there were 15,813 reported births, and 7,762 or 49.1% were female; but in the following decade 1979-1988 both the number of births and the proportion female declined: 9,317 reported births, with 4,436 or 47.6% female. See Mason et al. 1996, Table 6.2.)

Interpretation 2. Probability as Degree of Belief

A different way of giving the probability axioms empirical referents is to present a person with various statements, and ask the subject to assign each statement a number between 0 and 1, with 0 indicating the subject believes it definitely false, 1 indicating he or she believes it definitely true, and intermediate values indicating intermediate degrees of belief.

Additional steps would be needed, however, to assure that the numbers thus obtained satisfy the additivity axiom.

This type of interpretation is often called "subjective" probability. Although that usage is somewhat misleading (see below), it does capture the feature that different reasonable people might assign different probabilities to the same outcome.

Example: As of October 2000, there are people prepared to assign probabilities to the outcomes of next month's presidential election. But it is not that they have observed Gore-vs-Bush elections many times in the past, all of them held under identical conditions, and counted the proportion in which each candidate won. Indeed, it might be argued that that election will be a unique event, that there will never be another election repeating those identical conditions.

Remark: Someone may have a "degree of belief" about things for which he or she would be unable to estimate "long-run relative frequency" due to lack of relevant data.

Introspection, rather than external data, is the direct source, in assessing degree of belief. However, that is not to say that degree of belief is impervious to empirical evidence. In forming their beliefs, reasonable people will have taken at least some account of whatever relevant external data they had.

Remark: What is here called "degree of belief" probability is sometimes described as "subjective" probability, and compared unfavorably to "objective" (i.e., frequentist) probability. But the frequentist version may also involve some subjective judgements: How does a frequentist determine whether or not some events under consideration occurred "under identical conditions"?

Interpretation 3. Probability as a Function of Variable Conditions

A third way of giving empirical referents to the concept of probability, and one which has come to the forefront in recent years, treats probability as a variable, in the sense that the probability of a particular outcome is taken to vary with conditions, not erratically, but systematically in a manner that can be represented by a mathematical function, whose parameters can then be estimated from empirical data. The estimated function can then be used to calculate a predicted probability of the outcome, given any combination of conditions, whether that particular combination has occurred many or few times or never in the available data.

Interpretation 1 above, the traditional frequentist interpretation, takes prior experience into account in estimation of a probability, but only prior experience under identical conditions. This third interpretation might loosely be described as considering the outcomes from "similar" as well as "identical" circumstances as providing relevant information about an outcome's probability.

This third interpretation uses such techniques as logistic regression, which are beyond the scope of 210A but will figure prominently in the subsequent quarters of the sequence.

Example: Shortly before 3 am on 16 October 1999, an earthquake occurred, initially rated as 7.0 magnitude, and with its epicenter about 130 miles east of Los Angeles, north of Joshua Tree. Within a few minutes the US Geological Survey had posted preliminary data, and somewhat later USGS seismologist Lucy Jones held a news conference that included several probability statements. For example, she was quoted as assigning .05 probability to the event "an even bigger earthquake in the next week".

That number is apparently not a frequentist probability, calculated as a proportion of outcomes in a large number of trials under identical conditions. At the time of that statement, according to the same news reports, there had been only three previous magnitude 7 or higher earthquakes anywhere in southern California (none of them on that same fault) since such things were first measured and recorded, hardly enough trials on which to calculate long-run relative frequencies.

Furthermore, an earthquake changes such conditions as stress levels in tectonic plates, so even a subsequent earthquake in exactly the same location would not be under identical conditions. Whatever the .05 figure may mean, it does not seem to be the result of some long-run relative frequency calculation.

Neither is the quoted .05 number merely one person's subjective probability, obtained from her introspection. Lucy Jones is not some layman picking numbers out of the air, but rather an expert, backed by the data, computers, and other resources of the USGS.

Rather, the quoted .05 figure is the result of a complicated calculation that takes into account various relevant information, including what has happened after earthquakes in varying locations (not just southern California), and of varying magnitudes (not just those 7.0 or greater).

Disagreements Among Interpretations

The above might suggest that disagreements would be endemic, with different observers quoting vastly different numerical values for the probability of the same outcome. In actual practice, however, such disagreements are not as common as might be expected, and typically arise when data are scarce, or when there is disagreement as to the relevance of available data.

Some Related Notation, Concepts and Terminology

Probabilities are variously stated as decimal fractions, percents, or common fractions, and the latter sometimes have numerator and denominator separated by the word "in" instead of a horizontal bar or slash. The probability 0.25, for example, might alternatively be expressed as "25%" or "1/4" or "1 in 4".

"Chance" (or its plural form) is often used as a synonym for "probability". For example, according to published direct quotations, the seismologist cited above actually said "5% chance", rather than ".05 probability".

"Odds" provides equivalent information, but the numbers are different. The odds in favor of an event are expressed as a ratio of the probabilities of the event occurring, and not occurring. An event with probability .25 of occurring has probability 1-.25 = .75 of not occurring, and thus odds .25/.75, which would be written as odds of 1/3, or 1 to 3. Odds and probability can each be converted into the other:

odds = prob / (1 - prob)
prob = odds / (1 + odds)

Remark: While probabilities have an upper limit of 1.0, which will sometimes be inconvenient, odds can take on any non-negative values. A further transformation, to log(odds), can take on any real values, having no lower limit either. While of no immediate concern at this point in the course, these matters will become relevant later in the 210 sequence when regression analysis is considered.

"Likelihood" is one example of a word ("bias" is another) that, in statistics, has a technical definition that differs from its everyday usage. It should not be used interchangably with "probability". The distinction is important; for one thing, the likelihoods in any particular discussion will not sum to 1.0 as probabilities do. We will deal with it later, especially in the context of statistical inference. Meanwhile, do NOT use "likelihood" as a synonym for "probability".

Analytic, Numerical, and Simulation Methods in Probability

In working probability problems, one starts off with some given information, and applies one of the things I will here call "methods" to that initial information, to produce conclusions.

"Analytic" methods involve manipulation of mathematical symbols (algebra or calculus), along the lines of solving an equation for the information desired, in terms of the information given. We shall do relatively little of that in this course. Mathematica software (not used in this course) is far better at it than stata, and mathematicians are far better at it than sociologists. This requires accepting a lot of results on faith, but, hey, one can only do so much with probability in a couple of weeks.

"Numerical" methods involve working out the arithmetic for particular cases. This too involves mathematical formulae, but treated more in the plugging-in-numbers style than in the solving equations style. The main challenges here are to select the appropriate formula and to select the appropriate numbers to use; stata is adept at the arithmetic per se.

"Simulation" methods (sometimes called Monte Carlo simulations) constitute a specific type of numerical methods, but deserve special mention. These make use of random number generators, to actually create and apply numerical probabilities. In Stata, we will use the random number generator, uniform(), in the generate command.

These three types of methods get used across the board in probability, whether in theorizing about social phenomena, in designing studies, or in interpreting data; and across the various probability interpretations, whether frequentist, subjectivist, or whatever.

Where do numerical probability values come from?

Discussions of probability do not stop at the abstract level of p(A) and p(B), but go on to specific numerical values, such as .49 or 1/4, or at least to intervals, such as ".05 or less". Where do such numerical values come from?

In fact, numerical values for probabilities arise in several different ways, here labeled "construction", "estimation", and "hypothesis". We will consider sociologically relevant examples of each.

Construction

We will consider three examples of devices or procedures that were constructed to produce specified probabilities. Our examples involve mechanical, electronic, and linguistic devices respectively.

Mechanical Construction: Draft Lottery.

Most lotteries, and other "games of chance" are of little sociological interest, and for that reason are not covered here. One major exception was the draft lottery near the end of the Vietnam war, which at least arguably had major social impact as well as affecting the lives of the individuals in the lottery.

Specifically, it has been argued that the lottery served to defuse much of the anti-war protest then rampant, by removing from risk most of the young men whose lives had been put on hold (with unpleasant results for families, employers, and girlfriends as well as the men themselves) pending resolution of their postponed obligations regarding military service. We shall not evaluate that argument, but merely mention it as the reason for discussing the draft lottery, while omitting, as of insufficient sociological relevance, such usual illustrations of probability theory as card games and coin-tossing gambles.

The first draft lottery, in 1970, included all of those who had college student deferments, some of whom had been in college for much of a decade. (Each subsequent lottery included only the latest cohort of 18 year olds.)

The lottery involved putting the 366 days of the (possibly leap) year on pieces of paper, mixing them, and selecting and recording them one by one. Men were to be drafted in the order that their birthdays had been chosen in the lottery, which meant that those whose birthdays had been chosen early would be drafted, and those whose birthdays had been chosen late would not.

The lottery was to be "fair", which, among other things, requires that no day have higher probability than any other day of being chosen first. Any particular one of the 366 days was to have the same probability, 1/366, of being selected first.

Thorough stirring is critical to attaining such equal probabilities. Without such stirring, the dates placed in the container last would remain near the top, and have higher probabilities of being selected than the dates placed in the container earlier, which would remain near the bottom.

The physical apparatus used in the lottery was constructed in a manner aimed at producing equal probabilities. The dated slips of paper were inserted in identical plastic capsules, which would be easier to stir thoroughly than the slips of paper themselves. The container used was a cylindrical drum, rather than say a box, whose square corners might hinder thorough stirring.

Although less than completely successful, the design and construction of the lottery device did produce approximately equal probabilities. For further discussion of the draft lottery, see Fienberg (1971).

Two points should be noted here: Certain features of constructed devices (e.g., shapes of capsules and the drum in which they were stirred) affect the outcomes they produce. And devices can be constructed to yield specified probabilities.

Electronic Construction: Random Number Generators

Paper slips in capsules, tumbled in a large drum, is a spectacle that can be watched and (at some level) understood by the average television viewer, and that is important for such matters as regaining public confidence to end a political crisis. But probabilities are commonly produced by other, less photogenic, means.

A random number table is a sequence of digits, constructed such that, regardless of what digits have come previously, the next digit has probability 1/10 each, of being 0, 1, 2, ..., or 9.

An entire book of random numbers was published by the Rand Corporation (1955), and selected pages from it are still commonly published as appendices in statistics textbooks.

These days, random numbers will commonly be used in a computer, and thus it is less trouble to also generate them in the same computer, rather than look them up in a book and type them in at a keyboard, or try to read them in using a scanner.

In stata, random numbers are generated using, naturally, the "generate" command. The "uniform()" option generates numbers between 0 and 1, giving more decimal places than would usually be of any use. If integers are desired, instead of decimal fractions, one can generate the decimal fractions, multiply by 10 (or 100, etc.), and discard all but the integer parts.

Remark: The distinction is not usually important, but just for the record: These are actually "pseudo-random" numbers, from an implementation of Marsaglia's KISS algorithm, rather than truly random numbers. By default, the exact same sequence of digits is generated in every stata session. However, the "set seed" command will override that default if so desired. (Also, the sequence of digits itself would repeat after the number of digits produced reaches a number too large to have a name, which is roughly 8 followed by 37 zeros. To repeat, the distinction is not usually important.)

Linguistic Construction: Probabilities and Percentiles

Probabilities are also sometimes constructed by manipulation of words.

A household has probability .25 of being in the top quartile of household income.

A child has probability .5 of scoring below the median on a test.

These and similar examples indicate something about how the language is constructed, what "median" and "quartile" mean. Appearances notwithstanding, they provide no information about income or test scores.

Not all tautologies are useless, however. Far from it! The most useful kind have the form, "If A, B, and C are true, then D is also true", with each of A, B, and C, being easily understood, but with D being more complex and not so obviously related to the others. Such tautologies are very important for theorizing in sociology as well as other fields. We shall explore various examples below, in conjunction with the concept of conditional probability

Estimation

The examples cited under the "Construction" heading were for situations where the investigator is designing some process, has in mind a desired probability, and needs a device or procedure that will provide the desired probability.

Next we turn to situations where the investigator is observing some ongoing social process and conceptualizes it as probabilistic, but then wishes to estimate the numerical values of the relevant probabilities, using empirical data.

Estimation: Proportion of Cases

We have already discussed use of an observed proportion as an estimate of the underlying probability. For example, in the discussion of births, the estimated p(female) = .49 is of this type.

Various statistical and substantive questions arise in making such estimates, and they will be dealt with later. Here we consider, in a preliminary way, two important questions:

How many cases are needed to provide an accurate estimate of the probability? The statistical theory needed to answer such questions will be considered in detail later this quarter, in conjunction with sampling distributions. For now, 100 cases or a few hundred would be sufficient for most purposes, although not for those where unusual precision is required (e.g., voter preferences before a close election) or for especially rare events such as riots (e.g., Spilerman 1970).
Which cases experience the "identical conditions" mentioned above, in the discussion of probability as long-run relative frequency? This is not just a statistical question, but also a substantive question, requiring the expert judgement of a sociologist.
Example: Is the probability that someone experiences a particular career pattern (e.g., changing occupations over the course of a decade) the same for everyone, and thus suitable for straightforward estimation by the proportion who experience that career pattern? In some of my own early work (McFarland 1970) I raised the possibility of a heterogeneous population and derived some of its consequences, but back then we did not know how to deal with such problems in anything beyond very simple cases. Methods applied more recently will be discussed shortly, under the heading, "Estimation: Parameters of Functions".

Estimation: Degree of Belief

"Relations of trust" constitute an important component of "social capital" in the rational action formulation due to James Coleman (1988). In elaborating this aspect of his theory, Coleman (1990, especially pp. 97-104) places major emphasis on p = the probability that the trustee is trustworthy, as assessed by the actor faced with a decision to trust or not. In many cases, there are few or no data relevant to calculating frequentist probability estimates, so it comes to degree-of-belief probability assessment.

Coleman does not lay out a step-by-step procedure for estimating such probabilities. Perhaps this is because the estimation of subjective probabilities is a topic that had already been worked on extensively, primarily by psychologists (see Luce et al. 1965; Coombs et al. 1970).

However, at some points Coleman (1990, especially page 103) seems to regard the probability of trustworthiness as a manipulable variable, to possibly be modified by collecting more information, rather than as a fixed quantity merely to be estimated. The idea is that collecting additional information could move the actor's subjective probability away from the critical point at which the decision could go either way, thereby increasing his or her degree of certainty that the decision being made (to trust or not) is the correct one. According to this viewpoint, as long as the actor is comfortable with his or her decision, the precise magnitude of the probability is not important.

Another possibility is that, as a sociologist, Coleman is more interested in the social determinants of trustworthiness than the numerical magnitude that some actor subjectively attributes to its probability. That would certainly be consistent with his approach to high school dropouts, which in the same 1988 paper he treats as the dependent variable of a logistic regression (one of the functions whose estimation will be discussed in the next section).

Furthermore, Coleman (1990, pages 94-95) does review time series data from 1966 through 1980 on the percent of survey respondents expressing a great deal of confidence in the people running nine types of major institutions in the U. S.

Remark: The GSS has many such items, for example CONCOURT measuring confidence in courts and the legal system. This scale goes from 1 = complete confidence, to 5 = no confidence at all. It could be recoded as follows:

          0 = no confidence at all
        .25 = very little confidence
        .50 = some confidence
        .75 = a great deal of confidence
          1 = complete confidence

This would at least provide scores between 0 and 1, and have the 0 at the proper end of the scale. It might or might not also be useful to think of those as estimates of the respondents' subjective probabilities of, say, the court deciding a particular case the way the respondent thinks is correct.

Remark: Anyone intending to collect respondents' subjective probabilities should be aware of two potential hazards. First, psychologists have found a systematic tendency wherein people's subjective probabilities are underestimates of large objective probabilities, and overestimates of small objective probabilities (Luce et al. 1965, page 322; Coombs et al. 1970, page 136). Second, in situations with more than two possible outcomes, one should not assume that the numbers people assign to the various outcomes will satisfy the additivity axiom, unless taking special steps to guarantee it.

Estimation: Parameters of Function

Thus far we have dealt with "the" probability of an event, a single number that might have been either stated in an hypothesis, or estimated from some data, or designed into some mechanism constructed to produce randomized outcomes. Next we consider the situation where the probability of a particular event takes on various numerical values, depending on some other events or variables. The probability of attending college, for example, is higher for someone whose parents attended college than for someone whose parents did not. Similarly, the probability of dying within the next year varies systematically with age.

We now turn to directly consider something alluded to several times above: treatment of a probability not as a fixed number to be estimated, but as an entire function whose parameters are to be estimated. Schematically, it might be represented as:

p = f(circumstances)

where p is the probability of some outcome under consideration, and the circumstances are taken to vary over the combinations occurring in available data, and possibly others as well.

Once the parameters were estimated, the right hand side could be used to calculate a predicted probability of the outcome under consideration, by plugging any combination of circumstances into the right side and carrying out the arithmetic. The result, however, might not be between 0 and 1, as is required of probabilities. This problem can be avoided by predicting not the probability itself, but a transformation thereof which can have any real value, namely the logit transformation. A predicted logit can then be transformed back into a predicted probability, using the inverse transformation.

logit(p) = ln((p/(1-p))

p = 1/(1+exp(-logit(p)))

Example: Stepan-Norris and Zeitlin (1991) considered the probability that a union contract was "pro-labor", and asked whether that probability differed, between communist-led unions and other unions. Their logit analysis involved transformation of probability to odds, thence to log of odds.

Example: The Coleman (1988) paper, discussed above in connection with trust, also included a logistic regression, with the probability of high school dropout taken to depend on various circumstances, including family socioeconomic status, ethnicity, number of siblings, and presence of both parents in the household.

Example: Kleinman, Boyd, and Heritage (1997) studied the probability that a physician would adhere to prescribed explicit criteria, and sought to determine how that probability was affected by various circumstances, including age, gender, severety and duration of the case. They used the Huber variant of logistic regression which also takes account of multiple cases involving the same physician.

Example: Berk et al. (1992) studied whether the probability of recedivism in domestic violence cases was affected by the way the previous violence was handled, as well as other circumstances. They too did logistic regression, but in discussing parameter estimates emphasized Bayesian confidence intervals rather than specific numerical estimates. (Bayesian statistics will be discussed several places, next in our treatment of conditional probability.)

Hypothesis

We have considered numerical probability values that arise by construction of devices to produce desired probabilities, and probability estimates that are calculated from empirical data. Next we turn to the third category, the numerical values that arise from hypotheses.

Hypothetical values of probabilities are often used in exercises that focus on the consequences of probabilities. Sometimes this is done to understand more fully the workings of a theory that involves multiple, interrelated hypotheses. Other times it is done to work out empirical predictions that, if contrary to observations, would discredit the hypotheses. Both types involve "What if...?" types of calculations.

Hypothetical probabilities are sometimes regarded as approximately true, other times regarded as clearly counterfactual, and still other times approached with an open mind (a "diffuse prior", in Bayesian terminology) and intention to evaluate them empirically.

Furthermore, hypothetical probabilities are sometimes regarded as illustrative, rather than descriptive of some particular set of circumstances. Illustrative probabilities are outside the realm of true/false, unless or until they are given specific empirical referents.

Let us consider illustrative probabilities first, as we turn now to some examples.

Illustrative probability values do not refer to specific events, but rather illustrate how interrelated processes work. For example, one item in Assignment 4 deals with hypothetical people making critical career decisions. The value p=.5 used there is only illustrative; the statement does not mention any specific set of people making specific decisions. The purpose of that assignment is to illustrate how inequalities can arise despite initially equal conditions, a modest bit of sociological theorizing. For that purpose, some other value such as p=.6 would have served about as well, although a value such as p=.51423 would have made the arithmetic messier.

Illustrative probability values might turn out to be inappropriate, if one turns from mere illustration to applying them in specific circumstances. For example, one important career decision an individual may face is which of several universities to attend, and someone deciding among five universities might be more interested in using p=.2, rather than p=.5, if he or she did any such calculations.

Other hypothetical probabilities are believed, at least approximately, but the investigator is more interested in working out some of their implications, than in obtaining more precise empirical estimates. This may involve sensitivity considerations, with the investigator having reason to expect that small inaccuracies in the probabilities used in the calculations would produce only small inaccuracies in the conclusions. Furthermore, in some cases, whatever losses are incurred by using probability values that are not quite accurate may be counterbalanced with gains in simplicity.

For example, in another paper (McFarland 1970b), I studied the effects of group size on the probability of unequal numbers of males and females, using p(female) = .5 throughout. The effects for which I was looking were sizeable sexual imbalances, which would have been little affected by instead using p(female) = .49 or some even more precise value. That would, however, have precluded one algebraic deduction and made the arithmetic more extensive as well as somewhat messier.

Sometimes one uses hypothetical probability values which are flatly disbelieved. Their implications may be calculated anyhow, perhaps to better understand a theory of which the probabilities in question play a part.

For example, Oliver and Glick (1982) asked: What would happen if various career-related probabilities for blacks were to change, and become identical to those for whites? They were not suggesting that that had happened, nor that it had much prospect of happening in the next couple of decades; far from it. Rather, they used it in arguing that affirmative action programs needed to be continued, indeed needed to be made even stronger. Key to that argument was the conclusion of their hypothetical probability exercies: Even if mobility probabilities were to suddenly be equalized, it would take several generations to leave behind the disadvantage of Blacks' unfavorable social origins.

For another example, Mare, Winship and Kubitschek (1984) wanted to assess the probability that students would be unemployed if they were not in school, a clearly counterfactual proviso. Yet the "What if...?" question is quite appropriate, in this case as in the previous example.

Hypothetical probability values are often questioned, rather than being either believed or disbelieved. This commonly arises in empirical work, particularly in cases with limited amounts of data, where appropriate cases are too few to warrant merely calculating the proportion of cases that experience some event, and taking it as that event's probability.

But in statistical inference the questioning of hypothetical probabilities is not mere skepticism. Instead, implications of particular hypothetical values are worked out, with the aim of assessing how well those implications match with empirical observations, thereby assisting in making inferences about the unknown values of probabilities.

Often the questioning involves working out implications of a null hypothesis, whose specific form will vary, but which generally suggests that in the particular phenomenon under study, nothing sociologically interesting is going on. Often the data are compatible with such a null hypothesis, especially when datasets are small. Statistical inference will be our subject most of the remainder of this quarter.

References

Baron, James N., David B. Grusky, and Donald J. Treiman, eds. 1996. Social Differentiation and Social Inequality. Boulder, Colorado: Westview Press.

Berk, Richard A., Alec Campbell, Ruth Klap, and Bruce Western 1992. "The Deterrent Effect of Arrest in Incidents of Domestic Violence: A Bayesian Analysis of Four Field Experiments." American Sociological Review 57 (October): 698-708. [Available on jstor.] [Also see related articles in the same issue.]

Camilleri, Santo F. 1962. "Theory, Probability, and Induction in Social Research." American Sociological Review 27 (#2, April): 170-178. [Available on jstor.]

Coleman, James S. 1988. "Social Capital in the Creation of Human Capital." American Journal of Sociology 94 (Supplement): S95-S120. [Available on jstor]

Coleman, James S. 1990. Foundations of Social Theory. Cambridge, Mass.: Harvard University Press. Chapter 5, "Relations of Trust".

Coombs, Clyde H., Robyn M. Dawes, and Amos Tversky 1970. Mathematical Psychology. Englewood Cliffs: Prentice Hall. Chapter 5, "Individual Decision Making".

Deutsch, Karl W., and William G. Madow. 1961. "A Note on the Appearance of Wisdom in Large Bureaucratic Organizations." Behavioral Science 6: 72-79.

Fienberg, Stephen E. 1971. "Randomization and Social Affairs: The 1970 Draft Lottery." Science 171: 255-261.

Kleinman, Lawrence C., Elizabeth A. Boyd, and John C. Heritage. 1997. "Adherence to Prescribed Explicit Criteria During Utilization Review." Journal of the American Medical Association 278 (#6, 13 August): 497-501.

Lorge, Irving, and Herbert Solomon. 1955. "Two Models of Group Behavior in the Solution of Eureka-type Problems." Psychometrika 20: 139-148.

Luce, R. Duncan, and Patrick Suppes 1965. "Preference, Utility, and Subjective Probability." Ch. 19, pp. 249-410, in: Handbook of Mathematical Psychology, Volume 3, edited by R. Duncan Luce, Robert R. Bush, and Eugene Galanter. New York: Wiley.

Mare, Robert D. 1981. "Change and Stability in Educational Stratification." American Sociological Review 46: 72-87. [Available on jstor]

Mare, Robert D., Christopher Winship, and Warren N. Kubitschek 1984. "The Transition from Youth to Adult: Understanding the Age Pattern of Employment." American Journal of Sociology 90: 326-358. [Available on jstor]

Mason, William M., William Lavely, Hiromi Ono, and Angelique Chan 1996. "The Decline of Infant Mortality in China: Sichuan, 1949-1988." Ch. 6, pp. 153-207 in: Baron, Grusky and Treiman (1996).

McFarland, David D. 1970a. "Intragenerational Social Mobility as a Markov Process: Including a Time-Stationary Markovian Model That Esxplains Observed Declines in Mobility Rates over Time." American Sociological Review 35 (June): 463-476. [Available on jstor] [Also see 1974 comment by Larry Schroeder and reply by McFarland in ASR 39: 883-885.]

McFarland, David D. 1970b. "Effects of Group Size on the Availability of Marriage Partners." Demography 7(November): 411-415. [Available on jstor]

Oliver, Melvin L., and Mark A Glick. 1982. "An Analysis of the New Orthodoxy on Black Mobility." Social Problems 29(No. 5, June): 511-523.

Rand Corporation. 1955. A Million Random Digits With 100,000 Normal Deviates. New York: Free Press.

Spilerman, Seymour 1970. "The Causes of Racial Disturbances: A Comparison of Alternative Explanations." American Sociological Review 35: 627-649. [Available on jstor.] [Also see comment and reply in ASR August 1972, pp. 490ff.]

Stepan-Norris, Judith, and Maurice Zeitlin 1991. "'Red' Unions and 'Bourgeois' Contracts?" American Journal of Sociology 96 (#5, March): 1151-1200. [Available on jstor.]