18oct2000

Outline:

UCLA Soc. 210A, Topic 5, Conditional Probability

Professor: David D. McFarland


Web Pages for Fall 2000


Topic 5: Conditional Probability

Reading Assignment



In the preceding topic we dealt with "the" probability of an event, a single number that might have been either stated in an hypothesis, or estimated from some data, or designed into some mechanism constructed to produce randomized outcomes. Here we consider the situation where the probability of a particular event takes on various numerical values, depending on some other events or variables. Here the focal event's probability is one or another number, depending on the circumstances or "conditions". This brings us to Conditional Probability and related matters.

The probability of attending college, for example, is higher for someone whose parents attended college than for someone whose parents did not. Similarly, the probability of dying within the next year varies systematically with age.

Conditional probability considerations also arise in the design of research projects, such as sample surveys, which ordinarily preclude someone who has been sampled already from being sampled a second time in the same wave of the survey. Generally, the probability of any particular outcome at one stage of the sampling depends on what has happened at the previous stages.


Frequentist Interpretation of Bayes' Rule: Turning a Table Around

Frequentists interpret Bayes' Rule as applicable to the following sorts of problems: The printed table shows what percent of high school dropouts are unemployed, but what you want to know instead is what percent of the unemployed are high school dropouts. Bayes' Rule tells how to turn the table around.

This interpretation is illustrated in some cross-classifications from the GSS data, in a stata do-file and a log of the results.


Degree-of-belief Interpretation of Bayes' Rule: Revision of Prior Beliefs, in Light of New Evidence

Consider the problem of estimating p, the proportion of a population with a particular characteristic.

Before collecting and analyzing any new data, a Bayesian will assess his or her prior probability distribution. In fact, a Bayesian making inferences about a population proportion would specify priors in terms of a beta distribution, and select parameter values that would yield a particular beta distribution reflecting his or her best guess about p, and how certain or uncertain he or she is about it. The reason for choosing beta distributions is that they are "conjugate" to the process of estimating a population proportion, which means that a beta prior yields a beta posterior as well, only shifting the parameter values (Leamer pp 40-51; Winkler and Hays pp 498-506). The beta distribution has two parameters, and can, by appropriate choice of parameter values, be made single-peaked, or flat, or bimodal; as well as centered, or skewed either direction. Thus, confining one's attention to beta distributions is not, in fact, very restrictive, as far as the shape of the distribution is concerned.

Beta distributions, like normal distributions, are for continuous variables, and treatment of such requires calculus. Alas, unlike normal distributions, beta distributions are not widely discussed in basic statistics textbooks, with the calculus already worked out and the numerical values already tabulated. Thus, in this example, instead of a beta distribution, we will use a discrete distribution that requires only arithmetic, but that will give some of the flavor of an actual Bayesian analysis.

For illustration, suppose that based on similar previous studies or whatever, the Bayesian believes that p has a value that is probably around .2 or a little higher, but could be on either side of that. Not specific enough! We need particular numbers to insert in the calculations, so let's pick some particular numbers that are a precise instance of the vague ideas just expressed. Let's make the prior a discrete distribution, with positive probabilities only on multiples of .1, give p=.2 the highest prior probability, with smaller positive probabilities for p=.1, .3, and .4; and zero prior any other value of p. For example:

        p       prior

        0       0	
        .1      .20
        .2      .40
        .3      .30
        .4      .10
        >.4     0

        sum     1.00
Notice that the numbers in the prior probability column are all non-negative, and sum to 1.0, as required of a probability distribution.

How would those prior probabilities be revised after observing some data? Suppose, for example, 10 cases were observed, and 4 of the 10 had the characteristic being considered. How should the prior probabilities be revised? Bayes' rule gives the formula.

One needs to find the likelihood of the observed 4 in 10, calculated separately using each of the p values. Actually, these can be looked up in tables of the binomial distribution, such as Moore and McCabe pp T8-9, using the parts of the table for the probability of k=4 occurrences out of n=10 trials.

In the column for p=.10 we find L(.1|data) = .0112; in the column for p=.20 we find L(.2|data) = .0881; and similarly for the other likelihoods. Ading them as a third column makes the table:

 
        p       prior   likelihood

        0       0       0
        .1      .20     .0112
        .2      .40     .0881
        .3      .30     .2001
        .4      .10     .2508
        >.4     --      --

        sum     1.00    (not 1.0)
Notice that, unlike the prior probabilities, the likelihoods do not sum to 1.0. Recall my earlier warning, not to treat 'likelihood' as a synonym for 'probability'.

To complete the calculation of posterior probabilities, each of the likelihoods is multiplied by the corresponding prior, in the 4th column, and finally each of those products is divided by their sum, yielding the posterior probabilities in the 5th column.

                                        prior x
        p       prior   likelihood      likelihood      posterior
        
        0       0       0               0               0
        .1      .20     .0112           .00224          .018          
        .2      .40     .0881           .03524          .287
        .3      .30     .2001           .06003          .490
        .4      .10     .2508           .02508          .205
        >.4     --      --              --              --
        
        sum     1.00    (not 1.0)       .12259          1.000
Remark: The posteriors are shown here with a spurious precision, merely to facilitate a student's working through the calculations. The numbers in the third decimal place are meaningless, and those in the second place also rather doubtful.

Observation of 4 in 10 in the data led to the following revisions:


Sequences of Events

Many phenomena of sociological interest come in sequences, rather than one-time occurrences. Schooling is completed one year at a time rather than in a single selection (Mare 1981). A career consists not of a single job, but of a sequence of related jobs, with increasing rewards, each building on the experience gained in the previous jobs. Intergenerational mobility takes place over multiple generations, not just one or two. Negotiations consist of a sequence of proposals and counter-proposals.

To consider sequences we need more notation and concepts.

Independence, as noted when it was first mentioned, is a very special circumstance, seldom found in observational settings. Most examples of independence are in constructed settings, such as experimental laboratories where different people record their individual decisions prior to discussing them.

Example: A classic paper by Lorge and Solomon (1955) provides a model of group decision making with the individuals operating independently.

Markovian models provide a kind of compromise between overly simplistic independence, on the one hand, and everything-depends-on-everything-else anarchy, on the other hand. Social status depends on one's parents, but not on all the ancestors back to Lucy or Adam and Eve; that sort of thing.

A Markov model takes more possibly relevant information into account than does an independence model, and thus may be closer to reality. But in fact, things may not be that simple either.

Examples: McFarland (1970a) and Oliver and Glick (1982) treated occupational mobility using Markovian models. Weingart et al. (1999) treat negotiations along similar lines.




References

Berk, Richard A., Alec Campbell, Ruth Klap, and Bruce Western 1992. "The Deterrent Effect of Arrest in Incidents of Domestic Violence: A Bayesian Analysis of Four Field Experiments." American Sociological Review 57 (October): 698-708. [Available on jstor.] [Also see related articles in the same issue.]

Leamer, Edward E. 1978. Specification Searches: Ad Hoc Inference with Nonexperimental Data. New York: Wiley.

McFarland, David D. 1970a. "Intragenerational Social Mobility as a Markov Process: Including a Time-Stationary Markovian Model That Explains Observed Declines in Mobility Rates over Time." American Sociological Review 35 (June): 463-476. [Available on jstor] [Also see 1974 comment by Larry Schroeder and reply by McFarland in ASR 39: 883-885.]

Oliver, Melvin L., and Mark A Glick. 1982. "An Analysis of the New Orthodoxy on Black Mobility." Social Problems 29(No. 5, June): 511-523.

Weingart, Laurie R., Michael J. Prietula, Elaine B. Hyder, and Christopher R. Genovese. 1999. "Knowledge and the Sequential Processes of Negotiation: A Markov Chain Analysis of Response-in-Kind." Journal of Experimental Social Psychology 35: 366-393. (Online in idealibrary.)

Winkler, Robert L., and William L. Hays 1975. Statistics: Probability, Inference, and Decision. 2nd edn. New York: Holt.