The probability of attending college, for example, is higher for someone whose parents attended college than for someone whose parents did not. Similarly, the probability of dying within the next year varies systematically with age.
Conditional probability considerations also arise in the design of research projects, such as sample surveys, which ordinarily preclude someone who has been sampled already from being sampled a second time in the same wave of the survey. Generally, the probability of any particular outcome at one stage of the sampling depends on what has happened at the previous stages.
From the relative amounts of emphasis statistics books give independence and lack thereof, one might erroneously conclude that independence is the usual state of affairs. Not so in empirical phenomena. Indeed, we social researchers make a living examining the specific forms of dependence among the things we study.
P(A) = P(A|B) P(B) + P(A|notB) P(notB)and similarly
P(B) = P(B|A) P(A) + P(B|notA) P(notA)
P(A|B) = P(B|A) P(A) /[ P(B|A) P(A) + P(B|notA) P(notA)]
This interpretation is illustrated in some cross-classifications from the GSS data, in a stata do-file and a log of the results.
Before collecting and analyzing any new data, a Bayesian will assess his or her prior probability distribution. In fact, a Bayesian making inferences about a population proportion would specify priors in terms of a beta distribution, and select parameter values that would yield a particular beta distribution reflecting his or her best guess about p, and how certain or uncertain he or she is about it. The reason for choosing beta distributions is that they are "conjugate" to the process of estimating a population proportion, which means that a beta prior yields a beta posterior as well, only shifting the parameter values (Leamer pp 40-51; Winkler and Hays pp 498-506). The beta distribution has two parameters, and can, by appropriate choice of parameter values, be made single-peaked, or flat, or bimodal; as well as centered, or skewed either direction. Thus, confining one's attention to beta distributions is not, in fact, very restrictive, as far as the shape of the distribution is concerned.
Beta distributions, like normal distributions, are for continuous variables, and treatment of such requires calculus. Alas, unlike normal distributions, beta distributions are not widely discussed in basic statistics textbooks, with the calculus already worked out and the numerical values already tabulated. Thus, in this example, instead of a beta distribution, we will use a discrete distribution that requires only arithmetic, but that will give some of the flavor of an actual Bayesian analysis.
For illustration, suppose that based on similar previous studies or whatever, the Bayesian believes that p has a value that is probably around .2 or a little higher, but could be on either side of that. Not specific enough! We need particular numbers to insert in the calculations, so let's pick some particular numbers that are a precise instance of the vague ideas just expressed. Let's make the prior a discrete distribution, with positive probabilities only on multiples of .1, give p=.2 the highest prior probability, with smaller positive probabilities for p=.1, .3, and .4; and zero prior any other value of p. For example:
p prior 0 0 .1 .20 .2 .40 .3 .30 .4 .10 >.4 0 sum 1.00Notice that the numbers in the prior probability column are all non-negative, and sum to 1.0, as required of a probability distribution.
How would those prior probabilities be revised after observing some data? Suppose, for example, 10 cases were observed, and 4 of the 10 had the characteristic being considered. How should the prior probabilities be revised? Bayes' rule gives the formula.
One needs to find the likelihood of the observed 4 in 10, calculated separately using each of the p values. Actually, these can be looked up in tables of the binomial distribution, such as Moore and McCabe pp T8-9, using the parts of the table for the probability of k=4 occurrences out of n=10 trials.
In the column for p=.10 we find L(.1|data) = .0112; in the column for p=.20 we find L(.2|data) = .0881; and similarly for the other likelihoods. Ading them as a third column makes the table:
p prior likelihood 0 0 0 .1 .20 .0112 .2 .40 .0881 .3 .30 .2001 .4 .10 .2508 >.4 -- -- sum 1.00 (not 1.0)Notice that, unlike the prior probabilities, the likelihoods do not sum to 1.0. Recall my earlier warning, not to treat 'likelihood' as a synonym for 'probability'.
To complete the calculation of posterior probabilities, each of the likelihoods is multiplied by the corresponding prior, in the 4th column, and finally each of those products is divided by their sum, yielding the posterior probabilities in the 5th column.
prior x p prior likelihood likelihood posterior 0 0 0 0 0 .1 .20 .0112 .00224 .018 .2 .40 .0881 .03524 .287 .3 .30 .2001 .06003 .490 .4 .10 .2508 .02508 .205 >.4 -- -- -- -- sum 1.00 (not 1.0) .12259 1.000Remark: The posteriors are shown here with a spurious precision, merely to facilitate a student's working through the calculations. The numbers in the third decimal place are meaningless, and those in the second place also rather doubtful.
Observation of 4 in 10 in the data led to the following revisions:
To consider sequences we need more notation and concepts.
Example: A classic paper by Lorge and Solomon (1955) provides a model of group decision making with the individuals operating independently.
Markovian models provide a kind of compromise between overly simplistic independence, on the one hand, and everything-depends-on-everything-else anarchy, on the other hand. Social status depends on one's parents, but not on all the ancestors back to Lucy or Adam and Eve; that sort of thing.
A Markov model takes more possibly relevant information into account than does an independence model, and thus may be closer to reality. But in fact, things may not be that simple either.
Examples: McFarland (1970a) and Oliver and Glick (1982) treated occupational mobility using Markovian models. Weingart et al. (1999) treat negotiations along similar lines.
Leamer, Edward E. 1978. Specification Searches: Ad Hoc Inference with Nonexperimental Data. New York: Wiley.
McFarland, David D. 1970a. "Intragenerational Social Mobility as a Markov Process: Including a Time-Stationary Markovian Model That Explains Observed Declines in Mobility Rates over Time." American Sociological Review 35 (June): 463-476. [Available on jstor] [Also see 1974 comment by Larry Schroeder and reply by McFarland in ASR 39: 883-885.]
Oliver, Melvin L., and Mark A Glick. 1982. "An Analysis of the New Orthodoxy on Black Mobility." Social Problems 29(No. 5, June): 511-523.
Weingart, Laurie R., Michael J. Prietula, Elaine B. Hyder, and Christopher R. Genovese. 1999. "Knowledge and the Sequential Processes of Negotiation: A Markov Chain Analysis of Response-in-Kind." Journal of Experimental Social Psychology 35: 366-393. (Online in idealibrary.)
Winkler, Robert L., and William L. Hays 1975. Statistics: Probability, Inference, and Decision. 2nd edn. New York: Holt.