6nov2000

UCLA Soc. 210A, Topic 7, Sampling Distributions and Estimation

Professor: David D. McFarland


Web Pages for Fall 2000



Topic 7: Sampling Distributions and Estimation

We have already looked at probability distributions in the context where it was the social phenomenon under observation, and not the observation process, that was conceptualized as probabilistic. We will now review them, and then proceed to consider probability distributions that arise in the observation process, as when we select cases to interview, and that are used in inference, from observed data to inferences -- guesses -- about the population or process from which those data arose.

Some distributions generated by social processes

Some distributions utilized in inference

Point Estimation

Next we focus directly on estimation, a transition topic between descriptive and inferential statistics, at least insofar as a quantity calculated from data is interpreted as (an estimate of) the corresponding quantity for some larger population.

Ordinarily the investigator will only select a single sample, not a large number of replications, as suggested in the imagery of sampling distributions. However, the investigator has no control over which of the outcomes in the sampling distribution he or she happens to get. Thus the strategy is to arrange the sampling distribution, which can be controlled, in such a manner that the vast majority of the possible samples would, if they happened to be the one actually selected, yield suitably accurate inferences about the population being sampled.

Specifically, if a particular statistic in the sample is to be used as an estimate of a parameter in the population, one would like:

Sometimes the sample counterpart of a population parameter is unbiased. In examining sample statistics from the TVHOURS variable, this appeared to be the case for sample mean, but possibly not for sample standard deviation, and certainly not for sample maximum. The results from the five samples we examined do in fact generalize: The standard error of a sample estimator: Here it might be noted that reliability, in something like the sense used in statistics, is also a concern of at least some nonquantitative sociologists. Katz (1982) rejects such quantitative formulations, but suggests how similar sorts of concerns may be addressed in a style of research known as Analytic Induction. In this course we espouse careful use, not rejection, of quantitative tools. Sampling theory does not solve all our conceptual problems, particularly those involving indefinite theoretically relevant populations from which data at hand were not randomly selected; but it does tell how to obtain data representative of a population from which they were randomly selected, and also the amount of data required to provide specified levels of reliability.

Interval Estimation

Instead of estimating some population parameter with a single number based on sample data, an investigator sometimes prefers to use sample data to calculate endpoints of an interval, in such a manner that the interval has a high probability of including the true value of the parameter being estimated. Such an interval is referred to as a confidence interval. Its endpoints are called confidence limits. And the probability that it contains the true parameter value is called the confidence level.

Desirable properties of confidence intervals

Bayesian Interval Estimation

As indicated earlier, Bayes' Rule is a theorem that follows directly from the probability axioms and the definition of conditional probability; it does not depend on any particular interpretation such as degree-of-belief. However, when a statistician is described as a "Bayesian", that ordinarily refers to someone using degree-of-belief interpretation of probability.

Both frequentists and Bayesians use interval estimates, but they use somewhat different ways of describing them.

Formulae for Interval Estimates

The prototypical formula is for a parameter whose point estimate is unbiased and has a Gaussian sampling distribution, but we would like to calculate an interval estimate instead of a point estimate. Noting that the standard Gaussian distribution has 95% probability between the values -1.96 and +1.96, we could use as endpoints the values which were that many standard units above and below the point estimate.

Example: Find a confidence interval for the proportion of all voters favoring a particular measure, based on the proportion of respondents in a sample favoring it. The sample proportion is an unbiased estimate of the population proportion, and it has a standard error of sqrt[p(1-p)/n]. In a sample of n=400, if p has a value near .8, this would work out to about .02, and 1.96 times that would be about .04, yielding an interval of .8-.04 to .8+.04, or .76 to .84. Thus instead of using .8 as a point estimate of the population proportion, one would use .76 to .84 as a 95% confidence interval.

Special Values in or out of Confidence Interval

Sometimes one may wish to know whether some special value, typically zero, is in a confidence interval. A value of 0 for some parameter might mean that the patterns in the data are simpler than anticipated, that a simpler formula which omits that parameter will suffice for the data in hand. Such considerations lead directly to the next topic, tests of statistical hypotheses.


Feller, William. 1957. An Introduction to Probability Theory and Its Applications. Volume 1, 2nd edn. New York: Wiley. Section X.5, pages 238-241, "Variable Distributions".

Katz, Jack. 1982. "A Theory of Qualitative Methodology: The Social System of Analytic Fieldwork." Pages 197-218 in: Poor People's Lawyers in Transition. New Brunswick, NJ: Rutgers University Press. Reprinted, pages 127-148 in: Robert M. Emerson, ed. 1988. Contemporary Field Research: A Collection of Readings. Prospect Heights, IL: Waveland Press.

Seltzer, Judith A. 1991 "Legal Custody Arrangements and Children's Economic Welfare." American Journal of Sociology 96 (#4, January): 895-929.