University of California, Los Angeles Winter 1998 seminar: FROM DATA TO PICTURES: GRAPHICAL REPRESENTATION OF QUANTITATIVE INFORMATION Sociology 219A (for graduate students) 747-115-200 and 88b (for undergraduates) 347-261-200 Description: Graphical representation of quantitative information, both for discovery of patterns in data and for presentation of such findings, both on paper and on the World Wide Web. Aesthetic and representational principles which have been proposed for good graphic design. Emphasis on graphs depicting relationship among two or more variables (one of which may be "time"). Each section's coverage will begin with underlying theory, illustrated by examples of well and poorly designed graphs, and proceed to preparation and display of new graphs (based on data provided by students or professor) ranging from freehand sketches to computer graphics. Schedule: Tuesdays 11:00 - 1:50, PubPol 2292. (Some sessions will be held in one of the Social Science Computing labs, in PubPol 2035.) Professor: David D. McFarland, 257 Haines Hall, phone 825-6380. Email address: mcfarland@soc.ucla.edu Web page: http://www.sscnet.ucla.edu/soc/faculty/mcfarland Messages may be left with the secretaries in the main Sociology Department office, 264 Haines, phone 825-1313. Office hours: Tuesdays after class; and by appointment in 257 Haines Hall. Requirements: (1) Before each session, read the assigned selections and come prepared to discuss them. (2) Each week bring in at least one graph related to some topic of interest to you (e.g., in your major field), typically a graph published in a book or newspaper. Be prepared to comment on it in relation to the concepts and principles covered in the seminar. (3) During the quarter prepare a portfolio that includes those graphs, your written commentary about them, and perhaps redraws of some of the published graphs. You may go back and modify an earlier commentary if that seems appropriate in light of material covered later in the quarter. (4) Create a World Wide Web site, and post to it samples of graphs you have created. (5) A graduate student has the further requirement: Create a graph or series of graphs of some data from your own area of specialization. Notes on computing: (1) We will be using PCs running MS Windows in one of the Social Science Computing labs, and scanners in Powell. (2) The seminar is NOT primarily about how to use graphics software; rather it is primarily about good graphic design, which will effectively assist in the discovery and display of patterns in data. (3) Students will get a mixture of computer graphics demonstrations and hands-on experience. I intend to have the class use software that will include statistical (STATA), image editing (PHOTOSHOP), and web browser (NETSCAPE). I also plan to have demos-only of, or output-only from, some other software. BOOKS Wallgren, Anders, Britt Wallgren, Rolf Persson, Ulf Jorner, and Jan-Aage Haaland. 1996. Graphing Statistics and Data: Creating Better Charts. Thousand Oaks: Sage. ISBN 0-7619-0599-5 [BOOKSTORE-required] Weinman, Lynda. 1996. How to Prepare Images and Media for the Web. 2nd edn. Indianapolis: New Riders. (Note: This differs substantially from the first edition.) ISBN 1-56205-669-7 or 1-56205-715-4 [PERMANENT NON-CIRCULATING RESERVE, ARTS LIBRARY: T 385 W4545 1997] [BOOKSTORE: optional] Cleveland, William S. 1994. The Elements of Graphing Data. Revised edn. Summit, NJ: Hobart Press. ISBN 0-9634884-1-4 [2 HR RESERVE, COLLEGE: QA 90 C54 1994] (Note: A 1985 edition with the same title, from Wadsworth, is not as nicely laid out, but has much the same content and similar pagination.) Pike, Jayna. 1991. An Introduction to Computer Graphics Concepts: From Pixels to Pictures. Reading MA: Addison-Wesley. [2 HR RESERVE, COLLEGE: T 385 P52 1991] Tufte, Edward R. 1983. The Visual Display of Quantitative Information. Cheshire CT: Graphics Press. [2 HR RESERVE, URL: HA 31 T83 1983] [2 HR RESERVE, COLLEGE: HA 31 T83 1983] OUTLINE 0. Seminar Organization Differences between a seminar and a lecture course Regular weekly assignments Participation in class discussion Portfolio for quarter 1. Introduction: Definitions, goals, examples. Readings: Cleveland ch. 1 plus 150-154; Pike ch. 1; Tufte ch. 1 (skim 16-27 on maps); Wallgren ch. 1. Graphics as interdisciplinary, with goals and tools from science, journalism, art, and elsewhere. Our main goals are discovery of patterns in data, and presentation of those findings to others. Graphs: data, represented by geometric objects, according to correspondence rules. Data: both qualitative and quantitative (see section below on measurement scales). Geometric objects: points, lines, rectangles, etc., but also such things as intervals or regions of white space. Examples: --Dot graphs, representing quantities by positions of dots. Cleveland 150-154. --Bar graphs, representing quantities by lengths of rectangles. (To anticipate the section on measurement scales: The base variable, which distinguishes different dots or different bars, does not necessarily form even a nominal scale.) Boundary questions: Is a birthday cake a graph? How about a thermometer? Non-graphs that look like graphs: Logos, cartoons, etc., that include depictions of graphs but do not represent any underlying data. 2. Graphical Perception. Readings: Cleveland ch. 4 plus 181-185 (on vertical line graph); Tufte ch. 2 plus 107-112; Wallgren, 72-73. Cognitive and perceptual tasks involved in reading a graph, their varying difficulties, and the accuracies with which they are performed. 3. Graphing Fractions of a Whole. Readings: Cleveland sec. 3.3; Wallgren ch. 5. Examples: --Pie Chart and its shortcomings. --Alternatives for displaying proportions, such as slices of a dollar bill or other bar of constant width. (But contrast Tufte 70.) --Histogram (defined for present purposes as a bar graph whose base variable is at least an interval scale, and whose total area is 100%). Proposed redesign in Tufte 126-128. --Frequency or probability density curve. --Population pyramid --Cumulative frequency or probability distribution --Quantile (Percentile) graph. Cleveland 136-139. --Rank-size plot. Simon. --Lorenz curve. --Box graph; percentile graph with summary. Cleveland 139-143. 4. Posting to the World Wide Web --Graphics file formats. Weinman, ch. 3. --Bandwidth and file size issues. Weinman, ch. 4. --Scanning issues. Weinman, ch. 12. 5. Scales of Measurement Readings: Pages 21-30 of: Stevens, Stanley S. 1951. "Mathematics, Measurement, and Psychophysics", Ch. 1, pp. 1-49 in: Stevens, Stanley S., ed. Handbook of Experimental Psychology. New York: Wiley. Cleveland sec 2.5; Wallgren ch 2. Scales lying on a single dimension: --ordinal --interval --ratio Non-Stevens issues: Is there a meaningful upper anchor point, such as 100%? Are negative values meaningful? More complex structures: the periodic table in chemistry; the color wheel; trees or other grouped data (Cleveland 152-154). Less complex structures: nominal scales. Rescaling of data appropriate for its level of measurement. --Ranks or percentiles or quantiles. Cleveland 136-139. --Logarithmic transformation of data. Cleveland 95-103, 120- 126. --Per-capita or other modification of data may be useful despite (indeed, because of) failing Stevens' criterion. (See additional material on transformations below in connection with time series data.) 6. Graphing Bivariate Relationships Readings: Tufte 94-95 on scatterplot, plus 13-14 on Anscombe's quartet. Readings: Cleveland 154-165; Wallgren ch 7. Dots, circles, other symbols. Sunflowers, jittering, other devices for handling overlap. Predicted values from model or curve-fitting or smoothing procedure; residuals. 7. Color Readings: Pike 10-12; 25; ch 3; color plates at end of book. Tufte 153-154. Cleveland 209-212 with plates at front of book; 230-233. (Note: Tufte and Cleveland make only minimal use of color; my preference is to understand color well enough to avoid its limitations and use it both effectively and extensively.) The visible spectrum of electromagnetic waves. Light sources and the RGB color model: color on screen. Light filters and the CYMK color model: color printing. Hues, tints (white added), and shades (black added). Shading and texture in monochrome. 8. Graphing Time Series (or other data with a single-valued dependent variable and an equally spaced independent variable). Readings: Cleveland 180-192; Wallgren ch 6. Connected symbol graph ( = line graph in common usage; however it is the symbols that represent the data; the lines only guide eye motion from symbol to symbol.) Labeling of historical events. Truncation, estimates, or other features of the end of a series. Unequal time intervals produce distortion if clock-and- calendar time is the relevant type. However, for some purposes "time" is better measured as the count of successive marker events (parity progression, successive marriages, successive residential moves, etc.) which are not evenly spaced in clock- and-calendar time. Comparison with a baseline series, or a second observed series. Graphing multiple time series with same or different vertical scales. Distinguishing the series with different symbols or different colors. 8a. Further consideration of data transformations and reorganizations (Optional topics to be covered if time permits). "real" = "nominal" adjusted for inflation Per capita or other adjustments for size and growth. Index numbers (Choice of base year may affect conclusions) Seasonality, adjustment for seasonality, cycle plots. Cleveland 186-187. Rationale and procedures for data smoothing. Lowess or Loess (LOcally WEighted Scatterplot Smoother) procedure. Cleveland 168-180. Cumulative scores rather than raw data. Change or difference scores rather than raw data. 9. Graphing Multivariate Relationships Aspects of 3-D geometry: Pike ch 4. Multiple bivariate graphs. Tufte 170-175. Rugplots. Tufte 133, 135-136. Scatterplot matrix. Cleveland 193-197. Contour plots. Surface plots. Tufte 147. 3-D scatterplots. Severe limitations of methods for depicting 3-D data on two dimensions without motion. Use of color as a third dimension. Pike color plates 11, 15. 10. Motion, Interactive Graphics. Reading: Pike ch 7; 145-146. Cleveland 93-94, 213-218. Animation for 3-D Graphics. Spinning, brushing. Ability to quickly and easily toggle labels and highlights, redraw for subsets of cases or from different viewing angles, etc., may eliminate some difficult choices that would be required if a single graph were expected to meet multiple and conflicting goals (e.g., Cleveland 46). 11. Compound Graphs, Sets of Related Graphs, Accompanying Tables and Text. Making a causal argument, or telling successive parts of a single story. Readings: Tufte ch 9.