Measuring and Mapping the
Social Structure of Usenet
Pressented at the 17th Annual
International Sunbelt Social Network Conference
Bahia Resort Hotel, Mission Bay, San Diego, California
February 13-16, 1997
Marc A. Smith
UCLA Department of Sociology
Notable Qualities of the Usenet:
Usenet Growth from 1979 to Present
Contrast Offline and Online Social Context
Offline, it is easy to determine:
- where people are
- where they are coming from or going to
- how they are grouped
- roughly what they are doing
Online, it is difficult to determine:
- the size and composition of the crowd
- the distribution of people into clusters
- variations in the social character of groups
A typical interface to the Usenet presents the social space in the form of a long flat alphabetically sorted list of newsgroups.

Contrast this with studies of physical spaces

William H. Whyte, City
Limitations of ethnographic research on online groups
Some groups noisy or barren, others ordered and productive, Contrast:


Take a few moments to compare these groups. Without trying to judge which is better or worse, we can say that these groups are different in some important and countable ways. Alt.flame has longer threads, one is 49 messages long! In contrast, comp.lang.perl.misc shows no thread longer than 7 messages while most have only one followup. In Alt.flame, a few people dominate the space, posting dozens of messages, while in comp.lang.perl people post more than one message, a larger number of people are participating.
These distinctions do become apparent after reading a group for a period of time. But given the vast numbers of groups, it is impractical to manually construct a broader map of the distinctions between all newsgroups.
Does alt.flame have a different daily pattern than perl?

Message headers contain valuable data
Data is collected from the message header, archiving the contents of the From, Newsgroups, Subject, Date, Organization, Lines, Message-ID, and References lines.
| From: jwjr@panix.com (James Wetterau) Newsgroups: comp.lang.perl.misc Subject: Re: Regexp to do minimal email validation Date: 21 Feb 1997 18:34:52 -0500 Organization: Panix Lines: 18 Message-ID: <5elbes$hra@panix.com> References: <5e30e3$eh4@its.hooked.net> <8clo8pmtjx.fsf@gadget.cscaper.com> <Matthew.Healy-2002971159050001@pudding.med.yale.edu> In article <Matthew.Healy-2002971159050001@pudding.med.yale.edu>, Matthew D. Healy <Matthew.Healy@yale.edu> wrote: >... >> *can't*. You can't. There's no point. Send the mail, and if it gets >> to them, it's the right address. <fred&barney@stonehenge.com> is a >> valid address, and would have been falsely rejected by your regexp. > >I just tried it, and it worked the _SECOND_ time, because the first >time I forgot to put single quotes around the address, causing Unix Double quotes work just fine with my (bash) and many other shells. -- James Wetterau, Jr. | jwjr@panix.com (h) | <--- But some people call me Maurice jwjr@name.net (w) | |
Netscan can examine each message in each newsgroup for any time period and generate a measure of the number of :
Daily Rates of posts and posters
Average of:
- 67094 Posts/Day
- 17934 Posters/Day
- 4 Posts/Poster/Day
Hourly Rates of posts and posters
Average of:
- 3490 Posts/hour
- 1200 Posters/Hour
- 3 Posts/Poster/Hour
Distribution of posting frequency
Distribution of Thread to Post Ratios
Distribution of Crossposting Degree

Netscan data can reveal that newsgroups can be located within neighborhoods created through crossposting.
Distribution of Crossposting Volume
Hypothesis: 12.5K newsgroups collapse into ~800 metagroups

Limitations of Netscan data: Digital artifacts are incomplete, ambiguous, and potentially dangerous.
This research uncovers social spaces, subjecting them to a kind of panoptic surveillance.