Measuring and Mapping the
Social Structure of Usenet


Pressented at the 17th Annual International Sunbelt Social Network Conference
Bahia Resort Hotel, Mission Bay, San Diego, California
February 13-16, 1997

Marc A. Smith
UCLA Department of Sociology



Netscan Overview:


Initial Focus: The Usenet


Notable Qualities of the Usenet:

Usenet Growth from 1979 to Present

 


Contrast Offline and Online Social Context

Offline, it is easy to determine:

Online, it is difficult to determine:


Existing Interfaces

A typical interface to the Usenet presents the social space in the form of a long flat alphabetically sorted list of newsgroups.

 

Contrast this with studies of physical spaces


William H. Whyte, City

 


Limitations of ethnographic research on online groups


Newsgroups vary dramatically

Some groups noisy or barren, others ordered and productive, Contrast:



Take a few moments to compare these groups. Without trying to judge which is better or worse, we can say that these groups are different in some important and countable ways. Alt.flame has longer threads, one is 49 messages long! In contrast, comp.lang.perl.misc shows no thread longer than 7 messages while most have only one followup. In Alt.flame, a few people dominate the space, posting dozens of messages, while in comp.lang.perl people post more than one message, a larger number of people are participating.

These distinctions do become apparent after reading a group for a period of time. But given the vast numbers of groups, it is impractical to manually construct a broader map of the distinctions between all newsgroups.

Does alt.flame have a different daily pattern than perl?


Usenet data structure


Message headers contain valuable data

Data is collected from the message header, archiving the contents of the From, Newsgroups, Subject, Date, Organization, Lines, Message-ID, and References lines.

From: jwjr@panix.com (James Wetterau)
Newsgroups: comp.lang.perl.misc
Subject: Re: Regexp to do minimal email validation
Date: 21 Feb 1997 18:34:52 -0500
Organization: Panix
Lines: 18
Message-ID: <5elbes$hra@panix.com>
References: <5e30e3$eh4@its.hooked.net> <8clo8pmtjx.fsf@gadget.cscaper.com> <Matthew.Healy-2002971159050001@pudding.med.yale.edu>

In article <Matthew.Healy-2002971159050001@pudding.med.yale.edu>,
Matthew D. Healy <Matthew.Healy@yale.edu> wrote:
>...
>> *can't*. You can't. There's no point. Send the mail, and if it gets
>> to them, it's the right address. <fred&barney@stonehenge.com> is a
>> valid address, and would have been falsely rejected by your regexp.
>
>I just tried it, and it worked the _SECOND_ time, because the first
>time I forgot to put single quotes around the address, causing Unix

Double quotes work just fine with my (bash) and many other shells.

--
James Wetterau, Jr. |
jwjr@panix.com (h) | <--- But some people call me Maurice
jwjr@name.net (w) |

Netscan Data

Netscan can examine each message in each newsgroup for any time period and generate a measure of the number of :


Netscan Levels of Analysis


Daily Rates of posts and posters

Average of:

Hourly Rates of posts and posters

Average of:


Hierarchy breakdowns


Distribution of posting frequency


Distribution of Thread to Post Ratios


Distribution of Crossposting Degree

Netscan data can reveal that newsgroups can be located within neighborhoods created through crossposting.


Distribution of Crossposting Volume


Next steps


Limitations of Netscan data: Digital artifacts are incomplete, ambiguous, and potentially dangerous.


Ethical Issues

This research uncovers social spaces, subjecting them to a kind of panoptic surveillance.


Marc A. Smith smithm@ucla.edu