Latin-1 Character Set for HTML

David D. McFarland

Latin-1 is the default character set for HTML, the one that is supposed to be available to every web browser. For that reason some familiarity with Latin-1 is useful for every Web user who has occasion to either read or write web pages containing symbols beyond the ordinary keyboard characters.

Latin-1's main feature is an extensive range of accented characters used in western European languages, and scholars who use those languages will be interested in Latin-1 for that reason.

Mathematical sociologists are among those for whose work the ubiquity of Latin-1 creates more problems than solutions. We are not alone, however. Many scholars have use for non-Roman alphabets, such as Cyrillic, Greek, or Hebrew, which are not included in Latin-1. Some need Arabic, Chinese, or other characters. (These needs also arise in international commerce, as well as scholarship.)

Unicode, if and when it becomes widely implemented, promises to solve many of the current problems pertaining to languages that do not use Roman alphabets. That widespread implementation, in turn, might happen before the turn of the century. The most widely used computer platforms at this writing, Windows 3.1 and 95, do not support unicode, but Microsoft has built unicode capability into Windows NT 4.0, and it is reasonable to guess that revisions of other operating systems will do likewise.

What follows, then, is not a full account of Latin-1, but an account intended for mathematical sociologists. Our main concern herein is to know enough about Latin-1 to recognize and work around its limitations for the mathematical notation in our work.

Mathematical notation involves special characters, whose presence or absence in Latin-1 is our current topic. But mathematical notation also involves other considerations beyond the availability of particular characters. For example, a mathematical expression may involve characters positioned above or below the baseline of the expression, such as limits of integration, or multiple levels of sub- or superscripts. Or some characters are required to span several rows, such as brackets enclosing a matrix. So the availability of an appropriate character set will solve many, but not all, of the problems faced in attempts to publish mathematical sociology on the Web.

7-Bit ASCII

The Latin-1 character set is an extension of the earlier ascii character set, to which we turn first. The 7-bit ascii used 7 bits to distinguish 128 (= 2 to the power 7) different codes, numbered 0 through 127. Codes 0-31 and 127 are reserved for control codes, most of which are rarely used these days. The main part of 7-bit ascii covers:

Upper case letters
Lower case letters
Digits 0 through 9
33 punctuation marks

These upper and lower case letters, digits, and punctuation marks can all be entered directly from the keyboard; no special codes are needed, but just for the record, the numeric codes are as follows:

0-31. Control characters, such as 12 = formfeed

32. Space

33-47. ! " # $ % & ' ( ) * + , - . /

48-57. Digits 0 through 9

58-64. : ; < = > ? @

65-90. Upper case A through Z

91-96. [ \ ] ^ _ `

97-122. Lower case a through z

123-126. { | } ~

127. Control character

7-bit ascii offers little to the mathematical sociologist, being geared to an elementary school level of mathematics that includes addition ( + ) and subtraction ( - ), but not yet multiplication or division (symbols missing from ascii).

Workarounds have a long history, and a glance at a couple of historical workaround strategies may be instructive for our own attempts to use the World Wide Web for mathematical sociology.

Substitute a symbol with an appearance similar to the unavailable one.The letter x resembles the multiplication symbol, and has often served as substitute when the latter was unavailable.
Create a symbol by combining available symbols. In the days of typewriters, an acceptable division symbol could be created by overstriking (with the keystrokes : backspace - ). Alas, on a typical computer system backspace is destructive, producing an erasure rather than an overstrike.
Use an available symbol to convey the same meaning as the unavailable symbol. The slash symbol / commonly substitutes for the unavailable division symbol.

We no longer have to go through such contortions to write about multiplication and division. However, the concept of "workaround" and the notion of pursuing various workaround strategies will be of continuing use to mathematical sociologists, as well as to various others whose work is outside of the market for which software vendors design their mainstream products.

8-Bit Latin-1

Clearly the 7-bit ascii character set is inadequate for many purposes, so various organizations developed extensions, and the lack of agreement among those various extensions is a continuing source of headaches. Here we consider just one of those extensions, Latin-1, the one which is the default character set on the World Wide Web.

Adding an 8th bit to the transmission code doubled the total number of different characters from 128 (= 2 to the power 7) to 256 (= 2 to the power 8). To maintain backward compatibility (though at the cost of wasting space on no longer used control characters), the codes 0 through 127 were assigned to the same characters as in 7-bit ascii. (Also, since 8-bit codes passing through equipment intended for 7-bit codes might have their 8th bits truncated, and since control codes could be dangerous, the codes from 128 to 159 and 255, which on truncation of their 8th bits would become control codes, from 0 to 31 and 127, were left unused.)

The Latin-1 characters with numerical codes above 127 are mostly accented letters used in various European languages: c cedilla ( ç ), e grave ( è ), n tilde ( ñ ), u umlaut ( ü ), and such. These are needed for writing in French, German, Spanish, etc.

These characters are not directly on the keyboard, and the ways of getting them into a document vary by computer platform and software. Some programs offer their own ways of entering non-keyboard characters. Here we mention only the ways that come with the operating system.

In DOS and Windows3.1, non-keyboard characters can be entered by the "Alt-keypad method":
- Consult a table (partial table below; complete table in Appendix C of Raggett ) to find the numerical code for the desired character.
- Toggle the NumLock key, if needed, until the NumLock light indicates the keypad is in numeric mode.
- Depress and hold down one of the ALT keys.
- Enter the numerical code in the keypad.
- Release the ALT key.
- Note: Either ALT key will work, but one must use the number keys in the keypad, not those across the top row of the keyboard.
MacOS, Windows95, and WindowsNT have pull-down menus from which special symbols can be selected, then copied to the document being created. These are clumsy ordeals that drive touch-typists batty, but they work, and they do have the advantage of making it easy to find rarely used (and thus easily forgotten) symbols.
- In MacOS it is called "Key Caps", and is accessed from the pull-down menu obtained by clicking on the Apple icon.
- In Windows95 and WindowsNT 4.0 it is called "Character Map", and is accessed by selecting: Start, Programs, Accessories, Character Map.
Software firms don't always pay much attention to standards committees, but one of the latter has proposed that web browsers should be designed to correctly render all Latin-1 characters when the latter are entered in the form of a numerical code enclosed between & and ; characters; for example, &162; for the cent symbol ( ˘ ). Perhaps you will read this in an environment where that proposal has been implemented; at this writing the column labeled &n; in the table below is not working as proposed.

The Latin-1 characters with numbers above 127 consist mostly of accented letters; the exceptions are as follows:

n. &n; Alt-n Name
161. &161; Ą Inverted exclamation
162. &162; ˘ Cent
163. &163; Ł Pound (currency)
164. &164; ¤ Currency
165. &165; Ľ Yen
166. &166; Ś Broken vertical
167. &167; § Section
168. &168; ¨ Umlaut/diaeresis
169. &169; Š Copyright
170. &170; Ş Feminine
171. &171; Ť Left angle quote
172. &172; Ź Not sign
173. &173; Hyphen
174. &174; Ž Registered Trade Mark
175. &175; Ż Macron
176. &176; ° Degrees
177. &177; ą Plus/Minus
178. &178; ˛ Superscript 2
179. &179; ł Superscript 3
180. &180; ´ Acute accent
181. &181; ľ Micron
182. &182; ś Paragraph
183. &183; ˇ Middle dot
184. &184; ¸ Cedilla
185. &185; š Superscript 1
186. &186; ş Masculine
187. &187; ť Right angle quote
188. &188; ź One quarter
189. &189; ˝ One half
190. &190; ž Three quarters
191. &191; ż Inverted question mark
...
215. &215; × Multiplication

n.	&n;	Alt-n	Name
161.	&161;	Ą	Inverted exclamation
162.	&162;	˘	Cent
163.	&163;	Ł	Pound (currency)
164.	&164;	¤	Currency
165.	&165;	Ľ	Yen
166.	&166;	Ś	Broken vertical
167.	&167;	§	Section
168.	&168;	¨	Umlaut/diaeresis
169.	&169;	Š	Copyright
170.	&170;	Ş	Feminine
171.	&171;	Ť	Left angle quote
172.	&172;	Ź	Not sign
173.	&173;		Hyphen
174.	&174;	Ž	Registered Trade Mark
175.	&175;	Ż	Macron
176.	&176;	°	Degrees
177.	&177;	ą	Plus/Minus
178.	&178;	˛	Superscript 2
179.	&179;	ł	Superscript 3
180.	&180;	´	Acute accent
181.	&181;	ľ	Micron
182.	&182;	ś	Paragraph
183.	&183;	ˇ	Middle dot
184.	&184;	¸	Cedilla
185.	&185;	š	Superscript 1
186.	&186;	ş	Masculine
187.	&187;	ť	Right angle quote
188.	&188;	ź	One quarter
189.	&189;	˝	One half
190.	&190;	ž	Three quarters
191.	&191;	ż	Inverted question mark
...
215.	&215;	×	Multiplication

So, just what does Latin-1 offer the mathematical sociologist? Not much. The plus-or-minus sign ( ą ), the multiplication sign, ( × ), and one level of superscripts, provided the superscript you need happens to be 1, or 2, or 3.

Next we will consider the Symbol character set, which is widely used for printed documents, but not yet available by default on all web browsers, as Latin-1 is supposed to be. Such has been proposed by a standards committee, but as mentioned earlier software vendors don't always follow such recommendations. Nevertheless, Symbol has many characters of use to mathematical sociologists, and there are good reasons to be familiar with that character set.

References:

Raggett, Dave, Jenny Lam, and Ian Alexander. 1996. HTML 3: Electronic Publishing on the World Wide Web. Reading: Addison-Wesley. Back

author