Image Files in Web Publishing

By David D. McFarland

The present document covers selected aspects of graphics for Web publishing which have arisen in my own work. It deals with a variety of topics, from special twists arising in publishing Mathematical Sociology and other technical materials on the Web, to pointers for students who are Web publishing beginners. However, it is not intended to be, and most certainly is not, comprehensive; and in the topics it does touch upon, it skims over many subtle aspects that have not become especially salient in my work. Readers wishing further information are referred in particular to books by Kay and Levine, and by Weinman.

In-line versus Optional Images on Web Pages.

Web browsers have two built-in capabilities for displaying images:

An in-line image is displayed automatically when the Web browser loads the HTML file that links to it. The link uses an image tag <img src="path/filename.ext"> filled in with the actual file information.
An optional image is displayed only if the person reading the Web page clicks on the link to it. The link uses an anchor tag <a href="path/filename.ext"> filled in with the actual file information.

In both cases, the image itself is in a file separate from the HTML file that links to it.

Vector vs. Image Graphics

Computer Graphics has two major forms:

Image graphics (also called bitmap or raster graphics) operates at the level of pixels, which are displayed as dots of colored light on the computer screen, or dots of ink on paper.
Image graphics is created in cameras or scanners, and is created and edited in paint and photoediting software, such as Windows Paint, MacPaint, Paintshop Pro, Corel PhotoPaint, or Adobe Photoshop. Common image file formats include TIFF, GIF, and JPEG.
Any program which outputs to the screen in graphics mode can also be thought of as creating image graphics, since screen images can be captured and saved in image files, which can then be posted on the World Wide Web. This observation is useful when one wishes to publish on the Web things the Web wasn't designed to publish, including mathematical expressions.
Vector graphics operates at the level of geometric figures such as lines and curves on an abstract page, independent of the characteristics of the device(s) on which the page will ultimately be displayed.
Vector graphics is created in illustration software, such as Adobe Illustrator or Corel Draw, or may be created as output from such other sources as statistical software. A common format in which to express vector graphics is provided in PostScript, as one part of that entire page description language.
Vector graphics can be displayed directly using pen plotters (if one has access to such devices), but not using printers or computer screens. To display vector graphics on the latter, or on the Web, it must first be rasterized into image graphics. However, vector graphics often plays an important part in the earlier stages of graphics production, when device independence is important.
Graphics metafiles are multi-part files that can have some parts in image formats, other parts in vector formats.

The remainder of this document is about image graphics only.

Image File Formats

There are literally dozens of different formats for image data files, but most creators of Web pages can safely ignore all but a couple of them. (See the book by Kay and Levine for information about any of the others, if needed, and for further details about these.)

GIF. Graphics Interchange Format, created in the 1980s by CompuServe, and publicly documented, is probably the most widely supported graphics image format. In particular, GIF image files can be displayed by all graphics mode Web browsers. The compression method it uses is lossless, i.e., the original image could be exactly reconstructed from the compressed version.
JPEG. Joint Picture Experts Group's JFIF file interchange format, specifically designed for color photographs, is the other Web browser image format. It uses a lossy compression method, so some detail is lost whenever an image is saved in this format.
TIFF. The Tag Image File Format can not be displayed by Web browsers. Designed by the desktop publishing pioneer firm Aldus (more recently merged into Adobe), jointly with Microsoft, it has multiple application-specific variants. Although a desirable feature from some perspectives, that makes it poorly suited for the Web, which aims at device independence. But because of its widespread adoption in the desktop publishing arena, it is likely to be encountered by a Web author, perhaps as the default format of the software that runs a scanner.
PNG. Maybe in the future Portable Network Graphics format will become important, but it hasn't happened yet, and many browsers out there can't read the PNG format. It was created in 1995, as a new public domain format, by a group of graphics developers who objected when someone at Unisys belatedly tried to start collecting royalties for the patent on the LZW compression algorithm, which is used in GIF. This is discussed in the book by Raggett, pages 330-333, but it remains to be seen whether his predictions regarding PNG are more accurate than his predictions regarding the HTML 3.0 <MATH> capability. For now, PNG seems to be an improvement over GIF in all respects but one, the ability of prospective audiences to display the files.

Which format(s) to use? Here are some suggestions.

First, notice that it may be advantageous to store a particular image in different formats during different stages of the process, depending on the circumstances of its creation, and keeping in mind the possible need for future revision.
Images which ended up as GIF files on my Web site have at earlier stages been in a variety of image (or, in some cases, vector or text) formats: The images in my Web page about the Symbol Character Set were originally in a PS PostScript file, which I created in order to use a PostScript interpreter to render the special characters in the Symbol font. Those in my page about the Keyboard Character Set originated in TEX and passed through DVI format before becoming GIF, because I was using TeX software to produce special characters and formatting there. Images I myself scanned were initially PCX or TIFF, the native formats of the two scanners I used. And photographs scanned at a Konica photo lab were initially in KQP format, the only option available there.
Consider the number of different colors in the image to be created for display on the Web. A data graph, for example, may contain only a handful of different colors, and be designed to make them visually distinct from one another. A color photograph, on the other hand, may have hundreds or thousands of different colors, with color transitions by fine gradations rather than at visually distinct boundaries. Because of the differences in their compression schemes, JPEG is better for items with fine color gradations, but GIF is better for items with only a few distinct colors (see elaborations in the next three items).
JPEG was designed for color photographs, with their fine color gradations. A photo is represented as a set of pixels in 24 bit color. That means that each pixel is represented by 3 bytes, one byte (8 bits) for each of the additive primary colors, red, blue, and green. The value stored in one of those bytes specifies one of 256 (= 2 to the power 8) levels of that primary color, and the three values together specify which of over 16 million (= 256 x 256 x 256) possible colors that pixel will be. Each pixel has over 16 million possible colors, regardless of the colors of any other pixels in the same image. This provides color gradations as fine as most humans can detect.
All that color detail makes for huge files, which can be reduced somewhat by compression. JPEG compression is lossy, in that some of the data are lost, or rather discarded; it would be impossible to exactly reconstruct the original image from the compressed version. JPEG compression aims to discard the detail that will be least noticed in comparisons of original and compressed images. Basically this compression method assigns identical codes to "similar" colors, with greater compression coming from looser definition of "similar". It can be set for various levels of "lossiness", and one should visually inspect the results, to decide just how far the image can be compressed without excessive image degradation.
Since JPEG is lossy, it is important to save the original image in a lossless format, say as a TIFF file, and to go back to the lossless version if subsequent revisions are to be made. A lossy copy of the revised original will be better than a lossy copy of the revision of an image that is itself a lossy copy. Stated differently, each time a JPEG file is reloaded and resaved, the image is degraded further, a situation to be avoided. Do not revise a JPEG; discard it. Revise the TIFF from which the original JPEG was converted, save the revised TIFF, and convert a copy of the revised TIFF into a new JPEG.
GIF uses a palette of 2, 16, or 256 different colors, and before compression, each pixel is represented by 1, 4, or 8 bits, specifying which of the 2, 16, or 256 palette colors applies to that pixel. GIF allows no more than 256 different colors in the same image, although the 256 colors in the palette may be chosen as any 256 out of the 16.7 million available in 24 bit color.
Compression in GIF format follows the LZW algorithm, which looks for repeated patterns, such as the same color often being repeated in a string of adjacent pixels. In a data graph, adjacent pixels are often part of the same bar or line, hence the same color, and this kind of repetition is the sort of pattern on which LZW compression thrives. It is a lossless compression scheme, in that the original could be reconstructed exactly from the compressed version. It relies on replacing a lengthy but repeated pattern with a code which is short, but which is uniquely identified with the lengthy pattern so that the latter can be reconstructed exactly.
Currently I use GIF files on my Web site, even for photos. I initially had used JPG for photos, but some colleagues saw only gray boxes instead of my photos. And these were colleagues with high powered workstations, not starving students or Luddites running computer gear from yesteryear. So I switched to GIFs, which they displayed ok. A photo does give better appearance for smaller file size in JPG rather than GIF format, but that's little consolation to people who can't display it at all.

Graphics Image Software

What software is needed for creation and editing of graphics images, or converting among different file formats?

Graphics arts professionals, who earn their livelihoods this way, spend several hundred dollars for Adobe Photoshop, which I occasionally use in computer labs, but do not have on my own computers at either office or home.

Paintshop Pro seems to be well-regarded by its users who post to usenet groups. I have not used it myself, but do believe it is the leader at the budget end of the market.

I advise students to use whatever software was bundled with the computers they have access to, and whatever is on the machines in the computer labs, to get a better idea of what features they already have available, and what features they want but lack, before making any major investment.

I'm not sure choice of image editing software makes much difference for someone who, like me, works mainly in text, and only occasionally gets very involved in image data. The basic things, at least, can be done in various programs, and without spending much money on software.

One program I actually use at home is PhotoFinish 3.0, a Windows-based sibling of PC-Paintbrush, created by the now defunct ZSoft. It came as a freebie that was bundled with a hand scanner I bought a few years ago, and runs on MS-Windows 3.11. It lacks the ability to deal with very large files, and lacks some of the fancier features, such as ability to create transparent or animated GIFs. But it is able to acquire an image from a scanner or load it from a file in any of several formats; crop it, touch it up; and save it in GIF format.

MS-Windows itself includes a Paint accessory, but that has a fatal flaw for our purposes: it is unable to save in either GIF or JPEG format.

What about software for flattening color depth, or for format conversion?

Often it is unnecessary. Graphics editing software commonly includes those capabilities. Color depth may be a menu item. And a file conversion may involve the steps: load it, then "Save As..." a different format. Or it might actually be necessary to read the manual.

There is also some software useful for graphics format conversion, but incapable of full-fledged image editing. Lview is primarily for viewing, rather than editing, but it handles BMP and TGA formats, as well as JPG and GIF, and can open a file in one and save as a different format.

Also, in MS-Windows 3.11, the operating system itself can help get things from odd formats into GIF. Anything that can be displayed on the screen can be captured to the Clipboard using the PrtSc key. Then it can be pasted into Paintshop Pro or whatever, and saved as GIF.

Reducing File Size and Download Time

One joke, not without justification, alleges that "WWW" stands for "World Wide Wait". And the most commonly awaited event is completion of a graphics image download.

Graphics image files are relatively large, at least compared to HTML files and other text files. A vga screen filled with text takes roughly 2 kilobytes, in sharp contrast to the roughly 900 kilobytes taken by the same screen filled with a true color image. (The calculations are: 80 columns x 25 rows x 1 byte, versus 640 columns x 480 rows x 3 bytes, without compression in both cases.)

A message posted on a Web site will not achieve its purpose if it takes so long to download that the members of its intended audience give up, and point their browsers elsewhere. Here are several suggestions for ways to reduce image file size, and the corresponding download time.

McFarland urges that, contrary to most of what you see elsewhere on the Web, you should use graphics primarily for content, not decoration. That, alone, would greatly reduce the downloading time and file storage requirements of most Web sites. Repeat for emphasis:
Content, Not Decoration!
Crop. In a photo of a person, for example, do you really need to show the shoes, or would a hip and higher cropped version serve your purposes as well? How about just the face?
Flatten Color Depth. 24 bit color is overkill for the vast majority of photographs, and for virtually all non-photographic artwork. 8 bit provides 256 colors, and requires only about one third as much file space or download time. Make a copy at 8 bit color, and try it on the Web. (But do keep a 24 bit version of the color photograph, in case you also want to print it. Color depth is more important for printing than for display on a monitor, because of its possible use in dithering as well as in direct display.)
Try to count the number of distinct colors in the graphic. A bar graph, for example, might only contain a handful of colors, in which case it could be represented perfectly well as a GIF file with 4 bit color depth, which provides for up to 16 colors.
Go to Grayscale or Black and White. That's right: gray and dull, like Professor McFarland. 6 bits would provide 64 levels of gray, about as fine a gradation as most people can visually detect. GIF options include 4 bits, which provides 16 levels. And, of course, 1 bit, which provides just black and white. Does the color convey an important part of the graphic's message, or is it mere decoration?
Reduce Detail and Image Size. Would a 150 x 150 pixel image (about 2" square on the screen) serve your purposes as well as the 300 x 300 pixel image? If so, it would reduce file size and download time by about 75%.
Use Thumbnails. A small, fast-downloading thumbnail version of a photo may be used as an inline link to a more detailed, but slower, optional version of the same picture.
Use interlaced image files, which first display every 8th row of pixels, then every 4th row, etc., until all rows are displayed. This does not reduce the total download time, but does significantly reduce the amount of time before the reader can see that something is happening.
Study Weinman's book for details and further suggestions.

References

Kay, David C., and John R. Levine. 1994. Graphics File Formats. Second edition. Blue Ridge Summit, PA: Windcrest/McGraw-Hill.

Raggett, Dave, Jenny Lam, and Ian Alexander. 1996. HTML 3: Electronic Publishing on the World Wide Web. Reading: Addison- Wesley.

Weinman, Lynda. 1997. Designing Web Graphics.2 (sic). Second edition. Indianapolis: New Riders (Macmillan Computer Publishing).