Image Files in Web Publishing
By
David D. McFarland
The present document covers selected aspects of graphics for Web
publishing which have arisen in my own work. It deals with a
variety of topics, from special twists arising in publishing
Mathematical Sociology and other technical materials on the Web,
to pointers for students who are Web publishing beginners.
However, it is not intended to be, and most certainly is not,
comprehensive; and in the topics it does touch upon, it skims
over many subtle aspects that have not become especially salient
in my work. Readers wishing further information are referred in
particular to books by Kay and Levine, and by Weinman.
In-line versus Optional Images on Web Pages.
Web browsers have two built-in capabilities for displaying images:
- An in-line image is displayed automatically when the
Web browser loads the HTML file that links to it. The link uses an
image tag <img src="path/filename.ext"> filled
in with the actual file information.
- An optional image is displayed only if the person
reading the Web page clicks on the link to it. The link uses
an anchor
tag <a href="path/filename.ext"> filled in with the actual
file information.
In both cases, the image itself is in a file separate from the
HTML file that links to it.
Vector vs. Image Graphics
Computer Graphics has two major forms:
- Image graphics (also called bitmap or
raster graphics) operates at the level of pixels,
which are displayed as dots of colored light on the computer
screen, or dots of ink on paper.
Image graphics is created in cameras or scanners, and is created
and edited in paint and photoediting software, such as
Windows Paint, MacPaint, Paintshop Pro, Corel PhotoPaint, or
Adobe Photoshop. Common image file formats include TIFF, GIF, and
JPEG.
Any program which outputs to the screen in graphics mode can also
be thought of as creating image graphics, since screen images can
be captured and saved in image files, which can then be posted on
the World Wide Web. This observation is useful when one wishes to
publish on the Web things the Web wasn't designed to publish,
including mathematical expressions.
- Vector graphics operates at the level of
geometric figures such as lines and curves on an abstract page,
independent of the characteristics of the device(s) on which the
page will ultimately be displayed.
Vector graphics is created in illustration software, such
as Adobe Illustrator or Corel Draw, or may be created as output
from such other sources as statistical software. A common format
in which to express vector graphics is provided in PostScript, as
one part of that entire page description language.
Vector graphics can be displayed directly using pen plotters (if
one has access to such devices), but not using printers or
computer screens. To display vector graphics on the latter, or on
the Web, it must first be rasterized into image graphics.
However, vector graphics often plays an important part in the
earlier stages of graphics production, when device
independence is important.
- Graphics metafiles are multi-part files that
can have some parts in image formats, other parts in vector
formats.
The remainder of this document is about image graphics
only.
Image File Formats
There are literally dozens of different formats for image data
files, but most creators of Web pages can safely ignore all but a
couple of them. (See the book by Kay and Levine for information
about any of the others, if needed, and for further details about
these.)
- GIF. Graphics Interchange Format, created in the 1980s by
CompuServe, and publicly documented, is probably the most widely
supported graphics image format. In particular, GIF image files
can be displayed by all graphics mode Web browsers. The
compression method it uses is lossless, i.e., the original
image could be exactly reconstructed from the compressed version.
- JPEG. Joint Picture Experts Group's JFIF file
interchange format, specifically designed for color photographs,
is the other Web browser image format. It uses a lossy
compression method, so some detail is lost whenever an image is
saved in this format.
- TIFF. The Tag Image File Format can not be
displayed by Web browsers. Designed by the desktop publishing
pioneer firm Aldus (more recently merged into Adobe), jointly
with Microsoft, it has multiple application-specific variants.
Although a desirable feature from some perspectives, that makes
it poorly suited for the Web, which aims at device independence.
But because of its widespread adoption in the desktop publishing
arena, it is likely to be encountered by a Web author, perhaps as
the default format of the software that runs a scanner.
- PNG. Maybe in the future Portable Network Graphics
format will become important, but it hasn't happened yet, and
many browsers out there can't read the PNG format. It was created
in 1995, as a new public domain format, by a group of graphics
developers who objected when someone at Unisys belatedly tried to
start collecting royalties for the patent on the LZW compression
algorithm, which is used in GIF. This is discussed in the book by
Raggett, pages 330-333, but it remains to be seen whether his
predictions regarding PNG are more accurate than his predictions
regarding the HTML 3.0 <MATH> capability. For now, PNG
seems to be an improvement over GIF in all respects but one, the
ability of prospective audiences to display the files.
Which format(s) to use? Here are some suggestions.
- First, notice that it may be advantageous to store a
particular image in different formats during different
stages of the process, depending on the circumstances of its
creation, and keeping in mind the possible need for future
revision.
Images which ended up as GIF files on my Web site have at earlier
stages been in a variety of image (or, in some cases, vector or
text) formats: The images in my Web page about the Symbol
Character Set were originally in a PS PostScript file, which I
created in order to use a PostScript interpreter to render the
special characters in the Symbol font. Those in my page about the
Keyboard Character Set originated in TEX and passed through DVI
format before becoming GIF, because I was using TeX software to
produce special characters and formatting there. Images I myself
scanned were initially PCX or TIFF, the native formats of the two
scanners I used. And photographs scanned at a Konica photo lab
were initially in KQP format, the only option available there.
- Consider the number of different colors in the image to be
created for display on the Web. A data graph, for example, may
contain only a handful of different colors, and be designed to
make them visually distinct from one another. A color photograph,
on the other hand, may have hundreds or thousands of different
colors, with color transitions by fine gradations rather than at
visually distinct boundaries. Because of the differences in their
compression schemes, JPEG is better for items with fine color
gradations, but GIF is better for items with only a few distinct
colors (see elaborations in the next three items).
- JPEG was designed for color photographs, with their fine
color gradations. A photo is represented as a set of pixels in 24
bit color. That means that each pixel is represented by 3 bytes,
one byte (8 bits) for each of the additive primary colors, red,
blue, and green. The value stored in one of those bytes specifies
one of 256 (= 2 to the power 8) levels of that primary color, and
the three values together specify which of over 16 million (= 256
x 256 x 256) possible colors that pixel will be. Each pixel has
over 16 million possible colors, regardless of the colors of any
other pixels in the same image. This provides color gradations
as fine as most humans can detect.
All that color detail makes for huge files, which can be reduced
somewhat by compression. JPEG compression is lossy, in
that some of the data are lost, or rather discarded; it would be
impossible to exactly reconstruct the original image from the
compressed version. JPEG compression aims to discard the detail
that will be least noticed in comparisons of original and
compressed images. Basically this compression method assigns
identical codes to "similar" colors, with greater compression
coming from looser definition of "similar". It can be set for
various levels of "lossiness", and one should visually inspect
the results, to decide just how far the image can be compressed
without excessive image degradation.
- Since JPEG is lossy, it is important to save the
original image in a lossless format, say as a TIFF file, and to go back
to the lossless version if subsequent revisions are to be made. A
lossy copy of the revised original will be better than a lossy
copy of the revision of an image that is itself a lossy copy.
Stated differently, each time a JPEG file is reloaded and
resaved, the image is degraded further, a situation to be
avoided. Do not revise a JPEG; discard it. Revise the TIFF
from which the original JPEG was converted, save the revised
TIFF, and convert a copy of the revised TIFF into a new JPEG.
- GIF uses a palette of 2, 16, or 256 different colors, and
before compression, each pixel is represented by 1, 4, or 8 bits,
specifying which of the 2, 16, or 256 palette colors applies to
that pixel. GIF allows no more than 256 different colors in the
same image, although the 256 colors in the palette may be chosen
as any 256 out of the 16.7 million available in 24 bit color.
Compression in GIF format follows the LZW algorithm, which looks
for repeated patterns, such as the same color often being
repeated in a string of adjacent pixels. In a data graph,
adjacent pixels are often part of the same bar or line, hence the
same color, and this kind of repetition is the sort of pattern on
which LZW compression thrives. It is a lossless
compression scheme, in that the original could be reconstructed
exactly from the compressed version. It relies on replacing a
lengthy but repeated pattern with a code which is short, but
which is uniquely identified with the lengthy pattern so that the
latter can be reconstructed exactly.
- Currently I use GIF files on my Web site, even for photos. I
initially had used JPG for photos, but some colleagues saw only
gray boxes instead of my photos. And these were colleagues with
high powered workstations, not starving students or Luddites
running computer gear from yesteryear. So I switched to GIFs,
which they displayed ok. A photo does give better appearance for
smaller file size in JPG rather than GIF format, but that's
little consolation to people who can't display it at all.
Graphics Image Software
What software is needed for creation and editing of graphics
images, or converting among different file formats?
Graphics arts professionals, who earn their livelihoods this way,
spend several hundred dollars for Adobe Photoshop, which I
occasionally use in computer labs, but do not have on my own
computers at either office or home.
Paintshop Pro seems to be well-regarded by its users who post to
usenet groups. I have not used it myself, but do believe it is
the leader at the budget end of the market.
I advise students to use whatever software was bundled with the
computers they have access to, and whatever is on the machines in
the computer labs, to get a better idea of what features they
already have available, and what features they want but lack,
before making any major investment.
I'm not sure choice of image editing software makes much
difference for someone who, like me, works mainly in text, and
only occasionally gets very involved in image data. The basic
things, at least, can be done in various programs, and without
spending much money on software.
One program I actually use at home is PhotoFinish 3.0, a
Windows-based sibling of PC-Paintbrush, created by the now
defunct ZSoft. It came as a freebie that was bundled with a hand
scanner I bought a few years ago, and runs on MS-Windows 3.11. It
lacks the ability to deal with very large files, and lacks some
of the fancier features, such as ability to create transparent or
animated GIFs. But it is able to acquire an image from a scanner
or load it from a file in any of several formats; crop it, touch
it up; and save it in GIF format.
MS-Windows itself includes a Paint accessory, but that has a
fatal flaw for our purposes: it is unable to save in either GIF
or JPEG format.
What about software for flattening color depth, or for format
conversion?
Often it is unnecessary. Graphics editing software commonly
includes those capabilities. Color depth may be a menu item. And
a file conversion may involve the steps: load it, then "Save
As..." a different format. Or it might actually be necessary to
read the manual.
There is also some software useful for graphics format conversion,
but incapable of full-fledged image editing. Lview is primarily
for viewing, rather than editing, but it handles BMP and TGA
formats, as well as JPG and GIF, and can open a file in one and
save as a different format.
Also, in MS-Windows 3.11, the operating system itself can help
get things from odd formats into GIF. Anything that can be
displayed on the screen can be captured to the Clipboard using
the PrtSc key. Then it can be pasted into Paintshop Pro or
whatever, and saved as GIF.
Reducing File Size and Download Time
One joke, not without justification, alleges that "WWW" stands
for "World Wide Wait". And the most commonly awaited event is
completion of a graphics image download.
Graphics image files are relatively large, at least compared to
HTML files and other text files. A vga screen filled with text
takes roughly 2 kilobytes, in sharp contrast to the roughly 900
kilobytes taken by the same screen filled with a true color
image. (The calculations are: 80 columns x 25 rows x 1 byte,
versus 640 columns x 480 rows x 3 bytes, without compression in
both cases.)
A message posted on a Web site will not achieve its purpose if it
takes so long to download that the members of its intended
audience give up, and point their browsers elsewhere. Here are
several suggestions for ways to reduce image file size, and the
corresponding download time.
- McFarland urges that, contrary to most of what you see
elsewhere on the Web, you should use graphics primarily for
content, not decoration. That, alone, would greatly
reduce the downloading time and file storage requirements of most
Web sites. Repeat for emphasis:
- Content, Not Decoration!
- Crop. In a photo of a person, for example, do you really
need to show the shoes, or would a hip and higher cropped version
serve your purposes as well? How about just the face?
- Flatten Color Depth. 24 bit color is overkill for the vast
majority of photographs, and for virtually all non-photographic
artwork. 8 bit provides 256 colors, and requires only about one
third as much file space or download time. Make a copy at 8 bit
color, and try it on the Web. (But do keep a 24 bit version of
the color photograph, in case you also want to print it. Color
depth is more important for printing than for display on a
monitor, because of its possible use in dithering as well as in
direct display.)
Try to count the number of distinct colors in the graphic. A bar
graph, for example, might only contain a handful of colors, in
which case it could be represented perfectly well as a GIF file
with 4 bit color depth, which provides for up to 16 colors.
- Go to Grayscale or Black and White. That's right: gray and
dull, like Professor McFarland. 6 bits would provide 64 levels of
gray, about as fine a gradation as most people can visually
detect. GIF options include 4 bits, which provides 16 levels.
And, of course, 1 bit, which provides just black and white. Does
the color convey an important part of the graphic's message, or
is it mere decoration?
- Reduce Detail and Image Size. Would a 150 x 150 pixel image
(about 2" square on the screen) serve your purposes as well as
the 300 x 300 pixel image? If so, it would reduce file size and
download time by about 75%.
- Use Thumbnails. A small, fast-downloading thumbnail version
of a photo may be used as an inline link to a more detailed, but
slower, optional version of the same picture.
- Use interlaced image files, which first display every
8th row of pixels, then every 4th row, etc., until all rows are
displayed. This does not reduce the total download time, but
does significantly reduce the amount of time before the reader
can see that something is happening.
- Study Weinman's book for details and further suggestions.
References
Kay, David C., and John R. Levine. 1994. Graphics File
Formats. Second edition. Blue Ridge Summit, PA:
Windcrest/McGraw-Hill.
Raggett, Dave, Jenny Lam, and Ian Alexander. 1996. HTML 3:
Electronic Publishing on the World Wide Web. Reading: Addison-
Wesley.
Weinman, Lynda. 1997. Designing Web Graphics.2 (sic).
Second edition. Indianapolis: New Riders (Macmillan Computer
Publishing).