LaTeX is a pervasive markup language these days. But there are lots of problems in getting from a LaTeX source file to a final document — usually in either PostScript (PS) or Adobe's Portable Document Format (PDF) form — that can be sent to a printer, displayed on the Web, or e-mailed to other people.
Typical flaws in the final document appear as
While the individual pieces are more or less adequately documented, there's a lack of information on how they all fit together. This page introduces a discussion of these formatting problems.
LaTeX per se knows nothing about typefaces or graphics. It's merely a tool for arranging rectangular objects — the bounding boxes of characters or images — on pages. From the point of view of its layout engine, printed letter-forms (called glyphs in typesetting jargon) are merely reusable graphics to be arranged in rows (lines of type, from the user's point of view).
Both the glyphs of type fonts and the unique graphics used in illustrations come in two varieties: outline or vector graphics, in which the shapes of the printed forms are specified; and bitmapped or raster graphics, in which arrays of dots (such as drops of ink from an inkjet printer) are specified. The former are also called scalable graphics or fonts, because they can be enlarged or reduced by arbitrary factors without loss of quality. The latter are tied to the resolution of some particular output device, such as a printer or a terminal screen.
The nomenclature is a little different for font glyphs and illustrative graphics. Here's how these terms are associated:
But “bitmapped” fonts are really just rasterized graphics, and “outline” or “scalable” fonts are just vector graphics. The terminology developed differently in these cases, although the same ideas are really involved.
LaTeX normally keeps device-dependent (fixed-resolution) information separate from device-independent (size and shape) information. Vector and raster graphics (or fonts) are usually stored in different file formats, and require different utilities to manipulate them. These differences introduce some complications into processing LaTeX files; for example, some versions of the formatting program can handle only bitmapped graphics, while others use only raster graphics. A further complication is that PostScript files can use both vector and raster graphics.
Keep in mind that converting vector graphics (or outline fonts) to a fixed-resolution raster (or bitmap) is an irrevocable step. There is a loss of information, due to the finite resolution of the final result. This introduces “digitizing noise” and (despite efforts to mitigate it) “aliasing noise”. While there are utilities that can approximately reconstruct vector information from bitmapped data, the reconstruction is never exact.
That means that you'd like to keep everything in vector (scalable) form as long as possible, and let the rasterizing be done only at the final (display) step in the process.
Unfortunately, this isn't always possible. You might need to include a scanned image (such as a photograph) in your document, for example. However, there are programs to convert raster images to PostScript; so you can keep everything else in scalable, vector form, if you're careful.
Of course, LaTeX doesn't care (or know) about the drawbacks of bitmaps. In fact, it originally was designed to use bitmapped fonts, and only later was rigged to allow the use of outline fonts.
Another problem is that LaTex (and its parent, TeX) was originally designed to deal primarily with text, not graphics. The inclusion of graphics is therefore something of an ugly hack, not a natural part of the language. That means that unwanted side-effects plague the use of graphics in formatted documents.
Because the formatter knows only the bounding boxes of the objects laid out on the page, it normally produces a file that only tells where each glyph or graphic goes. Because this file contains no rasterizing information, it doesn't depend on the resolution of the final display device (monitor screen or printer) on which the page is ultimately seen. So this file is called a device-independent or DVI file. It's the normal output of the latex command:
latex text.tex -------> text.dvi
To get the document displayed or printed, however, the glyphs and graphics must be combined with the DVI file somehow. There are several ways to do this, depending on what kind of output is wanted.
latex dvips text.tex -------> text.dvi -------> text.ps
(See my LaTeX formatting page for the details of this process.) Then, you'd use lpr text.ps to send the PS file to a printer.
This PS file could also be displayed on the monitor screen, using a utility like gv.
Now suppose you want a PDF file. You could re-process the PS file:
latex dvips ps2pdf text.tex -------> text.dvi -------> text.ps --------> text.pdf
but this is getting rather complicated. Any you'll need special options to some commands to get a usable result, because ps2pdf corrupts images if you aren't careful. If you only wanted a PDF file, and not a PS version, there are simpler ways to do that.
Another consideration is whether to embed fonts in the PDF file or not. The basic PS fonts don't need to be embedded; but if you are using mathematics or other special characters, you'd better include them, although they make the PDF file bigger. And don't forget that it's illegal to include entire copyrighted commercial fonts — though you can embed a few special characters from such a font.
And to have clean-looking characters, regardless of where they're printed or displayed, you need to embed outline rather than bitmapped fonts.
Unfortunately, regardless of what output format you need, there are complications if you want to include graphics; vector and raster graphics require differing special treatments. (Dealing with such problems is why these pages are here.)
Finally, there are conceptual difficulties. Both PostScript and LaTeX are complete programming languages, but DVI files are static descriptions of box placements. PDF files have some of the features of PostScript, but not all; they can have additional features, like hypertext links, that don't exist in LaTex, and require special treatment to produce. Conversions among file formats can lose essential information.
In general, depending on which kind(s) of final product you want, different things must be added to the LaTeX input file, and different options must be used for the various post-processors. Here's a guide to the maze, to keep you out of the blind alleys:
|For problems with . . .||See . . .|
|incomprehensible error messages||LaTeX error discussion|
|PostScript output||formatting page|
|PDF output||LaTeX to PDF page|
|how to include figures (images)?||figures page|
|making Encapsulated PostScript||PS to EPS page|
|distorted PS figures||PS conversions page|
|figures and tables
put in wrong places
Copyright © 2005, 2006 Andrew T. Young
website overview page