LaTeX overview

Introduction

LaTeX is a pervasive markup language these days. But there are lots of problems in getting from a LaTeX source file to a final document — usually in either PostScript (PS) or Adobe's Portable Document Format (PDF) form — that can be sent to a printer, displayed on the Web, or e-mailed to other people.

Typical flaws in the final document appear as

Notice that these have nothing to do with the difficulties of using LaTeX itself, but are due to interactions among various parts of the document-formatting system. Several other commands and utilities are necessary to get from a LaTeX source file to the final product; both font files and graphics must be incorporated in it; and all these pieces must work together to get satisfactory results.

While the individual pieces are more or less adequately documented, there's a lack of information on how they all fit together. This page introduces a discussion of these formatting problems.


NOTE: the specific recommendations made here apply only to Debian GNU/Linux systems. Some may be helpful on other versions of Linux or UNIX. Most are irrelevant to other operating systems.

Conceptual overview

LaTeX per se knows nothing about typefaces or graphics. It's merely a tool for arranging rectangular objects — the bounding boxes of characters or images — on pages. From the point of view of its layout engine, printed letter-forms (called glyphs in typesetting jargon) are merely reusable graphics to be arranged in rows (lines of type, from the user's point of view).

Both the glyphs of type fonts and the unique graphics used in illustrations come in two varieties: outline or vector graphics, in which the shapes of the printed forms are specified; and bitmapped or raster graphics, in which arrays of dots (such as drops of ink from an inkjet printer) are specified. The former are also called scalable graphics or fonts, because they can be enlarged or reduced by arbitrary factors without loss of quality. The latter are tied to the resolution of some particular output device, such as a printer or a terminal screen.

The nomenclature is a little different for font glyphs and illustrative graphics. Here's how these terms are associated:

device
independent
fixed
resolution
Fonts: outline bitmapped
Graphics: vector raster

But “bitmapped” fonts are really just rasterized graphics, and “outline” or “scalable” fonts are just vector graphics. The terminology developed differently in these cases, although the same ideas are really involved.

LaTeX normally keeps device-dependent (fixed-resolution) information separate from device-independent (size and shape) information. Vector and raster graphics (or fonts) are usually stored in different file formats, and require different utilities to manipulate them. These differences introduce some complications into processing LaTeX files; for example, some versions of the formatting program can handle only bitmapped graphics, while others use only raster graphics. A further complication is that PostScript files can use both vector and raster graphics.

Graphics problems

Keep in mind that converting vector graphics (or outline fonts) to a fixed-resolution raster (or bitmap) is an irrevocable step. There is a loss of information, due to the finite resolution of the final result. This introduces “digitizing noise” and (despite efforts to mitigate it) “aliasing noise”. While there are utilities that can approximately reconstruct vector information from bitmapped data, the reconstruction is never exact.

That means that you'd like to keep everything in vector (scalable) form as long as possible, and let the rasterizing be done only at the final (display) step in the process.

Unfortunately, this isn't always possible. You might need to include a scanned image (such as a photograph) in your document, for example. However, there are programs to convert raster images to PostScript; so you can keep everything else in scalable, vector form, if you're careful.

Of course, LaTeX doesn't care (or know) about the drawbacks of bitmaps. In fact, it originally was designed to use bitmapped fonts, and only later was rigged to allow the use of outline fonts.

Another problem is that LaTex (and its parent, TeX) was originally designed to deal primarily with text, not graphics. The inclusion of graphics is therefore something of an ugly hack, not a natural part of the language. That means that unwanted side-effects plague the use of graphics in formatted documents.

From LaTeX to whatever

Because the formatter knows only the bounding boxes of the objects laid out on the page, it normally produces a file that only tells where each glyph or graphic goes. Because this file contains no rasterizing information, it doesn't depend on the resolution of the final display device (monitor screen or printer) on which the page is ultimately seen. So this file is called a device-independent or DVI file. It's the normal output of the latex command:

		  latex
	text.tex -------> text.dvi

To get the document displayed or printed, however, the glyphs and graphics must be combined with the DVI file somehow. There are several ways to do this, depending on what kind of output is wanted.

Screen previews

To get a quick look at the formatted text, you can use xdvi to preview the result on the monitor:  xdvi text.dvi. This program uses the bitmapped fonts available to the X window system.

PostScript

You can invoke a post-processor that combines the positional information from the DVI file with the shapes from PS font-glyph files and the graphics files to make a printable PostScript file:

		  latex		    dvips
	text.tex -------> text.dvi -------> text.ps

(See my LaTeX formatting page for the details of this process.) Then, you'd use  lpr text.ps  to send the PS file to a printer.

This PS file could also be displayed on the monitor screen, using a utility like  gv.

PDF output

Now suppose you want a PDF file. You could re-process the PS file:

		  latex		    dvips	     ps2pdf
	text.tex -------> text.dvi -------> text.ps --------> text.pdf

but this is getting rather complicated. Any you'll need special options to some commands to get a usable result, because ps2pdf corrupts images if you aren't careful. If you only wanted a PDF file, and not a PS version, there are simpler ways to do that.

Another consideration is whether to embed fonts in the PDF file or not. The basic PS fonts don't need to be embedded; but if you are using mathematics or other special characters, you'd better include them, although they make the PDF file bigger. And don't forget that it's illegal to include entire copyrighted commercial fonts — though you can embed a few special characters from such a font.

And to have clean-looking characters, regardless of where they're printed or displayed, you need to embed outline rather than bitmapped fonts.

Complications

Unfortunately, regardless of what output format you need, there are complications if you want to include graphics; vector and raster graphics require differing special treatments. (Dealing with such problems is why these pages are here.)

Finally, there are conceptual difficulties. Both PostScript and LaTeX are complete programming languages, but DVI files are static descriptions of box placements. PDF files have some of the features of PostScript, but not all; they can have additional features, like hypertext links, that don't exist in LaTex, and require special treatment to produce. Conversions among file formats can lose essential information.

In general, depending on which kind(s) of final product you want, different things must be added to the LaTeX input file, and different options must be used for the various post-processors. Here's a guide to the maze, to keep you out of the blind alleys:

For problems with . . . See . . .
incomprehensible error messages LaTeX error discussion
PostScript output formatting page
PDF output LaTeX to PDF page
how to include figures (images)? figures page
making Encapsulated PostScript PS to EPS page
distorted PS figures PS conversions page
figures and tables
put in wrong places
floats page


NOTE: Some links on these pages will work only if the files are copied to a Debian GNU/Linux system. That's what I use; these pages were made primarily for my own convenience. Even if you are running Debian, the links won't work until you copy these files to your own box, because they're being served from a Sun running Solaris.

Copyright © 2005, 2006 Andrew T. Young


Back to the . . .
alphabetic index page or the

website overview page