Formatting documents with LaTeX

Introduction

Having finally learned a little about how to avoid some LaTeX problems, I thought I'd record some painful lessons here, in hopes of saving other writers some of the agony I went through. This page deals with the interface between LaTeX and the rest of the formatting system.


Caution: the details here refer to Debian Linux systems only. If you have something else, you'll find a lot of broken links.

Overview of the LaTeX environment

Let's first distinguish between the commands used by a writer, and the infrastructure that allows documents to be previewed on-screen and printed on paper. Here's a crude diagram of how it works:

simple diagram

If your input file is called ms.tex, the formatted output of the latex command is a file ms.dvi, which can be turned into a printable PostScript file ms.ps by the dvips command. This is the surface the user sees, but a lot more is involved beneath it. Let's look first at the user commands, and then at the infrastructure.

Commands

The commands used vary from system to system. The general procedure, of course, is similar everywhere; but the particular commands and filenames I'll mention refer to Debian Linux only. (Even other flavors of Linux sometimes do things differently — as I discovered to my grief.)

Formatting and fonts

You start with a text editor (usually vi or emacs), and edit a file of input for LaTeX. Run the latex command on it, and (if you are lucky) you get a *.dvi file out. But that DVI file is useless without fonts and a way to display the formatted document.

That's because TeX and LaTeX don't know anything about the characters (known as glyphs in this business) that a human being can read. Formatters only know how much space the glyphs take up on the page. All the formatter does is reserve space for things; the DVI file only tells where every character goes on the page, not what it looks like. (See the fonts page for more information about fonts.)

Post-processing

To display the formatted text in readable form, you need a post-processor (like xdvi or dvips), which puts the glyphs from a font (and the images from graphics files) where the *.dvi file says they should go on the page.

But glyphs are represented in different ways, depending on the output medium. On screen, they're rows of little dots; a computer screen is a dot-matrix printer. So you need a map of where the dots go: a bitmapped font. Screen fonts are always bitmapped.

But the standard (especially in technical publishing) for printing on paper is PostScript (PS, for short). And PostScript fonts are (usually) outline fonts; the “rasterization” of the glyphs occurs in the PS interpreter, long after the *.ps file is written.

Unfortunately, these post-processors are usually called “printer drivers” by the TeX people — a usage that conflicts with standard UNIX/Linux terminology.

Infrastructure: Fonts and kpathsea

Fonts involve another layer of complexity. There are thousands of font files in even a “vanilla” installation of TeX. Most of them are in the format used by some digital font foundry, like Adobe — not the formats needed by TeX for font metrics, or by dvips for output.

When one of these programs needs font information, it turns to a font cache stored under /usr/share/texmf/fonts/. The tfm subdirectory holds font metrics for TeX, and the pk subdirectory holds glyph data. (tfm means TeX Font Metrics; pk means “packed” binary glyph data.)

But there are so many font files available that it's impractical to have them all pre-converted to the required formats. Instead, latex, dvips and other programs look to see whether a font is in the cache when it's needed. If it isn't, they use routines from a path-searching library called kpathsea to look for the font information in some other form. If the font is found, appropriate format-conversion programs and scripts are invoked to create the font on demand, and add the result to the cache.

Here's a diagram indicating roughly how this works:

more detailed diagram

Because these programs have thousands of files available, it's inefficient to search the whole /usr/share/texmf directory tree every time something is needed. Instead, the tree is indexed, and the index is stored in the /usr/share/texmf/ls-R database. This file is created by a program named mktexlsr; it's automatically updated when a new font is added to the cache.

Or at least it should be. Often, you find that the database file isn't writable by ordinary users. Or one of the font-cache directories isn't writable when you format a file that needs a new font. Then there are problems.

The kpathsea library is also used to locate macro packages, and other files needed in formatting text. Even TeX input files like testpage.tex can be found this way. [The diagram really should have dotted lines all over the place to show the interactions, but I've left them out to keep it (relatively) simple.]

So kpathsea provides a sort of sub-infrastructure for the whole TeX formatting system. Unfortunately, it's well concealed from the user. Your only command-line access to it is the kpsewhich command, which has very inadequate documentation.

Documentation for the kpathsea library is available at /usr/share/doc/texmf/programs/kpathsea.pdf.gz and can be displayed directly with the texdoc kpathsea command. And how does the texdoc script find the right documentation file? Why, it uses kpsewhich, of course.

Complications

Configuration files

Configuration files all over the system must be set up correctly to make everything work together smoothly. Fortunately, there's texconfig, a comprehensive script that will set most of them up for you. Try it. (You can exercise it as an unprivileged user, if you like, to get the feel of it; you have to be root to change the files.) Section 2.5 of the old (1999) teTeX manual (in /usr/share/doc/texmf/tetex/TETEXDOC.pdf on Debian) — now superseded by section 2.6 of the newer (2005) version at /usr/share/doc/texlive-base-bin-doc/tetex/TETEXDOC.pdf — has more information about these files.

PDF files

These days, it's common to distribute documents in Adobe's Portable Document Format (PDF). A PDF file is a sort of modified PostScript, usually compressed. So even the *.ps file may not be the end of the line; you might need to convert it to PDF, using a command like ps2pdf. (Think of that as post-post-processing.)

Actually, it's better to convert directly to PDF by using pdflatex instead of regular latex, because there are fewer transformations to go through. (See the LaTeX to PDF page for details.)

It's best to include the actual fonts in the PDF document, so people reading it won't see missing or incorrect glyphs. Use the pdffonts program to see which fonts are actually in a PDF file.

Ghostscript

On Linux systems, anything having to do with PostScript (including PDF files) seems to interact with Ghostscript, invoked directly with the gs command, or indirectly through things like pdf2ps and epstopdf.

But Ghostscript has its own way of accessing fonts; see http://www.cs.wisc.edu/~ghost/doc/gnu/5.50/Fonts.htm for details.

Furthermore, gs usually introduces dithering artifacts in PostScript images. So you must be careful in using gs either directly or indirectly.

This brings us to the problem of including figures in documents.

Figures

Figures are like font glyphs: TeX decides where they should go, but the actual image data are separate from the *.dvi file. Only when a PostScript or PDF version of the document is made will the actual images be incorporated into the output file. Preparation of figures for inclusion in LaTeX documents is so complicated that I have a separate page devoted to this problem.

Briefly, there are two different ways to go: either convert all the figures to PostScript (actually, Encapsulated PS, or EPS), or convert them all to JPEG, PDF, and PNG. If you merely want to print the document on a PostScript printer, it's simplest to convert everything to PostScript. If you need to make a PDF file, the second way (JPEG, PDF, and PNG) will produce slightly smaller files and (usually) better image quality.

Infrastructure again

Behind the scenes, there are many other programs that get called, directly or indirectly, by the ones you use at the command line. They fall into two main groups: things used by TeX, or whose products are used by TeX; and those used by the PostScript post-processors (mainly, gs).

TeX support

Adjuncts to Tex and LaTeX are of 3 or 4 main types:

Class files
(These used to be style files, under LaTeX209.) They fix the general format of a document — article, letter, book, etc.
Macro packages
These provide special services: fancier table formats, easy inclusion of graphics, support for multiple languages in one document, access to PostScript fonts, … . If you know the name of a macro package, you can find its documentation with the texdoc command.
Font metrics
These tell where the top and bottom of each glyph have to go, and the amount of space it requires relative to its neighbors. In PostScript terms, font metrics are the bounding boxes for the character glyphs, plus some kerning hints.

Fonts are complicated, but there are good explanations in the Font HOWTO and Alan Hoenig's book TeX Unbound.

PS support

Once you have a DVI file, you must convert it to PS (or its alter ego, PDF) to see the formatted text. Even xdvi, the X-Window command to display DVI files, will use PS fonts if you have requested them (though you have to make bitmapped versions for use on the screen).

So the font glyphs must be available to post-processors like dvips and dvipdfm. As the dvi file already tells where they go on the page, the font metrics are not needed here. So glyphs and metrics are stored separately.

Note that you must have the resolution of your printer(s) available to the post-processors, if they are to produce printable files from bitmapped fonts. That means you must run texconfig and set up support for each and every printer on which you want to print formatted documents.

If you need to include PS graphic images in figures, use the graphics macro package (or, preferably, its more capable cousin, graphicx). Use  texdoc graphics  to read the local Catalogue, or  texdoc -l graphics  to find the documentation on your system.

I have pages on converting PS to EPS, on conversions between PS and raster graphics formats (such as PNM), and on converting PS to PNG.

User support

With all this complexity, users need all the help they can get. A lot is available, but it's hard to find what you need, even when you know what you're looking for.

Documentation
How do you find information about these things? There's a lot of documentation under /usr/share/doc/texmf, but it's not cleanly organized. There are many manuals, in both PDF and HTML forms, in /usr/share/doc/texlive-latex-base-doc/latex/, which is pretty well organized.

Try the teTeX manual at /usr/share/doc/texlive-base-bin-doc/tetex/TETEXDOC.pdf.
There's also a LaTeX Reference Manual.
And commands like:
texdoc <macro_name>
texdoc web2c
texdoc kpathsea
texdoc dvips

See the HTML links
/usr/share/doc/texmf/helpindex.html (an alphabetic guide) and
/usr/share/doc/texmf/index.html (a descriptive guide to the documentation).
Support
There's more available than you'll find on your system, even with the tetex-extra package (now replaced by a whole swarm of texlive packages) installed. CTAN, the Comprehensive TeX Archive Network, makes zillions of macro packages, help files, and other stuff available. Debian used to provide a stale (but local) version of the CTAN Catalogue, but now you must consult the on-line version. And don't forget Google.

 

Copyright © 2005, 2006, 2010 Andrew T. Young


Back to the . . .
LaTeX overview page

or the alphabetic index

or my website overview page