PostScript file conversions

Introduction

The two main paths from LaTeX source to a PDF file require you to either convert all your images and graphics to Encapsulated PostScript (EPS), or convert them all from PostScript to something else. You can't intermix the two kinds of files. This means you need to be able to convert accurately between PS and non-PS graphics.

Another reason for wanting to convert back and forth is to use PostScript's nice fonts to annotate raster images and photographs that start out in some other graphics format. The idea is to convert the file to PS; add the annotations; then convert back to whatever you need. Once again, reliable conversions into and out of PostScript are necessary.

Although we think of PostScript mainly as a vector graphics language, it does support rasterized images. So these conversions are, in principle, quite possible. But in practice, they turn out to be awfully tricky to execute cleanly.

Problems

These conversions aren't as easy as using the netpbm package to convert among other graphics formats. For example, the otherwise very versatile anytopnm script can't handle PostScript. And, though there are programs called pnmtops and pstopnm, they have incompatible defaults and options, so that you can't just say

pnmtops image.pnm > image.ps
and then
pstopnm image.ps > image.pnm

and get the original image back again. (Indeed, that last command line will produce an empty image.pmn file, because pstopnm will write a file named image001.pnm.)

A major conflict between these two commands is that pnmtops generates a PostScript file that uses the setpagedevice command, which implicitly invokes initgraphics and so defeats the attempts of pstopnm to center and scale the image correctly. (This situation is mentioned, though rather indirectly, at the end of the pstopnm man page.) So you must ensure that the ignored transformation would actually have done nothing, if it had been executed.

Instead of using this troublesome pair of commands, you might suppose that you could use convert or the gimp to read one format and write the other. You can; but you usually find your image resized, and (often) mangled in the process. These programs actually use gs to interpret PostScript; so problems come from the arcane behavior of gs, and the difficulty of controlling it indirectly.

Conceptual difficulties

Which way is up?

One obvious source of confusion between PostScript and the PBM family of raster images is that the origin of coordinates is at the top of a PNM image, but the origin of a PostScript raster image (or page) is at its bottom.

However, this turns out not to be a serious problem. On the one hand, the conversion commands handle this directional problem automatically; on the other, even raw PostScript code has a simple way to invert an image (by changing the sign of one or more elements in the transformation matrix). Unless you delve into the actual PostScript code of a page, this reversal of the positive y direction isn't apparent.

Pixels and points

There is another, less obvious, difficulty in such conversions. The image pixels in raster images are usually addressed by (dimensionless) row and column numbers. But in PostScript, the coordinates in raster images, like everything else, are expressed in points, which have dimensions: 1 pt = 1/72 inch. That means you have to keep track of units in converting to or from PostScript.

Another way of looking at the problem is this: although PostScript is device-independent as long as it's setting type and rendering vector graphics, it becomes device-dependent (or at least, resolution-dependent) when dealing with raster images. This fact is emphasized in section 6.1.1 of the PostScript Language Reference manual, which says

The distinction between document generation and document rendering is essential ….

From this point of view, PostScript's natural resolution is 72 dpi. But the “device resolution” of an X-Window screen is 100 dpi. Most laser printers are 300, 600, or 1200 dpi. These numbers are incompatible; you can't fit an integral number of 72-dpi pixels into any of them.

Ghostscript complications

Worse yet, Ghostscript, which is invoked by the gs command, wants to render color and grayscale images at 60 image pixels per inch, using Floyd-Steinberg dithering. (That's because early color inkjet printers used a dot spacing of 360 dots per inch; this choice gave a dithered halftone spot of 6×6 or 36 ink drops.)

Furthermore, the various programs and scripts that use gs to do the actual work on PostScript files have different default resolutions. The Gimp, for example, uses 100 dpi on the X-Window system. The default is 300 dpi in pnmtops. And gs defaults to the 72 dpi natural to PostScript (i.e., one pixel per point).

You'd think this could be handled by setting a convenient resolution (or other options) in the GS_OPTIONS environmental variable. But some of the scripts and programs that invoke gs override this on the gs command line.

Pixels vs. paper

Another consequence of PostScript's physical units is that the default PostScript output medium is a page of paper, either American “letter” size or European A4. And many programs that produce PostScript output are used to send output to a PostScript-compatible laser or inkjet printer. So most of the programs that write PostScript want to center your raster image on a sheet of paper.

This centering can add unwanted white margins to an image converted from PS to rows and columns of pixels, or trim off parts of the image that fall outside the imaginary sheet of paper. There are ways of coaxing most of the gs-using programs and scripts to do what you want. But it usually isn't their default behavior.

An Example: from graphics to PostScript

simulation of green rims from 0 to 5 degrees altitude As an example of what can go wrong, and how to avoid it, consider the simulation of the green limb at various altitudes, depicted on the rim-simulation page.

Here's the composite image of the upper green rim of the low Sun at several different altitudes near the horizon. This image is the final result, annotated with altitudes (at the left) and a scale bar in the upper right. These details were added to the PostScript version of an original PNM file, which is shown below.

Starting form of the image

This is the original figure. Though converted to a PNG file for compactness, it shows exactly what the image looked like before conversion to PostScript. (Remember that PNG uses lossless compression; it preserves all the image detail.)

Note the smoothness of the upper limb in each sub-panel. Also, notice that the bottom edge of each sub-image is a sharp discontinuity. These details may not be obvious to the eye at normal screen-viewing distances, so I've enlarged a small part of the image.

In the enlargement below, you can see the individual pixels. It's just the upper left corner of the image to the left, magnified 4 times. Apart from the finite resolution, the image structure is quite smooth and regular.

[Scroll on down to see the enlargement.]

UL corner of original image, enlarged 4x

Now, suppose you take that original PPM image and convert it to PostScript with the command line

pnmtops start.pnm > ps1.ps.

ps1.ps displayed by Gimp at 100 dpi Then display it with gimp, accepting its default resolution of 100 dpi. Here's what you'll see:

The first thing you notice is that a large white area surrounds the image. That's because gs, by default, places it on a full page that corresponds to the default paper size. The white area at the bottom and left side is part of this page. (The image is truncated because I let the Gimp use the image's BoundingBox; evidently, it misplaced it. The image was scaled down to 50% of full size by the Gimp to fit on the screen.)

A second problem is that the image has been rescaled. If you use the ImageMagick utility identify to show the number of rows and columns in each version, you'll find that, while the original was 547×606, the PostScript file ps1.ps created by pnmtops has an image only 526×582, which is just 0.96 as large.

This reduction is a side effect of the scaling assumed by pnmtops. This program assumes the input image has 300 pixels per inch; and it produces PostScript output at exactly 72 dpi (the standard PS scaling). However, if you don't tell it how to scale things, it assumes a fictitious output device with a scale of (300/72 = 4.166666 …) — but then rounds this value to an integer, namely, 4. Now, 1/4 of 300 is 75; so the output ends up being scaled to a size 72/75 = 0.96 of the input size.

So if you accept the Gimp's default resolution of 100 dpi, the image is re-scaled to 100/72 of the original. As (100/72) × (72/75) = 100/75 = 4/3, what the Gimp displays is 4/3 the size of the original image: instead of 547×606, it's 730×808 pixels. (See the scales on the Gimp's window frame.)

ps1.ps displayed at 75 dpi, using BoundingBox You can get the Gimp to display ps1.ps at the right scale, and without distortions, if you specify 75 dpi; but the top and right edges of the image are still truncated (as shown at the left here) if you allow it to use the BoundingBox. The scales show that we have an area 547×606 pixels, all right; but it's not centered on the original image.

Though it isn't obvious, the top of the figure is missing. (Count the sub-panels: there were 6 in the original image, but the top one is missing here.) Also, the right edge has been truncated; this is most obvious in the lower right corner of the image.

The truncation and white border are caused by the Gimp's attempt to use the BoundingBox. Because you told it to use 75 dpi instead of 100, it shifted the displayed area only 75/100 as far up from the lower left corner as it should have, instead of all the way to the center of the page. But if you tell Gimp to ignore the BoundingBox, you get lots of surplus white space at the top and right edges, as shown below.
ps1.ps displayed at 75 dpi, ignoring BoundingBox At the right is the image you get if you tell the Gimp to open ps1.ps at 75 dpi, but not to use the bounding box. Though the image is still displaced, you get to see all of it. But it's still surrounded by a large white area; in fact, the white-page background is now much larger than the size of the image, so Gimp shrinks the page to fit it on the screen.

You might think that these problems are just due to some bug in the Gimp, so that using gs or gv to display the image would work OK.

Let's first look at the display produced by

gs ps1.ps

which is shown below.

[Scroll down again.]
ps1.ps displayed by gs Here you can see the image properly centered on the page. I've added a border around the full image to distinguish the white background of the gs display from that of the Web page.

Unfortunately, there are now little spiky artifacts projecting from the upper limb at regular intervals. Clearly, something is wrong.

If gs can't display the image properly, how about gv?

[See below.]


ps1.ps displayed by gv The image at the left is what

gv ps1.ps

produces. The centering is correct, but there are again errors in the display. These appear as subtle irregularities in the solar limb. At first glance, they appear to be missing lines, which might be attributed to the incorrect scaling.

But the problem is more complicated than that. Let's use xmag to blow up an example (shown below):
ps1.ps displayed by gv and magnified by xmag Here you can see that there's a step in the limb, all right; but there's also a vertical artifact, below and to the left of the jog in the limb. (Look directly below the word “new” in the xmag headings.) This vertical feature certainly can't be due to a missing line.

So, while the differences between the gs, gv, and gimp displays of the same image obviously are in the displaying process, rather than the PostScript image itself, the nature of the problem is not immediately obvious.

 

In any case, we'd like to get rid of the centering problem that gimp makes evident, and the resulting unwanted white border that is displayed by gs and gimp.

To get rid of the white area, we need to use the -nocenter option to pnmtops. That doesn't in itself fix the problem; but it does at least get the image into the corner of the page.

To prevent the mis-scaling, you have to use the -equalpixels option of pnmtops, as well as the -nocenter option. This will produce an output that's at exactly 72 pixels per inch on the PostScript page.
So the result of

pnmtops -nocenter -equalpixels start.pnm > ps2.ps

can be displayed correctly by gimp, but only if you tell it to use the correct dpi setting.

You might have expected that to be 72, but it's really 300 — the value assumed by pnmtops at its input side. ps1.ps displayed at 75 dpi, ignoring BoundingBox As a result, trying to display ps2.ps with either gs or gv, which assume 72dpi, produces an image scaled down by 72/300 = 6/25 = 0.24, which is even farther from what you want. You just get a little postage-stamp image like the one shown at the left.

So let's try to correct for this by imposing 72 dpi on the input side, by specifying -dpi 72 as well:

pnmtops -nocenter -equalpixels -dpi 72 start.pnm > ps3.ps

Now the Gimp displays the image correctly at 72 dpi. And so does gv; though gs by itself, with no special options, still produces a garbled image. And we finally have an output image that identify says is 547×606, the same as the original version.

The curious thing is that all three PostScript versions have the same image data; it's only the transformation matrix and the centering of that image that change. In fact, the actual pixels are correctly converted to PostScript in every case, as you can confirm by printing any of these images on a good PostScript printer.

That means the problem is entirely in gs, which the Gimp and gv both use to display PostScript files.

Variations on this theme

There are a couple of additional points that need attention. First, if your input image is in “landscape” format, pnmtops will try to rotate it to fit on a standard page. To prevent this and preserve the proper orientation, you must add -noturn to the list of pnmtops options. (It doesn't hurt to do this, regardless of the original orientation.)

Second, the default is to generate uncompressed PostScript. The file can be made much smaller by adding -rle to the pnmtops options. However, the price of a smaller file is much more time spent in rasterizing the image. For example, a page with two large images on it took up 313 kB with the -rle option, instead of 7.6 MB without; but the compressed version took 2 hours to print, while the uncompressed one printed in 13 minutes (on an old, slow, HP LaserJet III). That's because the image had to be decompressed by the PostScript interpreter in the slow printer.

That means the compression saved a factor of 24 in disk space, but cost a factor of about 9 in printing time. But modern printers are much faster; the printing time of the compressed version was about a minute on a LJ 4100N. Transmitting all those uncompressed bytes over the parallel port kept the cpu load of my 1400 MHz Athlon near 70% for those 13 minutes, too.

So your command line really should be:

pnmtops -nocenter -equalpixels -dpi 72 -noturn -rle start.pnm > ps3.ps

— but bear in mind the tradeoffs required by the -rle.

The Example Continued: from PostScript to PNM

It turns out that we also find gs being used to convert back from PostScript to PNM (or any other) graphics: Ghostscript is the PS interpreter used by not only the pstopnm program, but also convert, as well as the Gimp.

So it's necessary to understand the options to gs to get these conversions done cleanly. It's a little easier to use pstopnm (rather than gs) to do the conversions, as it needs fewer options on its command line. But you still need to understand what it's telling gs to do.

The simple command

pstopnm ps3.ps

writes a file named ps3001.ppm, which looks like this:

ps3.ps converted by pstopnm, no options You'll notice the unwanted white border is still here, though at least the image is now in the lower left corner of the “page”. The problem is that pstopnm, like pnmtops, wants to have a full page for its output image. [Not a bug, but a feature, right?]

Furthermore, this image has those nasty glitches along the solar limb; here's a magnified view of them: ps3.ps converted by pstopnm, no options So we still have that problem to contend with. And displaying the image with the Gimp shows that, once again, the image has been re-scaled.

The man page for pstopnm indicates that we should be able to get rid of the unwanted white borders by adding -xborder=0 and -yborder=0 to the options.

The result of that change to the conversion is shown below.

ps3.ps converted by pstopnm, no options The command

pstopnm -xborder=0 -yborder=0 ps3.ps

produced the image shown at the right. This clearly got rid of the unwanted borders; but there are still artifacts along the limb. And identify ps3.ps shows that this image is 607×673 instead of 547×606, so there's still unwanted scaling.

Once again, it turns out that the scaling is an unwanted feature of pstopnm: it assumes you want to fit the image into a standard page of paper, and enlarges it to just fit on the page.

The way around this is to tell the program what size you really want, by adding -xsize=547 and -ysize=606 to the options.

So we actually need the command line

pstopnm -xborder=0 -yborder=0 -xsize=547 -ysize=606 ps3.ps

to get what we want.

Actually, you should probably use -xmax and -ymax instead of -xsize and -ysize: if you don't, and the image turns out to be bigger than the number of points available on a standard sheet of paper, you'll get unwanted scaling again. (And note that the = signs can as well be replaced by spaces; pstopnm will parse the arguments correctly either way.)

Now we find that ps3001.ppm has the correct dimensions, displays correctly in the Gimp, and is in fact identical to our original file!

Discussion: the gs display problem

You might have noticed that gs messes up the display when (and only when) the image is re-scaled. When it's displayed pixel for pixel on the screen, there's no problem.

Typically, the artifacts have a blocky appearance: there are periodic discontinuities in the image in both x and y. That's a symptom of Floyd-Steinberg dithering — which, it turns out, is what gs uses to present continuous-tone images on bitmapped displays.

As a horrible example of bad dithering, here's what the Gimp shows if you open ps2.ps at 360 dpi: ps2.ps converted by Gimp at 360 dpi

This display is enlarged to show details of the dithering artifacts. The effect produced looks like an uneven mixture of wide and narrow lines, badly out of synch.

There seems to be no way to prevent this unwanted dithering from the gs command line, apart from specifying a fictitious “page size” that exactly matches the image. It's a nuisance to have to run identify to learn what those required dimensions are, but it appears to be unavoidable.

More problems

Unfortunately, you sometimes can't avoid rescaling and the resulting dithering problems. For example, suppose you want to match the width of a figure to the width of the text column in a PDF file. Unless your original just happens to have the right width, it will have to be scaled to fit the column width. If you use gv to display the PDF file, it will be using gs again; the image will be scaled, and will suffer dithering errors.

However, there's a way to ameliorate the display problem. You can tell gs to interpolate instead of dithering. The problem is that you don't have access to the gs command line used by gv or the Gimp. But you can set the environmental variable GS_OPTIONS to include the -dDOINTERPOLATE flag, like this:

export GS_OPTIONS='-dDOINTERPOLATE'

The resulting display will look a lot better, even though it's not perfect. Here's an example of setting that flag in the environment and then displaying ps2.ps with gimp at 360 dpi:

ps2.ps converted by Gimp at 360 dpi with -dDOINTERPOLATE set

As you can see, the limb detail is now smooth and looks acceptable. But the interpolation has produced an artifact at the edge between sub-panels of the figure: look at the row of half-bright pixels at the boundary. So this clearly isn't an acceptable way around the conversion problem; it just improves the display.

Plan B

There's another way to avoid these display/conversion problems. If you convert a PostScript file to PDF format, you can use xpdf to display the result. And it has an entirely different rendering engine; it does not use gs. Consequently, xpdf shows images without appreciable distortion.

Furthermore, there are conversion utilities based on xpdf that will produce much better images. For example, pdftopbm generates quite good bitmaps from line drawings. And pdfimages will extract equally good PPM images (both black-and-white and color) from PDF files.

Now the problem is to make sure the image wasn't corrupted when it was made into a PDF. The shell script ps2pdf, like its cousin, the Perl script epstopdf, invokes gs; but they only add a PDF wrapper to the original PostScript code, and so do not corrupt the images. Any artifacts occur when the PDF file is displayed, not when it's created.

Curiously, there is even some image corruption when the PDF file is displayed by Adobe's acroread reader. Apparently it also does some dithering and/or interpolation.

These conversions are also discussed here.

PostScript image formats

Another complication has to do with the way images are encoded in PostScript. Image data can be stored with 1, 2, 4, 8, or (in Level 2) 12 bits per component:

1 bit per component is appropriate for black-and-white figures, such as line drawings. It corresponds to the PBM bitmap format.

2 and 4 bits per component don't seem very useful.

8 bits per component is appropriate for grayscale (PGM) images (0 to 255). It's also used for full-color (PPM) images, with 3 (RGB) or 4 (CMYK) components per pixel.

While the pstopnm command lets you specify explicitly which of the PNM formats (PBM, PGM, or PPM) to write, the pnmtops command doesn't let you pick which PS image encoding is used. Instead, the latter maps input PBM files to 1 bit per component, PGM to 8, and PPM to 8×3.

A related complication is that the Gimp won't write PBM images, only PGM ones. And, to make matters worse, the identify command doesn't distinguish among PBM, PGM, and PPM, but reports them all as PNM. (You'll need to use the file command to make sure what's actually in a PNM file.)

The examples offered above all involved color images and PPM files, so these problems don't arise. But when you're dealing with black-and-white line drawings, you may need to invoke the pgmtopbm command to reduce the file sizes before converting to PostScript.

So, to sum up:

To go from PostScript to PPM (which can be converted to any other graphic format, like PNG), use

pstopnm -xborder=0 -yborder=0 -xmax=<width> -ymax=<height> image.ps

and the portable pixmap is written to image001.ppm. [Note that the output is named a ppm file, although the command is pstopnm.] You'll need to run  identify image.ps  first, to find the numerical values to use for <width> and <height>. And, if the original EPS file doesn't have the right page size, you could have more problems; see the page on converting EPS to PNG for details.

Usually, you'll want to convert the PNM file to PNG, which is enormously more compact (and uses lossless compression, so the compressed version is still an exact copy of the original). To do that:

pnmtopng image.pnm > image.png

Note that PNG files can contain much useful supplemental information: comments, rendering intent, color management instructions (including gamma), etc. See  man pnmtopng  and  man pngcrush  for details.

 

To go from PNM to PostScript, use

pnmtops -nocenter -equalpixels -dpi 72 image.pnm -noturn -rle > image.ps

— but remember to use  pgmtopbm  to reduce file size first, if you need to.

Or to go from anything else to PostScript, use

anytopnm image.whatever | pnmtops -nocenter -equalpixels -dpi 72 -noturn -rle - > image.ps

CAUTION: the -rle option may increase printing time by an order of magnitude, so decide whether you prefer fast rendering or compact files before using it.

 

Copyright © 2005, 2006, 2010 Andrew T. Young


Back to the . . .
LaTeX formatting page or the

website overview page