Compose Sequences

Introduction

A cardinal rule of user-interface design is consistency: make everything work the same way, so the user doesn't get confused. The careful integration of all parts of its user interface is largely responsible for Apple's popularity.

No matter how many keys you have on your keyboard, there will always come a day when you need a character that doesn't have its own key. The basic solution to this problem involves Unicode, which provides a character for (almost) every symbol that has been widely used.

But how you invoke those characters varies from one application to another. There are the “entity references” in HTML, which are often based on similar special symbols in LaTeX — not to mention the glyph names used in PostScript for accented letters and mathematical symbols.

It's often possible to set up context-sensitive configuration files for your commonest applications, so that typing a standard key sequence generates the correct string to invoke special characters you use frequently. For example, the  vim  editor automatically detects a large number of special file types (HTML, LaTeX, etc.), and can respond appropriately when you need to enter something that's not a single keyboard key.

So now that Debian has made UTF-8 locales standard, it's straightforward to use any Unicode character you want. This page explains some details, and discusses the design considerations.

The Compose Key

While “dead keys” can be used to introduce accents, a more general keyboard solution is to have a “Compose key”, which introduces sequences of characters that invoke particular Unicode characters. The Compose key lets you enter such special characters as the case fractions (½, ¾, etc.) and symbols like ¶ and §, in addition to letters with diacritical marks (á, è, ñ and so on). The whole range of Unicode characters is available; you just have to choose the ones you use most often, and set up convenient “compose sequences” for them.

You might think this had already been done. Well, while there are indeed various common files of compose sequences, they seem to have a very low signal-to-noise ratio. For one thing, they tend to contain both dead-key solutions and combinations for use with the nodeadkeys variant in /etc/default/keyboard. For another, they also contain compose sequences for common keyboard characters like @, braces and brackets, for use with odd keyboards that lack these common items.

While it's nice to know that non-standard keyboard deficiencies can be made up for with special compose sequences, most people have a standard keyboard with a full selection of marks like tilde (~), caret (^), and the “number sign” or “hash mark”, #. There's no use filling up your Compose table with sequences for these things that you'll never need.

Finally, there's really no standardization. For example, the compose sequences distributed for use with the console (in /etc/console-setup) sometimes conflict with those used in X (in /usr/share/X11/locale/en_US.UTF-8/Compose).

Anyway, you can always set up your own list of compose-key combinations. So let's consider how to make good choices.

Criteria for Good Sequences

Consistency

As I mentioned above, some of the compose sequences distributed for use with X don't agree with those used in the Consoles. Whatever you do, make sure that these two sets of compose sequences agree; if you have to type something different, depending on what kind of terminal screen you're looking at, you'll get confused.

Unfortunately, this can't always be done. It appears that the Console sequences can only contain two characters; but X sequences can be longer. So, at least make sure that all the 2-character combinations agree. You can have additional multi-character sequences for X, if you need them.

Memorability

First of all, the keys you strike after the Compose key should be easy to remember. We're obviously not going to choose  xq  as the compose sequence for the case fraction  ½ ; instead, everybody uses  12 . Likewise, everybody uses  ae  as the sequence for the ligature æ. These choices are obvious, and easy to remember.

But some are not so obvious. The file /etc/console-setup/compose.ISO-8859-1.inc contains the sequence  !p  for the pilcrow (¶), which isn't very mnemonic. Why a lower-case p ? I'd have expected an upper-case P. And the ! isn't an obvious choice, either.

Even less-obvious choices in the same file are  *0  to make the degree sign(°), and  *A  to make Å. I suppose somebody thought the asterisk was vaguely suggestive of a ring-shaped superscript, and that putting it with zero (nothing) suggested the ring by itself. That reasoning seems far-fetched; it certainly wouldn't be easy for me to remember.

I'll re-discuss these particular examples below.

Convenient and inconvenient compose sequences

“Convenience” in this context means “easy to type”. This criterion favors keys that are close together on the keyboard. In particular, doubled keys (i.e., hitting the same one twice) are especially easy. Some good examples are  !!  and  ??  to make the Spanish inverted forms ( ¡ and ¿ ) of these punctuation marks, and  ss  for the German double-s, or “eszet” (ß).

These work so well that I suggest using  PP  for the Paragraph mark (¶), and  SS  for the Section mark (§). (Note that there is no conflict with the eszet, which occurs only as a lower-case glyph: ß.)

Another good example of doubled keys is the “A-ring” glyph (Å). Before a modern Scandinavian spelling reform, this was actually spelled “AA”; and LaTeX uses \AA to generate it, as well. So I use  AA  as the compose sequence for it — which makes  aa  convenient for the lower-case letter (å), again following the old spelling rule.

Similarly, a reasonable compose sequence for the “degree” sign (°) is  oo .

Familiarity

If you use LaTeX a lot, it makes sense to use some of its special-symbol codes as compose sequences, particularly for Greek letters with short names, like  mu  (µ) and  pi  (π). Be careful here: Unicode has different encodings for the lower-case Greek letter mu (μ) and the sign used for the unit prefix “micro” (µ); some fonts distinguish between them and some don't.

 

Copyright © 2011 – 2012 Andrew T. Young


Back to the . . .
main LaTeX page

or the alphabetic index page

or the GF home page

or the website overview page