Booting Debian

Introduction

Nothing is more frustrating than making some small change to your system, and then discovering that it won't boot. This must have happened to me dozens of times over the years; and the problem is always something different and unexpected.

While there are several HOWTOs available, they deal with the details of isolated parts of the booting process. This page is intended to provide a broader view of the matter.

Bootstrapping

As computers have become more complex, the bootstrap process (which goes from cold metal to a running system) has become more complicated. So let's begin where I did, when a computer was a roomful of vacuum tubes that consumed 75 kW of power and required 40 tons of air-conditioning; core memory was 295,000 little ferrite donuts (i.e., magnetic cores — one for every bit in the 8192 words of memory); and booting was a lot simpler than it is today.

Ancient History

Back in 1959, when I started using computers, there were no operating systems. Every program was a stand-alone job. To get your program (on punched cards) into the machine, you had to put a “loader” at the front of your card deck.

A one-card loader contained 24 instructions that read your program into fixed locations in the machine's memory (all of 8192 words). When the operator had put your deck in the card reader, he punched a button marked LOAD CARDS on the control panel. This started the card reader; copied the first row of bits from the bottom row of the punched card into the first two words of memory; and transferred control of the machine to the first location. Those two instructions had to read the rest of the card into successive locations in memory (i.e., core storage).

The rest of the instructions on that first card had to read in the rest of the cards  in your program's “object deck”. Each card contained the address into which its first instruction was loaded. At the end of your absolute-binary deck was a card with the address of the first instruction in your program; the loader had to transfer control to this address to start your program. You were expected to end the program with a HALT instruction; when your program finished, the machine just sat there until the operator put another deck in the card reader and pressed the LOAD CARDS button again.

If you used FORTRAN, you got a “relocatable” deck instead of absolute binary cards. This required a “relocatable loader” that adjusted the address fields of some instructions to suit the places where they were stored in memory. The relocatable loader was a couple of dozen cards, containing instructions on how to adjust the addresses. It began with a one-card loader very similar to the absolute-binary loader: the instructions on the first card told the machine how to load the rest of the relocatable loader, and the whole relocatable loader loaded your relocatable routines. But, as there was still no operating system, the machine just stopped when your program was done.

Eventually, early in the 1960s, there appeared the first operating system, called the Fortran Monitor System. FMS could be told to compile your FORTRAN source deck to machine-language instructions (the so-called object deck), and then load and run the object deck! We were pretty impressed by that. It was so efficient that it could run somebody else's program automatically when yours finished! Amazing!

FMS lived on a big tape. The operator mounted the tape on a drive and then pressed the LOAD TAPE button — which turned on the tape unit; copied the first two words from the tape into memory, and transferred control to the first instruction. The first record on the tape had the equivalent of a one-card loader image, which loaded the system into memory, and transferred control of the machine to the system. Wow.

But the overall pattern of the whole process was the same: a tiny piece of machine code (the first 2 instructions) read in a little piece of code (the 1-card loader) that read in a bigger piece (the relocatable loader) that read in the final operating system. Each step made the running code more capable; piece by piece, we go from nothing to everything we need. So, metaphorically, the machine pulls itself up by its bootstraps, until it's fully up and running.

Fast Forward

Now here we are today. Machines are a lot smaller physically, but a lot more complicated. We have to bring lots of peripherals into operation, and we have to tell the hardware how to find files on a filesystem stored on one of them. So the bootstrap process has become more complicated. But the same pattern applies: we bring up the system gradually, starting from very simple initial steps to more complicated ones.

When you turn on your computer, it knows nothing about filesystems or operating systems. It goes through a short process that's stored in a CMOS EEPROM that contains both a list of the peripherals (disks, memory controllers, busses, keyboards, graphics and network-interface cards, monitors, etc.) and a hardware-testing routine (to make sure they all work): the Power-On Self-Test, or POST.

The routines that allow the CPU to talk to the peripheral devices during this stage are in a part of the CMOS memory called the Basic I/O System, or BIOS. It has become customary, if incorrect, to refer to everything in the CMOS as “the BIOS”, even though the BIOS is really just one part of the firmware.

At the end of the POST, the initialization routine looks for a particular peripheral device — which might be a hard disk, an optical disk, a floppy disk, a USB device, or a network interface — where (according to data stored in the CMOS) there is an operating system to be booted. We'll call this designated device the boot device, without worrying much about what it really is. But, for the sake of definiteness, let's suppose it's a hard disk.

Data are stored on disks in logical segments called blocks. Typically, a block is 512 bytes. When the BIOS tries to boot an operating system, it reads the first block from the boot device into memory, and transfers control to the beginning of that block. (This block is usually called the Master Boot Record, or MBR.) Actually, only 446 bytes of the MBR are available for this first-stage loader, because the rest of it usually holds the disk's partition table.

You can see that the first 446 bytes of the boot block play the same role in a modern computer that the 24 instructions of a one-card loader did on the ancient mainframe I used back in 1959: it has to be able to load the rest of the boot-loading system into memory. Then that remaining, much larger, piece of boot-loader code has the job of loading the operating system itself into memory.

Debian provides a  man boot  page that describes this process clearly. It wisely describes the 446-byte chunk of code in the MBR as the “primary loader”, and the rest of the OS loader as the “secondary loader” — a useful distinction. The primary loader works with absolute, hardware addresses (i.e., the locations of physical blocks on the disk); the secondary loader contains modules that can read your filesystem, and load the files that contain the kernel and some utilities into memory.

Before we get to the complications, you might take a look at some files in your /boot directory. If you use LILO, you'll see several 512-byte files in /boot with names like boot.0200 and boot.0300, which are copies of old boot-blocks from a floppy disk and a hard disk, respectively. If you use Grub, look at /boot/grub; there's a 512-byte file named stage1, which is Grub's boot-block (if you're using Grub 1, now called “grub-legacy"). If you have Grub2, there are 512-byte files named boot.img and cdboot.img and diskboot.img. These files are the images of the primary-loader boot blocks used by these bootloaders.

Upgrading problems

My last misadventure with booting occurred when I had to replace my old computer, and copied my files to a new hard disk. I thought I'd be clever and upgrade the filesystems from ext3 to ext4.

Here's a good rule of thumb:
Too clever is dumb.

Ogden Nash


Well, it turned out that early versions of Grub couldn't read ext4 filesystems; so I ended up having to boot from a rescue disk, and struggle to get things working normally again. (At that time, there wasn't the profusion of Web pages devoted to migrating from ext3 to ext4 filesystems that a Google search reveals today.) Even with a rescue disk, it's a nuisance to have to cope with Grub's peculiar user interface; without a rescue disk, it would have been a complete disaster.

My struggle was mostly due to an unfamiliarity with Grub, combined with the transition from the old version (now disparagingly known as “grub-legacy”) to Grub2 (now confusingly known as just plain “grub”).

The situation wasn't helped by the unclear terminology used in the grub documentation, where “install” is used for both the installation of the whole Grub system on a computer, and the process of setting it up to boot your machine. Worse than this overloading of the word “install” is the more complex overloading of the word “boot”: we have a boot partition whose mount-point is /boot; then there's the “boot image”: the file boot.img that gets installed (!) on the Master Boot Record by being copied from the boot directory  by the grub-install command. Of course, the “image” files in the boot directory must not be confused with the graphic image files that are also discussed at length in the grub documentation. Worst of all, although it's bad enough that the words “boot” and “root” are very similar to begin with, the Grub configuration files contain menu-stanzas in which “root=something” points to the partition that contains the /boot directory, followed just two lines later by a linux command line in which “root=something” points to the partition that contains the root filesystem. So sometimes “root” means “root”, and sometimes “root” means “boot”.

Are you confused yet? I sure was.

Understanding Grub

Fortunately, there is a short account of setting up grub in The Debian Administrator's Handbook at http://debian-handbook.info/browse/stable/sect.config-bootloader.html. A longer account, reasonably well written, is at https://help.ubuntu.com/community/Grub2/Setup on the Ubuntu website. An even longer and more complete account of how Grub works is in its Wikipedia article.

Probably the best guide to recovering from Grub disasters, as well as the use of the normal Grub-shell command line, is How to Rescue a Non-booting GRUB 2 on Linux/, which clearly explains how to use the infamous Grub Rescue shell. (I have a short treatment of it here.)

Besides the long and confusing info manual, the complete Grub documentation is available at http://www.gnu.org/software/grub/manual/, but it's huge: the PDF version is 130 pages long. However, it's somewhat clearer than the info page for Grub, because the cross-references are easier to follow in PDF format than in info.

Structure of Grub

Overview

Grub's grand strategy for booting a computer involves the ability to start any of several different operating systems. To handle that level of complexity, it has to be able to read any of several different filesystems, from any of several different storage devices. So the initial stages concentrate on getting the Grub system up to a considerable level of competence.

To make the process manageable, the authors of Grub devised a shell-like interpreter, and a scripting language devoted to booting. (This reminds me of PostScript: the PS interpreter concentrates on putting ink marks on a page, while the Grub interpreter concentrates on finding partitions on disks, and files on those partitions, as well as the special requirements of several common operating systems.) The Grub documentation awkwardly describes this scripting as its “command-line interface”; but it would be more sensible to just call it “Grubscript”. It's intended to look like standard POSIX shell-script, with common commands like echo and ls; but it's actually considerably different, so that the similarities are misleading rather than helpful. (The full syntax of Grubscript is hidden in section 5.2 of the Manual, “Writing full configuration files directly”.)

Because Grub's need for a fairly high-level understanding of disk partitions (and the filesystems on them) can't be met with the low-level approach of absolute block-number addresses that is the only thing available early in the booting process, it has to get the filesystem stuff and the script interpretation up quickly. That lets it (or the user) do everything else in the higher-level language of Grubscript, which handles the actual booting of the OS kernel.

The booting subsystem

Now let's look a bit closer.

I'll divide Grub into two areas. The first is the executable part that runs at boot time. Because Grub can boot any of several operating systems, it has to ask you which one to boot. So it presents a menu of possibilities that you can choose from; one will boot by default after a delay of a few seconds, if you make no choice. If it has problems finding the pieces needed to boot your selection, it drops you into a primitive shell that allows you to locate those pieces manually, and put together a workable Linux command line. (This is the dreaded command-line interface of the Grub “rescue shell”; a detailed discussion of dealing with these problems is in the Ubuntu Community Help Wiki.)

Notice that the executable stuff that runs at boot-time — not only the primary and secondary stages of the boot-loader, but also the stand-alone program that displays the menu, and uses your menu selection to boot the selected OS; and the Grubscript interpreter, and all its commands — all have to run before a regular operating system is available. In particular, the various filesystems don't get mounted on their mount points until the kernel has been booted and can read /etc/fstab; Grub sees just unmounted filesystems on isolated disk partitions. So modules that can do all these things by themselves, and read your filesystems to find them when they're needed, have to be part of Grub's boot-time apparatus. These modules are in the /boot/grub/ directory.

The installation subsystem

The other area of Grub is the part that sets up the part that runs at boot time. This is Grub's installation system: it organizes the boot-time menu, assembles the executable modules that actually boot your machine, and tells the boot-time stuff where (and how) to find everything, such as the kernel to be booted, and its root filesystem. It also installs the primary stage of the boot-loader in the MBR, and puts the locations of the later stages where the primary stage can find them.

These things are done by ordinary executable programs and scripts, such as:

grub-install         which writes the primary bootloader, boot.img, to the MBR
grub-mkconfig      which prints a new copy of the grub.cfg script to standard output
update-grub           which writes a new copy of grub.cfg to /boot/grub/grub.cfg

What's where

One reason Grub is so confusing is that pieces of it are scattered all over your disk. Some of them are in the /boot/grub directory, which holds many Grub executable modules and some configuration files; some are scripts in the /etc/grub.d directory, where you might expect to find Grub's configuration files; and some are environmental parameters for those scripts that are set in /etc/default/grub .

Likewise, there is no grub command, and no grub manpage. Instead, there are about 20 different commands that set up various pieces of the whole system. The installation commands are in /usr/sbin. It's like LaTeX : gloriously powerful and adaptable for the expert, but a nightmare for the casual user.

Connecting the pieces:  grub.cfg

Fortunately, there is a single file that tells Grub what to do when it boots your machine: /boot/grub/grub.cfg . This is a simple Grubscript text file that describes Grub's whole configuration. If you learn to read this configuration file, you can understand how Grub carries out the booting process.

Because grub.cfg is generated by the shellscripts in the /etc/grub.d directory, using the environmental parameters in the /etc/default/grub  file, you can think of it as the major link between Grub's installation subsystem and its booting subsystem. It's the main result of the installation process, and the main input to the booting process.

If you configure Grub correctly, the menu items presented by the booting subsystem will work correctly, and you'll never have to deal with the weird conventions and obscure commands of the rescue shell.

Understanding  grub.cfg

So let's examine a typical  grub.cfg file. Remember, this is written in Grubscript; so it resembles a normal shell script.

At the top, there are some general sections produced by the /etc/grub.d/00_header and /etc/grub.d/05_debian_theme scripts. These set up some basic information and make a few subroutines available to the secondary bootloader. For example, if your hard disk uses the traditional MS-DOS partitioning scheme, you'll see the line insmod part_msdos, which inserts a module that can read its partition table. If your system uses the ext2/ext3/ext4 group of filesystems, you'll see insmod ext2, which can read those. The timeout interval appears at the end of the 00_header section, copied directly from the value set in /etc/default/grub. Such things are detected automatically by the installation subsystem, and we don't expect to see problems here.

The interesting stuff begins with /etc/grub.d/10_linux, which sets up the menu entry for the default operating system. The word menuentry is followed by text describing the system to load, enclosed in single quotes; this will be displayed in the menu at boot time. Then come some --class declarations that tell the booting subsystem what kind of OS it's expected to boot. This first line ends with a left (opening) brace; everything that follows, up to a closing brace, is included in this menu item.

First comes load_video and some more module-insertions, which are pretty obvious. The first tricky item is

set root='(hd0,msdos1)'

which illustrates two of Grub's strange quirks. First of all, “root” here does not mean the root filesystem. Instead, it means the value of Grub's oddly-named “rootvariable — which actually means the disk partition that contains the filesystem where the grub.cfg file lives. Well, if you have a separate /boot filesystem (as I do), this really is a pointer to that filesystem's partition; so “root” here really means “boot”.

But this pointer, (hd0,msdos1), is an example of Grub's peculiar notation. The hd0 part means the disk identified in the important file /boot/grub/device.map, which contains a persistent identifier for the disk Grub calls (hd0). Of course, the msdos1 part indicates the first primary partition on this disk. Though the hd0 part resembles the old scheme for naming IDE disks in the /dev directory, it's different — because this is Grubscript, not shell-script. (And these days, Linux calls the disk /dev/sda, or something like that; the old /dev/hdX names are gone.)

Next comes a line beginning with search, which tells the boot-subsystem to look for something in this partition:

search --no-floppy --fs-uuid --set=root d074378b

The line also contains the flag --fs-uuid, which says the UUID of the filesystem partition is coming up, and then --set=root and the actual UUID itself (a string of gibberish at the end of the line). This line is a command in the Grubscript language that sets the Grubscript variable root to the Grubscript name of the partition whose UUID is the one specified.

After echoing a comment telling the user at boot-time that Grub is trying to boot the specified Linux kernel, the menu entry has a line beginning with linux that specifies the kernel's command-line:

linux /vmlinuz-3.2.0-4-686-pae root=UUID=0899983c…  ro video=630x480 quiet

There is the name of the kernel file, followed by root=UUID= and the UUID of the kernel's root filesystem. The line ends with the arguments that will be fed to the kernel, like ro (meaning “mount the root FS read-only”), and a video mode. So, in the “linux” line, “root” means the root filesystem — unlike the usage a few lines earlier.

The menu entry ends by naming the initial ramdisk (initrd.img file) the kernel will use. Its name, like that of the vmlinuz file, is preceded by a slash, meaning that it lives in the top directory of the Grub-root partition — i.e., /boot.

The full syntax of a menuentry is hidden in section 14.1.1 of the Grub manual (or its info version.) Some more detailed discussion of menuentry construction is in the Ubuntu wiki.

Debugging Grub

Most of the installation system works well automatically; so only a few changes usually need to be made to make Grub do what you want. The trick is to learn what parts need to be tweaked. You can experiment by adding menuentry items to the end of /etc/grub.d/40_custom, followed by running update-grub.

Better yet, you can change any of the scattered Grub configuration files, and then safely  see what it does, with the simple command line

grub-mkconfig | less ,

which lets you examine the  grub.cfg  file that would  have been generated and installed in  /boot/grub/grub.cfgif  you had run Debian's  update-grub  command. (The default output of  grub-mkconfig  goes to standard output; so your working  grub.cfg  file doesn't get changed this way.) So your actual configuration file is undisturbed, while you can see the effects of changes to files in /etc/grub.d  or  /etc/default/grub .

It's also useful to use grub-script-check to make sure your new  grub.cfg  is free of syntax errors. This program reads from its standard input, so it's easy to couple it to grub-mkconfig:

grub-mkconfig | grub-script-check .

If everything is OK, grub-script-check produces no output.

Remember to keep an unaltered copy of any file you change, so you can quickly reverse the changes if they don't do what you want. And be especially careful not to leave an incorrect set of Grub files in place when installing a new kernel, as the upgrade will invoke  update-grub  automatically, and overwrite your  grub.cfg .

Understanding grub-mkconfig and update-grub

These two commands are almost identical, because update-grub is just a 1-line shell script that writes the output of grub-mkconfig to /boot/grub/grub.cfg. Debian invokes update-grub automatically when a kernel is upgraded.

And grub-mkconfig itself is just a POSIX shell script that executes the numbered scripts in /etc/grub.d in numerical order. (And you can intervene, if you want, by adding more numbered scripts to this directory.) So, if you can read standard Bourne shell code, you can figure these guys out. However, there are lots of tricky cross-references that can make following the details difficult. In particular, the operation is guided by the various Grub environmental variables that are set in /etc/default/grub.

NOTE: these variables are not the “Special environment variables” described in Chapter 13.1 of the Grub manual. Instead, the variables set in /etc/default/grub are the ones described as “keys” in Section 5.1 of the Manual, “Simple configuration handling”.

One important feature of the standard operation is that the currently running kernel is the one that will be selected in the first (i.e., number zero) menuentry stanza in the grub.cfg that is produced. And, as the default menu selection is also number zero, that means that Grub will normally boot the current OS again when you re-boot.

Or, if you decide to make a different menu the default item, you only need to run (as root) update-grub to rewrite /boot/grub/grub.cfg while you are running on the OS version you prefer. Then, when you re-boot, that version will automatically be the default.

 

Copyright © 2011, 2015, 2017 Andrew T. Young


Back to the . . .
main LaTeX page

or the alphabetic index page

or the GF home page

or the website overview page