The GRUB2 Rescue Shell

Introduction

Nothing is more frustrating than making some small change to your system, and then discovering that it won't boot. This must have happened to me dozens of times over the years; and the problem is always something different and unexpected.

The most frustrating experience you can have with GRUB is to be dropped into its “Rescue shell”. You might get a cryptic error message, followed by

Entering rescue mode  . . .
 
grub rescue>

No help; no advice on how to proceed; and even the Grub2 Manual tells you nothing useful. Yuck.

The rescue shell

The rescue shell is an exceedingly minimal, Spartan environment. It offers only a tiny subset of the regular Grub shell's commands:

set
unset
ls
insmod

These commands offer no options, and it is impossible even to learn which ones are available. Even common commands like cat and cd are not available. At first glance, it would seem impossible to do anything useful in this limited environment.

And yet, it's actually possible to go through the booting process manually — provided that you know something about the booting process, and how Grub handles it. In fact, the operations that Grub executes automatically can be done by hand, just with the limited means provided by the rescue shell.

Understanding Disaster

Actually, the fact that you see the grub rescue> prompt is good news in disguise. It means you haven't wiped out Grub's booting system. The hardware has gone through the POST process, and loaded the primary stage of the boot loader; but the information available to that stage (supposedly, the addresses of the disk blocks that hold the next stage) was wrong. Grub is alive, but it needs a little help.

The common causes of this situation are things that changed the locations of the next pieces of Grub's boot-time subsystem. Maybe you re-formatted the filesystem that holds those files; maybe you updated a kernel, but didn't re-generate Grub's configuration files; maybe you ran update-grub with incorrect partition identifiers in the /etc/grub.d/40_custom shell script. The result was that Grub's early stages that depend on absolute block addresses couldn't find the following pieces. You can tell the part of Grub that's already working how to proceed.

With luck, the error message that precedes the rescue prompt will tell you what Grub looked for but couldn't find. Write it down, and take appropriate action after you've re-booted your box.

Diagnosing disaster

The first thing to do is to find out what Grub knows already. At the

grub rescue>

prompt, enter the command:

set

and Grub will tell you what little it knows. Again, take note of this information, as it will provide clues to what you need to do next — as well as clues to fixing the problem before the next reboot.

The main Grub variables available at this point are usually the prefix and root parameters. What Grub calls prefix is the location (in Grub's peculiar notation) of the directory that holds Grub's pieces. In normal *IX terminology, this would usually be /boot/grub. But Grub doesn't have mounted filesystems yet; it only knows about disk partitions, which it calls things like (hd0,msdos1).

So, if you have a separate boot partition, which is normally mounted on /boot, Grub will need to have prefix set to an address of the form (hd0,msdos1)/grub. On the other hand, if your /boot directory is in your root-filesystem's partition, Grub will need to have prefix=(hd0,msdos1)/boot/grub.

Similarly, what Grub calls root is NOT the partition that contains the root filesystem, but the partition that contains the kernel and initial RAM filesystem (i.e., initramfs) files. That usually means that Grub thinks root=(hd0,msdos1).

Fixing the problem

First of all, make sure Grub can see the partitions that contain these vital parts. Tell the rescue shell:

ls

and it will show you the partitions it knows about. This will be a list of things like

(hd0,msdos1) (hd0,msdos2) . . .

which mean /dev/sda1, /dev/sda2, and so on. Remember that Grub's name for the first disk is hd0; the second disk is hd1, etc. — but the partitions themselves are numbered normally, starting at 1 instead of zero.

Don't be misled by a superficial resemblance between the rescue-shell ls command and the normal ls that you use at a bash prompt. The normal command has scads of options; the rescue-shell command is a stripped-down version with no options, and a different output format. Only the names are similar.

If the rescue shell had a cat command, you could list the contents of /boot/grub/device.map to learn Grub's correspondence between disks and names; but it doesn't, so you can't. Keep a copy of the device.map file printed out beforehand, because the devices might not map out the way you expect. (For example, on my machine, Grub thinks sda is hd0, as you'd expect; but it also thinks sdb is hd2, and sdc is hd1, which is screwy.)

Then, if Grub has the wrong values for either prefix or root, you can fix its error by telling the rescue shell something like

set prefix=(hd0,msdos1)/boot/grub

or whatever the correct value is in your box.

Remember that you can verify that Grub has the correct values for these parameters with a simple

set

command. And you can double-check by telling the rescue shell to list the contents of those directories:

ls $prefix/

Proceed to boot manually

Now you need to lead Grub through its normal booting sequence. The first step is to make sure it has its normal.mod module available:

insmod normal

This module is the “guts” of Grub2: it contains the apparatus for reading configuration files, and displaying the usual Grub boot menu. The correct prefix and root parameter values are needed, if Grub is to find any modules.

Once the module is inserted, you can execute it as a new rescue-shell command:

normal

Another vital module that must be loaded is the linux.mod file:

insmod linux

Now you can set up the linux kernel's command line:

linux /boot/vmlinuz-3.16.0-4-686-pae root=/dev/sda2 ro

or whatever is the correct name for your kernel. Note that the “root” in this line points (in normal terms) to your root filesystem's disk partition; it is different from Grub's root parameter! You can specify the root partition here in perfectly normal terms — either by specifying --fs-uuid and the UUID of the partition, or by using a filesystem label, as in root=LABEL=ROOT (if your root filesystem is named ROOT). If you need additional command-line parameters, like video=640x480, be sure to add them here as well.

Similarly, you need to tell Grub where the initial RAMdisk filesystem is:

initrd /boot/initrd.img-3.16.0-4-686-pae

Make sure it matches the kernel version!

Now, with both the kernel and the initrd modules installed, and their arguments supplied, Grub should be able to boot. Tell the rescue shell to do so:

boot

Once the linux and normal modules are inserted, most of Grub's apparatus will be working, and the larger set of normal GrubScript commands will be accessible. Or, if you get everything messed up and want to start over, the usual invocation will re-boot the system.

Back to normal

When you are back up, be sure to fix the errors that caused the booting problem. If the configuration parameters are set in the /etc/default/grub file and any special scripts (like /etc/grub.d/40_custom), you should be able to get a correct /boot/grub/grub.cfg just by executing (as root):

update-grub

and

grub-install /dev/sda

(or whatever disk is marked as the boot device in your BIOS).

Sometimes, the error that brought up the rescue shell was

error: no such device:

followed by a UUID for some partition that has disappeared — usually, due to a re-made partition. In this case, you may need to invoke

update-initramfs -u -v

to get the correct partition named in the initrd.img file.

 

Copyright © 2015, 2017 Andrew T. Young


Back to the . . .
main LaTeX page

or the alphabetic index page

or the GF home page

or the website overview page