This post describes one possible scenario in which GRUB hangs (freezes) with the four letters GRUB and a blinking cursor displayed on the screen, even before the boot menu appears. It might also display "Loading stage2..." and then hang. You might or might not be able to restart the computer with CTRL+ALT+DEL when this occurs. An explanation and a procedure for identifying or excluding the scenario is provided.
When GRUB hangs before displaying the boot menu, the immediate cause most likely is a failed attempt to load the
stage2 file from the root partition ("root" refers to whatever has been specified when installing GRUB into boot sector). If you get no error messages at all, just a blinking cursor as mentioned above, then the underlying problem may lie in the implementation of the BIOS interrupt 13h, on which the installed GRUB (i.e.
stage1 stored in the boot sector) depends to read raw data from disk. GRUB cannot avoid the crash if its humble attempt to read sector n from disk leads to a locked-up computer. When this happens, some garbage characters might also appear on the screen (printed by BIOS, not GRUB).
It seems that some BIOSes, even versions released as late as 2005, have big trouble reading sectors (blocks) beyond a certain boundary. In my test case (AMIBIOS, 80 GB Maxtor disk) the last readable sector turned out to be 66059279.
In order to find out whether or not you are experiencing the same problem, you should perform these additional tests:
- Check the sector number where your root partition begins and ends (using
uto change display units to sectors). (If it is the first disk partition, it's unlikely that you have the problem discussed here.)
- Insert a boot CD with a working GRUB menu (a simple
/usr/sbin/grubshell is not enough; see note below). Press the key 'c' after the GRUB menu appears while booting from the CD. Now enter the command
root (hdm,n), replacing m with the disk number (in Linux: 0 = hda, 1 = hdc) and n with the partition number (in Linux: 0 = hdm1, 1 = hdm2, etc.) on which the file
/boot/grub/stage2is supposed to be found. If it hangs immediately after entering the command, but does not hang for
root (hd0,0), it is likely that you have the described problem.
- You can refine the diagnosis further by attempting to read individual sectors using the command
cat (hd0)sector_num+1. For example:
cat (hd0)66059280+1produced the error message "Error 18: Selected cylinder exceeds maximum supported by BIOS" in my case, attempts with a smaller sector number worked, and attempts to read a much higher sector number (where the actual root partition started) caused hanging. When you see this behavior, you can be quite certain, that you have a broken BIOS.
Note that performing the above tests from a grub shell after having successfully booted (say, to another OS on the same PC or through a Live CD) will not give the symptoms and therefore will not help troubleshooting. The grub shell binary uses different system calls to read sectors than the actual
stage2 binary and these may work well where the native BIOS interrupt fails.
In my case, it didn't help to upgrade BIOS. However, repartitioning and placing the root partition into lower sectors (swapping hdc2 with hdc3) solved the problem... or so I thought.
In a cruel twist of fate, the problem re-occurred with exactly the same symptoms just a few days later. It turned out that I moved the root partition toward the disk's beginning, but did not resize it, so that the end still stretched beyond the fatal sector. Editing
menu.lst or possibly simply rebooting (OpenSolaris) moved one of the files required by GRUB to higher sectors, making the partition unbootable again. Lesson: if you have the problem described here (which due to its nature you might only find out when your system suddenly and inexplicably stops booting), the whole partition which contains GRUB files must be contained in low sectors!
Update: the recommendation of moving the boot partition paritition into low sectors is apparently not enough with OpenSolaris. My machine won't boot again. The file system code in GRUB actually reads sectors that lie outside of the boot partition boundary. And even after hacking it stop reading such sectors, the final configuration doesn't boot. Although both
module$ can be set and files are reported found, all I get on a boot attempt is a blinking cursor. I suppose some of the OpenSolaris kernel/boot loader may be trying to read the high sectors as well. This is where I say farewell to OpenSolaris (on that machine ;-)).