BIOS execution in QEMU: where it all starts

Reading Linux Inside (by 0xAX) and the early steps of the kernel boot process sparked my curiosity about the BIOS code executed immediately after power-on. Somehow related to that, tinkering with QEMU has been on my backlog for quite some time. That made the recipe for a new -and quick- challenge: debug QEMU while executing the first instructions of the BIOS firmware.

The first step was to setup an environment to compile and debug QEMU. The compiling capability is actually optional. My RPM-based environment runs on top of Fedora 31 (x86_64). QEMU version is 4.1.1. Binary translation mode has been used for emulation -hardware acceleration may come in a follow up-.

With no other preamble, let’s go straight to the task.

ROM loading

Identifying where the ROM image gets loaded into the host memory has been a helpful starting point:

The rom structure in rom_add_file, once filled, reveals some interesting data about the ROM image:

  • File: /usr/share/qemu/bios-256k.bin
  • Size: 262144 bytes (256 KB)
  • Emulated physical address: 0xfffc0000
  • Host virtual address: 0x55555678da00 (*) (**)

(*) This value changes with each run.
(**) All host memory addresses in this article refer to the location where the BIOS image was initially loaded. This data is transferred to a pc.bios RAM block upon virtual machine reset. See more information here.

In old_pc_system_rom_init (referenced from the call stack) it possible to see the pci memory regions created and how the ROM image fits in-there. A region named pc.bios of the same size than the ROM image is created first, and added as a sub-region of pci at the emulated physical address 0xfffc0000. An alias of pci.bios, named isa-bios, is created mapping its last 128 KB. It is then added as a sub-region of pci at the emulated physical address 0xe0000. As a result, both emulated physical addresses 0xe0000 and 0xfffe0000 point to the beginning of the last 128 KB of the BIOS image (offset 0x20000), whereas address 0xfffc0000 points to the beginning of the BIOS image (offset 0x0).

CPU reset

The next interesting event, after memory and other hardware initialization, is CPU reset:

The Program Counter (EIP register in x86) is set to 0xfff0, the CS selector to 0xf000 and the CS base address to 0xffff0000. Adding the PC value to the CS base address we conclude that the CPU will start at 0xfffffff0 (emulated physical address). Note: all of the previous values were artificially set to comply with the x86 specification. In x86 real mode, addresses during run time will be calculated as segment selector * 16 + offset.

Doing some trivial math (0xfffffff0 – 0xfffc0000) we realize that the first instruction to be executed is at the BIOS image offset 0x3fff0. If the BIOS image was loaded into the host address 0x55555678da00 (see ‘ROM loading’ above), then the first instruction should be at 0x5555567cd9f0.

What do we have there?

Doing some 16 bits i8086 decoding:

That is a long jump backwards to the emulated physical address 0xfe05b (CS 0xf000 * 16 + PC 0xe05b). Doing some trivial math again (0xfe05b – 0xe0000), that is 0x1e05b into the isa-bios memory region. Considering that the isa-bios is the last 128 KB of the BIOS image (offset 0x20000), the jump destination is at BIOS image offset 0x3e05b (0x20000 +0x1e05b). In terms of host addresses, that is 0x5555567cba5b.

What do we have there?

Some 16 bits i8086 decoding again:

And so on. I won’t continue decoding for readers sanity but you get the idea.

Binary translation

Our final step is to see this in action.

A few code pointers for this section:

When in binary translation mode, QEMU retrieves the instructions to be executed and transform them into native host instructions. Even though it’s possible to establish a relationship between both sets, it is anything but a one-to-one opcodes translation; looks more like instrumented code generated by a framework such as DynamoRIO. To provide an example, a long jmp in the source architecture will look more like a couple of mov instructions that modify the CPU state structure (in memory) to emulate the effect.

Let’s see how the instructions decoded in ‘CPU reset’ section look like when translated.

Source instruction:

Translated instructions block:

You see there how the ljmp is represented by setting 0xf000 in the CS selector, 0xf0000 (0xf000 * 16) in the CS base address and 0xe05b in the PCRBP register is pointing to the CPUX86State structure.

Source instructions:

Translated instructions block:

That comparison followed by a conditional jump was a bit more involved.

Source instructions:

Translated instructions block:

I won’t continue but there are a few of observations that I want to make:

  1. registers used for the actual computation in the host are not necessarily the same than in the source instruction (i.e.: DX in the source is CX in the last translation block);
  2. registers state (as any other state) has to be saved after the translation block is executed (i.e.: see how RBX register value is moved to memory pointed by RBP+0x10); and
  3. the instruction that updates the PC (i.e.: move of 0xe06a to RBP+0x80) is a good hint to understand the number of source instructions executed within the translation block.

Bonus point

Where is bios-256k.bin source code located?

It’s in the seabios repository. The BIOS image entry point (at offset 0x3fff0) is in the romlayout.S file.

Further reading:

One Reply to “BIOS execution in QEMU: where it all starts”

Leave a Reply

Your email address will not be published.