BIOS execution in QEMU: where it all starts

Reading Linux Inside (by 0xAX) and the early steps of the kernel boot process sparked my curiosity about the BIOS code executed immediately after power-on. Somehow related to that, tinkering with QEMU has been on my backlog for quite some time. That made the recipe for a new -and quick- challenge: debug QEMU while executing the first instructions of the BIOS firmware.

The first step was to setup an environment to compile and debug QEMU. The compiling capability is actually optional. My RPM-based environment runs on top of Fedora 31 (x86_64). QEMU version is 4.1.1. Binary translation mode has been used for emulation -hardware acceleration may come in a follow up-.

With no other preamble, let’s go straight to the task.

ROM loading

Identifying where the ROM image gets loaded into the host memory has been a helpful starting point:

pc_memory_init (hw/i386/pc.c)
- pc_system_firmware_init (hw/i386/pc_sysfw.c)
  - old_pc_system_rom_init (hw/i386/pc_sysfw.c)
    - rom_add_file (hw/core/loader.c)

The rom structure in rom_add_file, once filled, reveals some interesting data about the ROM image:

File: /usr/share/qemu/bios-256k.bin
Size: 262144 bytes (256 KB)
Emulated physical address: 0xfffc0000
Host virtual address: 0x55555678da00 (*) (**)

(*) This value changes with each run.
(**) All host memory addresses in this article refer to the location where the BIOS image was initially loaded. This data is transferred to a pc.bios RAM block upon virtual machine reset. See more information here.

In old_pc_system_rom_init (referenced from the call stack) it possible to see the pci memory regions created and how the ROM image fits in-there. A region named pc.bios of the same size than the ROM image is created first, and added as a sub-region of pci at the emulated physical address 0xfffc0000. An alias of pci.bios, named isa-bios, is created mapping its last 128 KB. It is then added as a sub-region of pci at the emulated physical address 0xe0000. As a result, both emulated physical addresses 0xe0000 and 0xfffe0000 point to the beginning of the last 128 KB of the BIOS image (offset 0x20000), whereas address 0xfffc0000 points to the beginning of the BIOS image (offset 0x0).

CPU reset

The next interesting event, after memory and other hardware initialization, is CPU reset:

x86_cpu_reset (target/i386/cpu.c)

The Program Counter (EIP register in x86) is set to 0xfff0, the CS selector to 0xf000 and the CS base address to 0xffff0000. Adding the PC value to the CS base address we conclude that the CPU will start at 0xfffffff0 (emulated physical address). Note: all of the previous values were artificially set to comply with the x86 specification. In x86 real mode, addresses during run time will be calculated as segment selector * 16 + offset.

Doing some trivial math (0xfffffff0 – 0xfffc0000) we realize that the first instruction to be executed is at the BIOS image offset 0x3fff0. If the BIOS image was loaded into the host address 0x55555678da00 (see ‘ROM loading’ above), then the first instruction should be at 0x5555567cd9f0.

What do we have there?

(gdb) x/5xb 0x5555567cd9f0
0x5555567cd9f0: 0xea    0x5b    0xe0    0x00    0xf0

1 2	(gdb) x/5xb 0x5555567cd9f0 0x5555567cd9f0: 0xea 0x5b 0xe0 0x00 0xf0

Doing some 16 bits i8086 decoding:

ljmp $0xf000,$0xe05b

1	ljmp $0xf000,$0xe05b

That is a long jump backwards to the emulated physical address 0xfe05b (CS 0xf000 * 16 + PC 0xe05b). Doing some trivial math again (0xfe05b – 0xe0000), that is 0x1e05b into the isa-bios memory region. Considering that the isa-bios is the last 128 KB of the BIOS image (offset 0x20000), the jump destination is at BIOS image offset 0x3e05b (0x20000 +0x1e05b). In terms of host addresses, that is 0x5555567cba5b.

What do we have there?

(gdb) x/30xb 0x5555567cba5b
0x5555567cba5b: 0x2e    0x66    0x83    0x3e    0xe8    0x62    0x00    0x0f
0x5555567cba63: 0x85    0x24    0xf0    0x31    0xd2    0x8e    0xd2    0x66
0x5555567cba6b: 0xbc    0x00    0x70    0x00    0x00    0x66    0xba    0x09
0x5555567cba73: 0xf2    0x0e    0x00    0xe9    0x8b    0xee

(gdb) x/30xb 0x5555567cba5b

0x5555567cba5b: 0x2e 0x66 0x83 0x3e 0xe8 0x62 0x00 0x0f

0x5555567cba63: 0x85 0x24 0xf0 0x31 0xd2 0x8e 0xd2 0x66

0x5555567cba6b: 0xbc 0x00 0x70 0x00 0x00 0x66 0xba 0x09

0x5555567cba73: 0xf2 0x0e 0x00 0xe9 0x8b 0xee

Some 16 bits i8086 decoding again:

cmpl $0x0,%cs:0x62e8
jne 0x0000d08a
xor %dx,%dx
mov %dx,%ss
mov $0x7000,%esp
mov $0xef209,%edx
jmp 0x0000cf04

cmpl $0x0,%cs:0x62e8

jne 0x0000d08a

xor %dx,%dx

mov %dx,%ss

mov $0x7000,%esp

mov $0xef209,%edx

jmp 0x0000cf04

And so on. I won’t continue decoding for readers sanity but you get the idea.

Binary translation

Our final step is to see this in action.

A few code pointers for this section:

cpu_exec (accel/tcg/cpu-exec.c)
- tb_find (accel/tcg/cpu-exec.c)
  - tb_gen_code (accel/tcg/translate-all.c)
    - gen_intermediate_code (target/i386/translate.c)
      - translator_loop (accel/tcg/translator.c)
        
        i386_tr_translate_insn (target/i386/translate.c)
        
        disas_insn (target/i386/translate.c)
- cpu_tb_exec (accel/tcg/cpu-exec.c)
  - cpu_loop_exec_tb (accel/tcg/cpu-exec.c)
    - tcg_qemu_tb_exec (tcg/tcg.h)

When in binary translation mode, QEMU retrieves the instructions to be executed and transform them into native host instructions. Even though it’s possible to establish a relationship between both sets, it is anything but a one-to-one opcodes translation; looks more like instrumented code generated by a framework such as DynamoRIO. To provide an example, a long jmp in the source architecture will look more like a couple of mov instructions that modify the CPU state structure (in memory) to emulate the effect.

Let’s see how the instructions decoded in ‘CPU reset’ section look like when translated.

Source instruction:

ljmp $0xf000,$0xe05b

1	ljmp $0xf000,$0xe05b

Translated instructions block:

mov    -0x10(%rbp),%ebx
test   %ebx,%ebx
jl     0x7fffec8a3136 <code_gen_buffer+268>
movl   $0xf000,0xd0(%rbp)
movq   $0xf0000,0xd8(%rbp)
movq   $0xe05b,0x80(%rbp)
...

mov -0x10(%rbp),%ebx

test %ebx,%ebx

jl 0x7fffec8a3136 <code_gen_buffer+268>

movl $0xf000,0xd0(%rbp)

movq $0xf0000,0xd8(%rbp)

movq $0xe05b,0x80(%rbp)

...

You see there how the ljmp is represented by setting 0xf000 in the CS selector, 0xf0000 (0xf000 * 16) in the CS base address and 0xe05b in the PC. RBP register is pointing to the CPUX86State structure.

Source instructions:

cmpl $0x0,%cs:0x62e8
jne 0x0000d08a

1 2	cmpl $0x0,%cs:0x62e8 jne 0x0000d08a

Translated instructions block:

mov    -0x10(%rbp),%ebx
test   %ebx,%ebx
jl     0x7fffec8a32e1 <code_gen_buffer+695>
mov    0xd8(%rbp),%rbx
add    $0x62e8,%rbx
mov    %ebx,%ebx
mov    %rbx,%rdi
shr    $0x7,%rdi
and    -0x20(%rbp),%rdi
add    -0x18(%rbp),%rdi
lea    0x3(%rbx),%rsi
and    $0xfffffffffffff000,%rsi
cmp    (%rdi),%rsi
mov    %rbx,%rsi
jne    0x7fffec8a32ed <code_gen_buffer+707>
add    0x18(%rdi),%rsi
mov    (%rsi),%ebx
movq   $0x0,0x98(%rbp)
mov    %rbx,0x90(%rbp)
movl   $0x10,0xa8(%rbp)
test   %rbx,%rbx
jne    0x7fffec8a32cb <code_gen_buffer+673>
data16 xchg %ax,%ax
jmpq   0x7fffec8a32b4 <code_gen_buffer+650>
movq   $0xe066,0x80(%rbp)
...

mov -0x10(%rbp),%ebx

test %ebx,%ebx

jl 0x7fffec8a32e1 <code_gen_buffer+695>

mov 0xd8(%rbp),%rbx

add $0x62e8,%rbx

mov %ebx,%ebx

mov %rbx,%rdi

shr $0x7,%rdi

and -0x20(%rbp),%rdi

add -0x18(%rbp),%rdi

lea 0x3(%rbx),%rsi

and $0xfffffffffffff000,%rsi

cmp (%rdi),%rsi

mov %rbx,%rsi

jne 0x7fffec8a32ed <code_gen_buffer+707>

add 0x18(%rdi),%rsi

mov (%rsi),%ebx

movq $0x0,0x98(%rbp)

mov %rbx,0x90(%rbp)

movl $0x10,0xa8(%rbp)

test %rbx,%rbx

jne 0x7fffec8a32cb <code_gen_buffer+673>

data16 xchg %ax,%ax

jmpq 0x7fffec8a32b4 <code_gen_buffer+650>

movq $0xe066,0x80(%rbp)

...

That comparison followed by a conditional jump was a bit more involved.

Source instructions:

xor %dx,%dx
mov %dx,%ss

1 2	xor %dx,%dx mov %dx,%ss

Translated instructions block:

mov    -0x10(%rbp),%ebx
test   %ebx,%ebx
jl     0x7fffec8a3456 <code_gen_buffer+1068>
mov    0x10(%rbp),%rbx
xor    %ecx,%ecx
mov    %cx,%bx
mov    %rbx,0x10(%rbp)
movl   $0x0,0xe8(%rbp)
movq   $0x0,0xf0(%rbp)
movq   $0xe06a,0x80(%rbp)
...

mov -0x10(%rbp),%ebx

test %ebx,%ebx

jl 0x7fffec8a3456 <code_gen_buffer+1068>

mov 0x10(%rbp),%rbx

xor %ecx,%ecx

mov %cx,%bx

mov %rbx,0x10(%rbp)

movl $0x0,0xe8(%rbp)

movq $0x0,0xf0(%rbp)

movq $0xe06a,0x80(%rbp)

...

I won’t continue but there are a few of observations that I want to make:

registers used for the actual computation in the host are not necessarily the same than in the source instruction (i.e.: DX in the source is CX in the last translation block);
registers state (as any other state) has to be saved after the translation block is executed (i.e.: see how RBX register value is moved to memory pointed by RBP+0x10); and
the instruction that updates the PC (i.e.: move of 0xe06a to RBP+0x80) is a good hint to understand the number of source instructions executed within the translation block.

Bonus point

Where is bios-256k.bin source code located?

It’s in the seabios repository. The BIOS image entry point (at offset 0x3fff0) is in the romlayout.S file.