Reading Linux Inside (by 0xAX) and the early steps of the kernel boot process sparked my curiosity about the BIOS code executed immediately after power-on. Somehow related to that, tinkering with QEMU has been on my backlog for quite some time. That made the recipe for a new -and quick- challenge: debug QEMU while executing the first instructions of the BIOS firmware.
The first step was to setup an environment to compile and debug QEMU. The compiling capability is actually optional. My RPM-based environment runs on top of Fedora 31 (x86_64). QEMU version is 4.1.1. Binary translation mode has been used for emulation -hardware acceleration may come in a follow up-.
With no other preamble, let’s go straight to the task.
ROM loading
Identifying where the ROM image gets loaded into the host memory has been a helpful starting point:
- pc_memory_init (hw/i386/pc.c)
- pc_system_firmware_init (hw/i386/pc_sysfw.c)
- old_pc_system_rom_init (hw/i386/pc_sysfw.c)
- rom_add_file (hw/core/loader.c)
- old_pc_system_rom_init (hw/i386/pc_sysfw.c)
- pc_system_firmware_init (hw/i386/pc_sysfw.c)
The rom structure in rom_add_file, once filled, reveals some interesting data about the ROM image:
- File: /usr/share/qemu/bios-256k.bin
- Size: 262144 bytes (256 KB)
- Emulated physical address: 0xfffc0000
- Host virtual address: 0x55555678da00 (*) (**)
(*) This value changes with each run.
(**) All host memory addresses in this article refer to the location where the BIOS image was initially loaded. This data is transferred to a pc.bios RAM block upon virtual machine reset. See more information here.
In old_pc_system_rom_init (referenced from the call stack) it possible to see the pci memory regions created and how the ROM image fits in-there. A region named pc.bios of the same size than the ROM image is created first, and added as a sub-region of pci at the emulated physical address 0xfffc0000. An alias of pci.bios, named isa-bios, is created mapping its last 128 KB. It is then added as a sub-region of pci at the emulated physical address 0xe0000. As a result, both emulated physical addresses 0xe0000 and 0xfffe0000 point to the beginning of the last 128 KB of the BIOS image (offset 0x20000), whereas address 0xfffc0000 points to the beginning of the BIOS image (offset 0x0).
CPU reset
The next interesting event, after memory and other hardware initialization, is CPU reset:
- x86_cpu_reset (target/i386/cpu.c)
The Program Counter (EIP register in x86) is set to 0xfff0, the CS selector to 0xf000 and the CS base address to 0xffff0000. Adding the PC value to the CS base address we conclude that the CPU will start at 0xfffffff0 (emulated physical address). Note: all of the previous values were artificially set to comply with the x86 specification. In x86 real mode, addresses during run time will be calculated as segment selector * 16 + offset.
Doing some trivial math (0xfffffff0 – 0xfffc0000) we realize that the first instruction to be executed is at the BIOS image offset 0x3fff0. If the BIOS image was loaded into the host address 0x55555678da00 (see ‘ROM loading’ above), then the first instruction should be at 0x5555567cd9f0.
What do we have there?
1 2 |
(gdb) x/5xb 0x5555567cd9f0 0x5555567cd9f0: 0xea 0x5b 0xe0 0x00 0xf0 |
Doing some 16 bits i8086 decoding:
1 |
ljmp $0xf000,$0xe05b |
That is a long jump backwards to the emulated physical address 0xfe05b (CS 0xf000 * 16 + PC 0xe05b). Doing some trivial math again (0xfe05b – 0xe0000), that is 0x1e05b into the isa-bios memory region. Considering that the isa-bios is the last 128 KB of the BIOS image (offset 0x20000), the jump destination is at BIOS image offset 0x3e05b (0x20000 +0x1e05b). In terms of host addresses, that is 0x5555567cba5b.
What do we have there?
1 2 3 4 5 |
(gdb) x/30xb 0x5555567cba5b 0x5555567cba5b: 0x2e 0x66 0x83 0x3e 0xe8 0x62 0x00 0x0f 0x5555567cba63: 0x85 0x24 0xf0 0x31 0xd2 0x8e 0xd2 0x66 0x5555567cba6b: 0xbc 0x00 0x70 0x00 0x00 0x66 0xba 0x09 0x5555567cba73: 0xf2 0x0e 0x00 0xe9 0x8b 0xee |
Some 16 bits i8086 decoding again:
1 2 3 4 5 6 7 |
cmpl $0x0,%cs:0x62e8 jne 0x0000d08a xor %dx,%dx mov %dx,%ss mov $0x7000,%esp mov $0xef209,%edx jmp 0x0000cf04 |
And so on. I won’t continue decoding for readers sanity but you get the idea.
Binary translation
Our final step is to see this in action.
A few code pointers for this section:
- cpu_exec (accel/tcg/cpu-exec.c)
- tb_find (accel/tcg/cpu-exec.c)
- tb_gen_code (accel/tcg/translate-all.c)
- gen_intermediate_code (target/i386/translate.c)
- translator_loop (accel/tcg/translator.c)
- i386_tr_translate_insn (target/i386/translate.c)
- disas_insn (target/i386/translate.c)
- i386_tr_translate_insn (target/i386/translate.c)
- translator_loop (accel/tcg/translator.c)
- gen_intermediate_code (target/i386/translate.c)
- tb_gen_code (accel/tcg/translate-all.c)
- cpu_tb_exec (accel/tcg/cpu-exec.c)
- cpu_loop_exec_tb (accel/tcg/cpu-exec.c)
- tcg_qemu_tb_exec (tcg/tcg.h)
- cpu_loop_exec_tb (accel/tcg/cpu-exec.c)
- tb_find (accel/tcg/cpu-exec.c)
When in binary translation mode, QEMU retrieves the instructions to be executed and transform them into native host instructions. Even though it’s possible to establish a relationship between both sets, it is anything but a one-to-one opcodes translation; looks more like instrumented code generated by a framework such as DynamoRIO. To provide an example, a long jmp in the source architecture will look more like a couple of mov instructions that modify the CPU state structure (in memory) to emulate the effect.
Let’s see how the instructions decoded in ‘CPU reset’ section look like when translated.
Source instruction:
1 |
ljmp $0xf000,$0xe05b |
Translated instructions block:
1 2 3 4 5 6 7 |
mov -0x10(%rbp),%ebx test %ebx,%ebx jl 0x7fffec8a3136 <code_gen_buffer+268> movl $0xf000,0xd0(%rbp) movq $0xf0000,0xd8(%rbp) movq $0xe05b,0x80(%rbp) ... |
You see there how the ljmp is represented by setting 0xf000 in the CS selector, 0xf0000 (0xf000 * 16) in the CS base address and 0xe05b in the PC. RBP register is pointing to the CPUX86State structure.
Source instructions:
1 2 |
cmpl $0x0,%cs:0x62e8 jne 0x0000d08a |
Translated instructions block:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
mov -0x10(%rbp),%ebx test %ebx,%ebx jl 0x7fffec8a32e1 <code_gen_buffer+695> mov 0xd8(%rbp),%rbx add $0x62e8,%rbx mov %ebx,%ebx mov %rbx,%rdi shr $0x7,%rdi and -0x20(%rbp),%rdi add -0x18(%rbp),%rdi lea 0x3(%rbx),%rsi and $0xfffffffffffff000,%rsi cmp (%rdi),%rsi mov %rbx,%rsi jne 0x7fffec8a32ed <code_gen_buffer+707> add 0x18(%rdi),%rsi mov (%rsi),%ebx movq $0x0,0x98(%rbp) mov %rbx,0x90(%rbp) movl $0x10,0xa8(%rbp) test %rbx,%rbx jne 0x7fffec8a32cb <code_gen_buffer+673> data16 xchg %ax,%ax jmpq 0x7fffec8a32b4 <code_gen_buffer+650> movq $0xe066,0x80(%rbp) ... |
That comparison followed by a conditional jump was a bit more involved.
Source instructions:
1 2 |
xor %dx,%dx mov %dx,%ss |
Translated instructions block:
1 2 3 4 5 6 7 8 9 10 11 |
mov -0x10(%rbp),%ebx test %ebx,%ebx jl 0x7fffec8a3456 <code_gen_buffer+1068> mov 0x10(%rbp),%rbx xor %ecx,%ecx mov %cx,%bx mov %rbx,0x10(%rbp) movl $0x0,0xe8(%rbp) movq $0x0,0xf0(%rbp) movq $0xe06a,0x80(%rbp) ... |
I won’t continue but there are a few of observations that I want to make:
- registers used for the actual computation in the host are not necessarily the same than in the source instruction (i.e.: DX in the source is CX in the last translation block);
- registers state (as any other state) has to be saved after the translation block is executed (i.e.: see how RBX register value is moved to memory pointed by RBP+0x10); and
- the instruction that updates the PC (i.e.: move of 0xe06a to RBP+0x80) is a good hint to understand the number of source instructions executed within the translation block.
Bonus point
Where is bios-256k.bin source code located?
It’s in the seabios repository. The BIOS image entry point (at offset 0x3fff0) is in the romlayout.S file.
Further reading:
very good article