Even though x86 memory segmentation is largely unused in 64 bits mode, the Linux kernel still initializes the CPU’s Global Descriptor Table (GDT) and points the gdt register to it. I was curious about its content and how segment selectors can be used today. This article is a brief summary of my experiments and findings.
To begin with, we can read the gdt register calling native_store_gdt in kernel space (arch/x86/include/asm/desc.h):
1 |
native_store_gdt(&gdt_data); |
1 |
GDT address (%gdt): 0xffffffffff577000, size: 127 |
We see there that the register contains a virtual address, so segmentation is previous to pagination in protected mode.
Now that we know the GDT location and size, we can dump it from memory. Each entry is described by struct desc_struct (arch/x86/include/asm/desc_defs.h).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
NULL entry (0) {{{a = 0, b = 0}, {limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_KERNEL32_CS (1) {limit0 = 65535, base0 = 0, base1 = 0, type = 11, s = 1, dpl = 0, p = 1, limit = 15, avl = 0, l = 0, d = 1, g = 1, base2 = 0} GDT_ENTRY_KERNEL_CS (2) {limit0 = 65535, base0 = 0, base1 = 0, type = 11, s = 1, dpl = 0, p = 1, limit = 15, avl = 0, l = 1, d = 0, g = 1, base2 = 0} GDT_ENTRY_KERNEL_DS (3) {limit0 = 65535, base0 = 0, base1 = 0, type = 3, s = 1, dpl = 0, p = 1, limit = 15, avl = 0, l = 0, d = 1, g = 1, base2 = 0} GDT_ENTRY_DEFAULT_USER32_CS (4) {limit0 = 65535, base0 = 0, base1 = 0, type = 11, s = 1, dpl = 3, p = 1, limit = 15, avl = 0, l = 0, d = 1, g = 1, base2 = 0} GDT_ENTRY_DEFAULT_USER_DS (5) {limit0 = 65535, base0 = 0, base1 = 0, type = 3, s = 1, dpl = 3, p = 1, limit = 15, avl = 0, l = 0, d = 1, g = 1, base2 = 0} GDT_ENTRY_DEFAULT_USER_CS (6) {limit0 = 65535, base0 = 0, base1 = 0, type = 11, s = 1, dpl = 3, p = 1, limit = 15, avl = 0, l = 1, d = 0, g = 1, base2 = 0} Unused? (7) {limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_TSS - 1st part (8) {limit0 = 8303, base0 = 31872, base1 = 193, type = 11, s = 0, dpl = 0, p = 1, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 127} GDT_ENTRY_TSS - 2nd part (9) {limit0 = 34816, base0 = 65535, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_LDT - 1st part (10) {limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_LDT - 2nd part (11) {limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_TLS_MIN (12) {limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_TLS (13) {limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_TLS_MAX (14) {limit0 = 0, base0 = 0, base1 = 0, type = 0, s = 0, dpl = 0, p = 0, limit = 0, avl = 0, l = 0, d = 0, g = 0, base2 = 0} GDT_ENTRY_PER_CPU (15) {limit0 = 0, base0 = 0, base1 = 0, type = 5, s = 1, dpl = 3, p = 1, limit = 0, avl = 0, l = 0, d = 1, g = 0, base2 = 0} ... |
The table describes a few plain memory segments in its first entries, with base address values of 0x0 and sizes of 2^32 –size being determined by the limit value and ‘page’ granularity-. I’ve not showed beyond the 15th entry for space reasons but can tell you that they are all empty.
It’s worth noticing that every entry is 8 bytes long (base address of 4 bytes) except for GDT_ENTRY_TSS and GDT_ENTRY_LDT, which are 16 bytes long (base address of 8 bytes). GDT_ENTRY_TSS appears to be the only one with some information. There is a selector register named tr pointing to this entry, which can be read calling native_store_tr (arch/x86/include/asm/desc.h).
1 |
tr_register = native_store_tr(); |
1 |
TR (TSS) index to GDT: 0x0000000000000040 |
The tr register value is an index to the GDT -as it is any selector register-. We can read the entry in the GDT, obtain the base address and finally read the TSS structure:
1 |
tss = (struct tss_struct*)(((unsigned long)(tss_entry_in_gdt->base3) << 32) | ((unsigned long)(tss_entry_in_gdt->base2) << 24) | ((unsigned long)(tss_entry_in_gdt->base1) << 16) | (unsigned long)(tss_entry_in_gdt->base0)); |
1 |
TSS entry adddress in GDT: 0xffff88007fc17c80 |
The structure that describes the TSS entry in the GDT is struct ldttss_desc64 (arch/x86/include/asm/desc.h). The structure that describes the TSS is struct tss_struct (arch/x86/include/asm/processor.h).
TSS stands for Task State Segment (see more here). Just for curiosity I printed some values:
1 2 3 4 |
TSS sp0: 0xffffc90000534000 TSS sp1: 0x0 TSS sp2: 0x0 Current $rsp: 0xffffc90000533db8 |
SP0 seems to contain the base of the stack.
Going back to the GDT, I wondered how Thread-Local Storage (TLS) works if its entries (GDT_ENTRY_TLS_MIN to GDT_ENTRY_TLS_MAX) are empty. The fact that these entries have a base address of 4 bytes is an indication that they are not used in 64 bits; otherwise, possible TLS locations would be limited in the virtual address range and definitely not suitable for kernel use.
In glibc’s source code, we see that the FS segment selector is not directly set with a mov instruction -to indicate the TLS entry index in the GDT-. However, there is a system call named arch_prctl, executed with a ARCH_SET_FS flag, which looks related.
Long story short, arch_prctl + ARCH_SET_FS does not set neither the FS segment selector -which continues to be 0- nor the GDT entry -which continues to be empty-, but a special MSR_FS_BASE register. The address value passed through the system call is also kept in the task_struct.
1 2 3 4 5 |
current->thread.fsbase: 0x00007f9c59728b40 current->thread.fsindex: 0 %fs (fsindex): 0 MSR_FS_BASE: 0x00007f9c59728b40 |
We read the MSR_FS_BASE register in kernel space with:
1 |
rdmsrl(MSR_FS_BASE, msr_fs_base) |
rdmsrl is located in arch/x86/include/asm/msr.h.
We read the FS segment selector in kernel space with:
1 |
savesegment(fs, fsindex); |
savesegment is located in arch/x86/include/asm/segment.h.
Every use of the FS selector for a memory-access operation makes the MSR_FS_BASE value to be added to the index: the FS selector is not used as an index to the GDT and the base address is not retrieved from there.
The GS selector works quite the same than FS: user-space code can set it with arch_prctl + ARCH_SET_GS, the segment selector register remains 0, the address value is set in MSR_GS_BASE and stored in task_struct. But there is a difference: the MSR_GS_BASE register holds the value only while user-space code is executed. As soon as execution moves to the kernel, the x86 swapgs instruction swaps its value with the MSR_KERNEL_GS_BASE register. The reason is that the kernel uses the GS base to point to its own per-CPU local storage area.
1 2 3 4 5 6 7 8 9 10 11 |
current->thread.gsbase: 0x00000000ffffffff current->thread.gsindex: 0 %gs (gsindex): 0 MSR_GS_BASE: 0xffff88007fc00000 MSR_KERNEL_GS_BASE: 0x00000000ffffffff ---SWAPGS--- %gs (gsindex): 0 MSR_GS_BASE: 0x00000000ffffffff MSR_KERNEL_GS_BASE: 0xffff88007fc00000 |
Finally, I wanted to know what happens if we write a GDT entry and set an index in the FS segment selector pointing to it, like in the old x86-32 times.
Here we have the experiment code for kernel:
1 2 3 4 5 6 7 8 9 10 11 12 |
unsigned int new_fs_index = (GDT_ENTRY_TLS_MIN*8); unsigned int tmp_memory_segment_address = 0xAABBCCDDU; struct desc_struct tmp_gdt_entry = {0x0}; set_desc_base(&tmp_gdt_entry, (unsigned long)tmp_memory_segment_address); set_desc_limit(&tmp_gdt_entry, PAGE_SIZE); tmp_gdt_entry.type = 5; tmp_gdt_entry.dpl = 0; tmp_gdt_entry.s = 1; tmp_gdt_entry.p = 1; tmp_gdt_entry.l = 1; native_write_gdt_entry((struct desc_struct*)(get_cpu_gdt_rw(smp_processor_id())), GDT_ENTRY_TLS_MIN, (void*)&tmp_gdt_entry, 0x0); loadsegment(fs, new_fs_index); |
set_desc_base, set_desc_limit and native_write_gdt_entry are located in arch/x86/include/asm/desc.h, while loadsegment is in arch/x86/include/asm/segment.h.
This is the result:
1 2 3 |
%fs (fsindex) after set: 96 MSR_FS_BASE after set: 0x00000000aabbccdd |
Immediately after setting the FS segment selector, its descriptor entry in the GDT is read and the MSR_FS_BASE register set with the base address. There is probably no reason to set MSR_FS_BASE in this way and be -unnecessarily- limited to 32 bits addresses.