ELF binaries generated by the GCC compiler may contain a special section named .gcc_except_table, which is also known as LSDA (Language Specific Data Area). Despite of its name, it’s generated by GCC’s language-agnostic back-end and, thus, is language independent. In this article we will briefly describe it and answer when it is generated, what is in-there and how it is used.
.gcc_except_table section is related to exceptions in the sense of try-catch-finally control-flow blocks. Part of the information there is for handling the exception and the rest for cleanup code (i.e.: calling object destructors when the stack is unwinded).
In a nutshell, GCC language front-ends (such as C++) generate try-catch-finally nodes which are appended to the Abstract Syntax Tree (AST). These nodes are then transformed into back-end nodes and, after multiple passes, simplified into jumps and labels. Information about exception regions and landing pads is kept in annotations associated to each function.
At the end of the process, when the ELF binary is generated, there are a couple of outputs: 1) executable code corresponding to exception or cleanup handlers (.text section) and, 2) read-only information in .gcc_except_table, .eh_frame and .eh_frame_hdr sections. The latter is used by runtimes when deciding how to handle an exception event; including stack unwinding, cleanup and handler (landing pad) selection. We won’t focus on .eh_frame and .eh_frame_hdr sections at this time but on .gcc_except_table.
Let’s go with a simple piece of C-exceptions-enhanced (*) code:
1 2 3 4 5 6 7 8 9 |
int main(int argc, char** argv) { volatile int a; __try { a = argc + argc; } __catch { a = argc; } return a; } |
Once compiled, we can have a look at the ELF sections in the binary:
1 |
readelf -S main |
1 2 3 4 5 6 |
[15] .eh_frame_hdr PROGBITS 0000000000400694 00000694 0000000000000034 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 00000000004006c8 000006c8 000000000000011c 0000000000000000 A 0 0 8 [17] .gcc_except_table PROGBITS 00000000004007e4 000007e4 0000000000000018 0000000000000000 A 0 0 4 |
For x86-64 architecture, this is the main function assembly code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
00000000004005e2 <main>: 4005e2: 55 push %rbp 4005e3: 48 89 e5 mov %rsp,%rbp 4005e6: 89 7d ec mov %edi,-0x14(%rbp) 4005e9: 48 89 75 e0 mov %rsi,-0x20(%rbp) 4005ed: 8b 45 ec mov -0x14(%rbp),%eax 4005f0: 01 c0 add %eax,%eax 4005f2: 89 45 fc mov %eax,-0x4(%rbp) 4005f5: 8b 45 fc mov -0x4(%rbp),%eax 4005f8: eb 08 jmp 400602 <main+0x20> 4005fa: 8b 45 ec mov -0x14(%rbp),%eax 4005fd: 89 45 fc mov %eax,-0x4(%rbp) 400600: eb f3 jmp 4005f5 <main+0x13> 400602: 5d pop %rbp 400603: c3 retq |
The code starting at address 0x4005fa looks like dead code. If we had to guess; that could be the exception handler, which should not be part of an expected execution flow. It’s interesting to notice how this code was not optimized and wiped out of the binary. But, how can we be sure of our guess? Where is the try region located?
A dump of the .gcc_except_table section will be the first step to answer the previous questions:
1 |
readelf -x ".gcc_except_table" main |
1 2 3 |
Hex dump of section '.gcc_except_table': 0x004007e4 ff031501 0c000e00 00100318 01130e00 ................ 0x004007f4 00010000 00000000 ........ |
Raw bytes look quite cryptic as it is binary information. Dumping the generated assembly code -before being turned into a binary- may be more helpful:
1 |
gcc -g -S -o main_asm main.c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
.section .gcc_except_table,"a",@progbits .align 4 .LLSDA0: .byte 0xff -> lp_format -> DW_EH_PE_omit -> @LPStart format ("omit") .byte 0x3 -> tt_format -> @TType format ("udata4") .uleb128 .LLSDATT0-.LLSDATTD0 -> 15 -> Offset to the end of the @TType table (21 bytes) .LLSDATTD0: .byte 0x1 -> DW_EH_PE_uleb128 -> Region (call-site) offsets format ("uleb128") .uleb128 .LLSDACSE0-.LLSDACSB0 -> 0c -> Regions (call-site) table length (12 bytes) .LLSDACSB0: .uleb128 .LEHB0-.LFB0 -> 00 -> region 0 start = offset 0 (pre try-block) .uleb128 .LEHE0-.LEHB0 -> 0e -> region 0 length = 14 bytes .uleb128 0 -> region 0 landing pad = 0 .uleb128 0 -> region 0 action = 0 .uleb128 .LEHB1-.LFB0 -> 10 -> region 1 start = offset 16 (try-block) .uleb128 .LEHE1-.LEHB1 -> 03 -> region 1 length = 3 bytes .uleb128 .L5-.LFB0 -> 18 -> region 1 landing pad = offset 24 to exception handler .uleb128 0x1 -> region 1 action = 1 .uleb128 .LEHB2-.LFB0 -> 13 -> region 2 start = offset 19 (post try-block) .uleb128 .LEHE2-.LEHB2 -> 0e -> region 2 length = 14 bytes .uleb128 0 -> region 2 landing pad = 0 .uleb128 0 -> region 2 action = 0 .LLSDACSE0: .byte 0x1 -> Action Record Table - filter .byte 0 -> Action Record Table - next .align 4 -> 00 .long 0 -> 00000000 -> Types Table (@TType) |
lp_format indicates the format of the landing pad pointers (exception handlers) in the section. These pointers are offsets relative to a base, named @LPStart. Value is always DW_EH_PE_omit in GCC, which means that @LPStart is the start of the function. Further information from both the compiler and runtime sides can be found in output_one_function_exception_table (gcc/except.c) and parse_lsda_header (gcc/libstdc++-v3/libsupc++/eh_personality.cc) respectively.
tt_format indicates the DWARF encoding (i.e.: DW_EH_PE_udata4) for type entries in the Types Table, named @TType. The reason to have this table is because exceptions are typed and handlers can filter by them. cfun->eh->ttype_data vector (gcc/except.h) contains the types data in GCC and functions such as dw2_asm_output_encoded_addr_rtx (gcc/dwarf2asm.c) can be used to generate the output.
After tt_format, there is an offset to the end of the @TType table. A NULL entry in the @TType table means ‘all types’. Other types may be represented with pointers to type information structures.
Next, a value indicates the format of the Call-site offsets. I.e.: DW_EH_PE_uleb128. Call-sites are function regions subject to the same exception or cleanup behavior.
Then, a value indicates the size of the Call-sites Table. This table may be generated by either dw2_output_call_site_table or sjlj_output_call_site_table functions (gcc/except.c) in GCC.
Each Call-sites Table entry has the following values: start offset, length, exception or cleanup handler offset (or 0 if there isn’t one) and action. Action is an offset + 1 to the Action Record Table or 0 if there are no exceptions to catch.
After the Call-sites Table, the Action Record Table is located. Each entry has a filter value and an offset in bytes to the next entry (0 is used to finish the chain). Filter value is a reverse index to the @TType table, starting at 1. A filter value of 0 indicates that there is a cleanup function to be executed (despite the types). Chains of entries represent all the types handled by a single landing pad (catch statement). Entries are added by add_action_record function (gcc/except.c) in GCC.
Finally, and after the @TTypes table, there may be an Exception Specification Table. This table has the information held in cfun->eh->ehspec_data vector (gcc/except.c). The purpose is to inform the exception types allowed to be directly or indirectly thrown by the function. Each entry is, thus, an index to the @TType table. See information of how C++ uses Dynamic Exception Specification here.
See more information about the gcc_except_table section here. I also recommend reading this blogpost series related to exceptions in C++.
(*) Experimental feature I’m developing for GCC. Update 2020-05-15: see here.