Quarkslab folks proposed a challenge: modify any compiler to generate code with some obfuscation but valid semantics. In other words, add some noise but keep the program logic valid.
Time to have some fun with C1: OpenJDK’s 1st tier JIT compiler.
Obfuscation “noise” can be added at different stages of the compilation process. Adding it early may be useful to target multiple architectures but it’s necessary to check if following optimization passes revert the effect.
The basic flow for C1 is the following: Java bytecodes -> IR (Intermediate Representation) -> LIR (Low-level Intermediate Representation) -> assembly code (architecture specific). A few key functions involved in these transformations: GraphBuilder::iterate_bytecodes_for_block (for IR generation), LIRGenerator::block_do (for LIR generation) and LIR_Assembler::emit_lir_list (for assembly generation).
This is my basic pseudo-random Java test code -don’t even try to make sense of it!-:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
public class Main { private static volatile int a; private static String s; public static void main(String[] args) { System.out.println("Jit example started"); int max_iterations = 15000; while(max_iterations-- > 0) { f(); } System.out.println("a (final): " + a); System.out.println("s (final): " + s); } private static void f() { int i = 100; while (i-- > 0) { a = a + 1234; } s = new Integer(a).toString(); } } |
Function Main::f is called many times in a loop so it will be compiled by C1 first.
This is Main::f in bytecodes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
private static void f(); Code: 0: bipush 100 2: istore_0 3: iload_0 4: iinc 0, -1 7: ifle 23 10: getstatic #6 // Field a:I 13: sipush 1234 16: iadd 17: putstatic #6 // Field a:I 20: goto 3 23: new #10 // class java/lang/Integer 26: dup 27: getstatic #6 // Field a:I 30: invokespecial #11 // Method java/lang/Integer."<init>":(I)V 33: invokevirtual #12 // Method java/lang/Integer.toString:()Ljava/lang/String; 36: putstatic #8 // Field s:Ljava/lang/String; 39: return |
My obfuscation noise will be before any addition operation. In this case, before the iadd instruction.
This will be my x86_64 innocuous noise:
1 2 3 4 5 6 7 8 9 10 |
PUSHFD PUSH RAX PUSH RDX tryAgain: RDTSC CMP EAX, EDX JE tryAgain POP RDX POP RAX POPFD |
I’m saving the EFLAGS, RAX and RDX registers first because their values will be destroyed -restoring original values is mandatory to keep the state integrity after the obfuscation code-. Then I get the timestamp with RDTSC and compare the lower and higher halves. They will likely be different. In the unrealistic case that they are equal, we try again. Then we restore original values and continue execution as if nothing happened.
In x86_64, this is Main::f JIT compiled method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
0x7f03f570c0c5: cmp $0x0,%esi 0x7f03f570c0c8: je 0x7f03f570c281 0x7f03f570c0ce: mov $0x64,%edx 0x7f03f570c0d3: jmpq 0x7f03f570c12f 0x7f03f570c0d8: movabs $0x6d9a02658,%rdx 0x7f03f570c0e2: mov 0x74(%rdx),%esi 0x7f03f570c0e5: pushfq 0x7f03f570c0e6: push %rax 0x7f03f570c0e7: push %rdx 0x7f03f570c0e8: rdtsc 0x7f03f570c0ea: cmp %edx,%eax 0x7f03f570c0ec: je 0x7f03f570c0e8 0x7f03f570c0ee: pop %rdx 0x7f03f570c0ef: pop %rax 0x7f03f570c0f0: popfq 0x7f03f570c0f1: add $0x4d2,%esi 0x7f03f570c0f7: mov %esi,0x74(%rdx) 0x7f03f570c0fa: lock addl $0x0,-0x40(%rsp) 0x7f03f570c100: movabs $0x7f03d5cfcc60,%rdx 0x7f03f570c10a: mov 0x24(%rdx),%esi 0x7f03f570c10d: add $0x8,%esi 0x7f03f570c110: mov %esi,0x24(%rdx) |
I then decided to add some unaligned jumps as an anti-disassembly technique. Main::f now looks a bit more obscure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
0x7f4b64c52865: push %rax 0x7f4b64c52866: lea 0x5(%rip),%rax # 0x7f4b64c52872 0x7f4b64c5286d: jmpq *%rax 0x7f4b64c5286f: callq 0x7f4b66b0d32f 0x7f4b64c52874: callq 0x7f4b95d47b15 0x7f4b64c52879: jmp 0x7f4b64c5287c 0x7f4b64c5287b: callq 0x7f4b5c39eabb 0x7f4b64c52880: jmp 0x7f4b64c52883 0x7f4b64c52882: callq 0x7f4ae61dc5e1 0x7f4b64c52887: (bad) 0x7f4b64c52888: rolb %cl,(%rax,%rax,1) 0x7f4b64c5288b: add %cl,-0x7c0f8b8e(%rcx) 0x7f4b64c52891: rex.R and $0xc0,%al 0x7f4b64c52894: add %cl,-0x46(%rax) 0x7f4b64c52897: (bad) 0x7f4b64c52898: fmul %st(7),%st 0x7f4b64c5289a: xor 0x7f(%rbx),%ecx |
Download patch here (based on JDK rev 89111a0e6355).