Virtual calls come at a significant performance cost in many occasions. It’s not only that memory accesses to vtables take CPU cycles and could pollute caches, but also how method-inlining savings are missed. Proper engineering practices require to use expensive resources only when needed. Even when programming languages offer syntactic hints for the developer to make a decision (i.e. virtual or final method declarations), it’s ultimately up to a good compiler to perform a thorough analysis and optimize.
In this series of short examples, we will see how the C1 just-in-time compiler in the Hotspot Java Virtual Machine (JVM) performs Class Hierarchical Analysis (CHA) and deals with virtual calls. For each case, we will discuss what should happen, observe what actually happened and elaborate an explanation.
All the examples are based on the following Java classes topology:
This is the base template code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
interface H { void m(); } interface I extends H { } class A1 implements I { public void m() { System.out.println("A1::m"); } } class A2 implements I { public void m() { System.out.println("A2::m"); } } public final class Main { // JIT-compiled (C1). // Called from Main::main. // Will contain a virtual callsite to ::m in each example. public static void j<n>(...) throws Throwable { ... } public static void main(String[] args) throws Throwable { ... } } |
A few notes before moving on:
- The template is compiled with
1javac -sourcepath . -d . Main.java - The template is run with
1java -classpath . -Xcomp -XX:TieredStopAtLevel=1 -XX:CompileOnly=Main::j1,Main::j2,Main::j3,H,I,A1,A2 -XX:CompileCommand=print,Main::j1 -XX:CompileCommand=print,Main::j2 -XX:CompileCommand=print,Main::j3 Main - For each n case, we will instantiate
Main::j<n>
-for example,Main::j1
– andMain::main
methods - We refer to virtual calls in a broad sense that includes interface calls in Java
- All this work is based on JDK-17 running on Fedora Linux x86_64.
Example 1
1 2 3 4 5 6 7 8 9 10 |
public static void j1() throws Throwable { I i = new A1(); i.m(); } public static void main(String[] args) throws Throwable { // Load A1. A2 is not loaded. new A1(); j1(); } |
Independently of class A2
, the only possible target for the virtual callsite in j1
is A1::m
. Even if class A1
is subclassed at any point of the execution with an additional implementation of m
–A
is not final and classes can be dynamically loaded in the JVM-, its instances cannot reach the virtual callsite. Thus, a static binding to the target method should be safe in this scenario.
At bytecodes level, this is how j1 looks like:
1 2 3 4 5 6 7 8 9 |
public static void j1() throws java.lang.Throwable; Code: 0: new #7 // class A1 3: dup 4: invokespecial #9 // Method A1."<init>":()V 7: astore_0 8: aload_0 9: invokeinterface #10, 1 // InterfaceMethod I.m:()V 14: return |
The virtual call is the one at instruction #9: invokeinterface
. We know a few things about the callsite target method: its name is m
, its signature is ()V
-so it receives no arguments and returns void- and it’s either held-in or inherited-to an interface with name I
.
To better understand compiler decisions, it’s helpful to scrutinize some of the JVM internal structures that we have. Java classes and interfaces H
, I
and A1
are represented at the JVM level as instances of the C++ class InstanceKlass
. Let’s start with the interface I
and explore from there:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
(gdb) set $IClass = (InstanceKlass*)0x8000c1210 (gdb) x/s $IClass->_name->_body "I" (gdb) print $IClass->_methods->_length 0 (gdb) print $IClass->_local_interfaces->_length 1 (gdb) x/s $IClass->_local_interfaces->_data[0]->_name->_body "H" (gdb) print $IClass->_local_interfaces->_data[0]->_methods->_length 1 (gdb) set $MMethod = $IClass->_local_interfaces->_data[0]->_methods->_data[0] (gdb) x/s (**((Symbol**)((intptr_t*)(((char*)(((Method*)$MMethod)->_constMethod->_constants)) + sizeof(ConstantPool)) + (((Method*)$MMethod)->_constMethod->_name_index))))._body "m" |
What we see is that I
does not have any methods. If we look into interfaces of I
, there is H
which contains the method H::m
. We have not looked into H::m
flags but it’s obviously abstract because H
does not provide an implementation. Two simple remarks: 1) I
inherits but does not contain m
-hence why its _methods
array is empty-; and, 2) an abstract method is still represented with an entry in the _methods
array.
Let’s look down the hierarchy now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
(gdb) set $IImplementor = (*((InstanceKlass**)((char*)$IClass+sizeof(InstanceKlass)+($IClass->_vtable_len*sizeof(void*))+($IClass->_itable_len*sizeof(void*))+($IClass->_nonstatic_oop_map_size*sizeof(void*))))) (gdb) x/s $IImplementor->_name->_body "A1" (gdb) set $A1Class = $IImplementor (gdb) print $A1Class->_methods->_length 2 (gdb) set $A1Method0 = $A1Class->_methods->_data[0] (gdb) x/s (**((Symbol**)((intptr_t*)(((char*)(((Method*)$A1Method0)->_constMethod->_constants)) + sizeof(ConstantPool)) + (((Method*)$A1Method0)->_constMethod->_name_index))))._body "<init>" (gdb) set $A1Method1 = $A1Class->_methods->_data[1] (gdb) x/s (**((Symbol**)((intptr_t*)(((char*)(((Method*)$A1Method1)->_constMethod->_constants)) + sizeof(ConstantPool)) + (((Method*)$A1Method1)->_constMethod->_name_index))))._body "m" |
I
has only one implementor: A1
. This is true because A2
has not been loaded in this example. A1
has two methods: <init>
(the constructor) and A1::m
. Notice how H::m
and A1::m
are different methods, being the latter a concrete one.
What A1
has in its vtable, after the slots inherited from its superclass (java.lang.Object
), is a pointer to A1::m
:
1 2 |
(gdb) print (*(Method**)(((char*)$A1Class)+sizeof(InstanceKlass)+sizeof(void*)*$A1Class->_super->_vtable_len)) == $A1Method1 true |
More interestingly, we can now look at A1
interface vtables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
(gdb)print $A1Class->_local_interfaces->_length 1 (gdb) print $A1Class->_local_interfaces->_data[0] == $IClass true (gdb) print $A1Class->_itable_len 7 (gdb) x/s (*(InstanceKlass**)(((char*)$A1Class)+sizeof(InstanceKlass)+$A1Class->_vtable_len*sizeof(void*)+sizeof(void*)*0))->_name->_body "H" (gdb) set $HVtableOffset = (*(int*)(((char*)$A1Class)+sizeof(InstanceKlass)+$A1Class->_vtable_len*sizeof(void*)+sizeof(void*)*1)) (gdb) x/s (*(InstanceKlass**)(((char*)$A1Class)+sizeof(InstanceKlass)+$A1Class->_vtable_len*sizeof(void*)+sizeof(void*)*2))->_name->_body "I" (gdb) set $IVtableOffset = (*(int*)(((char*)$A1Class)+sizeof(InstanceKlass)+$A1Class->_vtable_len*sizeof(void*)+sizeof(void*)*3)) |
A1
has two interface vtables: H
and I
. In the gdb variables $HVtableOffset
and $IVtableOffset
we have the offset to each of them, counting from the start of InstanceKlass
. We expect the I
interface vtable to be empty because I
declares no methods. However, we can look at the first -and only- entry of the interface vtable that corresponds to H
:
1 2 |
(gdb) print (*(Method**)(((char*)$A1Class)+$HVtableOffset)) == $A1Method1 true |
In the interface vtable that corresponds to H
, A1
has a pointer to A1::m
.
In summary, when we pass an instance of A1
-in Object-Oriented terms, a receiver– to a callsite whose target is either H::m
or I::m
, the method A1::m
will be obtained from A1
interface vtables and invoked. If the callsite has a A1::m
target, the A1
vtable is used to obtain the A1::m
reference.
Now let’s look back at the invokeinterface callsite with target I::m
in Main::j1
. If we could guarantee that the only possible receiver there is of type A1
, then we know that the method to be invoked will be A1::m
(obtained from A1
interface vtables).
At C1 level, this is how the virtual callsite in j1
looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
[Verified Entry Point] # {method} {0x7fb9f88002b8} 'j1' '()V' in 'Main' 0x7fba05622c80: mov %eax,-0x16000(%rsp) ... 0x7fba05622c8c: movabs $0x801000400,%rdx ; {metadata('A1')} ... 0x7fba05622cbc: mov %rdx,%rcx ... 0x7fba05622cbf: movabs $0x800000000,%r10 0x7fba05622cc9: sub %r10,%rcx 0x7fba05622ccc: mov %ecx,0x8(%rax) ... 0x7fba05622cd8: jmpq 0x7fba05622db5 ... ;; patch entry point 0x7fba05622db5: callq 0x7fba05628a00 ; ImmutableOopMap {} ; *getstatic out ; {runtime_call ; load_mirror_patching} ... |
In the first instructions we see how an instance of A1
is created: %rax
points to the new instance and, at offset 0x8
(object header), a compressed pointer to InstanceKlass A1
is written (0x801000400 - 0x800000000 = 0x1000400
). The last instruction listed is a trampoline to call a patcher in run time. After a couple of patcher calls, the code to obtain System.out
will be written starting at 0x7fba05622cd8
. The important thing for us is that this is part of A1::m
. As predicted, the virtual callsite was statically bound to A1::m
and then inlined.
C1 decided that the callsite was monomorphic and could, thus, statically bind it to a method in GraphBuilder::invoke
. These are some initial context values when called:
Bytecodes::Code code
:_invokeinterface
ciMethod* target
:H::m
ciKlass* holder
:I
ciInstanceKlass* klass
:H
ciInstanceKlass* calling_klass
:Main
ciInstanceKlass* callee_holder
:I
ciInstanceKlass* actual_recv
:I
The first action is trying to determine the exact type of the receiver here. The JVM keeps track of the instructions that pushed values to the stack, so it’s possible to trace how the receiver got there. This bring us to the new A1()
instruction, represented by a C++ NewInstance
class instance (subclass of Instruction
):
1 2 |
(gdb) x/s ((NewInstance*)0x7fffa4033510)->_klass->_name->_symbol->_body "A1" |
The class NewInstance
overloads the exact_type
method to return the class. The variable type
points to A1
and it’s an exact type. receiver_klass
is set to A1
and the exact target method is located calling ciMethod::resolve_invoke
here with the following parameters: target = H::m
, calling_klass = Main
and receiver_klass = A1
. I won’t dive into ciMethod::resolve_invoke
details but the aforementioned JVM structures should give an idea of why the A1::m
is returned (a method with name m
and signature ()V
is found in A1 receiver_klass
_methods
array). Once we know that static binding is possible, code
is updated to a Bytecodes::_invokespecial
value and no more virtual call onwards.
Full series of related articles: