Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #3

Example 3

This article is a continuation of Part #2. The scenario to discuss now is similar to the previous with the only difference being class A2 that is loaded:

public static void j2(I i) throws Throwable {
    i.m();
}

public static void main(String[] args) throws Throwable {
    new A2();
    j2(new A1());
}

public static void j2(I i) throws Throwable {

i.m();

}

public static void main(String[] args) throws Throwable {

new A2();

j2(new A1());

}

With class A2 in existence, the virtual callsite in Main::j2 can receive an instance of it anytime. The expectation is, thus, that no static binding is applied.

This is how C1’s Main::j2 assembly looks like at the I::m callsite:

[Verified Entry Point]
  # {method} {0x00007f5da08002d0} 'j2' '(LI;)V' in 'Main'
0x7f5db4e0b7c0: mov    %eax,-0x16000(%rsp)
...
0x7f5db4e0b7cd: movabs $0xffffffffffffffff,%rax
0x7f5db4e0b7d7: callq 0x7f5db4eabf20 ; ImmutableOopMap {}
                                     ; *invokeinterface m
                                     ; {virtual_call}
...
0x7f5db4e0b7ee: retq

[Verified Entry Point]

# {method} {0x00007f5da08002d0} 'j2' '(LI;)V' in 'Main'

0x7f5db4e0b7c0: mov %eax,-0x16000(%rsp)

...

0x7f5db4e0b7cd: movabs $0xffffffffffffffff,%rax

0x7f5db4e0b7d7: callq 0x7f5db4eabf20 ; ImmutableOopMap {}

; *invokeinterface m

; {virtual_call}

...

0x7f5db4e0b7ee: retq

The assumption seems right: there is no static binding or inlining of the called method. However, we see a static call to the JVM instead of a virtual one (itable-based) to a method. Let’s set a breakpoint there and find out what is going on.

Once the breakpoint is hit, we step into. The callee is a trampoline that saves the native execution context (register values) and calls SharedRuntime::resolve_virtual_call_C:

0x7f5db4eabf20: push   %rbp
0x7f5db4eabf21: mov    %rsp,%rbp
0x7f5db4eabf24: pushfq 
0x7f5db4eabf25: sub    $0x8,%rsp
0x7f5db4eabf29: sub    $0x80,%rsp
0x7f5db4eabf30: mov    %rax,0x78(%rsp)
0x7f5db4eabf35: mov    %rcx,0x70(%rsp)
...
0x7f5db4eabf96: callq 0x7ffff6cdd4fa &lt;SharedRuntime::resolve_virtual_call_C(JavaThread*)&gt;
0x7f5db4eabf9b: movq   $0x0,0x2d8(%r15)

0x7f5db4eabf20: push %rbp

0x7f5db4eabf21: mov %rsp,%rbp

0x7f5db4eabf24: pushfq

0x7f5db4eabf25: sub $0x8,%rsp

0x7f5db4eabf29: sub $0x80,%rsp

0x7f5db4eabf30: mov %rax,0x78(%rsp)

0x7f5db4eabf35: mov %rcx,0x70(%rsp)

...

0x7f5db4eabf96: callq 0x7ffff6cdd4fa <SharedRuntime::resolve_virtual_call_C(JavaThread*)>

0x7f5db4eabf9b: movq $0x0,0x2d8(%r15)

Turns out that the callsite is not linked to a method yet, and a resolution of the symbolic information in the classfile is required. This is how the I::m reference looks like in the classfile’s Constant Pool:

With some formatting:

In run time, this information is available as a table in a ConstantPool object, after its regular fields. We can look at the entries 7 to 10:

(gdb) set $cp = current-&gt;_callee_target-&gt;_constMethod-&gt;_constants
(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+7   
0x7fffbbc000b0: 0x0000000000090008
(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+8
0x7fffbbc000b8: 0x00000000000a0001
(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+9
0x7fffbbc000c0: 0x000000000006000b
(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+10                  
0x7fffbbc000c8: 0x00000008004b22f0
(gdb) x/s ((Symbol*)(*((intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+10)))-&gt;_body  
0x8004b22f6:    "I"

(gdb) set $cp = current->_callee_target->_constMethod->_constants

(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+7

0x7fffbbc000b0: 0x0000000000090008

(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+8

0x7fffbbc000b8: 0x00000000000a0001

(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+9

0x7fffbbc000c0: 0x000000000006000b

(gdb) x/1xg (intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+10

0x7fffbbc000c8: 0x00000008004b22f0

(gdb) x/s ((Symbol*)(*((intptr_t*)((((char*)$cp)+sizeof(ConstantPool)))+10)))->_body

0x8004b22f6: "I"

There are some differences between the information in the classfile and in run time. In the previous example we see how the run time entry for InterfaceMethodref has offsets to the method holder and name-type but not the 0x0B tag, and it is 8 bytes long. Entries are always 8 bytes long and there is an auxiliary _tags array in the ConstantPool instance holding the entry types. A given index used in either the table or the _tags array refers to the same entry. Entries that are UTF-8 strings are represented with a pointer to a Symbol object.

I won’t describe every difference but want to make a comment on class constant entries. As seen in the previous example, entry #8 for class I has the UTF-8 reference 0x000a but also a 0x0001 value in its lower bytes. This value is an index into the ConstantPool's _resolved_klasses auxiliary array, which contains Klass* for resolved classes. The C++ class that describes class constant entries in the run time ConstantPool is CPKlassSlot. If we look at its corresponding tag in the _tags array, the value is JVM_CONSTANT_UnresolvedClass (0x64) while the class is unresolved (instead of JVM_CONSTANT_Class as in the classfile’s Constant Pool), and it’s updated after resolution.

InterfaceMethodref entries have to be resolved in the same way than classes. The same is true for other field, method and handle references. Instead of an auxiliary array such as _resolved_klasses, a structure called ConstantPoolCache is used to hold the information. The reason for having this structure is avoid entering into the runtime and improving performance: this is frequently accessed and we have seen already how entering to the runtime requires saving the whole context. The ConstantPoolCache associated to the ConstantPool in our example has a table (after its regular fields) with 5 entries:

(gdb) set $cpc = $cp-&gt;_cache
(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+0)
$37 = {_indices = 1, _f1 = 0x0, _f2 = 0, _flags = 0}
(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+1)
$38 = {_indices = 7, _f1 = 0x0, _f2 = 0, _flags = 0}
(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+2)
$39 = {_indices = 11993102, _f1 = 0x7fffbbc00970, _f2 = 0, _flags = -1879048191}
(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+3)
$40 = {_indices = 11993105, _f1 = 0x7fffbbc00dd0, _f2 = 0, _flags = -1879048191}
(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+4)
$41 = {_indices = 12058642, _f1 = 0x7fffbbc002d0, _f2 = 0, _flags = -1874853887}

(gdb) set $cpc = $cp->_cache

(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+0)

$37 = {_indices = 1, _f1 = 0x0, _f2 = 0, _flags = 0}

(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+1)

$38 = {_indices = 7, _f1 = 0x0, _f2 = 0, _flags = 0}

(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+2)

$39 = {_indices = 11993102, _f1 = 0x7fffbbc00970, _f2 = 0, _flags = -1879048191}

(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+3)

$40 = {_indices = 11993105, _f1 = 0x7fffbbc00dd0, _f2 = 0, _flags = -1879048191}

(gdb) print *((ConstantPoolCacheEntry*)(((char*)$cpc)+sizeof(ConstantPoolCache))+4)

$41 = {_indices = 12058642, _f1 = 0x7fffbbc002d0, _f2 = 0, _flags = -1874853887}

The _indices field has a reference to the corresponding ConstantPool entry in its lower 16 bits. Thus, entries 1, 2, 3, 4 and 5 above point to entries 1 (Object::<init> Methodref), 7 (I::m InterfaceMethodref), 14 (A1::<init> Methodref), 17 (A2::<init> Methodref) and 18 (Main::j2 Methodref) in the ConstantPool.

How does this work? The bytecode instructions in a classfile might contain references to the ConstantPool. This is how the I::m invocation in particular looks like:

With some formatting:

The reference in the bytecode instruction, as read in the file, is to the ConstantPool entry. However, we’ve just said that the information in run time is held in a ConstantPoolCache entry, which has different indexes. Turns out that bytecode instructions were patched when loaded into memory, so references to the ConstantPool were replaced with references to the ConstantPoolCache if applies. This is how Main::j2 bytecodes, including the invocation to I::m, look like in memory:

(gdb) x/7xb (char*)(current-&gt;_callee_target-&gt;_constMethod)+sizeof(ConstMethod) 
0x7fffbbc002c0: 0x2a    0xb9    0x01    0x00    0x01    0x00    0xb1

1 2	(gdb) x/7xb (char*)(current->_callee_target->_constMethod)+sizeof(ConstMethod) 0x7fffbbc002c0: 0x2a 0xb9 0x01 0x00 0x01 0x00 0xb1

Notice how the 0x0007 original reference was changed to 0x0001 (endianness aside), while the instructions aload_0 (0x2a), invokeinterface (0xb9) and return (0xb1) remained the same.

The JVM does lazy callsite resolution, for faster initialization time and avoid paying the cost if execution never reach them. As seen above, a pointer to a resolution stub is left in place if a compiled method has unresolved callsites. We should expect a full method resolution now, the ConstantPoolCache entry for I::m to be updated perhaps and the callsite in Main::j2‘s assembly to be patched.

I’ll leave the callsite resolution for a future article, but let’s see how C1’s Main::j2 looks like just after completion:

0x7f5db4e0b7c0: mov    %eax,-0x16000(%rsp)
...
0x7f5db4e0b7cd: movabs $0x801000400,%rax
0x7f5db4e0b7d7: callq  0x7fffe5601300
...
0x7f5db4e0b7ee: retq

0x7f5db4e0b7c0: mov %eax,-0x16000(%rsp)

...

0x7f5db4e0b7cd: movabs $0x801000400,%rax

0x7f5db4e0b7d7: callq 0x7fffe5601300

...

0x7f5db4e0b7ee: retq

There are two things to notice: 1) value 0x801000400 is loaded into rax (instead of 0xffffffffffffffff) and, 2) there is a call to 0x7fffe5601300.

1st observation:

(gdb) x/s ((InstanceKlass*)0x801000400)-&gt;_name-&gt;_body
0x7ffff0184ee6: "A1"

1 2	(gdb) x/s ((InstanceKlass*)0x801000400)->_name->_body 0x7ffff0184ee6: "A1"

The fact that there is a reference to the A1 class means that the callsite resolution was not purely based on symbolic linking information (which would have lead to find m‘s position in H‘s itable) but considered the current receiver’s type.

2nd observation:

0x7fffe5601300: mov    0x8(%rsi),%r10d
0x7fffe5601304: movabs $0x800000000,%r11
0x7fffe560130e: add    %r11,%r10
0x7fffe5601311: cmp    %rax,%r10
0x7fffe5601314: jne    0x7fffe56a1920
...
0x7fffe5601320: mov    %eax,-0x16000(%rsp)
...

0x7fffe5601300: mov 0x8(%rsi),%r10d

0x7fffe5601304: movabs $0x800000000,%r11

0x7fffe560130e: add %r11,%r10

0x7fffe5601311: cmp %rax,%r10

0x7fffe5601314: jne 0x7fffe56a1920

...

0x7fffe5601320: mov %eax,-0x16000(%rsp)

...

Based on the x86-64 calling convention, we know that the receiver object is in the rsi register. The code is reading its header and getting the class reference, which is then decompressed to a InstanceKlass*. If the receiver’s class is A1, then the method (which we presume to be A1::m) is executed. Otherwise, there is a jump to 0x7fffe56a1920. The important remark here is that the j2 method is considering the possibility of a receiver object at the I::m callsite whose type is not A1. Thus, we have evidence that there isn’t a static binding.

0x7fffe56a1920 goes to a SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) runtime call. What we have seen is an optimization called inline cache. The callsite is optimistically linked to a concrete method but with safeguards in place: the rax register carries the receiver’s type that would make the chosen method to be right but that is validated against the actual receiver passed in run time. Notice how the rax register is used not to mess with the method arguments, held in the registers indicated by the calling convention. Methods have an unverified entry point which performs this verification before the real (verified) one. This is faster than going through the ConstantPoolCache entry and the itable.

Finally, let’s look at the CHA decision in GraphBuilder::invoke. These are some initial context values when called:

Bytecodes::Code code: _invokeinterface
ciMethod* target: H::m
ciKlass* holder: I
ciInstanceKlass* klass: H
ciInstanceKlass* calling_klass: Main
ciInstanceKlass* callee_holder: I
ciInstanceKlass* actual_recv: I

The values above are exactly the same than in Example 1 and Example 2. What happens next is identical to Example 2 until the call to declared_interface->unique_implementor(). This time around, I‘s implementor field points to itself to indicate that there is not a single one:

(gdb) set $IClass = (InstanceKlass*)0x801000c28
(gdb) set $IImplementor = (*((InstanceKlass**)((char*)$IClass+sizeof(InstanceKlass)+($IClass-&gt;_vtable_len*sizeof(void*))+($IClass-&gt;_itable_len*sizeof(void*))+($IClass-&gt;_nonstatic_oop_map_size*sizeof(void*)))))
(gdb) print/x $IImplementor
$1 = 0x801000c28

(gdb) set $IClass = (InstanceKlass*)0x801000c28

(gdb) set $IImplementor = (*((InstanceKlass**)((char*)$IClass+sizeof(InstanceKlass)+($IClass->_vtable_len*sizeof(void*))+($IClass->_itable_len*sizeof(void*))+($IClass->_nonstatic_oop_map_size*sizeof(void*)))))

(gdb) print/x $IImplementor

$1 = 0x801000c28

Thus, singleton is null here, cha_monomorphic_target is null as well and the conditions are not met for static binding.

Full series of related articles:

Part #1
Part #2
Part #3