Example 2
This article is a continuation of Part #1. We will discuss the following scenario now:
1 2 3 4 5 6 7 |
public static void j2(I i) throws Throwable { i.m(); } public static void main(String[] args) throws Throwable { j2(new A1()); } |
At first sight, this case resembles Example 1 with the only difference being the instance of A1
that is created out of the JIT-compiled method (Main::j2
). This subtle difference has a strong implication, though. It’s not longer possible to guarantee that an instance of A1
is the only one to reach the virtual callsite in Main::j2
. In principle, any class dynamically loaded to the JVM that implements the I
interface can be instantiated and passed to Main::j2
. With that said, A1
is the only implementor of I
and A1::m
is not overwritten by now. We should be able to establish a conditional static binding to A1::m
and whenever a class implementing I
or a sub class of A1
that overwrites m
are loaded, Main::j2
has to be either recompiled without the binding or switched back to bytecode interpretation.
Deoptimizing a method earlier than strictly needed would hit performance unnecessarily. While the previous approach can sound reasonable, we may wonder why the load of a class shall be enough for the roll back to happen if, perhaps, there are no invocations passing the new type. The answer are Java language reflection and MethodHandles
, which make invocation-graph analysis difficult. Furthermore, with only class A1
in existence, the type of the received object in Main::j2
still has to be checked in run time because an instance of any type can be spuriously passed. We will discuss this case later.
We can think of ideas in-between, such as deoptimizing only after a run time check detects that the receiver is of a type different than the one assumed. Inline caching is another powerful mechanism in which, instead of deoptimizing the compiled method, a per-virtual-callsite cache of types and resolved methods is built. I leave to the reader the exercise of analyzing how inlining would play out with inline caching. Every decision involves a trade-off and there are always elements to weigh: 1) how much are we penalizing the fast-path of the most common scenario?, 2) what is the real gain and for which cases?, 3) how complex is the solution to implement?, and 4) is it worth the gain?
This is how C1 finally compiled Main::j2
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
[Verified Entry Point] # {method} {0x00007faaa54002b0} 'j2' '(LI;)V' in 'Main' 0x7faaad625fa0: mov %eax,-0x16000(%rsp) ... 0x7faaad625fb6: movabs $0x801000400,%rax ; {metadata('A1')} 0x7faaad625fc0: mov 0x8(%rsi),%ebx 0x7faaad625fc3: movabs $0x800000000,%r10 0x7faaad625fcd: add %r10,%rbx 0x7faaad625fd0: cmp 0x40(%rbx),%rax ... 0x7faaad625fe8: jmpq 0x7faaad6260ed ... ;; patch entry point 0x7faaad6260ed: callq 0x7faaad62bd80 ; ImmutableOopMap {} ; *getstatic out ; {runtime_call ; load_mirror_patching} ... |
Fairly early in the method we see a cmp 0x40(%rbx),%rax
comparison. The %rax
value is a pointer to A1
(Klass*
). On the other side, the %rbx
value comes from the received object’s header and, after decompressing, points to its class (Klass*
). Offset 0x40
from a Klass
(in x86-64) is the address of _primary_supers[1]
.
The _primary_supers
array is a first level cache of super classes. If the number of super classes of a class does not exceed a threshold (meaning that the class is not too down in the hierarchy), its _primary_supers
will be identical to its parent’s plus a self-reference entry at the end. Otherwise, its parent’s _primary_supers
is replicated as-is.
Class A1
has only one super class (java.lang.Object
) which, in turn, is at the top of the hierarchy. As a result, A1's
_primary_supers
array will contain the following values: _primary_supers[0] = java.lang.Object
, _primary_supers[1] = A1
(self-reference entry), _primary_supers[2] = 0x0
, _primary_supers[...] = 0x0
. There is an interesting observation here: any sub class of A1
must necessarily have a pointer to A1
at _primary_supers[1]
(offset 0x40
), no matter how down in the hierarchy it is. In other words, not having a pointer to A1
at _primary_supers[1]
means not being a sub class of A1
. Thus, this check is assuring that the received instance is a sub type of A1
or A1
itself, and it was added by C1’s LIR_Assembler::emit_typecheck_helper method. The Klass
field _super_check_offset
points to the self-reference entry in _primary_supers
and is helpful for C1 to decide the offset.
If the check succeeds, the rest of the method looks the same than Main::j1
in Example 1. As anticipated, A1::m
was statically bound and inlined in Main::j2
. If the check fails, execution goes to a throw_incompatible_class_change_error
runtime call. It’s crucial to ensure that the received instance is of a valid type. In this case, having I
only one implementor, the valid type is A1
. How is it possible to spuriously pass an object of an invalid type to Main::j2
? MethodHandles
can do the trick with an unsafe cast:
1 2 3 4 5 6 7 8 9 10 |
private static I unsafeCast(Object obj) throws Throwable { MethodHandle mh = MethodHandles.identity(Object.class); mh = MethodHandles.explicitCastArguments(mh, mh.type().changeReturnType(I.class)); return (I)mh.invokeExact((Object) obj); } public static void main(String[] args) throws Throwable { j2(unsafeCast(new Object())); } |
The console output for this case is:
1 2 3 |
Exception in thread "main" java.lang.IncompatibleClassChangeError at Main.j2(Main.java:76) at Main.main(Main.java:96) |
If the check were not in place during the previous example, memory safety would have been compromised and the JVM possibly crash. Think how an instance of java.lang.Object
would have illegally landed in a method where A1::m
was inlined, and used as if it were an instance of A1
or a sub type of it.
At C1 level, the compilation of the virtual callsite in Main::j2
occurs in GraphBuilder::invoke
. These are some initial context values when called:
Bytecodes::Code code
:_invokeinterface
ciMethod* target
:H::m
ciKlass* holder
:I
ciInstanceKlass* klass
:H
ciInstanceKlass* calling_klass
:Main
ciInstanceKlass* callee_holder
:I
ciInstanceKlass* actual_recv
:I
The values above are exactly the same than in Example 1. The next step is to determine the exact type of the receiver (if any) here. Contrary to Example 1, the instruction that pushed the receiver to the top of the stack is not a NewInstance
but a Local
. Local
represents values from method parameters, and uses its parent’s Instruction::exact_type
. There will be an exact type only if the parameter’s declared type (I
) is an exact class. While I
is a class (in the sense of being an instance of C++ InstanceKlass
or its ciInstanceKlass
wrapper), interfaces do not meet the requirement in ciInstanceKlass::exact_klass
. Thus, we don’t have an exact type in this case.
The condition here is not met either because the declared type (I
) is an interface. Execution then moves to this point. What is relevant for us is the call to target->find_monomorphic_target(...)
. These are the parameter values passed:
ciMethod* target
:H::m
ciInstanceKlass* calling_klass
:Main
ciInstanceKlass* declared_interface
:I
ciInstanceKlass* singleton
:A1
The returned cha_monomorphic_target
value is A1::m
. This is expected because A1::m
is the only possible concrete target for the callsite. In order to move forward with this target, a conditional clause (dependency) has to be recorded here: I
must have A1
as its unique implementor. In addition, a second conditional clause is recorded here to detect the dynamic loading of a sub class of A1
that redefines m
. After these conditions are recorded, code
is updated to a Bytecodes::_invokespecial
value here and no more virtual call onwards.
Full series of related articles: