Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #2

Example 2


This article is a continuation of Part #1. We will discuss the following scenario now:

At first sight, this case resembles Example 1 with the only difference being the instance of A1 that is created out of the JIT-compiled method (Main::j2). This subtle difference has a strong implication, though. It’s not longer possible to guarantee that an instance of A1 is the only one to reach the virtual callsite in Main::j2. In principle, any class dynamically loaded to the JVM that implements the I interface can be instantiated and passed to Main::j2. With that said, A1 is the only implementor of I and A1::m is not overwritten by now. We should be able to establish a conditional static binding to A1::m and whenever a class implementing I or a sub class of A1 that overwrites m are loaded, Main::j2 has to be either recompiled without the binding or switched back to bytecode interpretation.

Deoptimizing a method earlier than strictly needed would hit performance unnecessarily. While the previous approach can sound reasonable, we may wonder why the load of a class shall be enough for the roll back to happen if, perhaps, there are no invocations passing the new type. The answer are Java language reflection and MethodHandles, which make invocation-graph analysis difficult. Furthermore, with only class A1 in existence, the type of the received object in Main::j2 still has to be checked in run time because an instance of any type can be spuriously passed. We will discuss this case later.

We can think of ideas in-between, such as deoptimizing only after a run time check detects that the receiver is of a type different than the one assumed. Inline caching is another powerful mechanism in which, instead of deoptimizing the compiled method, a per-virtual-callsite cache of types and resolved methods is built. I leave to the reader the exercise of analyzing how inlining would play out with inline caching. Every decision involves a trade-off and there are always elements to weigh: 1) how much are we penalizing the fast-path of the most common scenario?, 2) what is the real gain and for which cases?, 3) how complex is the solution to implement?, and 4) is it worth the gain?

This is how C1 finally compiled Main::j2:

Fairly early in the method we see a cmp 0x40(%rbx),%rax comparison. The %rax value is a pointer to A1 (Klass*). On the other side, the %rbx value comes from the received object’s header and, after decompressing, points to its class (Klass*). Offset 0x40 from a Klass (in x86-64) is the address of _primary_supers[1].

The _primary_supers array is a first level cache of super classes. If the number of super classes of a class does not exceed a threshold (meaning that the class is not too down in the hierarchy), its _primary_supers will be identical to its parent’s plus a self-reference entry at the end. Otherwise, its parent’s _primary_supers is replicated as-is.

Class A1 has only one super class (java.lang.Object) which, in turn, is at the top of the hierarchy. As a result, A1's _primary_supers array will contain the following values: _primary_supers[0] = java.lang.Object, _primary_supers[1] = A1 (self-reference entry), _primary_supers[2] = 0x0, _primary_supers[...] = 0x0. There is an interesting observation here: any sub class of A1 must necessarily have a pointer to A1 at _primary_supers[1] (offset 0x40), no matter how down in the hierarchy it is. In other words, not having a pointer to A1 at _primary_supers[1] means not being a sub class of A1. Thus, this check is assuring that the received instance is a sub type of A1 or A1 itself, and it was added by C1’s LIR_Assembler::emit_typecheck_helper method. The Klass field _super_check_offset points to the self-reference entry in _primary_supers and is helpful for C1 to decide the offset.

If the check succeeds, the rest of the method looks the same than Main::j1 in Example 1. As anticipated, A1::m was statically bound and inlined in Main::j2. If the check fails, execution goes to a throw_incompatible_class_change_error runtime call. It’s crucial to ensure that the received instance is of a valid type. In this case, having I only one implementor, the valid type is A1. How is it possible to spuriously pass an object of an invalid type to Main::j2? MethodHandles can do the trick with an unsafe cast:

The console output for this case is:

If the check were not in place during the previous example, memory safety would have been compromised and the JVM possibly crash. Think how an instance of java.lang.Object would have illegally landed in a method where A1::m was inlined, and used as if it were an instance of A1 or a sub type of it.

At C1 level, the compilation of the virtual callsite in Main::j2 occurs in GraphBuilder::invoke. These are some initial context values when called:

The values above are exactly the same than in Example 1. The next step is to determine the exact type of the receiver (if any) here. Contrary to Example 1, the instruction that pushed the receiver to the top of the stack is not a NewInstance but a Local. Local represents values from method parameters, and uses its parent’s Instruction::exact_type. There will be an exact type only if the parameter’s declared type (I) is an exact class. While I is a class (in the sense of being an instance of C++ InstanceKlass or its ciInstanceKlass wrapper), interfaces do not meet the requirement in ciInstanceKlass::exact_klass. Thus, we don’t have an exact type in this case.

The condition here is not met either because the declared type (I) is an interface. Execution then moves to this point. What is relevant for us is the call to target->find_monomorphic_target(...). These are the parameter values passed:

  • ciMethod* target:    H::m
  • ciInstanceKlass* calling_klass:    Main
  • ciInstanceKlass* declared_interface:    I
  • ciInstanceKlass* singleton:    A1

The returned cha_monomorphic_target value is A1::m. This is expected because A1::m is the only possible concrete target for the callsite. In order to move forward with this target, a conditional clause (dependency) has to be recorded here: I must have A1 as its unique implementor. In addition, a second conditional clause is recorded here to detect the dynamic loading of a sub class of A1 that redefines m. After these conditions are recorded, code is updated to a Bytecodes::_invokespecial value here and no more virtual call onwards.

Full series of related articles:

Leave a Reply

Your email address will not be published.