Monomorphic to megamorphic callsite in C1

In our previous article Callsite resolution in the JVM we discussed how a monomorphic callsite looks like after inline caching:

Java

x86_64 assembly (generated by C1)
As a reminder, the RAX register is loaded with the Klass* for which the optimization is valid: a receiver object of the A1 type. The callq instruction that follows goes to the unverified entry of the A1::m method, where the Klass* in RAX is checked against the receiver object’s class. If the check succeeds, execution proceeds to the actual method. In this article we will see what happens when the check fails or, in other words, when the receiver object is not an instance of A1 and the optimization is not longer valid.
Continue reading “Monomorphic to megamorphic callsite in C1”

Callsite resolution in the JVM

In part #3 of the CHA series we found that the callsite to I::m in Main::j2, generated by C1, was initially unresolved. In this article we will describe how the JVM locates the symbolic information attached to this callsite and, taking the receiver object into consideration, finds the actual method to be invoked: A1::m.

Continue reading “Callsite resolution in the JVM”

Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #3

Example 3


This article is a continuation of Part #2. The scenario to discuss now is similar to the previous with the only difference being class A2 that is loaded:

With class A2 in existence, the virtual callsite in Main::j2 can receive an instance of it anytime. The expectation is, thus, that no static binding is applied.

This is how C1’s Main::j2 assembly looks like at the I::m callsite:

The assumption seems right: there is no static binding or inlining of the called method. However, we see a static call to the JVM instead of a virtual one (itable-based) to a method. Let’s set a breakpoint there and find out what is going on.
Continue reading “Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #3”

Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #2

Example 2


This article is a continuation of Part #1. We will discuss the following scenario now:

At first sight, this case resembles Example 1 with the only difference being the instance of A1 that is created out of the JIT-compiled method (Main::j2). This subtle difference has a strong implication, though. It’s not longer possible to guarantee that an instance of A1 is the only one to reach the virtual callsite in Main::j2. In principle, any class dynamically loaded to the JVM that implements the I interface can be instantiated and passed to Main::j2. With that said, A1 is the only implementor of I and A1::m is not overwritten by now. We should be able to establish a conditional static binding to A1::m and whenever a class implementing I or a sub class of A1 that overwrites m are loaded, Main::j2 has to be either recompiled without the binding or switched back to bytecode interpretation.

Continue reading “Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #2”

Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #1

Virtual calls come at a significant performance cost in many occasions. It’s not only that memory accesses to vtables take CPU cycles and could pollute caches, but also how method-inlining savings are missed. Proper engineering practices require to use expensive resources only when needed. Even when programming languages offer syntactic hints for the developer to make a decision (i.e. virtual or final method declarations), it’s ultimately up to a good compiler to perform a thorough analysis and optimize.

In this series of short examples, we will see how the C1 just-in-time compiler in the Hotspot Java Virtual Machine (JVM) performs Class Hierarchical Analysis (CHA) and deals with virtual calls. For each case, we will discuss what should happen, observe what actually happened and elaborate an explanation.

All the examples are based on the following Java classes topology:
Continue reading “Class Hierarchical Analysis (CHA) examples in C1 (Hotspot JVM) – Part #1”

JVM Class Relinker v1.0

Virtually every Java class needs references to other classes to achieve something meaningful. The example below, despite not so meaningful, will suffice to show them in action:

The javac compiler will take the Java source from these classes and turn it into four separate binary files called classfiles. Each of them includes the JVM executable instructions (bytecode), the data upon which they operate and linkage information that indicates references to external classes by their name. This type of references are commonly known as symbolic.

Continue reading “JVM Class Relinker v1.0”

Simple App: a Linux toy application for user and kernel space

Figure 1 – Simple App architecture

In the context of the Open Source Study Group, we decided to explore some Linux kernel APIs related to IPC and namespaces. An effective approach to this is exercising APIs from a crafted user-space application, while debugging at the kernel level. After some progress and setting breakpoints here and there, a challenge became evident: breakpoints in common functions are hit all the time from different processes. We need to filter out non-relevant hits, while staying focused on our application. This can be done, perhaps, by means of a conditional breakpoint in the debugger. Checking the binary image associated to the current task (looking into the mm_struct) should make it, but debugging complexity starts growing rapidly.

Continue reading “Simple App: a Linux toy application for user and kernel space”

BIOS execution in QEMU: first I/O interaction

In the previous articles of this series, we analyzed how QEMU starts executing the BIOS firmware (see here) and how addresses are translated from the emulated physical space to the host memory (see here). The proposal now is to discuss what these first instructions are doing and how the first I/O interaction with virtual hardware looks like.

After the initial long jump at the entry point, the BIOS firmware has the following i8086 instructions:

The first comparison is to check whether the VM is resuming or rebooting. Assuming the latter, the stack segment selector is set to 0x0 and the stack pointer to 0x7000. According to the Memory Regions map, the stack will be located in ram-below-4g. An address value, which belongs to the pc.bios region, is finally loaded into the EDX register and a jump to a different block occurs. Source code is available here and here.

Continue reading “BIOS execution in QEMU: first I/O interaction”

Memory corruption in FreeBSD’s ring-buffer (kernel)

Memory Ordering has been the latest topic under discussion in the Open Source Study Group. We walked through several concepts such as atomic memory access, volatile declarations, compiler and CPU reordering with different semantics -including the acquire-release model-, fences, compare-and-set (CAS) loops, ABA problems, lock-free algorithms and more. A few lock-free ring-buffer implementations were analyzed; including loki, dpdk and FreeBSD’s buf_ring (kernel).

We found a memory corruption bug in FreeBSD’s ring-buffer. Under specific thread scheduling conditions, and assuming a multiple-producers scenario, it could be possible to overwrite unread entries. However, we did not attempt full exploitation and our PoC (proof-of-concept) makes a few assumptions that may not accurately represent the reality. As a result, it well be the case that the bug is there but triggering not feasible under the constraints -currently- imposed by the context.

Continue reading “Memory corruption in FreeBSD’s ring-buffer (kernel)”