Plugging Bee JIT and Compiler

This post should have been written long time ago. By now I already forgot many of the difficulties I found when plugging the JIT and Smalltalk compiler to Bee, but on the upside I now have a much bigger picture of the system so I can describe more things with more details than before.

Where can we start? A good place can be to compare our self-hosted runtime against the host VM. What are the differences between the JIT in Bee and the host VM? To answer that, we have to understand what a JIT is, what it is made of, and how it is plugged to the system. Good! now we have some concrete things we can describe.

What is a JIT compiler


Just-in-time compiler. Well, the last word tells most: it is a compiler, and the JIT part refers to the fact that it is thought for working simultaneously with the program being run, unlike things like GCC to mention some compiler, which are used to compile programs before they are run.

Smalltalk code, in most implementations is compiled to bytecodes. This comes from its early history, the famous Blue Book describes bytecodes extensively. When saving a method, the Smalltalk compiler is triggered to convert the text to bytecodes. So you might think that the Smalltalk compiler is a kind of JIT-compiler, but it is not. At least not in the usual sense of the word, or what is generally known as an actual JIT-compiler. After compilation to bytecodes, at the point where the method is actually going to be invoked, those bytecodes have to be executed in some way. That is, invocation of a method means having computations done to alter the program's execution context. The approach that JIT-based VMs take to execute methods is to translate them to native code just before execution.

In the case of Bee, the VM translates the method's bytecodes to native code. Running a method is actually executing the native code derived from the method's bytecodes. Bee is not an interpreted Smalltalk, which means that always, before any method is run, it will be nativized. Generation of a method's native code can be done at any time before its execution. Usually nativization happens during lookup: the activation of a message-send will cause the invocation of a compiled method; the lookup algorithm checks if that method has native code already generated, and if not, asks the JIT to translate from bytecodes to native code.

But lookup is not the only moment at which native code for a method can be generated. Nativization could be done ahead-of-time (AOT), and a JIT-compiler that allows for this is sometimes called an AOT-compiler. Bee not only provides, but also requires, an AOT-compiler, as we will see below.

Internals of JIT compilers


The main work that our JIT compiler accomplishes is to convert from bytecodes to native code, but how does this conversion work? To start grabbing an idea, we can first see how bytecodes look like, and how they relate to source code. Consider this simple method:

sum
    ^3 + 4

The Smalltalk compiler will process this text and produce a CompiledMethod with these bytecodes:

[16r18] load R with SmallInteger 3
[16r19] push SmallInteger 4
[16rF7] send selector #+
[16r48] return


Basically, what we have is a stack machine with some special registers: R is the receiver and return register, there are some more (not mentioned here). Then the work the JIT has to do is to transform bytecodes to native instructions.

In this case, what we will get is something like:

prologue: ...
method: mov EAX, 7      ;  load R with SmallInteger 3
        push 9          ;push SmallInteger 4
        call lookup #+
epilogue: ...
        ret

The method label describes the most interesting part, where bytecodes got converted to native instructions (EAX is both the receiver and return register in our calling convention). The labels prologue and epilogue consist of a few instructions for constructing and desconstructing a stack frame.

The way our JIT-compiler is implemented is very straightforward: it iterates each bytecode, assembling the corresponding native instructions each time:

MethodNativizer>>#translateMethod
    self emitPrologueAndAlign.
        [self bytecodeIndex < self bytecodeSize] whileTrue: [
            self saveBytecodeNativeAddress.
            self translateSingleBytecode: self nextBytecode].
    self emitEpilogue

Plugging the JIT compiler to a classic VM


The final step, when you have a JIT compiler that is able to translate any method, is to plug it to the VM. There are a few places there that are affected by its presence, mainly the lookup mechanism and the Garbage Collector. In the case of a classic VM, the typical places would be:

Implementation of lookup

The VM will check if the method found has native code and if not trigger the nativizer, like in the following simplified code:

void lookupAndInvoke(oop *receiver, oop *selector) {
    Method *method = global_cache->lookup(receiver, selector);

    if (method->native_code() == nil)
    {
         nativize(method);
    }

    invoke(method);
}

void GlobalLookupCache::lookup(oop *receiver, oop *selector)
{
    Class *class = receiver->class();
    Method *method = lookup_in_cache(class, selector);
    if (method != nil)
        return method;

    method = class->find_in_method_dictionary(selector);
    this->add_to_cache(class, selector, method);
}

We don't provide here a description of how the cache is indexed exactly, but you can think of it as if it were just a low-level C array or vector.

When methods are changed in the system

The VM needs to be told when any method is changed in the system, so that it can update the cache. This is usually done with a primitive:

MethodDictionary>>#flushFromCache: aSymbol
    <primitive: FlushFromCodeCache>

The primitive could be implemented with something like this:

void FlushFromCodeCache(oop *selector)
{
    global_cache->remove_all_entries_with(selector);
}

During GC

The low level array pointers have to be updated, as compiled methods, classes and selectors could be moved. This will require just a special case in the GC to trace the pointers in the array.

Finally, to make all this work, the VM is compiled to native instructions. Then both the JIT, the primitives and the GC are just called from the corresponding places to make things work. How does Bee JIT differ from the one of the VM then?

Plugging the JIT to Bee self-hosted runtime


The first thing to notice is that Bee JIT is implemented in Smalltalk. Thus, it consists of a bunch of compiled methods, and not of "native code". So Bee JIT cannot be just linked into Bee self-hosted runtime and start nativizing methods as required during lookup. That takes a bit more of effort, as it requires someone to first convert the JIT methods to native code, a chicken and egg problem. But this  problem can be "easily" fixed: we can take our Smalltalk JIT and execute it inside the host VM, using it to translate its own methods! That would be, to nativize itself ahead-of-time. The result is a set of compiled methods with their associated native instructions.

We have to be careful, and make out nativized JIT behave slightly different than the host VM JIT, as it should not generate code with references to the host VM. For example, when a message-send bytecode is translated, instead of calling the host VM lookup, it has to call a different one which is stored in Bee kernel. The problem of referencing external things, particularly objects in Bee kernel, is already solved by our Smalltalk libraries framework. The final step to plug the nativizer is then to generate a Smalltalk library, one that can be loaded by Bee kernel, and that contains both the JIT methods and their corresponding native code.

As for plugging the JIT to the Smalltalk world, many things are simplified because all things are implemented within the same paradigm. The lookup is already coded in Smalltalk, and the only thing needed is to call our JIT when finding a method that doesn't have native code yet:


Object>>#_lookupAndInvoke: selector
    | cm nativeCode |
    cm := anObject _cachedLookup: selector.
    cm == nil ifTrue: [^anObject doesNotUnderstandSelector: selector].
    cm prepareForExecution.
    nativeCode := cm nativeCode.
    ^anObject _transferControlTo: nativeCode code

CompiledMethod>>#prepareForExecution
    self isNativized ifFalse: [self nativize].
    nativeCode refresh

CompiledMethod>>#nativize
nativeCode := BeeNativizationEnvironment current nativeCodeFor: self


Regarding the global lookup cache, as it contains just a reachable Smalltalk array, there is no need to make a special case for traversing it during GC. The #flushFromCache: implementation is not a primitive anymore, it just another Smalltalk method that traverses the cache looking for methods that correspond the flushed selector.

GlobalDispatchCache>>#flush: selector for: behavior
    | index |
    index := self indexOf: selector with: behavior.
    contents
        at: index put: nil;
        at: index + 2 put: nil;
        at: index + 4 put: nil


Interestingly, the #nativize method is dynamically bound, and is loaded when the JIT compiler library is bound, only if we tell Bee to load the JIT. We can, using the same technique we used for the JIT, ship any other library pre-nativized, so that we don't need to load the JIT if we don't want to, or if we don't plan to dynamically modify code:



Besides, the Smalltalk compiler is also packaged in an optional library, so we can get the following workflow:



This can be good for saving system resources, and also for performance reasons: as the code shipped with our libraries is nativized ahead of time, we can afford spend more time optimizing it, delivering code that is much faster than naively JITted code. I'll write more about that in the future, but that's going to be in another post, I hope you enjoyed this one!



Comentarios

  1. Hi Pocho.

    Just a small detail. In #translateMethod you could save self bytecodeSize to a temporary so you don't have to send the same message for every bytecode. Something on the lines of

    MethodNativizer>>#translateMethod
    | size |
    self emitPrologueAndAlign.
    size := self bytecodeSize.
    [self bytecodeIndex < size] whileTrue: [
    self
    saveBytecodeNativeAddress;
    translateSingleBytecode: self nextBytecode].
    self emitEpilogue

    ResponderEliminar

Publicar un comentario

Entradas populares de este blog

Design principles behind Bee Smalltalk's Execution Engine

Connecting the dots

Pre-releasing Bee Smalltalk