Design principles behind Bee Smalltalk's Execution Engine
I think the strongest design principle behind Bee Smalltalk is minimality. Bee works with a small kernel made of objects (SKernel), and other libraries (also made of objects). SKernel is very much minimal, it just contains basic objects (like the meta hierarchy, collections, streams, FFI and basic OS interaction), and knows how to load more libraries.
To understand this, think of a method closure, which is a set of methods that only send messages whose implementations are inside this same set. Other libraries, on the other side, are similar to the SKernel, but they are only partial method closures, as they depend on themselves and also on SKernel or methods already loaded by other libraries. To be able execute code for these methods, their bytecodes are already nativized and included in SKernel. This is done with NativeCode objects, which include a ByteArray with the actual encoded x86 instructions, and are, as every other required object, included in the library itself.
SKernel can be considered as a library that has no dependencies, so from now on, I'm just going to say library to refer both SKernel and any other library, unless a distinction is needed.
A library is a binary blob, and as you imagine, it consists of objects. These objects can be pretty much anything. To create a library, you create a builder and start adding objects. To ease the task, objects may belong to projects, and projects can be manipulated through browsers (so you can add entire classes with all of their methods into a project). Then, the easiest way to generate a library is to tell the builder to work on a specific project.
The SKernel executes without any VM. In order to do that, you need to include the native code of all its methods. This is cool because it lets you forget about the jitter (or interpreter) in the SKernel, making it smaller. Additionally, if you want to create objects while sending messages in SKernel (what you likely want), you need an arena, a place to allocate new objects. We've done it simply by adding a small empty ByteArray in the closure. Currently we have a 4MB arena, but I think a 100KB one should be more than enough to load a more powerful the object manager (maybe including a GC). It's interesting to note that you don't need a full blown GC. The GC can be -actually, will be- a separate library that gets loaded by the kernel -if you want!-. When the GC is loaded, it can allocate more space, and start collecting. Of course, the collection algorithm will depend on what library you chose to load.
Our current SKernel has an entry point that looks for the command line arguments and sends a message according to them. On a development scenario, you are likely going to load first is a compiler and a nativizer, because it's hosted in a library, so it's actually optional to have a compiler. When deploying, on the other hand, you may ship your already built library and directly perform initialization and run.
With this design, library loading has to be very fast, and it actually is. In a library file, objects are stored in almost the same way as they will be placed in memory. They occupy the same size, and their header has all bits correct. SmallIntegers are tagged, just like in memory. But OOPs (direct pointers) have to be specially encoded. References to objects in the same library are stored as an address relative to the base of the library (an offset), instead of an absolute address. Objects in other libraries (external objects, like SKernel classes or other globals) are referenced by name. All external references are collected and put in a table with a string that lets the loader find the correct object at load time. In the slot of the referring object, the address is replaced by an index in that externals table. To determine if a reference is external or internal we tag in the second bit, taking advantage of the fact that objects are always allocated in 4-bytes boundaries.
As you can imagine, it's not that we just want to create libraries that are minimal, but we want to make the minimum amount of assumptions about the place where the system is going to work. I think this will allow us to construct better and bigger things that fit exactly in our needs.
To understand this, think of a method closure, which is a set of methods that only send messages whose implementations are inside this same set. Other libraries, on the other side, are similar to the SKernel, but they are only partial method closures, as they depend on themselves and also on SKernel or methods already loaded by other libraries. To be able execute code for these methods, their bytecodes are already nativized and included in SKernel. This is done with NativeCode objects, which include a ByteArray with the actual encoded x86 instructions, and are, as every other required object, included in the library itself.
SKernel can be considered as a library that has no dependencies, so from now on, I'm just going to say library to refer both SKernel and any other library, unless a distinction is needed.
A library is a binary blob, and as you imagine, it consists of objects. These objects can be pretty much anything. To create a library, you create a builder and start adding objects. To ease the task, objects may belong to projects, and projects can be manipulated through browsers (so you can add entire classes with all of their methods into a project). Then, the easiest way to generate a library is to tell the builder to work on a specific project.
The SKernel executes without any VM. In order to do that, you need to include the native code of all its methods. This is cool because it lets you forget about the jitter (or interpreter) in the SKernel, making it smaller. Additionally, if you want to create objects while sending messages in SKernel (what you likely want), you need an arena, a place to allocate new objects. We've done it simply by adding a small empty ByteArray in the closure. Currently we have a 4MB arena, but I think a 100KB one should be more than enough to load a more powerful the object manager (maybe including a GC). It's interesting to note that you don't need a full blown GC. The GC can be -actually, will be- a separate library that gets loaded by the kernel -if you want!-. When the GC is loaded, it can allocate more space, and start collecting. Of course, the collection algorithm will depend on what library you chose to load.
Our current SKernel has an entry point that looks for the command line arguments and sends a message according to them. On a development scenario, you are likely going to load first is a compiler and a nativizer, because it's hosted in a library, so it's actually optional to have a compiler. When deploying, on the other hand, you may ship your already built library and directly perform initialization and run.
With this design, library loading has to be very fast, and it actually is. In a library file, objects are stored in almost the same way as they will be placed in memory. They occupy the same size, and their header has all bits correct. SmallIntegers are tagged, just like in memory. But OOPs (direct pointers) have to be specially encoded. References to objects in the same library are stored as an address relative to the base of the library (an offset), instead of an absolute address. Objects in other libraries (external objects, like SKernel classes or other globals) are referenced by name. All external references are collected and put in a table with a string that lets the loader find the correct object at load time. In the slot of the referring object, the address is replaced by an index in that externals table. To determine if a reference is external or internal we tag in the second bit, taking advantage of the fact that objects are always allocated in 4-bytes boundaries.
As you can imagine, it's not that we just want to create libraries that are minimal, but we want to make the minimum amount of assumptions about the place where the system is going to work. I think this will allow us to construct better and bigger things that fit exactly in our needs.
Bee Smalltalk is based on which VM? Squeak? Cog? Something else?
ResponderEliminarIt is based on a Digitalk derivative
Eliminarsounds lovely. Are you using in a particular application? What applicability you thing fits best?
ResponderEliminarWell, the Bee metacircular kernel is not yet ready for daily use. Yet Bee project, as a smalltalk kernel library with a host VM that loads other libraries, is being used by CaesarSystems, specially in their flagship product, PetroVR.
EliminarAs for applicability, we expect to be able to use it in any place where smalltalk is suitable. I also think that minimality will aid in places where it isn't so comfortable to use smalltalk. If we could add remote inspection and debugging, then deployment would be very easy and Bee could nicely fit in servers