Some analysis about the kernel size

De El Pocho - diciembre 06, 2013

The last days we've been working on integrating the code for loading libraries. This code required us to take deep breath, do a lot of diving and a bit of surgery here and there. So the whole changeset was done, but in our process, we had to split it in small pieces with tests before it could be integrated. As this was going to take some time we thought it was also a good moment to take a break and introduce some long overdue refactorings. Happily, these days we have most of this tiresome work done.

With the kernel in our hand and a library to load, we did some basic tests. One of the first things that got our attention was that the kernel was too big, our bee.exe kernel was 8.290.816 bytes long. You may think it's not that much but we were expecting -and would be comfortable with- something much smaller, in the order of 2 to 3 MB (the kernel library with just the objects and the bytecodes, without native code or all the PE (Windows executable format) tables is only 788.763 bytes).

So, after scratching our heads for a while, we did some very interesting analysis of its contents. To assure the analysis was rigorous we created a statistician object that let us consistently compare different closure builds. As a first measure, we wanted to know how much of the file was made of objects and how much was overhead caused by putting it into a PE file. We found out that only 5.901.176 bytes were occupied by actual objects. This makes for 2.389.640 bytes of PE overhead (a lot!)

Bee executable before a severe diet

Also, to our relief, we saw that all 29739 closure objects we very small, except for one: a 4 MB ByteArray. "Of course!" we said, this is the arena. This object was actually a placeholder for the area of memory were new objects were going to be created. It was treated as an object only before initialization, but after that we convert it into a raw buffer. This arena was an easy an early solution for reserving space for allocation of new objects. Seeing this bloated executable problem, and taking advantage of the fact that we now support FFI, we changed it a bit. Now, we just create a small protoarena of 4kb and then migrate the GCSpace to a new buffer obtained by calling VirtualAlloc (we still need the protoarena because calling VirtualAlloc will require to create some objects).

After this changes, without the arena, we have 5.901.176 - 4.194.320 = 1.706.856 bytes. This is a much more comfortable size, and to make it better, kernel generation became faster too (now it takes 38 seconds vs 52 previously). Even disassembly tools are working faster, as they don't have have to analyze the arena, which took a lot of time.

To finish our analysis, we looked only a bit into the PE overhead. In our 29739 objects, there were 130318 pointer slots. Also, each object has 1 pointer in it's header, so adding up there are 160057 pointers. For each pointer, we are now adding a relocation in the PE (which probably is useless). Assuming it takes at least 4 bytes to save the relocation, that would make for 640.228 bytes of space took by relocations. But by removing all relocations we found out that the file got 2.3 mb smaller, so it seems to be consuming around 16 bytes per relocation.

Therefore this executable kernel, which only knows basic things with Numbers, Collections and Streams, which can read files, do FFI and open libraries, is weighting now 1.967.104 bytes, This is awesome! To have an idea, Pharo.exe, which includes only the VM, no smalltalk objects nor methods inside, takes 3.1mb (to be fair, our kernel doesn't include GC but shouldn't be much bigger with it). And we also have this minimal kernel we mentioned in previous posts, which doesn't include everything but just what is commonly used. It is just 351.744 bytes. Embedding Smalltalk is ringing louder and louder in my ears...

Here we let a small exe to play with. It does nothing more than loading a library with a test method and executing it. You are not going to see any output but the return code of the execution should be 3 (a tagged 1 which mean success!). To try:

in linux/mac with wine

$> wine bee.sll loadAndRunLibrary: TestLib.nsll
$> echo $?

or in windows (tested in an old WinXP), rename bee.sll to bee.exe and

$> bee.exe loadAndRunLibrary: TestLib.nsll

$> echo %errorlevel%

Bee.sll and TestLib.nsll

Buscar este blog

Bee SmaRT

Some analysis about the kernel size

Comentarios

Publicar un comentario

Entradas populares de este blog

Pre-releasing Bee Smalltalk

Plugging Bee JIT and Compiler

Blogging back