Connecting the dots

We've been working very hard this year. The seed that was planted years ago has been growing underground and we are now starting to see the first leaves pop out. After Smalltalks 2013 we had the chance to stop the ball, have an overview and open to wider perspectives. After some time, we got back to daily work and thought about the current situation and the next steps. We opened this blog to let people know about both the progress of this project and the ideas around it. Now that the year is gone, we'd like to share some overview and vision of the future.

We have lot's of stuff going on here and many things almost working that need a last push to get integrated. You know, that 1% boring stuff.

We now have this small kernel working with libraries which is the base of the system. We also have a jitter, a garbage collector, and a starting point of a debugger all of them written in Smalltalk, but none of them is plugged yet. So we are asking ourselves, what do we do next, what do we need, and how do we make all this actually work together? Because they are already written, we just have to plug them.

The work on libraries mentioned in the previous posts opened a lot of doors and from this point on we already started plugging things and we are fixing stuff that is broken without the host VM. The compiler, the jitter, the GC, browsers, what should go first? We think that we can for now go without GC, and it's an interesting idea to see how much can be done without it. I mean, what if you disabled the GC for a while? how much will your system run if you let it use as much memory as it wants without collecting? We are going to see it soon.

Connecting Bee dots!
So for now we decided to leave GC unplugged and go for the JIT first. This will allow us to use plain slls, without native code, as that's what the JIT brings. After that, the compiler is next, and with that the regain a live smalltalk. And there is also a possibility that would be awesome: to compile from a remote host and inject the compiled methods in the guest, that is a live Smalltalk that doesn't need a compiler nor JIT, think in the footprint of that, it is minuscule.

There is also a lot of work in performance that has to be done to tune the runtime but we want to do a rigorous work to measure the impact. This involves two sides of a same coin: benchmarking, in order to compare many different algorithms and language implementations, and profiling so that we can detect the hot areas that need improvement. We have been working to port SMark to Bee, fortunately it is very well written, and an excellent piece of work. We are still missing the port of our profiler, but we almost have a small transcript-like profile log. There are some areas that we know are in need of optimization and that are very easy to improve -low hanging fruit-. Re-enabling the PIC is the most prominent, but at least a monomorphic cache for patching the call site will do for now. Then, we also have to move to a hashed lookup instead of a linear one (yes we are traversing all the method dictionary on each send each time now).

Moving to other zone, we lack browsers. To make browsers work we need to rethink things related to Process, to revise signaling primitives and check some parts of the window messaging architecture.
Then we have to choose which one we need first, I think the debugger and the inspector. We made some tests during Smalltalks 2013 to implement an out-of-process debugger, were we made an initial implementation of halt with a command loop. It was really promising but a lot was missing, we have to work on it. What I would like is to be able to debug transparently local and remote images. The first one is going to be easier and the second one super fun. Besides, the remote debugger should be able to work in two modes: assisted and unassisted mode. In assisted mode, the debugger issues commands to the guest asking to step, continue, etc, by sending messages that the guest responds. On unassisted mode, the debugger halts the guest and manipulates its memory directly. Assisted mode will be most useful but unassisted will let us do a lot of fancy things to fix broken images and do kernel changes on runtime.

In the research department there are so many ideas popping out of my brain that I feel my head is going to explode soon. We showed some bits of native multithreading in Smalltalks this year. It's very interesting, but is going to be really hard to debug, so we'll need better tools before we continue trying that. There are a lot of experiments to do there. So before that I'd like to experiment with the optimizations described in Ungar's Self, mainly inlining. Also, I was thinking about compiling to different architectures. In our design, almost everything falls into the Assembler as a backend. And this is a really nice Assembler in one sense: It doesn't target an explicit architecture but a virtual one, which was implemented loosely following this Wirfs-Brock tech report. It's a stack based architecture with a small amount of registers to store context (receiver, argument, temporary, method context). In our case, the assembler targets Intel x86, but we could implement another one that targets ARM, or maybe LLVM bytecode or even asm.js. These last things are probably not too close yet.

Finally, our code tracking is not as good as we would like, and to get open we'll need something better. We are thinking on using git to track sources, and will probably use Cypress for that. We are porting it to Bee, but we still have to find a way to integrate git to our development process. We have to find a way to stop managing changeset files by hand. Only after this we are going to be able to have a continuous integration server and automatic testing and profiling.

There are so many interesting topics, we want to wrap everything up, but we must focus on one item at a time if we want to be successful, I should go back to plugging the JIT now.

Happy new year!


  1. Sounds interesting. Two things:

    1) Where can I get it and have a look myself, if at all possible?
    2) I would personally go for Mercurial rather than Git. It's just much easier to use and
    the implementation is much more sane.

    1. Hi Jan, I'm glad you ask. We're still working on opening the project. I believe that we should plug the JIT before (or else no new code can be used) and we can skip th GC for later. We have been thinking how to put everything in files to let other people see the code at least.

      Regarding to git vs. mercurial, I think that both are probably just fine. For the use we are going to give it, it's just a matter of which one is more popular for other people.

      Hope to have more news soon, cheers

    2. That would be great if you release it. I'm experimenting (well, just started, time is scarce :-) with GC for truly multithreaded systems and this system looks like a nice fit for an experimental environment :-)


Publicar un comentario

Entradas populares de este blog

Plugging Bee JIT and Compiler

Pre-releasing Bee Smalltalk