Improving GC performance

GC itself

We should try to optimize the scavenging which is in my opinion not filling his role as well as it should.

Generated code

Comparison to Boehm GC

When you compare the performance of an application linked with the Boehm GC instead of our GC, you notice that the application linked with Boehm takes 75% of the time of the application linked with our GC. However you notice that the time spent in the GC is really different to the advantage of the ISE as we also take less memory (we take twice as less time and use 35MB less than the Boehm GC). So the only difference that remains is the generated code, and between Boehm GC and our GC, the real major difference is the stack management. Ours is manual, Boehm's one is done by using the hardware stack.

  Time Memory
Boehm 3m4s 85MB
Boehm in MT 3m9s 85MB
ISE 4m9s 51MB
ISE in MT 5m17s 51MB

The thing that strikes is the difference between non-MT and MT in ISE. The reason is that local specific data is used extensively in ISE GC, but not used at all in Boehm. As a consequence all the stack management routines are done through an indirection ("eif_globals") which is the only difference I can see at this point.

I thought that EIF_GET_CONTEXT had a cost because it might be inefficient, but it does not seems to be the case. To test that, instead of using EIF_GET_CONTEXT to retrieve `eif_globals', I've decided to pass it as first argument of all our routines. We avoid a call, but we put more on the stack. We get an improvement in speed, but only 8s on over 5m. We also get an improvement on the size of the generated code.

  Time Executable Size
ISE in MT using EIF_GET_CONTEXT 5m17s 7,843,840 bytes
ISE in MT using argument passing of eif_globals 5m9s 7,741,440 bytes

 

Improving stack management

At the moment we have a global variable `loc_set' which tracks all references pushed on stack. It works like:

void Eiffel_routine (EIF_REFERENCE Current, EIF_REFERENCE arg1, EIF_INTEGER arg2) {
	EIF_REFERENCE loc1 = NULL;
	EIF_REFERENCE loc2 = NULL;
	RTLI (4);
	RTLR(0, Current);
	RTLR(1, arg1);
	RTLR(2, loc1);
	RTLR(3, loc2);
	...
	RTLE;
}

A new idea would be to do:

void Eiffel_routine (EIF_REFERENCE Current, EIF_REFERENCE arg1, EIF_INTEGER arg2) {
	struct locals {
		EIF_REFERENCE loc1;
		EIF_REFERENCE loc2;
		EIF_REFERENCE Current;
		EIF_REFERENCE arg1;
	} l;
	memset(&l, 0, 2 * sizeof(EIF_REFERENCE));
	l.Current = Current;
	l.arg1 = arg1;
	add_loc_set (&l, 4);
	...
	remove_loc_set;
}

The advantage I see are:

The disadvantage I see are: