Improving GC performance

GC itself

We should try to optimize the scavenging which is in my opinion not filling his role as well as it should.

Generated code

Comparison to Boehm GC

When you compare the performance of an application linked with the Boehm GC instead of our GC, you notice that the application linked with Boehm takes 75% of the time of the application linked with our GC. However you notice that the time spent in the GC is really different to the advantage of the ISE as we also take less memory (we take twice as less time and use 35MB less than the Boehm GC). So the only difference that remains is the generated code, and between Boehm GC and our GC, the real major difference is the stack management. Ours is manual, Boehm's one is done by using the hardware stack.

	Time	Memory
Boehm	3m4s	85MB
Boehm in MT	3m9s	85MB
ISE	4m9s	51MB
ISE in MT	5m17s	51MB

The thing that strikes is the difference between non-MT and MT in ISE. The reason is that local specific data is used extensively in ISE GC, but not used at all in Boehm. As a consequence all the stack management routines are done through an indirection ("eif_globals") which is the only difference I can see at this point.

I thought that EIF_GET_CONTEXT had a cost because it might be inefficient, but it does not seems to be the case. To test that, instead of using EIF_GET_CONTEXT to retrieve `eif_globals', I've decided to pass it as first argument of all our routines. We avoid a call, but we put more on the stack. We get an improvement in speed, but only 8s on over 5m. We also get an improvement on the size of the generated code.

	Time	Executable Size
ISE in MT using EIF_GET_CONTEXT	5m17s	7,843,840 bytes
ISE in MT using argument passing of eif_globals	5m9s	7,741,440 bytes

Improving stack management

At the moment we have a global variable `loc_set' which tracks all references pushed on stack. It works like:

void Eiffel_routine (EIF_REFERENCE Current, EIF_REFERENCE arg1, EIF_INTEGER arg2) {
	EIF_REFERENCE loc1 = NULL;
	EIF_REFERENCE loc2 = NULL;
	RTLI (4);
	RTLR(0, Current);
	RTLR(1, arg1);
	RTLR(2, loc1);
	RTLR(3, loc2);
	...
	RTLE;
}

A new idea would be to do:

void Eiffel_routine (EIF_REFERENCE Current, EIF_REFERENCE arg1, EIF_INTEGER arg2) {
	struct locals {
		EIF_REFERENCE loc1;
		EIF_REFERENCE loc2;
		EIF_REFERENCE Current;
		EIF_REFERENCE arg1;
	} l;
	memset(&l, 0, 2 * sizeof(EIF_REFERENCE));
	l.Current = Current;
	l.arg1 = arg1;
	add_loc_set (&l, 4);
	...
	remove_loc_set;
}

The advantage I see are:

Only one address computation is required and `loc_set' structure is always incremented by one element (less resizing of loc_set)
Instead of performing manually the zeroing of `locx' variables, we use `memset' which might be more efficient.

The disadvantage I see are:

The stack might be larger as we create a space in the C stack for all reference arguments.
memset might not be inlined on some platforms and therefore be inefficient