Internship Report of Jerome Bou Aziz (June 1999):

I was assigned to work on EiffelBench (known as ebench).
Thus I joined a team composed essentially of Emmanuel Stapf
here in Santa Barbara, but also Michael in Germany, Eric in
France, David Schwartz (for time to time on VMS) and from
time to time, depending on the good will of the INS officer
in charge, Emmanuel Texier (for the run-time).

The first challenge was to get to know ebench deep enough to
be able to work on it: it is a 2500 classes application !!!
Then my work was composed of various little tasks to improve
the compiler, its speed or some bug fixes. To make short a
long story, I choose to present in the rest of this document
one of those tasks: the rewriting of the dead code removal. 

==============================================================================
Table of contents

Introduction

The structure of an Eiffel program seen from the compiler
	The feature id
	The body id
	The routine id

A few word about dependencies
	What are the dependencies ?
	Structure of a server
	Structure of a FEATURE_DEPEDENDANCE

The dead code removal
	Introduction
	The algorithm itself
		The procedure record
		The marking process
			Marking the feature
			Propagation to the clients
			Propagation to the heirs

Problem due to the incrementality of the compiler
	Upgrade of the DEPEND_UNIT
	Upgrade of the CLASS_DEPENDANCE

Conclusion
==============================================================================


Introduction

The main problem with the former dead code removal degree was not a lack of
efficiency in removing dead code but how slowly it was doing it. The challenge
for the new dead code removal was to keep doing exactly the same thing but
much faster.

In fact the rewriting of the dead code removal took place in a larger process
of improvement of the compilation speed. A big customer had a huge system of
more then 3000 classes that took more than 110 hours to finalize. He wanted to
be able to compile it overnight. At the end of the process, he was able to
compile his system in a bit more than 13 hours.

The new dead code removal itself took nearly 2 weeks to come to what it is today.

A few words about the structure of an Eiffel system viewed from inside the
compiler...Got your id ?

An Eiffel program is a set of clusters, each of one is a set of classes, each
of one is a set of feature. Each feature can be either a function or an
attribute.

Inside the compiler the notion of cluster is an artificial notion that we
forget very early. In fact this notion is just taken into account in the
very first degrees of compilation (6 and 5: the parsing degrees) in case of
an override cluster (if two features have the same name, one in the override
cluster, one somewhere else, then it is not an error: the feature in the
override cluster in the right one, the over is to be forgotten).

During the interesting degrees (the compiling ones), the system is viewed as
a set of classes each being a set of features. Internally, this means that a
system is a set of CLASS_C characterized by a unique identifier called
CLASS_ID. A CLASS_C contains a FEATURE_TABLE and an ORIGIN_TABLE which list
all the features in the class.

The feature id

In the FEATURE_TABLE, each feature is identified by a FEATURE_ID The
FEATURE_TABLE is a structure whose interest is local to the class: the
FEATURE_ID is an identifier for a feature in a class, the same FEATURE_ID in
different classes designates different features. Even an inherited feature does
not have the same FEATURE_ID in the parent class and in the heir class. 

The body id

In the ORIGIN_TABLE, each feature is identified by a BODY_ID. On the contrary
of the FEATURE_ID, the BODY_ID is a global identifier: two features with the
same BODY_ID, even in two different classes, are always the same. A BODY_ID is
set to identify a feature body (as suggested in the name). It means that when
you inherit a feature, even if you rename it, it will keep the same BODY_ID in
the heir class. But, obviously, if you redefine it the body of the feature
changes, so does the BODY_ID: an other one will be created for the new feature.
The same thing happens when you modify a feature: it will be incrementally
recompiled and as you changed its body, it will be assigned a new BODY_ID. 

The routine id

So we have a local id, a global id attached to the body of a feature, but for
the moment we have no way to follow the inheritance tree, there is no common
id to the features of a same tree unless we do not redefine anything. An id
common to all the feature of a same line, redefined or not exists: it is called
the ROUTINE_ID because it characterizes a routine line: all the heirs of a same
routine have the same ROUTINE_ID. We do not have to worry about the heir of an
attribute because an attribute cannot be redefined. But what happens in case of
an inheritance diamond? Two features with the same name will become one in the
heir. Concerning the ROUTINE_ID, we have to keep a trace of both lines. We do
that by assigning a list of ROUTINE_ID to the heir feature: in this case the
first id in the list should be the one corresponding to the selected feature.
Note that the table indexed by the ROUTINE_IS is the POLY_TABLE which is filled
at degree minus 2. 

A few word about dependencies

What are the dependencies?

By dependencies,  we mean clientelism dependencies. A class A depends on a
class B if and only if a feature of A uses a feature of B. In other words,
A depends on B if and only if A is a client of B.

The dependencies are stocked in what we call a server. In ebench, a server is
a structure you can use nearly as a hash-table in the sense that it is some
data indexed by a key you can choose freely. Contrary to the hash-table, the
server is designed to be able to manage huge amounts of data using a limited
amount of memory. To accomplish that, it acts like a database: it stocks all
the data on disk using a cache in memory to make the consultation faster.

Structure of the server
	
The server we use for the dependencies is called DEPEND_SERVER. It stores
CLASS_DEPENDANCE indexed by their CLASS_ID. A CLASS_DEPENDANCE is a class that
stores the dependencies of each and every of the features of the class it is
linked to. Technically,  a  CLASS_DEPENDANCE is a hash-table (precisely an
EXTEND_TABLE which is a specialized HASH_TABLE inheriting form nothing and
modified to give a maximum of performance in the server case) storing
FEATURE_DEPENDANCE indexed by the BODY_ID of the studied feature.  To each
feature in the class, corresponds a FEATURE_DEPENDANCE in the CLASS_DEPENDANCE.
Note that it is possible to search for a FEATURE_DEPENDANCE in a
CLASS_DEPENDANCE by BODY_ID of course but also by its name.

Structure of a FEATURE_DEPENDANCE

A FEATURE_DEPENDANCE is a set of DEPEND_UNIT;  each DEPEND_UNIT gives all the
information you need to identify in every table the feature it represents. Each
DEPEND_UNIT in the FEATURE_DEPENDANCE stands for a feature on which depends the
owner of the FEATURE_DEPENDANCE.

Note about the DEPEND_UNIT: as you can see, there is some redundancy in the
DEPEND_UNIT structure: we have several identifiers for a same feature whereas
each of them stands for being globally unique. This was not the case before;
only one unique identifier was kept which let us retrieve the description of
the feature and so the rest of the identifiers. But this process turned out
to be really slow because of disk access and garbage collector overuse. 


The dead code removal

The dead code removal  is mainly accomplished by two classes FEAT_ITERATOR and
REMOVER. To eliminate the dead code, we use the DEPENDANCE information. 
To work, the algorithm needs 3 structures:
* the control list: it is the list of the DEPEND_UNIT still to be treated.
* the marked_table: it is an array of ROUT_ID_SET (list of ROUTINE_ID)
  indexed by BODY_ID. As we have seen, a feature can have more than one
  ROUTINE_ID (case of an inheritance diamond). This structure allows us
  to know if a feature of a certain BODY_ID has already been treated,
  under the view of  what ROUTINE_ID it was considered.
* the used_table: it is an array of BOOLEAN indexed with BODY_ID. At the
  end of the degree, this table will tell us which feature is alive.


The algorithm itself

The algorithm consists on going through the program seen as a tree and to mark
all the feature we can reach from the root (creation feature in the root class
of the system). At the end, all the features reached will be marked in the
used_table. The features not marked in the used_table are dead and should not
be generated.

The procedure record

The entry point in the algorithm is the procedure record: it takes the next
element in the control list and if this element is not already alive, it
launches the marking process on it and goes on and on until the control list
is empty. Note that as all the attributes have to be generated anyway, we do
not consider them in the dead code removal.

	record (feat: FEATURE_I; in_class: CLASS_C) is
			-- Record Eiffel routines reachable by feature `feat' from
			-- static type `in_class'.
		local
			next: FEATURE_I;
			traversal_unit: TRAVERSAL_UNIT;
			a_feature: FEATURE_I;
			dep: DEPEND_UNIT;
			static_class: CLASS_C;
			body_id: BODY_ID
		do
			from
				!! dep.make (in_class.id, feat);
				control.extend (dep)
			until
				control.empty
			loop
				dep := control.item;
				body_id := body_index_table.item (dep.body_index);
				if not (dep.rout_id.is_attribute or else is_alive (body_id.id)) then
					mark (dep.feature_id, dep.body_index, body_id, dep.id,
							dep.written_in, dep.rout_id)
				end;
				control.remove
			end
		ensure
			control_empty: control.empty
		end;


The marking process

The marking process works in 3 phases:
1) marking the feature alive
2) propagation to the clients
3) propagation to the heirs

1) Marking the feature

If the feature is not already alive, then we simply mark it alive.
Note: As far as the dead code removal is concerned, a feature that arrives to
this point is never alive. But the function mark is also used in the array
optimization context and there we do not know.


	mark_and_record (
					feature_id: INTEGER;
					body_index: BODY_INDEX;
					body_id: BODY_ID;
					actual_class_id: CLASS_ID;
					written_class_id: CLASS_ID) is
			-- Mark feature `feat' alive.
			-- (export status {NONE})
		local
			depend_list: FEATURE_DEPENDANCE;
			original_dependances: CLASS_DEPENDANCE;
			just_born: BOOLEAN;
			like_feature: LIKE_FEATURE;
			like_feat: FEATURE_I;
			feature_name: STRING;
			static_class, written_class: CLASS_C;
			depend_feature, original_feature: FEATURE_I;
			a_class: CLASS_C
		do
			just_born := not is_alive (body_id.id);
			debug ("dead_code")
				io.putstring ("MARKING: ");
				a_class := actual_class_id.associated_class;
				io.putstring (a_class.feature_table.feature_of_body_id (body_id).feature_name);
				io.putstring (" (bid: ");
				io.putint (body_id.id);
				io.putstring ("; fid: ");
				io.putint (feature_id);
				io.putstring (") of ");
				io.putstring (a_class.lace_class.name);
				io.putstring (" (");
				io.putint (a_class.id.id);
				io.putstring (") originally in ");
				a_class := written_class_id.associated_class;
				io.putstring (a_class.lace_class.name);
				io.putstring (" (");
				io.putint (a_class.id.id);
				io.putstring (")%N")
			end;
			if just_born then
				mark_alive (body_id.id);
				original_dependances := depend_server.item (written_class_id);
				depend_list := original_dependances.item (body_id);
				if depend_list /= void then
					propagate_feature (written_class_id, body_index, body_id, depend_list)
				end;
				debug ("dead_code")
					io.putstring ("La depend_list contient ");
					if depend_list /= void then
						io.putint (depend_list.count)
					else
						io.putstring ("aucun")
					end;
					io.putstring (" elements.%N");
					io.putstring ("-----------------------------------------------------%N%N%N")
				end
			end;
			debug ("dead_code")
				if not just_born then
					io.putstring ("already alive%N")
				end
			end
		end


2) Propagation to the clients

First, we search for the dependencies of the feature in the DEPEND_SERVER. To do
that, we ask the CLASS_DEPENDANCE corresponding to the class in which the
feature was originally written. Then, in this CLASS_DEPENDANCE, we look for the
FEATURE_DEPENDANCE corresponding to the BODY_ID of the feature.

If the feature has dependencies, for each client we check if it was previously
treated, that is to say the exact feature, exact inheritance branch (same
BODY_ID, same ROUTINE_ID):
* if so, we forget it, it is already alive.
* otherwise, we mark it treated, we mark it dead and we add it to the control
  list. We mark it dead because it may already be alive, treated as belonging
  to an other branch in a multiple inheritance scheme. This way, the new
  branch will be also treated completely (it is important for the phase 3: the
  propagation to the heirs).

	propagate_feature (
					written_class_id: CLASS_ID;
					original_body_index: BODY_INDEX;
					original_body_id: BODY_ID;
					depend_list: FEATURE_DEPENDANCE) is
			-- (export status {NONE})
		local
			depend_unit: DEPEND_UNIT;
			body_id: BODY_ID;
			rout_id: ROUTINE_ID
		do
			from
				depend_list.start
			until
				depend_list.after
			loop
				depend_unit := depend_list.item;
				debug ("dead_code")
					if depend_unit.is_special then
						io.putstring ("special%N")
					end
				end;
				if not depend_unit.is_special then
					body_id := body_index_table.item (depend_unit.body_index);
					debug ("dead_code")
						print_dep (depend_unit);
						if is_treated (body_id.id, depend_unit.rout_id) then
							io.putstring ("previously treated%N")
						end
					end;
					rout_id := depend_unit.rout_id;
					if not (is_treated (body_id.id, rout_id) or else rout_id.is_attribute) then
						mark_treated (body_id.id, rout_id);
						mark_dead (body_id.id);
						debug ("dead_code")
							print ("Since it was not treated, we marked bid: ");
							print (body_id.id);
							print (" dead%N")
						end;
						control.extend (depend_unit)
					end
				end;
				depend_list.forth
			end;
			array_optimizer.process (written_class_id.associated_class, original_body_index, original_body_id, depend_list)
		end;


3) The propagation to the heirs

We retrieve the POLY_TABLE corresponding to the ROUTINE_ID of the feature. In
this table are listed the routines of a same branch: they are the redefinitions
of a routine inherited from higher in the hierarchy.

Then we go through the POLY_TABLE and we treat every feature whose original
class conforms to the original class of the feature we are currently
considering. 

	mark (
		f: INTEGER;
		body_index: BODY_INDEX;
		body_id: BODY_ID;
		static_class_id: CLASS_ID;
		original_class_id: CLASS_ID;
		rout_id_val: ROUTINE_ID) is
			-- Mark feature and its redefinitions
			-- (from FEAT_ITERATOR)
			-- (export status {NONE})
		local
			table: ROUT_TABLE;
			old_position: INTEGER;
			unit: ROUT_ENTRY;
			body_id_unit: BODY_ID
		do
			mark_and_record (f, body_index, body_id, static_class_id, original_class_id);
			if tmp_poly_server.has (rout_id_val) then
				debug ("dead_code")
					print ("%NMarking Poly_table `");
					print (rout_id_val.id);
					print ("%'; for feature written in ");
					print (original_class_id.associated_class.name);
					print ("%N")
				end;
				table ?= tmp_poly_server.item (rout_id_val);
				check
					table_exists: table /= void
				end;
				from
					table.start
				until
					table.after
				loop
					unit := table.item;
					if unit.id.associated_class.simple_conform_to (original_class_id.associated_class) then
						old_position := table.position;
						if not is_alive (unit.body_id.id) then
							debug ("dead_code")
								io.putstring ("marking for rout_id: ");
								io.putint (rout_id_val.id);
								io.putstring ("%N")
							end;
							body_id_unit := body_index_table.item (unit.body_index);
							mark (unit.feature_id, unit.body_index, body_id_unit, unit.id, unit.written_in, rout_id_val)
						end;
						table.go_to (old_position)
					end;
					table.forth
				end
			end
		end;


Problems due to the incrementality of the compiler

Ebench is an incremental compiler. This means that when you make a modification
to your system, it will not recompile the whole system but just the part you
modified and its dependencies.

But when you modify the a feature, by definition its body changes. And so must
do its BODY_ID. But as we have seen the BODY_ID is one of the identifiers that
we save in the DEPEND_UNIT. So each time we change a BODY_ID we must also
upgrade all the DEPEND_UNIT and the CLASS_DEPENDANCE that contains it.

Upgrade of the DEPEND_UNIT

We do not really upgrade all the DEPEND_UNIT that contains it. In fact, we have
in the system a table, the BODY_INDEX_TABLE, that indexes all the BODY_Ids. In
the DEPEND_UNIT, we store a key in this table, the BODY_INDEX and we read the
BODY_ID corresponding in this table each time we need it. This way, when the
BODY_ID changes, we just have to upgrade an entry in this table.

Upgrade of the CLASS_DEPENDANCE

This is far more complicated. the CLASS_DEPENDANCE linked to the class in which
the modified feature was written is itself a table indexed by BODY_ID. So the
FEATURE_DEPENDANCE of the modified feature is the entry in this table of index
the old BODY_ID and should now become the entry of index the new BODY_ID.

The problem is that at the moment the BODY_ID changes, the only information we
have is the old BODY_ID and the new one. With that, we must search in the
DEPEND_SERVER, retrieve the CLASS_DEPENDANCE concerned and upgrade it.

To do so, in the DEPEND_SERVER, we have defined a table, the bid_cid_table to
link a BODY_ID to the CLASS_ID of the class in which it is written. This way,
we are able to retrieve the right CLASS_DEPENDANCE and upgrade it: we delete
the entry with the old BODY_ID and introduce a new one with new BODY_ID
containing the same thing.


Note: If a bug appear concerning a wrong upgrade of BODY_ID, please have a
look at the function change_body_ids from FEATURE_I.


Conclusion

On the technical point of view, although a few bugs at the beginning, I can say
that the new debugger is a real amelioration compared to what was before. It is
not a revolution since the algorithm itself is pretty much the same. But the
new implementation allowed a huge acceleration of the process: on a test
application of 500 classes, the former dead code removal took 5 minutes to work
whereas the new one need only a bit more then 20 seconds !!!

In this document is presented the last version of the debugger. In reality,
I cannot remember exactly how many version were tried before the good one,
but they were a lot. And this is very representative of my internship at ISE:
I did a lot of little things in fields I knew nothing about. This obliged me to
go each time  through a process of try/failure before finding an appropriate
solution.

Thanks to this internship, I learnt better how a OOL is working and also how
important are its various features. Before coming here, I have got to admit that
I did not understand well the importance of having or not multiple inheritance,
or the importance of genericity in the design of a program. But now I realize
how important those concepts are and how they can make life easier to the
poor system designer.

On a more human aspect, I would like to thanks Bertrand Meyer for this
internship, Annie Meyer for her help in the everyday life and I would like
to thanks all the team here at ISE for creating such a great atmosphere. I will
add a special thanks to Emmanuel Stapf, with whom I have worked more closely,
for his support and his help despite the bugs his baby suffered from time
to time because of me.