The implementation of generic conformance
=========================================

I) The problem and the previous state

The language definition requires that if
we check conformance between two types T1
and T2 then we must take into account generic
parameters, if any. For example, LIST [A]
conforms to LIST [B] if and only if A con-
forms to B.

Internally, in the compiler, this is properly
handled. However, externally, in the generated
system, this was not properly handled.

In the generated system we need to do a conformance
check in one of the following two situations:

1) Assignment attempt
2) A call to 'conforms_to', a feature of class
   GENERAL.

But the compiler was designed in such a way that
it assigns a single id (=an integer) to every type
in the system, subject to the constraint that in
generic derivations all actual generic parameters
of reference type are treated equal, that is,
a LIST [A] and a LIST [B] get the same id if
A and B are both reference types.

The type id is used during object creation and
it is stored in the objects header. Therefore,
at run-time, an object of type LIST [A] was
typewise indistinguishable from an object of
type LIST [B] if both A and B were reference
types. Hence an assignment attempt from one
to the other always succeeded even if A and
B were totally unrelated with respect to
ancestorship.

The generic conformance mechanism solves this
problem.

II) The goals

The first goal was to fully implement the language 
requirements for conformance checks at run-time.

The second goal was to be able to access the i-th
actual generic parameter of the type of an object
'o'.

III) The consequences

Consequence 1:

Given an object 'o' it must be possible at run-time 
to compute 'o's exact type, including, recursively, 
all its possible actual generic parameters.

So if, e.g. 'a' is an object of type LIST [A] and
'aa' one of type LIST [LIST [A]] we must be able
to extract this information from 'a' and 'aa' and
we must be able to extract the type of the actual
generic parameter.

Consequence 2:

We must be able to do the conformance check.

IV) An additional requirement

We must be able to deal with all possible
legal combinations of generic base classes
and generic parameters.

This requirement is not enforced by the
language and is not unanimously accepted so
let me explain:

Suppose in a system you have an entity of
type LIST [ANY] and another entity of type
STRING but you do not have any entity of
type LIST [STRING]. Suppose further that
your system reads a file and this file 
contains a LIST [STRING]:

    local
        l : LIST [ANY]
    do
        l ?= my_file.retrieved
        ...

The question is: should this assignment
attempt fail or not? I have the strong
feeling that it should not fail but
this is debatable. Another example is
a function

    f (a : ANY) : LIST [like a] is do "create the result" end

Then f(a) (a an ANY) should create a LIST [ANY],
f (f (a)) a LIST [LIST [ANY]], etc. ad infinitum.

If one thinks that the assignment attempt from
above should not fail and/or the function should
not be invalid then an immediate consequence is:

Consequence 3:

The set of all possible types occurring in
a given system can be infinite.

Of course this does not mean that in one
execution of one system there can be 
infinitely many types. It means that
there're potentially infinitely many
types - the compiler cannot compute
a complete list. 

Thus the set of all types which may occur
during the exectution of a given system
is in general not computable. And even 
if we forced it to be computable by limiting
the nesting depth of generic derivations,
it'll be huge.

In the sequel the additional assumption IV
will be assumed.

V) The basics of the new mechanism

Under assumption IV the compiler cannot assign
id's to all types in a system because the set
of these may be infinite. Hence the run-time
must do it somehow. The basic idea was then
to assign a sequence of ids to a generic type
and to create appropriate tables (actually only
one) which will enable the run-time to create new 
ids which uniquely identify each generic 
derivation. Such a sequence is called a 'cid' (read:
compound id). We use the ids already created by the 
compiler to build cids.

So if, for example, LIST has id 100, A has
id 200 and B has id 300 then the cids are

 LIST [A] : 100, 200
 LIST [B] : 100, 300

Now, whenever we create an instance of LIST [A]
or ([B]) we put the corresponding cid in the code
and then call a feature in the run-time which
converts the cid (=array of integers) into a
single id (=integer) and this id will be stored
in the header of the object. Different cids
will get different ids. And the ids assigned
to cids are larger than any id generated by
the compiler so that we can tell whether an
id belongs to a generic type or not. In the
above example we may get the ids

    500 for the cid 100, 200
    501 for the cid 100, 300

Conversely, given an object, we get the id from
the header and if the objects type was generic
this id uniquely identifies a cid which in turn
uniquely identifies a type. In the example, if
the type id of the object is 500 then we know
that the cid is 100, 200 and from this we know
that we have a LIST [A] and likewise if the id
is 501 then we know that we have a LIST [B].
 
What about recursion?

What, for example, is the cid for LIST [LIST [A]]?
Assuming the ids given to LIST (100) and A (200)
it's:

  LIST [LIST [A]] : 100, 100, 200

The run-time can handle this because it knows (see
below) that LIST takes one gen. param. The run-time
function which creates new ids will see the first
'100' of the cid. A simple table lookup tells it
that we have a LIST and that a LIST has only one
parameter. It then calls itself recursively on the
remainder of the list (here: 100, 200) and this
returns the id of LIST [A] (=500). Now we have 
a 'condensed' cid: 100, 500 and it gets a new
id (if it doesn't have one already), 502, say.
Then type id = 502 means cid = 100, 100, 200,
which means LIST [LIST [A]].

I think that this sufficies to explain the basic
idea.

VI) Generated tables

To implement it we generate just one table, the
parent table:

For every class C in the system we generate a
table of its parents where a parent class is
simply represented by its cid (which is simply
its id if the parent is non-generic). If C is
generic, e.g. C [T1, T2] and if C has a generic
parent with unresolved formals, e.g. P [T2],
we also code this information into the table.
That is, we know later that C [U, V] conforms
to P [V].

I said that we generate entries for every class;
this is not quite correct: we generate entries
for every generic derivation, that is only one 
entry for LIST [REFERENCE] but different entries 
for LIST [basic] or LIST [expanded].

We do not generate conformance tables any more
(in fact, currently (4.3), we do, but they are not
used any more).

The file in which the frozen/finalized table
resides is 'eparents.c' in the E1 subdirectory
of the generation. Of course we put table 
updates in the melt file, if necessary, and
'update.c' takes care of them.

VII) Conformance checks

It's quite obvious how conformance checks at
run-time can be done:

in an assignment attempt

    x ?= y

the compiler will generate enough info for the
declared type of x, that is, it generates a cid
if x is generic and generates a call into the
run-time which converts the cid into a single
id, otherwise it simply generates the type id
of the target x. If y is not void then it carries
an id which gives full info about its type. Hence
we have two type ids Tx and Ty and we want to know
whether Ty conforms to Tx. We ask the run-time.
It keeps a conformance table (optimized, that is,
packed bit table) for such queries. The table
is initially empty. If there's an entry for
(Ty/Tx) it returns the answer at once. Otherwise
it computes a new table for 'To whom does Ty
conform to?' and then it returns the answer.

Note that this computation is not exhaustive:
if you ask 'does LIST [A] conform to LIST [B]'
then not all possible types to which LIST [A]
may conform to are considered because that is
impossible in general - but it is carefully
recorded in another bit array whether a certain
conformance question has been answered already.

This is perhaps too cryptic so let's have an
example: Assume A, B and C are reference types.
Assume that the system first creates an object
of type LIST [C], then one of type LIST [A]
then one of type LIST [B]. Assume further that
these get ids 500, 501 and 502 resp. Now you
ask: does LIST [A] conform to LIST [B]. A bit
table for LIST [A] (id = 501) is computed as
follows: Suppose LIST [G] has a non-generic
parent P then it is recorded that LIST [A]
conforms to P. In this case id(P) < 501.
If P = P [G] is generic (this is extracted
from the above mentioned parent table) then
we first replace 'G' by 'A' then compute the
id of P [A] (which may be new) and then we
generate an entry in the table for LIST [A].
We process all the parents of LIST [G] and
recursively their parents until we're done
(= reached GENERAL through all the branches).

Now our table has entries for all ids <= 501
(and possibly some entries for ids > 501 if
new generic types have occurred in the process),
in particular one for LIST [C] (id = 500) and
this will in general be 0. But the question
whether or not LIST [A] conforms to LIST [C]
has not been answered in this process and 
LIST [C] has an entry in the table for LIST [A].
So without precaution a subsequent query for
conformance of LIST [A] to LIST [C] would be
answered with 'NO'. This is why we record in
a parallel bit array whether the question has
already been asked.

VIII) Extracting formal generic parameters

In a generic class C [T1, T2,..] we want to
be able to create an entity of formal generic
type Ti. To this end me must be able to
extract the i-th actual generic parameter
from Current. In the example from above
Current would be of type LIST [A] with
type id 500. The run-time keeps an array
of structures indexed by type id with one
entry per generic derivation. So in the
case at hand it would lookup the entry
at index 500 which, among other things,
contains an array of type ids, namely the
actual generics. So we have immediate
access to the i-th actual generic parameter.

However, another table is necessary to
get correct behaviour because when you 
ask "what is the i-th actual generic
parameter of Current?" then it can happen
that Current is not of generic type at all.

Example:

class C [G]

feature
    a : G

    r is do 'create a' end
end

class D inherit C [T] end

If 'r' is called with Current of type D then
we must be able to determine that the actual
generic is T. To this end we compute (if not
already present) ids for all parents of D
and recursively the parents of those until
we have an id for the ancestor type in 
question (here C [T]). Then we use this
id instead of the one stored in Current. 
The ids computed in this manner are stored
in a table (array) for D - the ancestor
id map.

IX) Accessing other tables

Various tables generated by the compiler,
such as feature tables, are indexed by
compiler generated type ids. In the example
from above the id generated by the compiler
for a LIST [REFERENCE] is 100 but the id
generated by the run-time for LIST [A] is
500 and this is stored in the object. We
cannot use 500 as an index to access a
feature of class LIST. Hence the run-time
computes a map (an array) which maps the
ids generated at run-time to the corres-
ponding id generated by the compiler.
This is the eif_cid_map used by the Dtype
macro. Here, eif_cid_map [500] = 100.

X) BIT types

Since BIT types come in various flavours
(BIT 8, BIT 12, etc.) they are treated in
the same ways as generic types: the run-time
assigns ids to them. This enables proper
conformance checks, e.g. between LIST [BIT n]
and LIST [BIT m]. The numeric parameter
is treated as if it were a generic parameter.

XI) TUPLE types

In the cid for a tuple a special code is
used to tell the run-time that it is a 
TUPLE type and the number of actual generics
is part of the cid (it cannot be read from 
a static table - parent table - because
the number of generics is not constant).

XII) anchored types

Anchored types are also encoded in the cid.
For example, the cid for LIST [like attr]
contains the information necessary to get
the exact type of 'attr'. Likewise for
anchors to arguments.

XIII) Optimizations

Although the compiler does not produce
unique ids for generic types it can, in
many cases, predict that the id generated
by the run-time in a certain creation
instruction will be the same in all
executions of that instruction. In this
case the value computed in the first
execution gets cached in a static local
variable. The next execution will not
need a table lookup to get the id for
the cid. Hence performance is nearly
the same as if the compiler had pre-
computed an id. This optimization is
done whenever the cid does not contain
anchors or formal generics. For example,
the id for a creation of a LINKED_LIST [STRING]
will be cached.