The implementation of generic conformance ========================================= I) The problem and the previous state The language definition requires that if we check conformance between two types T1 and T2 then we must take into account generic parameters, if any. For example, LIST [A] conforms to LIST [B] if and only if A con- forms to B. Internally, in the compiler, this is properly handled. However, externally, in the generated system, this was not properly handled. In the generated system we need to do a conformance check in one of the following two situations: 1) Assignment attempt 2) A call to 'conforms_to', a feature of class GENERAL. But the compiler was designed in such a way that it assigns a single id (=an integer) to every type in the system, subject to the constraint that in generic derivations all actual generic parameters of reference type are treated equal, that is, a LIST [A] and a LIST [B] get the same id if A and B are both reference types. The type id is used during object creation and it is stored in the objects header. Therefore, at run-time, an object of type LIST [A] was typewise indistinguishable from an object of type LIST [B] if both A and B were reference types. Hence an assignment attempt from one to the other always succeeded even if A and B were totally unrelated with respect to ancestorship. The generic conformance mechanism solves this problem. II) The goals The first goal was to fully implement the language requirements for conformance checks at run-time. The second goal was to be able to access the i-th actual generic parameter of the type of an object 'o'. III) The consequences Consequence 1: Given an object 'o' it must be possible at run-time to compute 'o's exact type, including, recursively, all its possible actual generic parameters. So if, e.g. 'a' is an object of type LIST [A] and 'aa' one of type LIST [LIST [A]] we must be able to extract this information from 'a' and 'aa' and we must be able to extract the type of the actual generic parameter. Consequence 2: We must be able to do the conformance check. IV) An additional requirement We must be able to deal with all possible legal combinations of generic base classes and generic parameters. This requirement is not enforced by the language and is not unanimously accepted so let me explain: Suppose in a system you have an entity of type LIST [ANY] and another entity of type STRING but you do not have any entity of type LIST [STRING]. Suppose further that your system reads a file and this file contains a LIST [STRING]: local l : LIST [ANY] do l ?= my_file.retrieved ... The question is: should this assignment attempt fail or not? I have the strong feeling that it should not fail but this is debatable. Another example is a function f (a : ANY) : LIST [like a] is do "create the result" end Then f(a) (a an ANY) should create a LIST [ANY], f (f (a)) a LIST [LIST [ANY]], etc. ad infinitum. If one thinks that the assignment attempt from above should not fail and/or the function should not be invalid then an immediate consequence is: Consequence 3: The set of all possible types occurring in a given system can be infinite. Of course this does not mean that in one execution of one system there can be infinitely many types. It means that there're potentially infinitely many types - the compiler cannot compute a complete list. Thus the set of all types which may occur during the exectution of a given system is in general not computable. And even if we forced it to be computable by limiting the nesting depth of generic derivations, it'll be huge. In the sequel the additional assumption IV will be assumed. V) The basics of the new mechanism Under assumption IV the compiler cannot assign id's to all types in a system because the set of these may be infinite. Hence the run-time must do it somehow. The basic idea was then to assign a sequence of ids to a generic type and to create appropriate tables (actually only one) which will enable the run-time to create new ids which uniquely identify each generic derivation. Such a sequence is called a 'cid' (read: compound id). We use the ids already created by the compiler to build cids. So if, for example, LIST has id 100, A has id 200 and B has id 300 then the cids are LIST [A] : 100, 200 LIST [B] : 100, 300 Now, whenever we create an instance of LIST [A] or ([B]) we put the corresponding cid in the code and then call a feature in the run-time which converts the cid (=array of integers) into a single id (=integer) and this id will be stored in the header of the object. Different cids will get different ids. And the ids assigned to cids are larger than any id generated by the compiler so that we can tell whether an id belongs to a generic type or not. In the above example we may get the ids 500 for the cid 100, 200 501 for the cid 100, 300 Conversely, given an object, we get the id from the header and if the objects type was generic this id uniquely identifies a cid which in turn uniquely identifies a type. In the example, if the type id of the object is 500 then we know that the cid is 100, 200 and from this we know that we have a LIST [A] and likewise if the id is 501 then we know that we have a LIST [B]. What about recursion? What, for example, is the cid for LIST [LIST [A]]? Assuming the ids given to LIST (100) and A (200) it's: LIST [LIST [A]] : 100, 100, 200 The run-time can handle this because it knows (see below) that LIST takes one gen. param. The run-time function which creates new ids will see the first '100' of the cid. A simple table lookup tells it that we have a LIST and that a LIST has only one parameter. It then calls itself recursively on the remainder of the list (here: 100, 200) and this returns the id of LIST [A] (=500). Now we have a 'condensed' cid: 100, 500 and it gets a new id (if it doesn't have one already), 502, say. Then type id = 502 means cid = 100, 100, 200, which means LIST [LIST [A]]. I think that this sufficies to explain the basic idea. VI) Generated tables To implement it we generate just one table, the parent table: For every class C in the system we generate a table of its parents where a parent class is simply represented by its cid (which is simply its id if the parent is non-generic). If C is generic, e.g. C [T1, T2] and if C has a generic parent with unresolved formals, e.g. P [T2], we also code this information into the table. That is, we know later that C [U, V] conforms to P [V]. I said that we generate entries for every class; this is not quite correct: we generate entries for every generic derivation, that is only one entry for LIST [REFERENCE] but different entries for LIST [basic] or LIST [expanded]. We do not generate conformance tables any more (in fact, currently (4.3), we do, but they are not used any more). The file in which the frozen/finalized table resides is 'eparents.c' in the E1 subdirectory of the generation. Of course we put table updates in the melt file, if necessary, and 'update.c' takes care of them. VII) Conformance checks It's quite obvious how conformance checks at run-time can be done: in an assignment attempt x ?= y the compiler will generate enough info for the declared type of x, that is, it generates a cid if x is generic and generates a call into the run-time which converts the cid into a single id, otherwise it simply generates the type id of the target x. If y is not void then it carries an id which gives full info about its type. Hence we have two type ids Tx and Ty and we want to know whether Ty conforms to Tx. We ask the run-time. It keeps a conformance table (optimized, that is, packed bit table) for such queries. The table is initially empty. If there's an entry for (Ty/Tx) it returns the answer at once. Otherwise it computes a new table for 'To whom does Ty conform to?' and then it returns the answer. Note that this computation is not exhaustive: if you ask 'does LIST [A] conform to LIST [B]' then not all possible types to which LIST [A] may conform to are considered because that is impossible in general - but it is carefully recorded in another bit array whether a certain conformance question has been answered already. This is perhaps too cryptic so let's have an example: Assume A, B and C are reference types. Assume that the system first creates an object of type LIST [C], then one of type LIST [A] then one of type LIST [B]. Assume further that these get ids 500, 501 and 502 resp. Now you ask: does LIST [A] conform to LIST [B]. A bit table for LIST [A] (id = 501) is computed as follows: Suppose LIST [G] has a non-generic parent P then it is recorded that LIST [A] conforms to P. In this case id(P) < 501. If P = P [G] is generic (this is extracted from the above mentioned parent table) then we first replace 'G' by 'A' then compute the id of P [A] (which may be new) and then we generate an entry in the table for LIST [A]. We process all the parents of LIST [G] and recursively their parents until we're done (= reached GENERAL through all the branches). Now our table has entries for all ids <= 501 (and possibly some entries for ids > 501 if new generic types have occurred in the process), in particular one for LIST [C] (id = 500) and this will in general be 0. But the question whether or not LIST [A] conforms to LIST [C] has not been answered in this process and LIST [C] has an entry in the table for LIST [A]. So without precaution a subsequent query for conformance of LIST [A] to LIST [C] would be answered with 'NO'. This is why we record in a parallel bit array whether the question has already been asked. VIII) Extracting formal generic parameters In a generic class C [T1, T2,..] we want to be able to create an entity of formal generic type Ti. To this end me must be able to extract the i-th actual generic parameter from Current. In the example from above Current would be of type LIST [A] with type id 500. The run-time keeps an array of structures indexed by type id with one entry per generic derivation. So in the case at hand it would lookup the entry at index 500 which, among other things, contains an array of type ids, namely the actual generics. So we have immediate access to the i-th actual generic parameter. However, another table is necessary to get correct behaviour because when you ask "what is the i-th actual generic parameter of Current?" then it can happen that Current is not of generic type at all. Example: class C [G] feature a : G r is do 'create a' end end class D inherit C [T] end If 'r' is called with Current of type D then we must be able to determine that the actual generic is T. To this end we compute (if not already present) ids for all parents of D and recursively the parents of those until we have an id for the ancestor type in question (here C [T]). Then we use this id instead of the one stored in Current. The ids computed in this manner are stored in a table (array) for D - the ancestor id map. IX) Accessing other tables Various tables generated by the compiler, such as feature tables, are indexed by compiler generated type ids. In the example from above the id generated by the compiler for a LIST [REFERENCE] is 100 but the id generated by the run-time for LIST [A] is 500 and this is stored in the object. We cannot use 500 as an index to access a feature of class LIST. Hence the run-time computes a map (an array) which maps the ids generated at run-time to the corres- ponding id generated by the compiler. This is the eif_cid_map used by the Dtype macro. Here, eif_cid_map [500] = 100. X) BIT types Since BIT types come in various flavours (BIT 8, BIT 12, etc.) they are treated in the same ways as generic types: the run-time assigns ids to them. This enables proper conformance checks, e.g. between LIST [BIT n] and LIST [BIT m]. The numeric parameter is treated as if it were a generic parameter. XI) TUPLE types In the cid for a tuple a special code is used to tell the run-time that it is a TUPLE type and the number of actual generics is part of the cid (it cannot be read from a static table - parent table - because the number of generics is not constant). XII) anchored types Anchored types are also encoded in the cid. For example, the cid for LIST [like attr] contains the information necessary to get the exact type of 'attr'. Likewise for anchors to arguments. XIII) Optimizations Although the compiler does not produce unique ids for generic types it can, in many cases, predict that the id generated by the run-time in a certain creation instruction will be the same in all executions of that instruction. In this case the value computed in the first execution gets cached in a static local variable. The next execution will not need a table lookup to get the id for the cid. Hence performance is nearly the same as if the compiler had pre- computed an id. This optimization is done whenever the cid does not contain anchors or formal generics. For example, the id for a creation of a LINKED_LIST [STRING] will be cached.