Hash Tables

Hash tables are a convenient mechanism to store and retrieve objects identified by unique keys.

Why use hash tables?

The main advantage of hash tables is the efficiency of the basic operations: store (put) and retrieve (item, remove).
The idea behind hash tables is to try to emulate the data structure that provides the ultimate in efficiency: the array. On an array a, for some integer i whose value lies within the bounds ofa, the basic operations are

a.put (x, i)
x := a.item (i)
x := a @ i

The first causes the value of a at index i to be x; the second (and the third, which is simply a syntactical variant) access the value at index i and assign it to x. With the usual computer architectures, these operations are very fast: because arrays items are stored contiguously in memory, a computer will need just one addition (base address plus index) and one memory access to perform a put or item.
Not only are the operation times small; they are constant (or more precisely bounded by a constant). This is a great advantage over structures such as lists or trees which you must traverse at least in part to retrieve an item, so that access and modification times grow with the number of items. With an array, disregarding the influence of other factors such as memory paging, the time for a put or item is for all practical purposes the same whether the array has five items or five hundred thousand. These properties make arrays excellent data structures for keeping objects. Unfortunately, they are only applicable if the objects satisfy three requirements:

Hash tables may be viewed as a rehabilitation mechanism for objects that do not naturally possess these three properties. If we are unable to find a natural index, we can sometimes devise an artificial one. To do so we must be able to find a key. Each key must uniquely identify the corresponding object; this is the same as property A2, making keys similar to indices. But keys are not necessarily integers (violating property A1), although it must be possible to associate an integer with each key. The mechanism that maps keys to integers is called the hashing function.
Thanks to the hashing mechanism we will indeed be able to store suitable objects into arrays, approaching the optimal efficiency of this data structure. The efficiency will not be quite as good, however, for two reasons:

With good implementations, however, it is possible to use hash tables with a performance that is not much worse than that of arrays and, most importantly, may be treated as if the time for a put, an item or a remove were constant. This will mean that you can consider operations such as

h.put (x, k)
h := a.item (k)

where h is a hash-table and k is a key (for example a string) as conceptually equivalent to the array operations mentioned above.
The quality of a hashed implementation will depend both on the data structure that will store the objects, and on the choice of hashing function. Class HASH_TABLE attempts to address the first concern; for the second, client developers will be responsible for choosing the proper hashing function, although Base provides a few predefined functions, in particular for class STRING.

When hash tables are appropriate

You may keep objects in a hash table if for each one of these objects you can find a key that uniquely identifies it. The objects and their keys may be of many possible kinds:

What this practically means is that in all cases you will need, in the generating class of the objects to be stored, a query (attribute or function) that gives the key. The type of the key is highly variable but must in all cases be a descendant of HASHABLE. This is true of both INTEGER (case H1) and STRING (case H2). The requirements for being a HASHABLE are not harsh: all you need is a function hash_code that returns a non-negative integer.

Using hash tables

Class HASH_TABLE takes two generic parameters:

class HASH_TABLE [G, H -> HASHABLE] ...

G represents the type of the objects to be stored in the hash table, H the type of their keys.
When viewed as an implementation of containers, HASH_TABLE, in a strict sense, represents bags rather than sets: unlike the other classes in this chapter, it allows an object to have two or more distinct occurrences in a single container. But this is only true if we consider a hash table as a repository of objects of type G. In reality each item of the table is identified by a pair of values, one from G and one from H. Because the keys must uniquely identify objects, the hash table viewed as a container of such pairs is indeed a set. The creation procedure make takes an integer argument, as in

create my_table.make (n)

The value of n indicates how many items the hash table is expected to have to accommodate. This number of items is not a hardwired size, just information passed to the class. In particular:

It is useful, however, to use a reasonable n upon creation: not too large to avoid wasting space, but not too small to avoid frequent applications of resizing, an expensive operation.