Using the XML parser | ![]() ![]() |
Event interfaces are the lowest level of communication with an XML parser. An event interface is a deferred class containing callback calls. Sources of events, like a parser, have routines to attach a descendant of the event interface.
For each event interface, there is a purely deferred class with the callbacks, of which clients inherit, and a 'source' class, which events sources, like the parser, inherit. For the main XML content events, the event interface is XM_CALLBACKS, and the source is XM_CALLBACKS_SOURCE. It provides a set_callbacks feature, and the parser inherits from it.
DTD events are covered separately, for parsers that support them, using XM_DTD_CALLBACKS and XM_DTD_CALLBACKS_SOURCE (with set_dtd_callbacks).
The public interface of XML parsers is represented in the deferred class XM_PARSER. Parsers are event sources, inheriting from the event sources classes to provide set_callbacks and set_dtd_callbacks. The input document is set using parse_from_stream and similar features. Incremental parsing routines are available to parse a document a chunk at a time, if the parser supports it, which can be checked with is_incremental.
Errors can be collected but are also forwarded to the event interface. Because an event filter stream as described below can produce its own errors, not reflected in the event source that is the parser, it may be more sensible in most cases to collect errors downstream.
Several concrete parsers are available, which are descendants of this interface. The pure Eiffel parser is XM_EIFFEL_PARSER. The parser making use of the Expat C library is XM_EXPAT_PARSER. These classes can be created directly.
Because Expat introduces external dependencies in the library, a factory class is available: XM_EXPAT_PARSER_FACTORY. The value of is_expat_available depends on whether Expat has been compiled in or not, and code may portably act accordingly, for instance falling back to the Eiffel parser.
On top of the event interface, the XML library provides a set of filters and a framework for using filters. The filters are arranged in a stream, in a manner similar to the Unix command shell.
Each component of a filter pipe is a descendant of a filter base class, XM_CALLBACKS_FILTER for content events, which has a next attribute. The default implementation of each event is simply to forward the event to the next filter. A filter that uses only a few events can redefine only the required routines. Redefinition of routines are expected to do their processing and then forward the event to the next filter, for instance using Precursor. The class provides two routines that can be used as creation procedures: make_null sets next to a filter that does nothing on each event. This null filter, XM_CALLBACKS_NULL for content events, allows each component of a pipe to be used at any position in the pipe, including at the end, and the next filter to be set when convenient, while maintaining an invariant that next is not Void. The feature set_next can also be used as a creation procedure.
From an Eiffel typing viewpoint, the whole stream has the same type: each filter can be at any position in the pipe. It maybe that some filters have extra dependencies (one must be before the other) that are not captured by the static type system. This seems acceptable given the flexibility of the system, and that many practical filters can indeed be placed anywhere on a pipe. A good point for encapsulation is that each filter is a small component with a clear interface, providing much better encapsulation than some other event filter patterns (like each stage inheriting from the previous one, with high coupling between each component).
The content events are the core of the XML parser interface. They cover elements and attributes, in addition to less fundamental feature like comments and processing instructions. There are also events called on startup and at the end of parsing.
All events of XM_CALLBACKS that take names of tags or attributes, follow the same convention. The signature includes the namespace (a string representing the namespace URI), name prefix and local part. The parser is not expected to provide to resolve namespaces, with a filter introduced below resolving the namespaces and replacing the non-resolved namespaces (Void) downstream of the filter pipe. Whether a namespace is set can be checked with has_namespace.
To make the interface consistently simple, it has only atomic events whose parameters are only strings and not data structures. Data structures are build downstream, or as intermediary internal structures of a specific filter. In particular, this means there is one event per attribute.
A set of standard content event filters is available in the library. There is a factory class XM_CALLBACKS_FILTER_FACTORY with creation routines and convenience routines to build pipes and bind the filters to each other. The filters can be created directly, the factory is only there for convenience.
XM_PRETTY_PRINT_FILTER is a filter that prints out the event stream as an XML document, to the standard output, or a string. It can be placed anywhere in the stream, which may be convenient for debugging.
Validation and namespace resolving filters will be in most standard pipes: XM_END_TAG_CHECKER generates an error event if an end tag does not match the start tag, and XM_NAMESPACE_RESOLVER reads XML namespace declaration attributes (these events are not forwarded downstream) and adds a resolved namespace URI to all outgoing names.
Without XM_STOP_ON_ERROR_FILTER the event flow may continue after an error. This filter stops all event forwarding from the first error, which it remembers for later use (has_error and last_error). It is useful for most standard pipes, indeed an error condition is better collected here, including errors within the preceding filters, than in the parser itself.
To produce the output in a tree structure (descendants from XM_NODE), the filter XM_CALLBACKS_TO_TREE_FILTER is used. It expects resolved namespaces.
XM_SHARED_STRINGS_FILTER saves memory and possibly comparison time by making all equal strings point to a single instance. The downstream events must then consider strings immutable. This sharing is across event categories (if a content happens to be the same as an element name, it will be the same string for instance).
To finish this section, here is an example of a filter pipe, using the factory class convenience routines callbacks_pipe that simply binds the next pipe of each filter in an array and returns the first element:
... inherit XM_CALLBACKS_FILTER_FACTORY ... a_parser: XM_PARSER ... a_parser.set_callbacks (callabacks_pipe ( << new_end_tag_checker, new_namespace_resolver, new_stop_on_error, new_tree_builder >>) ...
In a real program, references may be kept to individual filters, to recover the result or check their state after processing. XM_TREE_CALLBACKS_PIPE provides a standard pipe with attributes for the interesting component filters.
By default, the parsers do not resolve external entities and produce an error if an external entity or a DTD is used. To use entities, an external resolver must be set, using the parser's set_resolver routine. This one sets a single resolver for use both for external DTDs and entities, there are routines to set each of these separately.
A resolver is a class that opens a KI_CHARACTER_INPUT_STREAM given a system identifier (a string). An error is produced if no corresponding stream can be found. It is the responsability of the client to close the stream.
Default concrete resolvers are provided: XM_FILE_EXTERNAL_RESOLVER for using system identifiers as local file names, and XM_STRING_EXTERNAL_RESOLVER for resolving in memory using a hash table of strings. Therse is also a null resolver, which the default resolver of a parser.
http://www.gobosoft.com | ![]() ![]() ![]() ![]() |