<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>Geyacc: Context Dependency</title>
</head>

<body bgcolor="#FFFFFF">

<table border="0" width="100%">
    <tr>
        <td><font size="6"><strong>Context Dependency</strong></font></td>
        <td align="right"><a href="precedence.html"><img
        src="image/previous.gif" alt="Previous" border="0"
        width="40" height="40"></a><a href="error.html"><img
        src="image/next.gif" alt="Next" border="0" width="40"
        height="40"></a></td>
    </tr>
</table>

<hr size="1">

<p>The <em>geyacc</em> paradigm is to parse tokens first, then
group them into larger syntactic units. In many languages, the
meaning of a token is affected by its context. Although this
violates the <em>geyacc</em> paradigm, certain techniques (known
as &quot;kludges&quot;) may enable you to write <em>geyacc</em>
parsers for such languages. (Actually, &quot;kludge&quot; means
any technique that gets its job done but is neither clean nor
robust.)</p>

<h2>Semantic Info in Token Types</h2>

<p>The C language has a context dependency: the way an identifier
is used depends on what its current meaning is. For example,
consider this:</p>

<blockquote>
    <pre><font color="#808000">foo (x);</font></pre>
</blockquote>

<p>This looks like a function call statement, but if <font
color="#808000"><tt>foo</tt></font> is a typedef name, then this
is actually a declaration of <font color="#808000"><tt>x</tt></font>.
How can a <em>geyacc</em> parser for C decide how to parse this
input?</p>

<p>The method could be to have two different token types, <font
color="#FF0000"><tt>IDENTIFIER</tt></font> and <font
color="#FF0000"><tt>TYPENAME</tt></font>. When <font
color="#008080"><em><tt>read_token</tt></em></font> finds an
identifier, it looks up the current declaration of the identifier
in order to decide which token type to return: <font
color="#FF0000"><tt>TYPENAME</tt></font> if the identifier is
declared as a typedef, <font color="#FF0000"><tt>IDENTIFIER</tt></font>
otherwise.</p>

<p>The grammar rules can then express the context dependency by
the choice of token type to recognize. <font color="#FF0000"><tt>IDENTIFIER</tt></font>
is accepted as an expression, but <font color="#FF0000"><tt>TYPENAME</tt></font>
is not. <font color="#FF0000"><tt>TYPENAME</tt></font> can start
a declaration, but <font color="#FF0000"><tt>IDENTIFIER</tt></font>
cannot. In contexts where the meaning of the identifier is not
significant, such as in declarations that can shadow a typedef
name, either <font color="#FF0000"><tt>TYPENAME</tt></font> or <font
color="#FF0000"><tt>IDENTIFIER</tt></font> is accepted <font
face="Symbol">-</font> there is one rule for each of the two
token types.</p>

<p>This technique is simple to use if the decision of which kinds
of identifiers to allow is made at a place close to where the
identifier is parsed. But in C this is not always so: C allows a
declaration to redeclare a typedef name provided an explicit type
has been specified earlier:</p>

<blockquote>
    <pre><font color="#808000">typedef int foo, bar, lose;
static foo (bar); /* redeclare `bar' as static variable */
static int foo (lose); /* redeclare `foo' as function */</font></pre>
</blockquote>

<p>Unfortunately, the name being declared is separated from the
declaration construct itself by a complicated syntactic structure
<font face="Courier New">-</font> the <em>declarator</em>.</p>

<p>As a result, the part of <em>geyacc</em> parser for C needs to
be duplicated, with all the nonterminal names changed: once for
parsing a declaration in which a typedef name can be redefined,
and once for parsing a declaration in which that can't be done.
Here is a part of the duplication, with actions omitted for
brevity:</p>

<blockquote>
    <pre><font color="#800080">initdcl</font><font
color="#0000FF">: </font><font color="#800080">declarator maybeasm </font><font
color="#FF0000">'='</font><font color="#800080"> init</font><font
color="#0000FF">
    | </font><font color="#800080">declarator maybeasm</font><font
color="#0000FF">
    ;

</font><font color="#800080">notype_initdcl</font><font
color="#0000FF">: </font><font color="#800080">notype_declarator maybeasm </font><font
color="#FF0000">'='</font><font color="#800080"> init</font><font
color="#0000FF">
    | </font><font color="#800080">notype_declarator maybeasm</font><font
color="#0000FF">
    ;</font></pre>
</blockquote>

<p>Here <font color="#800080"><tt>initdcl</tt></font> can
redeclare a typedef name, but <font color="#800080"><tt>notype_initdcl</tt></font>
cannot. The distinction between <font color="#800080"><tt>declarator</tt></font>
and <font color="#800080"><tt>notype_declarator</tt></font> is
the same sort of thing.</p>

<p>There is some similarity between this technique and a lexical
tie-in (described next), in that information which alters the
lexical analysis is changed during parsing by other parts of the
program. The difference is here the information is global, and is
used for other purposes in the program. A true lexical tie-in has
a special-purpose flag controlled by the syntactic context.</p>

<h2><a name="tie_in">Lexical Tie-ins</a></h2>

<p>One way to handle context-dependency is the <em><strong>lexical
tie-in</strong></em>: a flag which is set by <em>geyacc</em>
actions, whose purpose is to alter the way tokens are parsed. For
example, suppose we have a language vaguely like C, but with a
special construct <font color="#808000"><tt>hex (HEX-EXPR)</tt></font>.
After the keyword <font color="#808000"><tt>hex</tt></font> comes
an expression in parentheses in which all integers are
hexadecimal. In particular, the token <font color="#808000"><tt>a1b</tt></font>
must be treated as an integer rather than as an identifier if it
appears in that context. Here is how you can do it:</p>

<blockquote>
    <pre><font color="#800080">expr</font><font color="#0000FF">: </font><font
color="#FF0000">IDENTIFIER</font><font color="#0000FF">
        { </font><font color="#008080">...</font><font
color="#0000FF"> }
    | </font><font color="#800080">constant</font><font
color="#0000FF">
        { </font><font color="#008080">...</font><font
color="#0000FF"> }
    | </font><font color="#FF0000">HEX '('</font><font
color="#0000FF">
        { </font><font color="#008080"><em>hexflag </em>:=<em> True</em></font><font
color="#0000FF"> }
      </font><font color="#800080">expr</font><font
color="#0000FF"> </font><font color="#FF0000">')'</font><font
color="#0000FF">
        { </font><font color="#008080"><em>hexflag </em>:=<em> False</em>;</font><font
color="#0000FF"> $$ </font><font color="#008080">:=</font><font
color="#0000FF"> $4 }
    | </font><font color="#800080">expr </font><font
color="#FF0000">'+' </font><font color="#800080">expr</font><font
color="#0000FF">
        { $$ </font><font color="#008080">:= <em>sum</em> (</font><font
color="#0000FF">$1</font><font color="#008080">,</font><font
color="#0000FF"> $3</font><font color="#008080">)</font><font
color="#0000FF"> }
    ...
    ;

</font><font color="#800080">constant</font><font color="#0000FF">: </font><font
color="#FF0000">INTEGER</font><font color="#0000FF">
        { </font><font color="#008080">...</font><font
color="#0000FF"> }
    | </font><font color="#FF0000">STRING</font><font
color="#0000FF">
        { </font><font color="#008080">...</font><font
color="#0000FF"> }
    ;
</font><em>...</em><font color="#0000FF">
%%
    </font><font color="#008080"><em>hexflag</em>:<em> BOOLEAN
</em></font><em>...</em></pre>
</blockquote>

<p>Here we assume that <font color="#008080"><em><tt>read_token</tt></em></font>
looks at the value of <font color="#008080"><em><tt>hexflag</tt></em></font>;
when it is true, all integers are parsed in hexadecimal, and
tokens starting with letters are parsed as integers if possible.</p>

<h2>Lexical Tie-ins and Error Recovery</h2>

<p>Lexical tie-ins make strict demands on any <a
href="error.html">error recovery rules</a> you have. The reason
for this is that the purpose of an error recovery rule is to
abort the parsing of one construct and resume in some larger
construct. For example, in C-like languages, a typical error
recovery rule is to skip tokens until the next semicolon, and
then start a new statement, like this:</p>

<blockquote>
    <pre><font color="#800080">stmt</font><font color="#0000FF">: </font><font
color="#800080">expr</font><font color="#0000FF"> </font><font
color="#FF0000">';'</font><font color="#0000FF">
        { </font><font color="#008080">...</font><font
color="#0000FF"> }
    | </font><font color="#FF0000">IF '(' </font><font
color="#800080">expr</font><font color="#FF0000"> ')'</font><font
color="#0000FF"> </font><font color="#800080">stmt</font><font
color="#0000FF">
        { </font><font color="#008080">...</font><font
color="#0000FF"> }
    </font>...<font color="#0000FF">
    | error</font><font color="#FF0000"> ';'</font><font
color="#0000FF">
        { </font><font color="#008080"><em>hexflag </em>:= <em>False</em></font><font
color="#0000FF"> }
    ;</font></pre>
</blockquote>

<p>If there is a syntax error in the middle of a <font
color="#808000"><tt>hex (EXPR)</tt></font> construct, this error
rule will apply, and then the action for the completed <font
color="#808000"><tt>hex (EXPR)</tt></font> will never run. So <font
color="#008080"><em><tt>hexflag </tt></em></font>would remain set
for the entire rest of the input, or until the next <font
color="#808000"><tt>hex</tt></font> keyword, causing identifiers
to be misinterpreted as integers. To avoid this problem the error
recovery rule itself clears <font color="#008080"><em><tt>hexflag</tt></em></font>.</p>

<p>There may also be an error recovery rule that works within
expressions. For example, there could be a rule which applies
within parentheses and skips to the close-parenthesis:</p>

<blockquote>
    <pre><font color="#800080">expr</font><font color="#0000FF">:</font> ...<font
color="#0000FF">
    | </font><font color="#FF0000">'(' </font><font
color="#800080">expr</font><font color="#FF0000"> ')'</font><font
color="#0000FF">
        { $$ </font><font color="#008080">:=</font><font
color="#0000FF"> $2 }
    | </font><font color="#FF0000">'('</font><font
color="#0000FF"> error </font><font color="#FF0000">')'</font><font
color="#0000FF">
    </font>...</pre>
</blockquote>

<p>If this rule acts within the <font color="#808000"><tt>hex</tt></font>
construct, it is not going to abort that construct (since it
applies to an inner level of parentheses within the construct).
Therefore, it should not clear the flag: the rest of the <font
color="#808000"><tt>hex</tt></font> construct should be parsed
with the flag still in effect.</p>

<p>What if there is an error recovery rule which might abort out
of the <font color="#808000"><tt>hex</tt></font> construct or
might not, depending on circumstances? There is no way you can
write the action to determine whether a <font color="#808000"><tt>hex</tt></font>
construct is being aborted or not. So if you are using a lexical
tie-in, you had better make sure your error recovery rules are
not of this kind. Each rule must be such that you can be sure
that it always will, or always won't, have to clear the flag.</p>

<hr size="1">

<table border="0" width="100%">
    <tr>
        <td><address>
            <font size="2"><b>Copyright © 1997</b></font><font
            size="1"><b>, </b></font><font size="2"><strong>Eric
            Bezault</strong></font><strong> </strong><font
            size="2"><br>
            <strong>mailto:</strong></font><a
            href="mailto:ericb@gobosoft.com"><font size="2">ericb@gobosoft.com</font></a><font
            size="2"><br>
            <strong>http:</strong></font><a
            href="http://www.gobosoft.com"><font size="2">//www.gobosoft.com</font></a><font
            size="2"><br>
            <strong>Last Updated:</strong> 7 September 1997</font><br>
            <!--webbot bot="PurpleText"
            preview="
$Date$
$Revision$"
            -->
        </address>
        </td>
        <td align="right" valign="top"><a
        href="http://www.gobosoft.com"><img
        src="image/home.gif" alt="Home" border="0" width="40"
        height="40"></a><a href="index.html"><img
        src="image/toc.gif" alt="Toc" border="0" width="40"
        height="40"></a><a href="precedence.html"><img
        src="image/previous.gif" alt="Previous" border="0"
        width="40" height="40"></a><a href="error.html"><img
        src="image/next.gif" alt="Next" border="0" width="40"
        height="40"></a></td>
    </tr>
</table>
</body>
</html>
