/[eiffelstudio]/vendor/gobosoft.com/gobo/current/doc/gelex/patterns.html
ViewVC logotype

Contents of /vendor/gobosoft.com/gobo/current/doc/gelex/patterns.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 90767 - (show annotations)
Tue Jan 22 00:56:30 2013 UTC (6 years, 8 months ago) by manus
File MIME type: text/xml
File size: 14273 byte(s)
Updated svn:eol-style to be native and svn:mime-style to be text/xml

1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
2 <html>
3
4 <head>
5 <meta http-equiv="Content-Type"
6 content="text/html; charset=iso-8859-1">
7 <meta name="GENERATOR" content="Microsoft FrontPage 2.0">
8 <title>Gelex: Patterns</title>
9 </head>
10
11 <body bgcolor="#FFFFFF">
12
13 <table border="0" width="100%">
14 <tr>
15 <td><font size="6"><strong>Patterns</strong></font></td>
16 <td align="right"><a href="options.html"><img
17 src="../image/previous.gif" alt="Previous" border="0"
18 width="40" height="40"></a><a href="matching_rules.html"><img
19 src="../image/next.gif" alt="Next" border="0" width="40"
20 height="40"></a></td>
21 </tr>
22 </table>
23
24 <hr size="1">
25
26 <p>The patterns in the input are written using an extended set of
27 regular expressions. These are:</p>
28
29 <dl>
30 <dt><font color="#FF0000"><tt>x</tt></font></dt>
31 <dd>match the character <font color="#808000"><tt>x</tt></font>.</dd>
32 <dt><font color="#FF0000"><tt>.</tt></font></dt>
33 <dd>any character except newline.</dd>
34 <dt><font color="#FF0000"><tt>[xyz]</tt></font></dt>
35 <dd>a <em><strong>character class</strong></em>; in this
36 case, the pattern matches either an <font color="#808000"><tt>x</tt></font>,
37 a <font color="#808000"><tt>y</tt></font> or a <font
38 color="#808000"><tt>z</tt></font>.</dd>
39 <dt><font color="#FF0000"><tt>[abj-oZ]</tt></font></dt>
40 <dd>a <em>character class</em> with a range in it; matches an
41 <font color="#808000"><tt>a</tt></font>, a <font
42 color="#808000"><tt>b</tt></font>, any letter from <font
43 color="#808000"><tt>j</tt></font> through <font
44 color="#808000"><tt>o</tt></font>, or a <font
45 color="#808000"><tt>Z</tt></font>.</dd>
46 <dt><font color="#FF0000"><tt>[^A-Z]</tt></font></dt>
47 <dd>a <em><strong>negated character class</strong></em>,
48 i.e., any character but those in the class. In this case,
49 any character except an uppercase letter.</dd>
50 <dt><font color="#FF0000"><tt>[^A-Z\n]</tt></font></dt>
51 <dd>any character except an uppercase letter or a newline.</dd>
52 <dt><font color="#FF0000"><tt>r*</tt></font></dt>
53 <dd>zero or more <font color="#FF0000"><tt>r</tt></font>'s,
54 where <font color="#FF0000"><tt>r</tt></font> is any
55 regular expression.</dd>
56 <dt><font color="#FF0000"><tt>r+</tt></font></dt>
57 <dd>one or more <font color="#FF0000"><tt>r</tt></font>'s.</dd>
58 <dt><font color="#FF0000"><tt>r?</tt></font></dt>
59 <dd>zero or one <font color="#FF0000"><tt>r</tt></font>'s
60 (that is, &quot;an optional <font color="#FF0000"><tt>r</tt></font>&quot;).</dd>
61 <dt><font color="#FF0000"><tt>r{2,5}</tt></font></dt>
62 <dd>anywhere from two to five <font color="#FF0000"><tt>r</tt></font>'s.</dd>
63 <dt><font color="#FF0000"><tt>r{2,}</tt></font></dt>
64 <dd>two or more <font color="#FF0000"><tt>r</tt></font>'s.</dd>
65 <dt><font color="#FF0000"><tt>r{4}</tt></font></dt>
66 <dd>exactly four <font color="#FF0000"><tt>r</tt></font>'s.</dd>
67 <dt><font color="#FF0000"><tt>{name}</tt></font></dt>
68 <dd>the expansion of the &quot;<font color="#800080"><tt>name</tt></font>&quot;
69 <a href="description.html#definitions">definition</a>.</dd>
70 <dt><font color="#FF0000"><tt>&quot;[xyz]\&quot;foo&quot;</tt></font></dt>
71 <dd>the literal string: <font color="#808000"><tt>[xyz]&quot;foo</tt></font>.</dd>
72 <dt><font color="#FF0000"><tt>\X</tt></font></dt>
73 <dd>if <font color="#FF0000"><tt>X</tt></font> is an <font
74 color="#808000"><tt>a</tt></font>, <font color="#808000"><tt>b</tt></font>,
75 <font color="#808000"><tt>f</tt></font>, <font
76 color="#808000"><tt>n</tt></font>, <font color="#808000"><tt>r</tt></font>,
77 <font color="#808000"><tt>t</tt></font>, or <font
78 color="#808000"><tt>v</tt></font>, then the <font
79 size="2">ANSI-C</font> interpretation of <font
80 color="#808000"><tt>\X</tt></font>. Otherwise, a literal <font
81 color="#808000"><tt>X</tt></font> (used to escape
82 operators such as <font color="#FF0000"><tt>*</tt></font>).</dd>
83 <dt><font color="#FF0000"><tt>\0</tt></font></dt>
84 <dd>a null character (<font size="2">ASCII</font> code <font
85 color="#808000"><tt>0</tt></font>).</dd>
86 <dt><font color="#FF0000"><tt>\123</tt></font></dt>
87 <dd>the character with octal value <font color="#808000"><tt>123</tt></font>.</dd>
88 <dt><font color="#FF0000"><tt>\x2a</tt></font></dt>
89 <dd>the character with hexadecimal value <font
90 color="#808000"><tt>2a</tt></font>.</dd>
91 <dt><font color="#FF0000"><tt>(r)</tt></font></dt>
92 <dd>match an <font color="#FF0000"><tt>r</tt></font>;
93 parentheses are used to override <a href="#precedence">precedence</a>.</dd>
94 <dt><font color="#FF0000"><tt>rs</tt></font></dt>
95 <dd>the regular expression <font color="#FF0000"><tt>r</tt></font>
96 followed by the regular expression <font color="#FF0000"><tt>s</tt></font>;
97 called <em><strong>concatenation</strong></em>.</dd>
98 </dl>
99
100 <hr size="1" width="75%">
101
102 <dl>
103 <dt><font color="#FF0000"><tt>r|s</tt></font></dt>
104 <dd>either an <font color="#FF0000"><tt>r</tt></font> or an <font
105 color="#FF0000"><tt>s</tt></font>.</dd>
106 </dl>
107
108 <hr size="1" width="75%">
109
110 <dl>
111 <dt><font color="#FF0000"><tt>r/s</tt></font></dt>
112 <dd>an <font color="#FF0000"><tt>r</tt></font> but only if it
113 is followed by an <font color="#FF0000"><tt>s</tt></font>.
114 The text matched by <font color="#FF0000"><tt>s</tt></font>
115 is included when determining whether this rule is the
116 &quot;longest match&quot;, but is then returned to the
117 input before the action is executed. So the action only
118 sees the text matched by <font color="#FF0000"><tt>r</tt></font>.
119 This type of pattern is called <em><strong>trailing
120 context</strong></em>. (There are some combinations of <font
121 color="#FF0000"><tt>r/s</tt></font> that <em>gelex</em>
122 cannot match correctly, such as in <font color="#FF0000"><tt>zx*/xy</tt></font>.
123 See <em>gelex</em>'s <a href="limitations.html">limitations</a>
124 for details.).</dd>
125 <dt><font color="#FF0000"><tt>^r</tt></font></dt>
126 <dd>an <font color="#FF0000"><tt>r</tt></font>, but only at
127 the beginning of a line (i.e., when just starting to
128 scan, or right after a newline has been scanned).</dd>
129 <dt><font color="#FF0000"><tt>r$</tt></font></dt>
130 <dd>an <font color="#FF0000"><tt>r</tt></font>, but only at
131 the end of a line (i.e., just before a newline).
132 Equivalent to <font color="#FF0000"><tt>r/\n</tt></font>.</dd>
133 <dd>Note that <em>gelex</em>'s notion of &quot;newline&quot;
134 is exactly what is interpreted as <font color="#808000"><tt>%N</tt></font>
135 by the Eiffel compiler that was used to compiler <em>gelex</em>;
136 in particular, on some <font size="2">DOS</font> systems
137 you must either filter out <font color="#FF0000"><tt>\r</tt></font>'s
138 in the input yourself, or explicitly use <font
139 color="#FF0000"><tt>r/\r\n</tt></font> for <font
140 color="#FF0000"><tt>r$</tt></font>.</dd>
141 </dl>
142
143 <hr size="1" width="75%">
144
145 <dl>
146 <dt><font color="#800000"><tt>&lt;s&gt;</tt></font><font
147 color="#FF0000"><tt>r</tt></font></dt>
148 <dd>an <font color="#FF0000"><tt>r</tt></font>, but only in
149 start condition <font color="#800000"><tt>s</tt></font>
150 (see discussion about <a href="start_conditions.html">start
151 conditions</a> for details).</dd>
152 <dt><font color="#800000"><tt>&lt;s1,s2,s3&gt;</tt></font><font
153 color="#FF0000"><tt>r</tt></font></dt>
154 <dd>same, but in any of start conditions <font
155 color="#800000"><tt>s1</tt></font>, <font color="#800000"><tt>s2</tt></font>,
156 or <font color="#800000"><tt>s3</tt></font>.</dd>
157 <dt><font color="#800000"><tt>&lt;*&gt;</tt></font><font
158 color="#FF0000"><tt>r</tt></font></dt>
159 <dd>an <font color="#FF0000"><tt>r</tt></font> in any start
160 condition, even an exclusive one.</dd>
161 </dl>
162
163 <hr size="1" width="75%">
164
165 <dl>
166 <dt><font color="#FF0000"><tt>&lt;&lt;EOF&gt;&gt;</tt></font></dt>
167 <dd>an end-of-file.</dd>
168 <dt><font color="#800000"><tt>&lt;s1,s2&gt;</tt></font><font
169 color="#FF0000"><tt>&lt;&lt;EOF&gt;&gt;</tt></font></dt>
170 <dd>an end-of-file when in start condition <font
171 color="#800000"><tt>s1</tt></font> or <font
172 color="#800000"><tt>s2</tt></font>.</dd>
173 </dl>
174
175 <h2>Some notes on patterns</h2>
176
177 <p>Note that inside of a character class, all regular expression
178 operators lose their special meaning except escape (<font
179 color="#FF0000"><tt>\</tt></font>) and the character class
180 operators, <font color="#FF0000"><tt>-</tt></font>, <font
181 color="#FF0000"><tt>]</tt></font>, and, at the beginning of the
182 class, <font color="#FF0000"><tt>^</tt></font>.</p>
183
184 <p>The regular expressions listed above are grouped according to <a
185 name="precedence"><em><strong>precedence</strong></em></a>, from
186 highest precedence at the top to lowest at the bottom. Those
187 grouped together have equal precedence. For example,</p>
188
189 <blockquote>
190 <pre><font color="#FF0000">foo|bar*</font></pre>
191 </blockquote>
192
193 <p>is the same as:</p>
194
195 <blockquote>
196 <pre><font color="#FF0000">(foo)|(ba(r*))</font></pre>
197 </blockquote>
198
199 <p>since the <font color="#FF0000"><tt>*</tt></font> operator has
200 higher precedence than concatenation, and concatenation higher
201 than alternation (<font color="#FF0000"><tt>|</tt></font>). This
202 pattern therefore matches either the string <font color="#808000"><tt>foo</tt></font>
203 or the string <font color="#808000"><tt>ba</tt></font> followed
204 by zero-or-more <font color="#808000"><tt>r</tt></font>'s. To
205 match <font color="#808000"><tt>foo</tt></font> or zero-or-more <font
206 color="#808000"><tt>bar</tt></font>'s, use:</p>
207
208 <blockquote>
209 <pre><font color="#FF0000">foo|(bar)*</font></pre>
210 </blockquote>
211
212 <p>and to match zero-or-more <font color="#808000"><tt>foo</tt></font>'s-or-<font
213 color="#808000"><tt>bar</tt></font>'s:</p>
214
215 <blockquote>
216 <pre><font color="#FF0000">(foo|bar)*</font></pre>
217 </blockquote>
218
219 <p>A negated character class such as the example <font
220 color="#FF0000"><tt>[^A-Z]</tt></font> above will match a newline
221 unless <font color="#FF0000"><tt>\n</tt></font> (or an equivalent
222 escape sequence) is one of the characters explicitly present in
223 the negated character class (e.g., <font color="#FF0000"><tt>[^A-Z\n]</tt></font>).
224 This is unlike how many other regular expression tools treat
225 negated character classes, but unfortunately the inconsistency is
226 historically entrenched. Matching newlines means that a pattern
227 like <font color="#FF0000"><tt>[^&quot;]*</tt></font> can match
228 the entire input unless there's another quote in the input.</p>
229
230 <p>A rule can have at most one instance of trailing context (the <font
231 color="#FF0000"><tt>/</tt></font> operator or the <font
232 color="#FF0000"><tt>$</tt></font> operator). The start
233 conditions, <font color="#FF0000"><tt>^</tt></font>, and <font
234 color="#FF0000"><tt>&lt;&lt;EOF&gt;&gt;</tt></font> patterns can
235 only occur at the beginning of a pattern, and, as well as with <font
236 color="#FF0000"><tt>/</tt></font> and <font color="#FF0000"><tt>$</tt></font>,
237 cannot be grouped inside parentheses. A <font color="#FF0000"><tt>^</tt></font>
238 which does not occur at the beginning of a rule or a <font
239 color="#FF0000"><tt>$</tt></font> which does not occur at the end
240 of a rule loses its special properties and is treated as a normal
241 character.</p>
242
243 <p>The following are illegal:</p>
244
245 <blockquote>
246 <pre><font color="#FF0000">foo/bar$</font>
247 <font color="#800000">&lt;sc1&gt;</font><font color="#FF0000">foo</font><font
248 color="#800000">&lt;sc2&gt;</font><font color="#FF0000">bar</font></pre>
249 </blockquote>
250
251 <p>Note that the first of these, can be written <font
252 color="#FF0000"><tt>foo/bar\n</tt></font>. The following will
253 result in <font color="#808000"><tt>$</tt></font> or <font
254 color="#808000"><tt>^</tt></font> being treated as a normal
255 character:</p>
256
257 <blockquote>
258 <pre><font color="#FF0000">foo|(bar$)
259 foo|^bar</font></pre>
260 </blockquote>
261
262 <p>If what's wanted is a <font color="#808000"><tt>foo</tt></font>
263 or a <font color="#808000"><tt>bar</tt></font>-followed-by-a-newline,
264 the following could be used (the special <font color="#0000FF"><tt>|</tt></font>
265 action is explained in the <a href="actions.html">Actions</a>
266 section):</p>
267
268 <blockquote>
269 <pre><font color="#FF0000">foo</font> <font color="#0000FF">|</font>
270 <font color="#FF0000">bar$</font> <font color="#008080">-- action goes here</font></pre>
271 </blockquote>
272
273 <p>A similar trick will work for matching a <font color="#808000"><tt>foo</tt></font>
274 or a <font color="#808000"><tt>bar</tt></font>-at-the-beginning-of-a-line.</p>
275
276 <hr size="1">
277
278 <table border="0" width="100%">
279 <tr>
280 <td><address>
281 <font size="2"><b>Copyright 1998-2005</b></font><font
282 size="1"><b>, </b></font><font size="2"><strong>Eric
283 Bezault</strong></font><strong> </strong><font
284 size="2"><br>
285 <strong>mailto:</strong></font><a
286 href="mailto:ericb@gobosoft.com"><font size="2">ericb@gobosoft.com</font></a><font
287 size="2"><br>
288 <strong>http:</strong></font><a
289 href="http://www.gobosoft.com"><font size="2">//www.gobosoft.com</font></a><font
290 size="2"><br>
291 <strong>Last Updated:</strong> 21 February 2005</font><br>
292 <!--webbot bot="PurpleText"
293 preview="
294 $Date$
295 $Revision$"
296 -->
297 </address>
298 </td>
299 <td align="right" valign="top"><a
300 href="http://www.gobosoft.com"><img
301 src="../image/home.gif" alt="Home" border="0" width="40"
302 height="40"></a><a href="index.html"><img
303 src="../image/toc.gif" alt="Toc" border="0" width="40"
304 height="40"></a><a href="options.html"><img
305 src="../image/previous.gif" alt="Previous" border="0"
306 width="40" height="40"></a><a href="matching_rules.html"><img
307 src="../image/next.gif" alt="Next" border="0" width="40"
308 height="40"></a></td>
309 </tr>
310 </table>
311 </body>
312 </html>

Properties

Name Value
svn:mime-type text/xml

  ViewVC Help
Powered by ViewVC 1.1.23