When the generated scanner is run, it analyzes its input looking for
strings which match any of its patterns. If it finds more than one
match, it takes the one matching the most text (for trailing context
rules, this includes the length of the trailing part, even though it
will then be returned to the input). If it finds two or more matches of
the same length, the rule listed first in the flex
input file is
chosen.
Once the match is determined, the text corresponding to the match
(called the token) is made available in the global character
pointer yytext
, and its length in the global integer
yyleng
. The action corresponding to the matched pattern is
then executed (see Actions), and then the remaining input is scanned
for another match.
If no match is found, then the default rule is executed: the next
character in the input is considered matched and copied to the standard
output. Thus, the simplest valid flex
input is:
%%
which generates a scanner that simply copies its input (one character at a time) to its output.
Note that yytext
can be defined in two different ways: either as
a character pointer or as a character array. You can
control which definition flex
uses by including one of the
special directives %pointer
or %array
in the first
(definitions) section of your flex input. The default is
%pointer
, unless you use the ‘-l’ lex compatibility option,
in which case yytext
will be an array. The advantage of using
%pointer
is substantially faster scanning and no buffer overflow
when matching very large tokens (unless you run out of dynamic memory).
The disadvantage is that you are restricted in how your actions can
modify yytext
(see Actions), and calls to the unput()
function destroys the present contents of yytext
, which can be a
considerable porting headache when moving between different lex
versions.
The advantage of %array
is that you can then modify yytext
to your heart’s content, and calls to unput()
do not destroy
yytext
(see Actions). Furthermore, existing lex
programs sometimes access yytext
externally using declarations of
the form:
extern char yytext[];
This definition is erroneous when used with %pointer
, but correct
for %array
.
The %array
declaration defines yytext
to be an array of
YYLMAX
characters, which defaults to a fairly large value. You
can change the size by simply #define’ing YYLMAX
to a different
value in the first section of your flex
input. As mentioned
above, with %pointer
yytext grows dynamically to accommodate
large tokens. While this means your %pointer
scanner can
accommodate very large tokens (such as matching entire blocks of
comments), bear in mind that each time the scanner must resize
yytext
it also must rescan the entire token from the beginning,
so matching such tokens can prove slow. yytext
presently does
not dynamically grow if a call to unput()
results in too
much text being pushed back; instead, a run-time error results.
Also note that you cannot use %array
with C++ scanner classes
(see Cxx).