Next: Token Values, Up: Lexical [Contents][Index]
yylex
The value that yylex
returns must be the positive numeric code
for the type of token it has just found; a zero or negative value
signifies end-of-input.
When a token is referred to in the grammar rules by a name, that name
in the parser implementation file becomes a C macro whose definition
is the proper numeric code for that token type. So yylex
can
use the name to indicate that type. See Symbols.
When a token is referred to in the grammar rules by a character literal,
the numeric code for that character is also the code for the token type.
So yylex
can simply return that character code, possibly converted
to unsigned char
to avoid sign-extension. The null character
must not be used this way, because its code is zero and that
signifies end-of-input.
Here is an example showing these things:
int yylex (void) { … if (c == EOF) /* Detect end-of-input. */ return 0; … if (c == '+' || c == '-') return c; /* Assume token type for '+' is '+'. */ … return INT; /* Return the type of the token. */ … }
This interface has been designed so that the output from the lex
utility can be used without change as the definition of yylex
.
If the grammar uses literal string tokens, there are two ways that
yylex
can determine the token type codes for them:
yylex
can use these symbolic names like
all others. In this case, the use of the literal string tokens in
the grammar file has no effect on yylex
.
yylex
can find the multicharacter token in the yytname
table. The index of the token in the table is the token type’s code.
The name of a multicharacter token is recorded in yytname
with a
double-quote, the token’s characters, and another double-quote. The
token’s characters are escaped as necessary to be suitable as input
to Bison.
Here’s code for looking up a multicharacter token in yytname
,
assuming that the characters of the token are stored in
token_buffer
, and assuming that the token does not contain any
characters like ‘"’ that require escaping.
for (i = 0; i < YYNTOKENS; i++) { if (yytname[i] != 0 && yytname[i][0] == '"' && ! strncmp (yytname[i] + 1, token_buffer, strlen (token_buffer)) && yytname[i][strlen (token_buffer) + 1] == '"' && yytname[i][strlen (token_buffer) + 2] == 0) break; }
The yytname
table is generated only if you use the
%token-table
declaration. See Decl Summary.
Next: Token Values, Up: Lexical [Contents][Index]