[NEXT] [PREV]    HANDBOOK / GENTLE PRIMER / Describing Syntax /

Token Predicates


Besides recognizing tokens that appear literally in the grammar, lexical analysis often also deals with items that appear as atoms for syntactic analysis but are structured at a lower level. An example of this are numbers that are treated as tokens, but specified as a sequence of digits at the level of lexical analysis.

Things like blanks, comments, and other separators that do not contribute to the syntactic structure are usually filtered out by lexical analysis as white space. Thus one criterion for deciding whether an item should be handled as a token is whether its components can be separated by white space: in a Statement a newline can (or must) appear between IF and the following Expression. In a Number this is not allowed as it would split the number into two tokens.

A complex token is introduced as a predicate of the category token. A predicate of this category is not defined by rules.

For example, a token Variable can be declared as

   'token' Variable
and can then be used as member of a rule body:

   'rule' Statement: Variable ":=" Expression
The declaration of a token does not provide rules. The actual description is given outside the Gentle specification. In many cases existing descriptions can be reused.

(See the manual on the Reflex tool for details of how to describe tokens and white space conventions.)