A Parser for Expressions


This section presents a parser for expressions such as 10+20*30.

'root' expression

'nonterm' expression

   'rule' expression: expr2
   'rule' expression: expression "+" expr2
   'rule' expression: expression "-" expr2

'nonterm' expr2

   'rule' expr2: expr3
   'rule' expr2: expr2 "*" expr3
   'rule' expr2: expr2 "/" expr3

'nonterm' expr3

   'rule' expr3: Number(-> N)
   'rule' expr3: "-" expr3
   'rule' expr3: "+" expr3
   'rule' expr3: "(" expression ")"

'token' Number(-> INT)

In Gentle a parser is written as a grammar for the language to be processed. Such a grammar consists of a set of nonterminal symbols that represent phrases, and terminal symbols (tokens) that stand for themselves.

The nonterminal symbols that represent phrases are defined by rules listing the constituents of the phrase.

   'nonterm' expression
introduces the nonterminal expression. The rule

   'rule' expression: expression "+" expr2
specifies that a phrase of the class expression may be given by a (simpler) phrase of the class expression, followed by a plus symbol, followed by a phrase of the class expr2 (expressions involving operators with a higher priority than plus and minus). For example, if 10 is an expression and if 20*30 is an expr2, then 10+20*30 is an expression.

Terminal symbols appear in quotes (e.g "+") or are introduced by token declarations.

   'token' Number(-> INT)
introduces a token Number, the representation of which is not given in this specification. It is defined elsewhere as a nonempty sequence of decimal digits. The actual value of the symbol is of type INT (we ignore parameters in this section).

The line

   'root' expression
states that this specification is elaborated by invoking the expression parser.

As it stands, this program is not of much use: it reads and analyzes its input, and if the input is a valid expression, it does nothing. If the input is not an expression according to the given grammar, the program will flag the first token that does not allow us to complete the text read so far as a valid expression.

Note that the phrase structure imposed on the input reflects the binding strengths of the operators. E.g. 10+20*30 is parsed as the sum of 10 and the product of 20 and 30. This is not necessary if the task is just to reject invalid expressions, but it is helpful in the next example.