[NEXT] [PREV]    HANDBOOK / CASE STUDY / The Compiler /

Overall Structure


The task of a compiler is to translate a program written in a source language into a semantically equivalent program in a target language.

This task can be decomposed into subtasks

  • Discover the structure of the source program.
  • Process this structure to generate the target program.
In Gentle this can be written as

   Program(-> P) Translate(P)
Here, Program reads the source program and returns a structured representation (the ``abstract syntax'') in its output parameter P. This representation is then processed by Translate, which is invoked with P as an input parameter.

Consider the example MiniLAX program in Fig. 6.1.

   PROGRAM test;
     i: INTEGER
     i := 0
Fig. 6.1 Example Program in MiniLAX

For this program the predicate Program will deliver a value of P as shown in Fig. 6.2.

            dcl( , variable( integer),  ),
            id( ,  ),
            int( 0 ),
Fig. 6.2 Example Program in Abstract Syntax

This will be translated by Translate into the ICode program shown in Fig. 6.3.

    0:   ENT    1
    1:   LDA    0    3
    2:   LDC    1    0
    3:   STI
    4:   RET
Fig. 6.3 Example Program in ICode

The abstract syntax representation of a prgram is a structured value (a ``term'') that expresses the hierarchical composition of the program. Abstract syntax ignores syntactic sugar and concrete tokens, that only serve to make the input unambiguous. It merely specifies which alternative was used for a construct and what its constituents were.

A term type is introduced by a type declaration. For example, the following declaration introduces the abstract syntax of statements:

   'type' STMT
      assign(DESIG, EXPR, POS)
      read(DESIG, POS)
      write(EXPR, POS)
      call(IDENT, EXPRLIST, POS)
      if(EXPR, STMT, STMT, POS)
      while(EXPR, STMT, POS)
      seq(STMT, STMT)
These alternatives serve as templates for values: If E is an EXPR, S an STMT, and Pos the source coordinate of the phrase, then

   while (E, S, Pos)
is a value of type STMT.

Concrete syntax is described by grammar rules, e.g.

 'rule' Stat(-> while(E, S, Pos)) :
    "WHILE" Expr(-> E) @(-> Pos) "DO" StatSeq(-> S) "END"
This rule parses a phrase

    WHILE Expr DO StatSeq END
E is the abstract syntax of Expr, S the abstract syntax of StatSeq, and Pos the source coordinate of Expr. From these values, the abstract syntax of the phrase is constructed:

   while (E, S, Pos)
Translation is described by action rules, e.g.

   'rule' Statement(while(E, S, Pos)) :
       NewLabel(-> L1)
       NewLabel(-> L2)
       Expression(E -> T)
       CheckBool(T, Pos)
This rule unparses the term

   while (E, S, Pos)
thereby emitting the target code

       JMP L2
   L1: Statement
   L2: Expression
       FJP L1
where Statement is the code for S, and Expression is the code for E.

When the source program is translated into an internal representation, rules are selected by the parser according to the given source. Consider the rules

   'rule' Stat(-> read(V, Pos)) :
      "READ" "(" Desig(-> V) @(-> Pos) ")"
   'rule' Stat(-> write(E, Pos)) :
      "WRITE" "(" Expr(-> E) @(-> Pos) ")"
If a statement has the form READ(Desig), the first rule is selected and the internal form is read(V,Pos). If a statement has the form WRITE(Expr), the second rule is selected and the internal form is write(E,Pos).

When the internal representation is processed, rules are selected according to the given term. Consider the rules

   'rule' Statement(read(V, Pos)) :
      Designator(V -> T)
      CheckSimple(T, Pos)
      TypeCode(T -> N)
   'rule' Statement(write(E, Pos)) :
      Expression(E -> T)
      CheckSimple(T, Pos)
      TypeCode(T -> N)
If the predicate Statement is called with the term read(V, Pos), the first rule is selected and corresponding target code for reading an item is emitted. If the predicate is called with write(E, Pos), the the second rule is selected and emits code to write an item.

The overall organization of the compiler is expressed by the root clause, which is elaborated by executing its statements. Our specification starts with the root clause:

001  'root' 
002     Program(-> P) Translate(P) 

Now we have to define the abstract syntax and have to specify how a concrete source program is mapped to the abstract representation ( Program) and how this representation is processed to produce the target program ( Translate).