Notes for week 1
Day 1.
- Source Code, Object Code (and language)
- Types of object code
- Compile Sequence
Source Code -> Compiler -> Object Code -> Linker -> Executable Code
- Object Code does not have addresses of grammar objects that are not defined in the Source code.
- Linker finds external references
- Parts of a compiler
- Lexical analyzer
- Syntactical Analyzer (Parser)
- Intermediate Code Generation.
Temp3 = id2
Temp4 = id3
Temp2 = Temp3 * Temp4
Temp1 = Temp2
id1 = Temp1
- Optimization.
(if no registers required)
id1= id2 * id3
or (if registers required)
Temp3 = id2
Temp4 = id3
Temp2 = Temp3 * Temp4
id1 = Temp2
- Object Code generation convert intermediate code to machine code.
- Compiler Construction
- When no compilers are available.
- Create a compiler in assembly language
- That compiler can then compile any program in the source language
- If a compiler is available in C and we want a PL/1 compiler. Write the compiler in C and the effect is the same as A, except easier to create.
- Boot Strap
- Create a minimal compiler in assemby language
- Must be big enough to write the rest of the compiler
- Must be small enough so that you don't end up back in A
- May be done in steps. Each step after the first, a compiler that compiles more features of the language is written in the part of the language that is currently available.
(example: Minimal Pascal doesn't have strings. Extend the Pascal compiler by implementing strings in the pascal compiler (written in Pascal)
- Cross-compiler Compiler for a high-level language (eg. C) on one machine. Write the C compiler in C on this machine and produce code that is executable on another machine.
- Compiler-compiler.
- Retargetable compilers Pl/0 -> pcode. Write a pcode interpreter for the target machine (in a language available on the machine).
Lexical Analysis
- Responsiblities of Lexical analyzer.
- Finding lexemes and associating them with Tokens
- Removal of comments (cannot effectively deal with nested comments because they are not regular expressions)
- Case conversion all upper or lower case (as in Pascal and Lisp) or case sensitive grammar (such as C)
- Removal of White Space.
Example FORTRAN
do 100 i = 1, 30, 2
same as
do100i=1,30,2
- Interprete Compiler directives
- {$R-} in Turbo and other Pascals
- #include in C
- Comunicate with symbol table
- Prepare output listing
- Tokens and Lexemes
Day2.
- Finite State Automata.
- A finite set of state with rules for making transitions between states. One State is specified as the initial state.
- Example: vending machine sells candy at 25 cents per bar. Will accept nickels, dimes and quarters. The user is also allowed to select the type of bar. Select is ignored unless 25 cents or more has been deposited. Select in these cases returns change if there is an overpayment and resets the machine to the start state.
States:
| State# |
total money inserted
|
| 1 the start |
0 cents |
| 2 |
5 cents |
| 3 |
10 cents |
| 4 |
15 cents |
| 5 |
20 cents |
| 6 goal |
25 cents |
| 7 goal |
more than 25 cents |
Draw picture without edges.
Where can you get to from the start state? (Draw labeled edges)
Continue to get rest of labels
- When using a FSA for pattern recognition, there must be a set of terminal states called accepting states.
- Another example of a FSA that accepts strings
Table
| State |
Input |
| |
a |
b |
c |
| 1 |
2 |
1 |
3 |
| 2 |
4 |
2 |
1 |
| 3 |
3 |
4 |
2 |
| 4 |
1 |
3 |
2 |
- What character strings does this FSA record?
Test -> w1=abaabc
- What is the smallest string accepted? c
- What are the shortest strings beginning with a and b that are accepted?
- What is the shortest string beginning with c with more than one letter?
In lexical analysis, when a string hits a terminal state you may still want to consider the next character (as in w1). <, <=, and <> are all acceptable strings in Pascal. How do you know when to stop?
You can parse this type of graph by defining the class Exit which determines which state you will goto on input of a given character. Each state contains an array of Exit, that is used to determine what will happen on a given input.
class Exit
{
public :
Exit(){input ='\0';goToState=Error;}
Exit(char c, Symbol g){input = c; goToState =g;}
int match(char s)const {return s == input;}
Symbol nextState()const {return goToState;}
int operator != (const Exit &e)const
{return (input != e.input)&&(goToState != e.goToState);}
private :
char input;
Symbol goToState;
};
typedef Array<Exit> Exits;
class State
{
public :
State(){numExits =0;}
State(int n,const Exits &e):exits(e){numExits = n;}
int operator != (const State &s)const {return 0;} //temp fix
Symbol nextState(char c)const ; //if c is an exit return its goto otherwise return error state
private :
int numExits;
Exits exits;
};
typedef Array<State> States;
The actual function that determines the next state is given here.
Symbol State::nextState(char c)const
{
//change to find.
for (int i =0;i<exits.numberInArray();i++)
if (exits[i].match(c))//found an exit
return exits[i].nextState(); //go to the state of the exit
return Error; // no exit means error
}