Notes for week 1

Day 1.

  1. Source Code, Object Code (and language)
  2. Types of object code
  3. Compile Sequence Source Code -> Compiler -> Object Code -> Linker -> Executable Code
    1. Object Code does not have addresses of grammar objects that are not defined in the Source code.
    2. Linker finds external references
  4. Parts of a compiler
    1. Lexical analyzer
    2. Syntactical Analyzer (Parser)
    3. Intermediate Code Generation.
      Temp3 = id2
      Temp4 = id3
      Temp2 = Temp3 * Temp4
      Temp1 = Temp2
      id1 = Temp1
      
    4. Optimization.
      (if no registers required)
      id1= id2 * id3
      
      or (if registers required)
      Temp3 = id2
      Temp4 = id3
      Temp2 = Temp3 * Temp4
      id1 = Temp2
      
    5. Object Code generation convert intermediate code to machine code.
  5. Compiler Construction
    1. When no compilers are available.
      1. Create a compiler in assembly language
      2. That compiler can then compile any program in the source language
    2. If a compiler is available in C and we want a PL/1 compiler. Write the compiler in C and the effect is the same as A, except easier to create.
    3. Boot Strap
      1. Create a minimal compiler in assemby language
        1. Must be big enough to write the rest of the compiler
        2. Must be small enough so that you don't end up back in A
        3. May be done in steps. Each step after the first, a compiler that compiles more features of the language is written in the part of the language that is currently available. (example: Minimal Pascal doesn't have strings. Extend the Pascal compiler by implementing strings in the pascal compiler (written in Pascal)
    4. Cross-compiler Compiler for a high-level language (eg. C) on one machine. Write the C compiler in C on this machine and produce code that is executable on another machine.
    5. Compiler-compiler.
  6. Retargetable compilers Pl/0 -> pcode. Write a pcode interpreter for the target machine (in a language available on the machine).

Lexical Analysis

  1. Responsiblities of Lexical analyzer.
    1. Finding lexemes and associating them with Tokens
    2. Removal of comments (cannot effectively deal with nested comments because they are not regular expressions)
    3. Case conversion all upper or lower case (as in Pascal and Lisp) or case sensitive grammar (such as C)
    4. Removal of White Space.
      Example FORTRAN 
      do 100 i = 1, 30, 2
      same as
      do100i=1,30,2
      
    5. Interprete Compiler directives
      1. {$R-} in Turbo and other Pascals
      2. #include in C
    6. Comunicate with symbol table
    7. Prepare output listing
  2. Tokens and Lexemes

    Day2.

  3. Finite State Automata.
    1. A finite set of state with rules for making transitions between states. One State is specified as the initial state.
    2. Example: vending machine sells candy at 25 cents per bar. Will accept nickels, dimes and quarters. The user is also allowed to select the type of bar. Select is ignored unless 25 cents or more has been deposited. Select in these cases returns change if there is an overpayment and resets the machine to the start state. States:
      State# total money inserted
      1 the start 0 cents
      2 5 cents
      3 10 cents
      4 15 cents
      5 20 cents
      6 goal 25 cents
      7 goal more than 25 cents

      Draw picture without edges. Where can you get to from the start state? (Draw labeled edges) Continue to get rest of labels

    3. When using a FSA for pattern recognition, there must be a set of terminal states called accepting states.
    4. Another example of a FSA that accepts strings

      Table

      State
      Input
        a b c
      1 2 1 3
      2 4 2 1
      3 3 4 2
      4 1 3 2
      • What character strings does this FSA record?
        Test -> w1=abaabc
        
      • What is the smallest string accepted? c
      • What are the shortest strings beginning with a and b that are accepted?
      • What is the shortest string beginning with c with more than one letter?
    5. In lexical analysis, when a string hits a terminal state you may still want to consider the next character (as in w1). <, <=, and <> are all acceptable strings in Pascal. How do you know when to stop?

You can parse this type of graph by defining the class Exit which determines which state you will goto on input of a given character. Each state contains an array of Exit, that is used to determine what will happen on a given input.

class  Exit
{
public :
	Exit(){input ='\0';goToState=Error;}
	Exit(char c, Symbol g){input = c; goToState =g;}
	int match(char s)const {return  s == input;}
	Symbol nextState()const {return  goToState;}
	int operator != (const  Exit &e)const 
		{return  (input != e.input)&&(goToState != e.goToState);}
private :
	char input;
	Symbol goToState;
};

typedef  Array<Exit> Exits;
class  State
{
public :
	State(){numExits =0;}
	State(int n,const  Exits &e):exits(e){numExits = n;}
	int operator != (const  State &s)const {return  0;} //temp fix
  	Symbol nextState(char c)const ;  //if c is an exit return its goto otherwise return error state  
private :
	int numExits;
	Exits exits;
};

typedef  Array<State> States;

The actual function that determines the next state is given here.

Symbol State::nextState(char c)const 
{
//change to find.  
	for  (int i =0;i<exits.numberInArray();i++)
		if  (exits[i].match(c))//found an exit  
			return  exits[i].nextState(); //go to the state of the exit
  	return  Error; // no exit means error  
}