Notes for week 3

Day 1

  1. Converting NFA to a DFA.
    1. e-closure
      1. e-closure of a state s is the set of states that can be reached from s by a series of e-transitions.
      2. e-closure(T) where T is a set of states is U e-closure(s) for s e T.
      3. move(T,a) Set of NFA states to which there is a transiton on the input symbol a from some NFA state s in T.
    2. Create a DFA
      1. Initalizes DStates with e-closure(s0), this state is unmarked.
      2. While there is an unmarked state T in DStates
        1. mark T
        2. For each inpute symbol a
          1. U = e-closure(Move(T,a));
          2. If U is not in DStates add U as an unmarked state of DStates
          3. DTran[T,a] = U
    3. Example
      1. A = e-closure(0) = {0,1,2,4,7}
      2. Move(A,a) = {3,8} 2->3, 7->8 Nothing else moves for a B = e-closure({3,8}) ={1,2,3,4,6,7,8}; DTrans(A,a) = B
      3. Move(A,b) = {5} C = e-closure({5}) = {1,2,4,5,6,7}; DTrans(A,b) = C
      4. Other states D = {1,2,4,5,6,7,9}; E = {1,2,4,5,6,7,10}
      5. DTran
        State Input
          a b
        A B C
        B B D
        C B C
        D B E
        E B C
      6. Draw this DFA.
    4. If time do the example on page 49 in the book.

Creating a lexical analyzer using an NFA

  1. Modifications to the basic NFA used to recognize patterns. A. Scanner must ignor white space except when white space marks the end of a token. B. Cannot wait until the end of the string (the program) to determine if we match the pattern. Must match every time we get to a terminal state, then reset the state to the initial state. The lexical analyzer always returns something. If it fails to recognize a lexeme, it returns a null token. C. We want exactly one accept state for every token. D. Keywords can be handled either as a token recognized by the NFA or as an identifier that is found in a lookup table.
  2. An actual lexical analyzer.
    1. Consider the simple case of a language that consists of just {<, <=, =, >=, >, <>}
      1. List the initial and terminal states first
      2. Next determine intermediate states needed to get from the start state to the accept state.

      Day 2

    2. To write the lexical analyzer from the diagram, you need to initialize the states used in the lexical analizer you saw in week 1 .
      1. Create a enumerated type that corresponds to each of the states. The terminal states will be returned as symbol.
        enum 	Symbol {Zero,InLess,InGreater,Error, Less,NotEqual,LessEqual,Equal,Greater,GreaterEqual};
        
      2. Initize the states. For example Zero will have one exit to InLess (1), InGreater(2), Equal(7) and Error(3). Notice the white space keeps you in state zero.
        	Exits exits;
        	exits.insertItem(Exit(' ',Zero));
        	exits.insertItem(Exit('\t',Zero));
        	exits.insertItem(Exit('<',InLess));
        	exits.insertItem(Exit('=',Equal));
        	exits.insertItem(Exit('>',InGreater));
        	state[Zero] = State(exits.numberInArray(),exits.copy() );
        	//Clear before creating the next state  
        	exits.clearArray();
        
      3. Finally, in this case getNextSymbol needs only more from state to state until a terminal state is achieved. If you were accumulating a value such as a number or string, you would have special conditions that would record information if you were in a state that was an appropriate internal state, such as InIdent for an identifier.
        Symbol LexAnalizer::getNextSymbol()
        {
        	id = "";
        	Symbol currentState=Zero;
        
        	do 	{
        	  currentState = state[currentState].nextState(ch); //get the next state
        	  }while  (currentState<Error && ch !=0);//Error is the first terminal state
             
             if (ch ==0) //Ran out of source code  
                 return  Error;
             return  currentState;
        }
  3. What about the example {if, ident} ident = letter (letter|digit)*

Grammars

  1. Syntax and Semantics (in natural languages must know the semantics to determine the syntax, not in programming languages.)
  2. Definition of a grammar:
    1. A grammar G is a quadruple (T,N,S,R), where T is a finite set of terminal symbols
      N is a finite set of nonterminal symbols (T <-> N is empty)
      S is a unique starting symbol (S is nonterminal)
      R is a finit set of productions in the form a -> b, where a and b are strings of nonterminals and terminals.
    2. Example of a grammar
      T = {a, b, c}
      N = {S, B, C}
      S is the start 
      R {
      
      S -> aSBC S -> abC CB -> BC bB -> bb bC -> bc cC -> cc
      }
      • Notice that the left-hand sides of some of these productions are strings of terminals and/or nonterminals. Chomski classifies grammars that allow strings of symbols (at least one of which should be non-termina) on the left as Type 0 and 1 grammars. 0 requires that the left not be e , 1 requires that the right not be
        e
      • Type 2 grammars require the left-side be one non-terminal. These are Context free grammar
      • Type 4 grammars are regular expressions
    3. Derivations Build a parse tree for a+b*c with the following grammar
      E -> E + E
         -> E * E
         -> i
      
    4. Left-most and Right-most Derivations
      E => E * E
          => E + E * E
          => i + E * E
          => i + i * E
          => i + i * i
      
    5. Two ways to parse a + b * c (show trees) This grammar is ambiguous.
    6. Convert to unambiguous form:
      E -> E + T | E - T | T
      T -> T * F | T / F | F
      F -> (E) | i