Notes for week 4

Day 1

Top-Down Parsers

  1. Starts at the top of the parse tree and tries to construct the left-most derivation.
  2. Bottom-up parsers start at the leaves and build the parse tree from the bottom-up as a right-most derivation.
  3. Both types of parsers scan from left to right.
  4. Left recursions.

    1. Suppose we have the grammar (on page 83)
      E -> E + T | T
      T -> T * F | F
      F -> i | (E)
      
      1. If you parse i + i + i you would like to expand to
      2. Problem how does the compiler know to expand E -> E + T twice on the current input character and E->T on the third expansion?S
      3. Show infinite left recursion.
    2. The general problem of left-recursive productions. A -> Aa .
      1. Two cases
        1. Immediate left recursion A -> Aa
        2. Non-immediate left recursion A-> B a , B -> Ab .
      2. Immediate left recursion
        1. Only left-recursion causes problems A -> aA b and A -> a A do not cause any problems. Why?
        2. Must have a non-left recursive production.
        3. A -> A a | b
        4. Add a new non-terminal A' and replace productions in A by
          A -> a A'
          A' -> e | a A'
          
        5. What type of expressions does this parse? Are they the same as the original.
      3. Example (page 86) A -> Ac | Ad | e | f
        1. Break into left-recursive and non-left-recursive
          A -> Ac | Ad
          A -> e | f
          
        2. Create new non-terminal A'
        3. Create new set of productions:
          A -> eA' | fA'
          A' -> e | cA' | dA'
      4. Do the replacement for:
        E -> E + T | T
        T -> T * F | F
        F -> i | (E)
        
        1. Identify non-recursive part of each production.
          1. + T
          2. *F
        2. Add nonterminals for E and T productions .
          1. E -> TE'
               E' -> e | +T E'
            
          2. T -> F T'
                 T' -> e | + FT'
            

Day 2

Recursive Descent Compilers

  1. Review the top down approach
  2. Recursive descent for
    S -> if Condition then S;
       -> while Condition do S;
       -> ident := Expression;
       -> e 
    
    bool S()
    {
    	if  (sym == ifsym)
    	{     
                    lex.getsym();
    	    if  (Condition())
    	    {
    		if ( sym == thensym )
    			getsym
    		else 
    			error();
    		S;
    	     }
    	}
    	else if  (sym == whilesym)
    	…
    	else if ( sym == ident)
    
    	…
     no else if there is an e
    
  3. How can you store a grammar like this one in a program? One way is to create class Production that can be used in your compiler compiler. This class is given below along with the class RightSide.
    class  RightSide
    {
    public: 
    	RightSide(){numSyms = 0;}
    	void insert(int s){symSubs[numSyms++]=s;}
    	ostream& print(ostream &,const  LookUpTable &)const ;
    	ostream& printMarked(ostream &s,const  LookUpTable &l,int mark)const ;
    	int operator [](int i)const {return  symSubs[i];}
    	int length()const {return  numSyms;}
    	int findOccurences (int i,TokenStack &loc)const ;
    private: 
    	IntArray symSubs;//symbols used in right-hand side 
    	int numSyms; //how large is the right-hand side 
    };
    
    
    class  Production
    {
    public: 
    	Production(int ls=-1){leftSide = ls;}
    	void  insertRight(int i){rightSide.insert(i);}
    	int getSymI(int i)const {return  rightSide[i];}
    	int findOccurences (int i,TokenStack &loc)const ;
    	//returns the number of hits. loc is actual location of hits 
    	int length()const {return  rightSide.length();}
    	int operator != (const  Production &p){return  leftSide != p.leftSide;}
    	ostream& print(ostream &s, const  LookUpTable &l)const ;
    	ostream& printMarked(ostream &s,const  LookUpTable &l,int mark)const ;
    	// prints state entries 
    	int left()const {return  leftSide;}
            void error(const  String &s){_error = s;}
    private: 
    	int leftSide;// subscript of nonterminal on left 
    	RightSide rightSide;//stores the right side of a production 
            String _error;//string to describe an error at a given step 
    };
  4. In a top down compiler compiler, you will parse productions. Some samples are given here.
    void  TAGParser::parseRule()
    {
       if  (psym != Ident)
          code[numProduct++].error("Idenifier expected on Left side of a production" );
       String left = scanner.getId(); //get the left-hand side 
       int where = lookUp.locateInsert(left,NonTerminalToken); // insert if new otherwise find
                                                //where is where left is located in the table 
       psym = scanner.getNextSymbol();//get next symbol 
       int firstProduct = numProduct; //store so we can keep range of products
             //with left on left side 
       
       bool  error = false ;
       do   // get all products with this left hand side 
       {
          code[numProduct++]= Production(where);
          if  (psym == Arrow)
             psym = scanner.getNextSymbol();
          else 
          {
             code[numProduct++].error("Arrow expected between left"+
                   "  and right side of production" );
             error = true ;
          }
          expression();
       }
       while  (psym != Semicolon && psym!=Error  && !error);
       lookUp[where]->setFirstLast(firstProduct,numProduct-1);
       psym = scanner.getNextSymbol();
    }
    
    void TAGParser::term()
    {
    	if  (psym == Ident)
    	{
    		String sval = scanner.getId();
    		int symVal = lookUp.locateInsert(sval,NonTerminalToken);
    		code[numProduct-1].insertRight(symVal);
    		psym = scanner.getNextSymbol();
    	}
    
    	else 
    	  factor();
    }
  5. What happens if you have a grammar like this (if time permits)
    E -> TE'
    E' -> e | +T E'
    T -> F T'
    T' -> e | + FT'
    F -> id | (E)
    
  6. More complicated (page 94)
    S -> Ab | Bc
    A -> Df  | CA
    B -> gA | e
    C -> dC | c
    D -> h | i
    
    When you write S how do you determine whether or not to try A or B?

    Suppose you have the string hfb. Do you try A or B? You have to know what tokens can start A and B. The set of these items is called the First

    How do you think you would calculate first?
    First(S) = First(A) + First(B); ({h, i, d, c} + {g, e} ={h, i, d, c, g, e}
    First(A) = First(D) + First(C); (= {h, i}+ {d, c} = {h, i, d, c})
    First(B) = {g, e}
    First(C) = {d, c}
    First (D) = {h, i}
    
  7. Computing First.
    1. Start with an array subscripted on your grammar items N» T, of sets of terminals (call it Firsts)
    2. Set the first of each terminal to be itself (Firsts[t] = {t}) and Firsts of the nonterminals to be the empty set.
    3. Define the function First by
//int Epsilon;//will always be numTokens +2 use to represent the empty product
// Thus every set First will be large enough to contain any item from the grammar.  
void  TAGCompiler::findFirsts()//compute the set of all firsts  
{
	// Initialize the Firsts. First of a terminal is that terminal
	// while first of a non-terminal is initialize to empty 
	for  (int  count = 0;count<numTokens;count++)
		if  (lookUp[count]->kind()==TerminalToken)
			firsts[count] = MySet(Epsilon+1,1,count);
		else 
			firsts[count] = MySet(Epsilon+1);
	// Find the firsts for non-terminals by computing the first for the
	// left-hand side of each production in the grammar 
 	for  (int  marker = numProduct-1;marker >= 0;marker--)
		if  (firsts[code[marker].left()].empty())
			computeFirsts(code[marker].left());
}

void  TAGCompiler::computeFirsts(int nonTerm)// compute the firsts for one nonTermial 
 {
	int  prodFirst;
	int  prodLast;
	lookUp[nonTerm]->getFirstLast(prodFirst,prodLast);
	for  (int  count = prodLast;count>=prodFirst;count--)
	   if  (code[count].length()==0) //empty production adds Epsilon to first  
	      firsts[nonTerm] += Epsilon;
	   else 
	   {
	      int  mark = code[count].getSymI(0);
	      int  position = 1;
	      int  finished =0;
	      if  (lookUp[mark]->kind()==TerminalToken)
	         firsts[nonTerm]+= mark;
	      else 
	         do 
	         {             
	            if  (firsts[mark].empty())//if first[mark] not computed do it now  
	               computeFirsts(mark);
	            MySet temp = firsts[mark];
	            temp -=Epsilon;
	            firsts[nonTerm]|=temp;
	            if  (firsts[mark].contains(Epsilon)
		               &&  position < code[count].length()-1)
	            // if current item can begin with epsilon, then could see through
	            // to next item in code  
	            {
	               position++;
	               mark = code[count].getSymI(position);
	            }
	            else 
	            {
	               if  (firsts[mark].contains(Epsilon))
	               //then can see through this product because must be at end of code  
	                     firsts[nonTerm]+=Epsilon;
		       finished = 1;
	            }
	         }while  (!finished);
	      }
}

void  TAGCompiler::findFollows()// computer follows for all nonterminals 
{
	//numTokens now includes $ so only go through numTokens -1 
	for  (int count = 1;count < numTokens-1;count++)
		follows[count] = MySet(Epsilon+1);
	follows[0] = MySet(Epsilon+1,1,lookUp.find("$" ));// add $ to follows of first production 
	computeFollows(0);	

	for  (int count = 1;count<numTokens-1;count++)
		if (lookUp[count]->kind()==NonTerminalToken && follows[count].empty())
   		     computeFollows(count);
}

int TAGCompiler::treatEpsilonEnd(int prodNum,int &pos,int nonTerm,int &lSym)
{
	if (firsts[lSym].contains(Epsilon) &&// lSym is see through
 		pos < code[prodNum].length()-1) // and there is something to see 
	{
		pos++;
		lSym = code[prodNum].getSymI(pos);
		return  1; //continue with this value 
	}
	else  // nothing else to see 
	{
		if  (firsts[lSym].contains(Epsilon) && code[prodNum].left()!=nonTerm)
		{
		         lSym = code[prodNum].left();// nonTerm is followed by anything  
                                                     //following the left-hand side of Production
 			if  (follows[lSym].empty()) //has to be computed before it's used 
				computeFollows(lSym);
			follows[nonTerm] |= follows[lSym];
		}
		return  0;  //all done 
	}
}

void  TAGCompiler::computeFollows(int nonTerm) // compute the follows for a nonTerminal 
{
    TokenStack locs(10,5);
    for (int prodNum = 0;prodNum<numProduct;prodNum++)
        if (code[prodNum].findOccurences(nonTerm,locs))
        {
            while  (!locs.empty())
            {
                int pos = locs.remove();//retrieve one of the locations of nonTerm 
                if (pos < code[prodNum].length()-1) //it's not at the end 
                {
                    pos++;
                    int lSym = code[prodNum].getSymI(pos);
                    // find out how far beyond pos you can see. Add each item seen 
                    if  (lSym != nonTerm)//don't add current symbol to its follow 
                        do 
                        {
                            MySet temp = firsts[lSym];
                            temp -= Epsilon;
                            follows[nonTerm]|=temp;
                            //finish the production if more epsilons 
                        }
                        while 	(treatEpsilonEnd(prodNum,pos,nonTerm,lSym));
                }
                else if  (code[prodNum].left() != nonTerm)//nothing follows nonTerm 
                {
                    int lSym = code[prodNum].left();
                    if (follows[lSym].empty())// must find follows for lSym 
                        computeFollows(lSym);
                    //add follows of lSym to nonTerm follow 
                    follows[nonTerm]= follows[nonTerm] | follows[lSym];
                }
            }
        }
}