Rainforest

Sankuru

Implementing, customizing, extending, and troubleshooting Joomla/Virtuemart

Views: 2715
SocialTwist Tell-a-Friend

Machine translation

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Re-use open source

What you need, often exists already, and covers your requirements for 80%. We will add the remaining 20% for you.
Php implementation of an LALR1 parser generator PDF Print E-mail
User Rating: / 0
PoorBest 
Written by erik   
Thursday, 24 December 2009 14:30

 

Here is my initial version (0.15) of a Php implementation of an LALR1 parser generator: lrparsers.zip.

Parser tables, along with a lexer table, are the core constituents of any compiler's front end.

This LALR parser generator is heavily based on my custom Php version of the map/reduce programming technique (functional programming). I will elaborate in future posts, on why I used a map/reduce basis to implement the parser generator.

In fact, I have implemented three LR parser generators:

  • LR0. No lookaheads. The other parsers call it to calculate the collection of grammar states.
  • SLR1. States based on LR0. Lookaheads based on grammar-wide follow sets. To test the LR0 parser against SLR1 examples.
  • LALR1. States based on LR0. Lookaheads based on individual state-wide follow sets (merged LR1 lookaheads).

 

I am still looking for a way to validate the LALR1 parser generator more thoroughly. I would use the output of existing parser generators more readily, if they did not generate very cryptical parser tables.

It is a commandline application in Php. You can generate the parser tables for a particular grammar as following:

$ ./lalr1cc examples/grammar07.txt -v

[RULES]
0 S : E ;
1 E : E + ( E ) ;
2 E : int ;

[START]
S

[NON-TERMINALS]
S E

[TERMINALS]
+ ( ) int

[REDUCTIONS] --> state number, rule number reduced, non-terminal reduced | lookahead tokens
1 0 S | EOF
2 2 E | ) + EOF
6 1 E | EOF + )

[TRANSITIONS]  --> transition number, state from, transition term, state to | lookahead tokens
0 0 E 1 | EOF +
1 0 int 2 | EOF +
2 1 + 3 | EOF +
3 3 ( 4 | EOF + )
4 4 E 5 | EOF + )
5 4 int 2 | ) +
6 5 ) 6 | EOF + )
7 5 + 3 | ) +

[STATES]
0 0-0|1-0/2-0 INIT --> state number, core rule positions | expanded rule positions
0-0 S : .E | EOF --> rule position, the dot shows the position in the rule | lookahead tokens
1-0 E : .E + ( E ) | EOF +
2-0 E : .int | EOF +
1 0-1/1-1
0-1 S : E. | EOF
1-1 E : E.+ ( E ) | EOF +
2 2-1
2-1 E : int. | ) + EOF
3 1-2
1-2 E : E +.( E ) | EOF + )
4 1-3|1-0/2-0
1-3 E : E + (.E ) | EOF + )
1-0 E : .E + ( E ) | ) +
2-0 E : .int | ) +
5 1-4/1-1
1-4 E : E + ( E.) | EOF + )
1-1 E : E.+ ( E ) | ) +
6 1-5
1-5 E : E + ( E ). | EOF + )

 

The -v option means "verbose". With this option, the parser generator will also output the individual state cursors in the [STATES] section of the parser tables. The individual state cursors are not needed for the LR parsing algorithm.

The example grammar above (grammar07.txt) comes from lecture on LALR1, apparently at Berkeley (http://users.cs.dal.ca/~jost/4131/lecture09_4up.pdf). The parser generates indeed the same states and lookahead tokens as in the solution in the lecture. This parser generator, however, does not operate by merging LR1 states, but by enriching directly the LR0 states.

Of course, I need much more further validation than the few examples included, to make sure the parser generator generates the tables correctly.

I would love to get some help in validating the parser generator. Therefore, if you feel like running it against other examples, feel free!

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack