|
Here is my initial version (0.15) of a Php implementation of an LALR1 parser generator: lrparsers.zip.
Parser tables, along with a lexer table, are the core constituents of any compiler's front end.
This LALR parser generator is heavily based on my custom Php version of the map/reduce programming technique (functional programming). I will elaborate in future posts, on why I used a map/reduce basis to implement the parser generator.
In fact, I have implemented three LR parser generators:
- LR0. No lookaheads. The other parsers call it to calculate the collection of grammar states.
- SLR1. States based on LR0. Lookaheads based on grammar-wide follow sets. To test the LR0 parser against SLR1 examples.
- LALR1. States based on LR0. Lookaheads based on individual state-wide follow sets (merged LR1 lookaheads).
I am still looking for a way to validate the LALR1 parser generator more thoroughly. I would use the output of existing parser generators more readily, if they did not generate very cryptical parser tables.
It is a commandline application in Php. You can generate the parser tables for a particular grammar as following:
$ ./lalr1cc examples/grammar07.txt -v
[RULES] 0 S : E ; 1 E : E + ( E ) ; 2 E : int ;
[START] S
[NON-TERMINALS] S E
[TERMINALS] + ( ) int
[REDUCTIONS] --> state number, rule number reduced, non-terminal reduced | lookahead tokens 1 0 S | EOF 2 2 E | ) + EOF 6 1 E | EOF + )
[TRANSITIONS] --> transition number, state from, transition term, state to | lookahead tokens 0 0 E 1 | EOF + 1 0 int 2 | EOF + 2 1 + 3 | EOF + 3 3 ( 4 | EOF + ) 4 4 E 5 | EOF + ) 5 4 int 2 | ) + 6 5 ) 6 | EOF + ) 7 5 + 3 | ) +
[STATES] 0 0-0|1-0/2-0 INIT --> state number, core rule positions | expanded rule positions 0-0 S : .E | EOF --> rule position, the dot shows the position in the rule | lookahead tokens 1-0 E : .E + ( E ) | EOF + 2-0 E : .int | EOF + 1 0-1/1-1 0-1 S : E. | EOF 1-1 E : E.+ ( E ) | EOF + 2 2-1 2-1 E : int. | ) + EOF 3 1-2 1-2 E : E +.( E ) | EOF + ) 4 1-3|1-0/2-0 1-3 E : E + (.E ) | EOF + ) 1-0 E : .E + ( E ) | ) + 2-0 E : .int | ) + 5 1-4/1-1 1-4 E : E + ( E.) | EOF + ) 1-1 E : E.+ ( E ) | ) + 6 1-5 1-5 E : E + ( E ). | EOF + )
The -v option means "verbose". With this option, the parser generator will also output the individual state cursors in the [STATES] section of the parser tables. The individual state cursors are not needed for the LR parsing algorithm.
The example grammar above (grammar07.txt) comes from lecture on LALR1, apparently at Berkeley (http://users.cs.dal.ca/~jost/4131/lecture09_4up.pdf). The parser generates indeed the same states and lookahead tokens as in the solution in the lecture. This parser generator, however, does not operate by merging LR1 states, but by enriching directly the LR0 states.
Of course, I need much more further validation than the few examples included, to make sure the parser generator generates the tables correctly.
I would love to get some help in validating the parser generator. Therefore, if you feel like running it against other examples, feel free!
|