Rainforest

Sankuru

Implementeren, customiseren, uitbreiden, en troubleshooten van Joomla/Virtuemart

Views: 2713

Wij helpen met ...

Virtuemart
Joomfish
Andere extensies
SocialTwist Tell-a-Friend

Automatische vertaling

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Hergebruik open source

Datgene wat U nodig hebt, bestaat vaak al, en dekt 80% van Uw behoeften. Wij zorgen voor de ontbrekende 20%.
Php implementation of an LALR1 parser generator PDF Afdrukken E-mail
Waardering: / 0
SlechtZeer goed 
Geschreven door erik   
donderdag 24 december 2009 14:30
There are no translations available.

 

Here is my initial version (0.15) of a Php implementation of an LALR1 parser generator: lrparsers.zip.

Parser tables, along with a lexer table, are the core constituents of any compiler's front end.

This LALR parser generator is heavily based on my custom Php version of the map/reduce programming technique (functional programming). I will elaborate in future posts, on why I used a map/reduce basis to implement the parser generator.

In fact, I have implemented three LR parser generators:

  • LR0. No lookaheads. The other parsers call it to calculate the collection of grammar states.
  • SLR1. States based on LR0. Lookaheads based on grammar-wide follow sets. To test the LR0 parser against SLR1 examples.
  • LALR1. States based on LR0. Lookaheads based on individual state-wide follow sets (merged LR1 lookaheads).

 

I am still looking for a way to validate the LALR1 parser generator more thoroughly. I would use the output of existing parser generators more readily, if they did not generate very cryptical parser tables.

It is a commandline application in Php. You can generate the parser tables for a particular grammar as following:

$ ./lalr1cc examples/grammar07.txt -v

[RULES]
0 S : E ;
1 E : E + ( E ) ;
2 E : int ;

[START]
S

[NON-TERMINALS]
S E

[TERMINALS]
+ ( ) int

[REDUCTIONS] --> state number, rule number reduced, non-terminal reduced | lookahead tokens
1 0 S | EOF
2 2 E | ) + EOF
6 1 E | EOF + )

[TRANSITIONS]  --> transition number, state from, transition term, state to | lookahead tokens
0 0 E 1 | EOF +
1 0 int 2 | EOF +
2 1 + 3 | EOF +
3 3 ( 4 | EOF + )
4 4 E 5 | EOF + )
5 4 int 2 | ) +
6 5 ) 6 | EOF + )
7 5 + 3 | ) +

[STATES]
0 0-0|1-0/2-0 INIT --> state number, core rule positions | expanded rule positions
0-0 S : .E | EOF --> rule position, the dot shows the position in the rule | lookahead tokens
1-0 E : .E + ( E ) | EOF +
2-0 E : .int | EOF +
1 0-1/1-1
0-1 S : E. | EOF
1-1 E : E.+ ( E ) | EOF +
2 2-1
2-1 E : int. | ) + EOF
3 1-2
1-2 E : E +.( E ) | EOF + )
4 1-3|1-0/2-0
1-3 E : E + (.E ) | EOF + )
1-0 E : .E + ( E ) | ) +
2-0 E : .int | ) +
5 1-4/1-1
1-4 E : E + ( E.) | EOF + )
1-1 E : E.+ ( E ) | ) +
6 1-5
1-5 E : E + ( E ). | EOF + )

 

The -v option means "verbose". With this option, the parser generator will also output the individual state cursors in the [STATES] section of the parser tables. The individual state cursors are not needed for the LR parsing algorithm.

The example grammar above (grammar07.txt) comes from lecture on LALR1, apparently at Berkeley (http://users.cs.dal.ca/~jost/4131/lecture09_4up.pdf). The parser generates indeed the same states and lookahead tokens as in the solution in the lecture. This parser generator, however, does not operate by merging LR1 states, but by enriching directly the LR0 states.

Of course, I need much more further validation than the few examples included, to make sure the parser generator generates the tables correctly.

I would love to get some help in validating the parser generator. Therefore, if you feel like running it against other examples, feel free!

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack