Rainforest

Sankuru

Implémenter, personaliser, étendre et réparer Joomla/Virtuemart

Views: 14163

Nous vous aidons avec ...

Virtuemart
Joomfish
Autres extensions
SocialTwist Tell-a-Friend

Traduction automatique

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Re-utilisons des sources libres

Les logiciels dont vous avez besoin, éxistent souvent déjà en source libre, et couvrent vos besoins à 80%. Nous ajouterons pour vous les 20% qui manquent.
The need for LALR1 packrat parsers PDF Imprimer E-mail
Note des utilisateurs: / 1
MauvaisTrès bien 
Écrit par erik   
Vendredi, 25 Décembre 2009 02:06
There are no translations available.

Traditionally, there has always been an important distinction between sentential languages (CFG), that can be parsed by LL or LR parsers, and regular grammars, that can be parsed by scanners.

An example of a sentential grammar:

start : animal ;
animal : mammal ;
animal : reptile ;
mammal : dog ;
mammal : lion ;
reptile : snake ;
reptile : lizard ;

An example of a regular grammar:

word: (letter) + ;

The power of sentential grammars is that terms on the right-hand side of a rule can re-appear on the left-hand side.  This allows the rules to be decomposed. The power of regular expressions is that you can use the Kleene operators, especially, to deal with constructs describing repetition. You cannot use those in sentential grammars, where you need to deal with repetition through recursion.

Of course, we would benefit from a unification in both grammar types. Therefore, it would really be a great thing if we could use Kleene operators in sentential grammars.

So, I thought that the idea of a unification effort was original and thought of of embarking on it by myself, until I discovered that Bryan Ford already got there, with his PEG approach, that is, his Parsing Expression Grammars.

His approach allows for regular expressions in sentential grammars. So, Ford's packrat parsers already allow for the desired regularity in sentential grammars.

That would leave me nothing to do, if it were not for the fact that Ford's packrat parsers are apparently built on the LL parsing algorithm. Wikipedia says so, but I still need to verify this by studying Ford's original paper more thoroughly.

We do not "like" LL parsers, because LR parsers are seriously more powerful than LL parsers, regardless of how many lookahead tokens the LL parsers use. Therefore, we still need an algorithm to generate LR instead of LL packrat parsers.

While building my own LALR parser generator in Php, I became aware of the fact that the LR0 states ("item sets") are based on relatively simple rule positions. A rule position could actually be extended to be the state in the underlying DFA for the rule, instead of just a cursor position in it.

If a grammar rule ("production") is represented by a DFA, it can be a regular expression in its terms, instead of just a simple sequence of terms. This extension would allow for the use of Kleene operators in a traditional LALR grammar.

Therefore, it must indeed be possible to build an LALR1 packrat parser generator instead of an LLk one.

Unfortunately, I have only just managed to finish my LALR implementation in Php, without actually sufficiently validating it. So, I cannot embark immediately on this extension, a Kleene LALR1 parser generator ("KLALR1") before doing more testing on my simple LALR1 ("SLALR1") implementation first.

But then again, it is still a race against time, because otherwise, someone else will build the Kleene LALR1 parser generator first, to produce an LR alternative to Ford's Kleene LLk parser generator, and beat me to the finish :-)

Of course, there is again no Php implementation for Ford's work. So, the research results are again not trickling down to the ones who could benefit from them and actually use them to solve real-world problems.

As I blogged about it before, the functional guys do not like Php, while the imperative guys do not want to use lisp or similar languages. Python would have been acceptable to both style of programmers, if it hadn't strayed as much from the C syntax.

I repeat: why not compromise on Javascript?

The real-world, imperative programmers already agree to use Javascript. They use it client-side all the time already. I think that they can be talked into using Javascript server-side too, especially, now that we have super-fast Javascript engine implementations such as v8 and tracemonkey.

The functional programmers have all the support they need, for functional programming constructs in Javascript (if not more) as in Python. Why not both move there, so that the communication finally starts flowing?

How much fun can it be to produce such interesting work, only to see that nobody uses it, because they do not want to bother with Haskell or Ruby and so? Come on, guys.

While I do understand a preference for Python, Ruby, or Haskell, I do not understand why anybody would prefer to publish their research efforts in Java or C#. What a waste of time. How could it be more convenient to work on parsers in that kind of stupid languages? Java and C# don't even have first-class functions.

Furthermore, by using Java or C# or other big-corporation languages, why help furthering a big-corporation, big-government agenda, which contains secret ambitions to exclude even more people from the benefits of technological progress?

I really prefer small and medium-size business, because they do not have the power to further secret agendas anyway. At least, they have to work for a living, instead of leeching, and always trying to collect big-corporation, big-government-protected monopoly rents to the detriment of everybody else. And Php is the choice of the small and medium-size business.

Php may not be the most beautiful language implementation in the history of mankind, but at least it does work. Furthermore, publishing in Php guarantees that you actually have an audience of real-world, real-life users.

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack