Rainforest

Sankuru

Implementeren, customiseren, uitbreiden, en troubleshooten van Joomla/Virtuemart

Views: 1119

Wij helpen met ...

Virtuemart
Joomfish
Andere extensies
SocialTwist Tell-a-Friend

Automatische vertaling

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Hergebruik open source

Datgene wat U nodig hebt, bestaat vaak al, en dekt 80% van Uw behoeften. Wij zorgen voor de ontbrekende 20%.

Gratis offerte

Vraag vandaag nog gratis een offerte aan.

Building a tokenizer for the EAV problem, part 5 PDF Afdrukken E-mail
Waardering: / 0
SlechtZeer goed 
Geschreven door erik   
dinsdag 23 juni 2009 05:17
There are no translations available.

I have finally gotten the first embryonic version together for the cRE compiler, which implements the reduction rules I have proposed in replacement of Thompson's algorithm for the reduction of Kleene's closures into state transitions, that is, a Turing table.

By the way, I still need to re-read Thompson's seminal paper again to invent new and sillier reasons why my approach would be better than his :-)

In the end, the game consists in producing the correct table for a pure Turing automaton, in such a way that it can match Kleene's word rules. Lexing ("tokenizing") has traditionally always been solved by using a Turing automaton.

This approach is, however, potentially much wider and seriously more applicable than it may look like. Any decidable programming problem could conceivably be projected unto a Turing table, and hence solved by a Turing automaton.

Therefore, the cRE compiler is just an example of a Turing translate rule cPPTT class, that accepts a program P in grammar G from program space PP(G) and projects P unto the Turing table T in the Turing table space TT, under the equivalence rule that given the same input I, the output O remains the same: O(P(I))==O(cPPTT(P(I))).

We do not often use such Turing translate rules. The domain of regular expression matching algorithms is an exception, in which we do use them; but even then, probably, because there is simply no alternative available.  But then again, I actually like using Turing translate rules. Therefore, I think it is worthwhile to explore if they can be applied more commonly. Turing proposed this approach in 1936 already. I actually think that he had more of a point, than generally accepted.

Unfortunately, the really interesting people in our field, such as Alan Turing, Kurt Gödel, John Nash, Stephen Kleene, and John Von Neumann, are all gone now.  Therefore, we have no longer access to the guidance that they were obviously able to provide, and for which there seems to be no contemporary replacement. Studying this field is almost like digging up old scriptures from the desert. The field does not produce that many new interesting publications, if any at all. For any real practical work, we still depend on the old guys. It is their original work that still powers the computer systems of today. The funny thing is that the old guys have probably never seen our computer systems. It is their original specifications, still.

In version 0.0.1 of the cRE compiler, I have provided an implementation for the Kleene star operator only. The testcRE.php script takes one command line argument, which should be the regular expression to reduce. For example:

$ php testcRE.php "ab(cd)*e"

produces the following output:

ab(cd)*e
testrule:WR:0/'<0>','a','<1>','b','<2>',[WR:0s0],KLOP_STAR,'e','<5>'
testrule:WR:0s0/'c','<3>','d','<4>'

This is just the sentence tree containing the tokens as nodes. A node is either a character range, a Kleene operator, a state, or a subrule.

Next, the program reduces the rule:

reduced:

testrule:WR:0r1/'<0>','a','<1>','b','<2>','e','<5>'
testrule:WR:0r2/'<0>','a','<1>','b','<2>',[WR:0s0],'e','<5>'
testrule:WR:0s0/'c','<3>','d','<4>'
testrule:WR:0r3/'<0>','a','<1>','b','<2>',[WR:0s0],[WR:0s0],'e','<5>'

I still need to implement the other reduction rules. I have just implemented the one for Kleene's star (the repetition operator):

<?php
require_once('WordRule.Reducer.php');
/*
Written by Dit e-mailadres is beveiligd tegen spambots, u heeft JavaScript nodig om het te kunnen bekijken
Phnom Penh, June 15, 2009

Licensed under the General Public License (GPL)

Word Rule Reducer for Kleene Star:

--- T{a(xyz)*b}=T{ab}+T{axyzb}+T{axyzxyzb}

*/

class WordRuleReducerStar extends WordRuleReducer
{
    const WR1=1;
    const WR2=2;
    const WR3=3;

    function reduce(&$wordRules,&$wordRule,&$node,$i)
    {
        /*
            0..op-1 == $prefix
            op..i-1 == $operand
            i..i = $operator
            i+1..n = $suffix

            WR gets replaced by:
            WR1=polygonize((0,op-1),(i+1,n))
            WR2=polygonize((0,op-1),(op,i-1),(i+1,n))
            WR3=polygonize((0,op-1),(op,i-1),(op,i-1),(i+1,n))

        */

        if($i==0) $wordRule->error('missing operand for kleene star',$wordRule,$i);

        //seek operand start
        $op=$i-1;
        $nodes=array_values($wordRule->nodes->collection);
        $node=$nodes[$op];
        while(!$node->isNTCharRange() && !$node->isNTSubRule() && $op>=0)
        {
            $op--;
            $node=$nodes[$op];
        }

        if($op<0) $wordRule->error('cannot find start of operand for kleene star',$wordRule,$i);

        $n=$wordRule->countNodes()-1;

        $wr1=$wordRule->polygonize(array(
                         array(0,$op-1)
                        ,array($i+1,$n)
                        )
                    ,self::WR1);

        $wr2=$wordRule->polygonize(array(
                         array(0,$op-1)
                        ,array($op,$i-1)
                        ,array($i+1,$n)
                        )
                    ,self::WR2);

        $wr3=$wordRule->polygonize(array(
                        array(0,$op-1)
                        ,array($op,$i-1)
                        ,array($op,$i-1)
                        ,array($i+1,$n)
                        )
                    ,self::WR3);

        $wordRules->removeWordRule($wordRule);
        $wordRules->addWordRule($wr1);
        $wordRules->addWordRule($wr2);
        $wordRules->addWordRule($wr3);
    }
}
?>

 

The other Kleene operators will follow in the next version. I also still need to implement the next steps in the compilation process, that is, the step flatten(), the step which expands the embedded word rules; and, deduct(), the step that creates the state transitions. The entire chain is then:

compile():

  1. parse()
  2. reduce()
  3. flatten()
  4. deduct()

 

At the end of the chain, the compile() process should create the same Turing table as the one produced by Thompson's algorithm.

You can download cRE version 0.0.1 here.

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack