Rainforest

Sankuru

Implémenter, personaliser, étendre et réparer Joomla/Virtuemart

Views: 1003

Nous vous aidons avec ...

Virtuemart
Joomfish
Autres extensions
SocialTwist Tell-a-Friend

Traduction automatique

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Re-utilisons des sources libres

Les logiciels dont vous avez besoin, éxistent souvent déjà en source libre, et couvrent vos besoins à 80%. Nous ajouterons pour vous les 20% qui manquent.

Devis gratuit

Demandez gratuitement un devis aujourd'hui.

The EAV problem is inherently a compiler construction problem PDF Imprimer E-mail
Note des utilisateurs: / 0
MauvaisTrès bien 
Écrit par erik   
Lundi, 25 Mai 2009 03:34
There are no translations available.

Changing EAV source code cannot be avoided

A good number of widely used Php applications draw an important part of their flexibility from the use of EAV databases. Examples of this include Magento and Sobi2.

Users clamour for this kind of flexibility.

Therefore, instead of hard fixing the fields in a product record, e-commerce platforms such as Magento allow the user to create user-defined fields.

For example, companies selling used cars, can add year of make, total miles, and other user-defined attributes to the product record, without having to change the source code for the application.

Not having to change the source code is a Good Thing (tm). But then again, it is not easy to achieve.

For example, even user-defined fields need to be validated. You can demand that a field must mandatorily be filled out, must be a valid date, is the start date of a period and must lie before the end date for that period, and so on. A good application framework will automate this kind of low-hanging fruit.

But what about the more complex and idiosyncratic stuff that invariably pops up? The Social Security Number for a person registered in New Zealand must be modulo-97 valid. The transaction date for a transaction may not lie before the date the account was opened. You probably came across a few examples of your own.

So, ok, adding code, and therefore, changing the source code is now unavoidable. How easy is it? I will show why -- without special measures -- changing an EAV-based application will inevitably become mind-numbing hard:

magentosites.com: Developing for Magento "Nothing turns the easy into impossible like Magento" is an endearing tag line given to Magento by developers who've wasted hours trying to make even the simplest of changes to the code.

 

An example EAV database

Say that D is a database, from the relational database space DD containing all relational databases. Deav is one particular database in the space DD. In that case eavD() is a function that projects any database D in the space DD unto Deav.

Deav is a database with the following tables:

entities(ename)
attributes(ename,aname)
values(ename,rowid,aname,value)

We take as an example, the following database D, with one table mytable:

mytable(mycol1,mycol2,mycol3):

mytable(aaaaa,XXX,123)
mytable(bbbbb,YYY,8766)
mytable(ccccc,ZZZ,6766)

eavD(D) is:

entities(mytable)
attributes(mytable,mycol1)
attributes(mytable,mycol2)
attributes(mytable,mycol3)
values(mytable,1,mycol1,aaaaa)
values(mytable,1,mycol2,XXX)
values(mytable,1,mycol3,123)
values(mytable,2,mycol1,bbbbb)
values(mytable,2,mycol2,YYY)
values(mytable,2,mycol3,8766)
values(mytable,3,mycol1,ccccc)
values(mytable,3,mycol2,ZZZ)
values(mytable,3,mycol3,6766)

 

An example EAV query

Say that Q is a valid query in the query space QQ associated to D, that is, QQ(D). The result for the query Q is R, which is a member of the result space QR associated to D and Q, that is, QR(D,Q). 

We take as an example, the following query Q:

select col1,col2,col3 from mytable where col1='aaaaa' order by col1

The result is QR(D, Q).

If we want to obtain the same result from Deav, we need to run the query Qeav:

select
  V1.value
, V2.value
, V3.value
from values as V1
left join (values as V2, values as V3)
on V1.rowid=V2.rowid
and V1.rowid=V2.rowid
where
     V1.ename='mytable'
and  V2.ename='mytable'
and  V3.ename='mytable'
and  V1.aname='mycol1'
and  V2.aname='mycol2'
and  V3.aname='mycol3'
--
and  V1.value='aaaaa'
order by V1.value

The result is QR(Deav,Qeav).

If both queries Q(D) and Qeav(Deav) are equivalent, then QR(D,Q)==QR(Deav,Qeav).

Therefore, we now seek to generally construct the function eavQ(Q) that projects the query Q on its equivalent query Qeav.

Qeav=eavQ(Q) as such that QR(D,Q) == QR(eavD(D),eavQ(Q)).

How do we construct the function eavQ(Q)?

 

Mixed EAV databases

Many EAV databases are not entirely EAV. The most important fields will not be in EAV style. Only the custom user fields will. Therefore, some columns in the same table will be relational and other columns in EAV style.

Say that col2 in mytable is relational while col1 is in EAV style. In that case, we only need to fix the occurrence of col1 in the following query:

select col1,col2 from mytable where col1='aaaaa' order by col1
 

When we replace col1 by its EAV counterpart subexpressions, we get:

select
    V1.value
    ,col2
from
    mytable
    left join value V1
    on V1.rowid=mytable.rowid
where
        V1.ename='mytable'
    and V1.aname='col1'
     and V1.value='aaaaa'
order by V1.value

It is definitely possible to replace just a subset of the columns with their EAV counterpart subexpressions.

This means that the function eavQ() can start from the existing relational query and replace EAV columns one by one, until all EAV columns have been replaced with their EAV counterpart subexpressions.

 

Building the function eavQ()

The function eavQ() accepts as input any valid query Q from the query space QQ associated with database D, that is, QQ(D). The function then returns the query Qeav from the query space QQeav associated with database Deav, that is, QQ(Deav), in such a way that the result sets QR(D,Q) and QR(Deav,Qeav) are equal.

The funtion eavQ() does not need to completely parse and understand the input query Q. It only needs to understand Q sufficiently enough, to be able to replace every expression containing EAV columns with their counterpart subexpressions. Therefore, instead of parsing Q in its full SQL glory, we can probably simplify the grammar for the input query Q, by blackboxing the typical SQL expressions:

GRAMMAR

<select>::=
select <expressions>
from <expressions>
[ left | right ] [ join ] <expressions> on <expressions>
[ where <expressions> ]
[ group by <expressions> ]
[ having <expressions> ]
[ order by <expressions> ]

<expressions>::= {<expression>}(,)
<expression>::=<tokens> || ( <select> )
<tokens>::={<other>}(,)

TOKENS

select
from
where
group
by
having
order
left
right
on
','
<other>

STRINGS

" {char|\"} "
' {char|\'} '

COMMENTS

// {char|\<nl>} <nl>
/* {char|*/} */

According to this grammar, the expressions are blackboxed, but can still contain SQL queries ("subqueries").

As you can see, the function eavQ() is a rewrite rule that needs to parse the input query, in order to produce the equivalent EAV output query. It maps the query space QQ(D) unto the query space QQ(eavD(D)).

The function eavQ will make live simpler for the developer. Instead of having to figure out the EAV query by himself, he can use the function eavQ() to do the rewrite for him:

results=query(eavQ(Q));

This problem is a compiler construction problem. Creating the function eavQ(Q) will solve the issue of developer mental overload, and give EAV source code a much more normal appearance.

 

The EAV performance issues

However, the function eavQ() will not solve all problems associated with EAV databases.

For example, you can see that EAV queries must massively join over the same table. The database server may balk at this and refuse to execute queries that are more complex than the maximum. Even if the database server is still willing to execute the potentially enormous EAV query, it will be subject to massive performance penalties.

Magento is notoriously slow. There are indeed good EAV reasons for that.

The problem may be alleviated partially by indexing the tables judiciously, and tricks like that; but in the end, you may expect your database server to succumb under the pressure of massively joining for each additional field that you need in the result set.You will simply need to match the growing load with exponentially (combinatorially) more hardware.

Now, it is still useful to create the eavQ(Q) function.

A project may historically be heavily invested in EAV, and be close to throwing in the towel and giving up in the battle against the runaway complexity that results from the need to construct and maintain EAV queries manually.

No matter how many resources and no matter how much money you throw at an EAV application, you will invariably hit the point at which yet more resources and yet more money will no longer help.

Instead of embarking on a complete application rewrite, however, the eavQ() function will make it possible to re-factor the source code gradually, and get rid of its excess EAV complexity. It will be possible to keep maintaining the application, without losing control over the associated maintenance costs.

Therefore, re-factoring the source code using a eavQ() function, is definitely the cheaper solution.

As I have time, I will try to develop in this blog, the underlying support functions, that is, a minimal compiler construction toolkit for Php, that will make it possible to eventually create the eavQ() function.

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack