Rainforest

Sankuru

Implementeren, customiseren, uitbreiden, en troubleshooten van Joomla/Virtuemart

Views: 878

Wij helpen met ...

Virtuemart
Joomfish
Andere extensies
SocialTwist Tell-a-Friend

Automatische vertaling

English Arabic Chinese (Simplified) German Japanese Russian Spanish



Hergebruik open source

Datgene wat U nodig hebt, bestaat vaak al, en dekt 80% van Uw behoeften. Wij zorgen voor de ontbrekende 20%.
What's both right and wrong with relational databases PDF Afdrukken E-mail
Waardering: / 0
SlechtZeer goed 
Geschreven door erik   
zondag 18 oktober 2009 03:10
There are no translations available.

 

The main alternative to relational databases, is storing data in a large associative arrays (hashes). Some people call them object databases, other people call them object-relational databases, and others have other all kinds of different names for them. The strategy invariably amounts to storing (key,value) tuples in one way or the other. A system built on storing (key,value) tuples can indeed be massively faster in some ways.

 

Example 1: order processing

Imagine you store customer records, with associated sales orders. Each sales order contains the quantities for the products in the sales order. The database could look like this:

customer1
--order1
----product3,qty=35
----product7,qty=50
--order5
----product1,qty=10
----product9,qty=5
----product12,qty=3

customer5
--order3
----product1,qty=5
----product3,qty=20
----product4,qty=10
--order4
----product1,qty=3
----product8,qty=15

You also store supplier records, with associated purchase orders. Each purchase order contains the quantities for the products in the purchase order. The database could look like this:

supplier1
--order1
----product3,qty=100
----product7,qty=200
--order3
----product1,qty=50
----product8,qty=75

supplier2
--order2
----product3,qty=60
----product14,qty=40

The current inventory itself would look like this:

product1,qty=32
product2,qty=12
product3,qty=87
product4,qty=12
...

Stored in an object database, it would be possible to implement very fast lookup methods. For example:

customer=db.getCustomer(5)
supplier=db.getSupplier(2)
productInventory=db.getProduct(3)

The call db.getCustomer(id) would return the customer object with all its associated sales orders. The call db.getSupplier(id) would return the supplier object with all its associated purchase orders. The call db.getProduct(id) would return the product record and the current inventory. These calls would operate very fast. The object database would massively outperform the relational database.

However, now we need reports in the following style:

Product3
-inventory          +87
-purchase orders
--supplier1,order1 +100
--supplier2,order2  +60
-total             +160
-sales orders
--customer1,order1  -35
--customer5,order3  -20
-total              -55
grand total        +192

Producing this report, requires walking through all purchase orders and all sales orders. The existing methods would be very slow.

Of course, people who build object databases, will say that they may create indexes to speed up things. It can obviously be done. However, it would require additional logic to maintain these indexes in additional programming, when saving a customer or a supplier object. It would also require to create new methods to retrieve the data in an efficient manner. In fact, each new access strategy would either be massively slow, or else require new indexes and lots of additional programming to speed it up.

 

Example 2: web search

For example, it will be very fast for Google to give an answer in the style of:

keyword
-url1
-url2
-url3
...

However, in order to make this work fast, they probably stored it in an object database. Consequently, producing the following report is either impossible, or else requires massive additional programming:

url
-keyword1,position=12
-keyword2,position=3
-keyword3,position=45
...

Any adhoc search across keywords, will probably be almost impossibly slow to achieve. Google does offer a rudimentary service to look at the keywords related to a site at http://www.google.com/sktool. However, it does not return much data and it is way less impressive than what you get using their primary data access strategy. Google is sitting on a massive database connecting urls to keywords. However, besides returning pages with urls related to a particular set of keywords, Google seems to be unable to re-use the same data in other and probably also interesting ways.

 

Example 3: Email


Let's look at another example. Mailboxes could look like this:

user1
--received
----email1,from=99
----email5,from=12
----email20,from=14
----email35,from=99
--sent
----email2,to=11
----email8,to=18
----email14,to=99
----email38,to=12

user2
--received
----email3,from=98
----email4,from=99
----email18,from=11
----email22,from=97
--sent
----email4,to=98
----email7,to=12
----email19,to=99
----email25,to=11

Mail systems usually store their email in mailboxes, neatly separated per user. It makes sense to do this, because that would be the primary data access strategy for email.

Imagine, however, that user1 and user2 work together to answer each other's email. One of the views they would be interested in, is:

to and from: 99
--received
----email1, to user1
----email4, to user2
--sent
----email14, from user1
----email19, from user2

Again, it would take quite a bit of effort and additional programming for the system to search across all users' mailboxes to assemble such view. Note, however, that such view could be useful. It is impossible to exclude beforehand that a new way for looking at the same data, will not be useful. For mailboxes stored in a relational database, this view would take not more effort to assemble, than assembling the primary view. When using an object database, however, it becomes hard to do, and the results will be slow to use, requiring all kinds of tricks to keep the system usable in terms of performance.

 

Conclusion

Object databases favour one particular data access strategy to the detriment of all others. While this approach indeed allows massive scaling for that one particular primary data access strategy, it creates large problems for all alternative access strategies:

speed/complexity of programming:

                        primary              alternative
object database         very fast/easy       very slow/complex
relational database     medium/medium        medium/medium


Establishing another view on the same data, is relatively easy for relational access strategies, while it is quite complex for object databases. In other words, maintaining an application built on top of an object database will be hard. Whatever new requests the user comes up with, if they don't fall within the perimeter of the primary data access strategy, they will cost an inordinate amount of effort to implement.

Unless the storage requirements or performance requirements for the primary data access strategy are really extreme, such as in the case of Google storing (keyword,url) tuples, relational databases offer the better compromise between performance and the ability to look at the data through a different data access strategy.

 


blog comments powered by Disqus
 
 
Joomla 1.5 Templates by Joomlashack