John Chamberlain
Developer Diary
 Developer Diary · You Heard It Here First · Monday 12 January 2004
Database Technology Stuck in the 70s
When I was in graduate school ten years ago part of the curriculum was the ancien science of database systems. This course is traditionally taught by the least exciting member of the faculty. We pondered through useless theoretically excursions such as the "relational algebra". Today while the rest of technology jets ahead database theory remains mired in the 1970s. When I was doing some database work with Oracle a couple of years ago I kept having the nagging urge to put on the Bee Gees. When asked how it was going I would say "Groovy, man".

Of course there is some movement. Object-oriented databases have become viable. This feature is not really changing the landscape of data storage. In general databases have not really progressed past the basics of the relational model set down in the 1970s. You could backend your database Paradox on a PC and not even know the difference. The object-oriented has proved too slow and complex to go anywhere. Making efficient queries seems to be a problem.

My own feeling is that we should step backwards and forget the relational model which was just a dumb idea as far as I am concerned. I think this is what is retarding growth in databases is the continuing use of this essentially flawed and ungrowable scheme. In my mind it makes much more sense to switch to the hierarchical model. Indeed, the first database designs were hierarchical. For example, IBM's IMS standard. The problem was that hierarchical DBs never developed a good query language whereas the relational model had an easy-to-understand query syntax. For example, in Oracle's current hierarchical database support one uses a "CONNECT BY" clause, but this crude add on does not allow hierarchical data to be accessed efficiently.

The lack of a good hierarchical method is a real modern failure. Most data is in the form of trees. Think file system, table of contents, message board, balance sheet--it's all trees. So, translating all this data to a relational scheme is expensive and restrictive and never really works well. There has been some advances in tree manipulation in a different field: XML. To use style sheets you have to merge two trees. This merge involves a complex set of rules with one incredibly abstruse rule at its center. Here it is:

             The node N matches a pattern P if and only if there is a node A 
             that is an ancestor-or-self of N, such that evaluating P as an 
             expression with A as the context node returns a node-set 
             that contains N.

Got that? Now you know why XLST has not caught. Using it requires understanding the above sentence. The problem with the hierarchical queries is the same: how do manipulate them with an easy language? Is such a language even possible? I think there must be a way because this is how we naturally order things so there must a natural way to query them in the same structure. Maybe some day someone will create this language and databases can catch up with the rest of technology.

return to John Chamberlain's home · diary index
Developer Diary · about · · bio · Revised 12 January 2004 · Pure Content