Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Thursday October 20, 2005
08:02 PM

Databases Suck

[ #27261 ]

Anyone who has worked with databases long enough knows that the relational model does not map well to the real world. Further, the necessity of database normalization is not something inherent to the data; it's mostly (not entirely) an artifact of the relational model. For example, how do you store customer addresses? Two database might store it completely different ways yet both be normalized. The common example is to do something like this pseudo-code:

create table customer {
  id ...
  first_name ...
  address ...
  city_id ...
  state_id ...
  etc ...
)

Now let's say that you're having a sale and you want to send fliers to all of your customers saying "We're having a sale. We hate you and we don't want you to show up." Getting their addresses is easy.

Now business two is also having a sale but they want to send out two sets of fliers. The first set tells their customers that they hate them and don't want them to come and the second set is to all addresses for which they don't have a customer. Now they have problems if they're using the above table. They have to have a separate addresses table and the customer table looks like this:

create table customer {
  id ...
  first_name ...
  address_id ...
  etc ...

The problem here is that business knowledge is required to properly know how to normalize the database. For the second business, the address is not dependent on the customer, unlike the first business. In other words, the database reflects domain knowledge but you, the poor fool having to maintain it, can't tell from looking at a database schema if it's really reflecting business knowledge or if it's a mistake.

This is sort of silly. For the first business the addresses exist in the real world whether or not they have customers there. Further, their database is junk if a customer has a summer home and a winter home. Even then they might be facing performance problems and have gone with a deliberate decision to denormalize parts of the database. Despite what "purists" argue, sometimes this has to be done (my last company had an Oracle performance expert come in and finally admit that he couldn't speed up their billion row table which relied on aggregated custom data types).

Object databases and multivalue databases can get around many of these problems by allowing data to be arranged in a heirarchical fashion instead of a relational one (though they may use a relational database under the hood). By doing this, if properly designed, they trade some performance for better real world modeling. But there's still a problem. Consider a table where we store what a customer likes (more pseudo-code).

create table customers (
  id ...
  customer_name ...
  likes (multivalue, fk likes.id) ...
)

create table likes (
  id ...
  thing ...
)

With that sort of multivalue relationship, we can fetch everything John likes very easily:

SELECT @likes FROM customers WHERE customer_name = 'John'

We can also find out everyone who happens to like something John likes:

SELECT @likes
FROM   customers
WHERE  customer_name != 'John'
  AND  @likes IN (
    SELECT @likes FROM customers WHERE customer_name = 'John'
)

So far, so good. We've gotten a decent amount of power from violating first normal form. However, there's still a problem. These databases (from what I've seen) usually constrain relations on type, not capabilities. How do we say that John "likes" another customer? There's a very good chance we can't do that. The likes table might have a list of customer ids or it might have a list of product ids, sport ids, etc. Because these technologies seem to restrict things tightly based upon types (and it's tough to introduce an FK constraint on multiple tables on the fly), we still can't model the real world terribly well.

I suppose one could actually get something like this to work by reaching into the system tables or something, but it would be difficult. Further, if we tried to do something like this, I suspect that the slow object or multivalue databases would be even slower. Moving one step closer to reality is one step further from an application that responds.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I am stuck with an Oracle database that I can't change, has spaces filling up every table regardless of what is actually in the table and everything is a CHAR.

  • The real world is not hierarchical. Some parts of it are, some aren't. A system like MV which only lets you express hierarchical relationships is not going to be a better model of the real world than a SQL system.

    Your schema reflects your business rules. If you require that all customers have an address, there should be a constraint to this effect. If customers can have more than one address, I'm sure you know quite well how to model that as well.

    Also, SQL is not relational. It's a poor second cousin o
  • I forgot to mention this though I don't know how I forgot.

    All tables (all 1500 of them); every field is NOT NULL. Every field.

  • Anyone who has worked with software long enough knows that the relational model does not map well to the real world.

    Then pick any of the following:

    s/relational/procedural/
    s/relational/object oriented/
    s/relational/functional/
    s/relation/logical/
    s/relation/aspect oriented/
    s/relational/hierarchical/

    Continuing this list is left as an exercise for the reader :-)

    Models are not the real world. They're an abstraction chosen to make a particular problem domain easier to manipulate. Sometimes people pick the w