Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://users.easystreet.com/ovid/
AOL IM: ovidperl (Add Buddy, Send Message)

Check out my free CGI course [easystreet.com].

Grant Secretary and Steering Committee member for Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Friday April 21, 2006
05:31 AM

Ow! My unnormalized head!

[ #29398 ]

One Dawn Wulthius suggests that we stop normalizing our data.

Interestingly, I think her example is misguided, but not for the reasons one might think. Her example of "unnormalized" data is an example of a way one can define a set of data in a single attribute in an MV database. For example:

id | name  | email
------------------------------
1  | Bob   | bob@hotmail.com
------------------------------
2  | Sally | sally@hotmail.com
           | sally@yahoo.com

In that example, "email" is a multi-valued field which can contain a set (or possibly a bag) of email addresses (I do believe duplicates might be allowed, but I can't recall).

That seems to violate to first normal form (1NF), but this appears to cause problems in traditional databases for technical reasons, not mathematical ones. In a multi-value (MV) database, if you have a list of reports sent to specific email addresses, you can tie them to the particular email address you want and if someone deletes the email address, you can either throw an exception (because the report still exists) or have the equivalent of a cascading delete -- the report gets deleted.

This technique has several powerful benefits. Email addresses don't require a separate table, the query language can be simpler and many-to-many relations no longer require lookup tables. However, popular database systems don't support this (unless you do a lot of complicated work with rules or triggers).

I write that it seems to violate 1NF, but only because folks usually misunderstand 1NF. 1NF states that the domain of an attribute must contain single values. However, some (such as C.J. Date) argue that attributes don't need to contain indivisible values. A value can be a collection or an object. Traditional values allowed in attributes are not necessarily indivisible. A string can be broken down into characters. A date can be broken down into date parts. Integers can be broken down into prime factors. However, most would agree that dividing these types would be silly. So why do we rely on such arbitrary types? What is important is not that an attribute be indivisible; it's that the data contained in an attribute have a clearly defined type and operators which can act on that type (and in such a system, FK contraints which can work with user-defined types). Unfortunately, popular databases don't tend to support user-defined compound types very well and MV databases are merely a crippled implementation of the relational model.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Oh, good point. Is something like PostgreSQL's array type the proper solution here then?

    create table sometable (
       id serial,
       name varchar(50),
       email text[]
    );

    This would store a list of email addresses as a single obj

    • In other news, I'm surprised PostgreSQL doesn't have an "email_address" type. It seems to have everything else.

      You can create one easily enough with a CREATE DOMAIN statement. Here's a naive implementation:

      CREATE DOMAIN email_address AS TEXT
      CONSTRAIN
    • As Theory points out, you can create such a type. The problem is that if anything else needs to reference data within that type, Postgres does not allow any convenient way of doing so. This makes it pretty difficult to use Postgres for MV purposes.

      --
      Silence is Evil [easystreet.com]
      • PostgreSQL actually supports composite types.

        CREATE TYPE inventory_item AS (
            name            text,
            supplier_id     integer,
            price           n
    • How do you then update, insert, delete, or search for a single email address? If it makes things more complicated to deal with the data, I think I'd rather normalize. Hmm, looking at the postgres docs, searching for a value in an array:

      SELECT * FROM sal_e
      • The problem with CRUD on a single email address goes away if you have a language which supports that. SQL does not. While I confess that I don't remember my UniQuery (the query language for the MV database I used), imagine something like this:

        DELETE em
        --
        Silence is Evil [easystreet.com]