Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Friday April 21, 2006
04:31 AM

Ow! My unnormalized head!

[ #29398 ]

One Dawn Wulthius suggests that we stop normalizing our data.

Interestingly, I think her example is misguided, but not for the reasons one might think. Her example of "unnormalized" data is an example of a way one can define a set of data in a single attribute in an MV database. For example:

id | name  | email
------------------------------
1  | Bob   | bob@hotmail.com
------------------------------
2  | Sally | sally@hotmail.com
           | sally@yahoo.com

In that example, "email" is a multi-valued field which can contain a set (or possibly a bag) of email addresses (I do believe duplicates might be allowed, but I can't recall).

That seems to violate to first normal form (1NF), but this appears to cause problems in traditional databases for technical reasons, not mathematical ones. In a multi-value (MV) database, if you have a list of reports sent to specific email addresses, you can tie them to the particular email address you want and if someone deletes the email address, you can either throw an exception (because the report still exists) or have the equivalent of a cascading delete -- the report gets deleted.

This technique has several powerful benefits. Email addresses don't require a separate table, the query language can be simpler and many-to-many relations no longer require lookup tables. However, popular database systems don't support this (unless you do a lot of complicated work with rules or triggers).

I write that it seems to violate 1NF, but only because folks usually misunderstand 1NF. 1NF states that the domain of an attribute must contain single values. However, some (such as C.J. Date) argue that attributes don't need to contain indivisible values. A value can be a collection or an object. Traditional values allowed in attributes are not necessarily indivisible. A string can be broken down into characters. A date can be broken down into date parts. Integers can be broken down into prime factors. However, most would agree that dividing these types would be silly. So why do we rely on such arbitrary types? What is important is not that an attribute be indivisible; it's that the data contained in an attribute have a clearly defined type and operators which can act on that type (and in such a system, FK contraints which can work with user-defined types). Unfortunately, popular databases don't tend to support user-defined compound types very well and MV databases are merely a crippled implementation of the relational model.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Oh, good point. Is something like PostgreSQL's array type the proper solution here then?

    create table sometable (
       id serial,
       name varchar(50),
       email text[]
    );

    This would store a list of email addresses as a single object. Or have I missed the point again?

    In other news, I'm surprised PostgreSQL doesn't have an "email_address" type. It seems to have everything else.

    • In other news, I'm surprised PostgreSQL doesn't have an "email_address" type. It seems to have everything else.

      You can create one easily enough with a CREATE DOMAIN statement. Here's a naive implementation:

      CREATE DOMAIN email_address AS TEXT
      CONSTRAINT ck_email_address CHECK (
         VALUE ~ '^[^@]+@[^@]+$'
      );

      Modify the regular expression as needed to increase its accuracy. Then you can use it like any other PostgreSQL data type:

      CREATE TABLE person (
        id SERIAL PRIMARY KEY,
        name text

    • As Theory points out, you can create such a type. The problem is that if anything else needs to reference data within that type, Postgres does not allow any convenient way of doing so. This makes it pretty difficult to use Postgres for MV purposes.

      • PostgreSQL actually supports composite types.

        CREATE TYPE inventory_item AS (
            name            text,
            supplier_id     integer,
            price           numeric
        );

        CREATE TABLE on_hand (
            item      inventory_item,
            count     integer
        );

        SELECT (item).supplier_id FROM on_hand;

        It sounds like PostgreSQL even automatically makes a composite type for each table

    • How do you then update, insert, delete, or search for a single email address? If it makes things more complicated to deal with the data, I think I'd rather normalize. Hmm, looking at the postgres docs, searching for a value in an array:

      SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
                                  pay_by_quarter[2] = 10000 OR
                               

      • The problem with CRUD on a single email address goes away if you have a language which supports that. SQL does not. While I confess that I don't remember my UniQuery (the query language for the MV database I used), imagine something like this:

        DELETE email
        FROM   customer
        WHERE  name  = ?
          AND  email = ?

        In other words, DELETE would operate on more than just tuples.

        How about searching for a particular email? Well, you show some sample syntax above.

        There are plenty of oth