Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Thursday September 05, 2002
07:36 PM

Asian vs. non-Asian

[ #7556 ]

I'm working on normalizing a legacy database for an international company and came across a weird problem. Someone decided that contacts for a company can just have a "name" field. It's not split up by first and last name, so we can't conveniently sort by last name. I'm aware that the family name frequently comes first for some Asians. What the heck do I do there? I need to move duplicate information from the office contacts and administrative users into a new table, but I don't know how to split up the names. Some of them have clearly "Westernized" their names, but who? Further, if I figure out which name is which, do I add some sort of boolean field to the table to allow the two names to be displayed in reverse? These are issues get more subtle than I ever realized and I don't have any convenient way to wade through this other than the anguished screams of users seeing their names displayed wrong. I feel sorry for the internationalization folks. I'm sure that they constantly have issues like this.

Following the XP "simplest thing that could possibly work" philosophy, I'm considering splitting the names into first and last and not trying to do any fancy tricks with the ordering. If someone's family name is their first name, it will still display as such. But what happens when they enter new information? I suppose we can just tell them that they input the data incorrectly, but no client ever likes to hear that. Hmm...

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Dealing with names is a mess. It's tempting to split things into "firstname, "initial", "lastname", but, as you've noticed, that falls apart pretty quickly, both for foreign names and for people who use honorifics.

    Consider using a single "fullname" field. If you're going to be using the database for generating form letters, add a "salutation" field. That way, "The Right Honorable Horatio Hornblower III" can be addressed as "Dear Horny,".
    • "Firstname" "initial" breaks down horribly for "J. David Blackstone."

      On the other hand, I always know it's a telemarketer when they ask for "David J. Blackstone."

      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
  • What you're dealing with is a mess. All you can do is piece it together the best that you can, and allow the end user to edit and make changes to their information. Those who care will make the edits, and those who don't won't. At that point, if the user doesn't like seeing the information the way it's presented, it's their own damned fault.

    I've come to believe that most people are pretty understanding when it comes to "computers" screwing up their names. It's when the situtation can't be corrected th


    If things get any worse, I'll have to ask you to stop helping me.

  • Even here in Non-Asia, going from someone's full name to the their family name is not easy.

    For example, many names of the form "X Y Z" are first name X, middle name Y, last name Z (like Sean Michael Burke)
    But many X Y Z names (typically in Latin America) are first name X, primary family name Y, and second (mother's) family name Z, like Mario Vargas Llosa -- he's Mr. Vargas Llosa (his father was Ernesto J. Vargas Maldonado, his mother was Dora Llosa Ureta). So:

    Sean Michael Burke -> Burke, Sean Michael
    Mario Vargas Llosa -> Vargas Llosa, Mario

    But wait, there's more! I've heard that when alphabetizing Icelanders' names, you alphabetize by the first name. That is, "Björk Guðmundsdóttir" [] is sorted under B -- but I think she's still Ms. Guðmundsdóttir, not Ms. Björk.

    An adaptation of what librarians do when cataloguing, would be for you to add a "_" before the start of the last name so you know where the sortable stuff starts. But I don't think that tells you how to turn the name into [Title] [Lastnamepart]. I.e., if you tag Wen Ho Lee as "_Wen Ho Lee", that doesn't tell you whether it's Mr. Wen or Mr. Wen Ho -- it just tells you to sort under W. God only knows what we'd do if he had a son with the same name: Wen Ho, Jr., Lee?