To my chagrin, in a data modelling meeting today I found myself asking "do we know who Batman is?" Lamentably, this bit of silliness is a rather vexing issue for the BBC. More properly, it's a vexing issue for the system on which I work.
If you watch iPlayer (something you can't do outside the UK, due to licensing issues), you may have wondered where they get their data. I don't know the full source of their data, but I think it's safe to say that most of it comes from our system, PIPs ("Programme Information Platform" -- I've no idea how that 's' tagged along). We track programs for the BBC. Programs, as you may be aware, are often on television and they often have actors, news anchors, cocaine-snorting children show presenters and so on. Being a properly agile team, we did the least amount of work necessary to model them in our system (i.e., not much). However, now that we have to model them more thoroughly, the true headache of the issue comes into play.
There is another team in the BBC which tracks people and we don't want to duplicate too much of their work, but we do have to have a bit of information in our system so that when Paul David Hewson appears on a programme (sic), people need to know that this is actually the philanthropist/singer Bono. We also have to know that if a character named "Starbuck" shows up, if it's Starbuck from the first Battlestar Galactica, the gender-reassigned Starbuck or the first mate of the Pequod. And clearly two of those are related, but should they even represent the same character?
Names are fun, too. We might have Madonna Louise Ciccone in our system but we had better not present her as Madonna Louise Ciccone. She's Madonna. And while we might refer to Mr. Scott Thompson, or Mr. Thompson for short, referring to him as Mr. Carrot Top is the height of silliness.
There's also the troubling issue of Elizabeth the Second, by the Grace of God, of Great Britain, Ireland and the British Dominions beyond the Seas Queen, Defender of the Faith. Of course, that's a bit wordy to read over and over, so we can refer to her as "Her Royal Highness, the Queen", or "Queen Elizabeth" in a pinch. I expect that seeing her referred to in print as "Ms. Windsor" is going to get a few people fired. There are special rules for handling the titles of royalty and peerage and I don't even know what peerage is, much less what those rules are. With luck, though, our editors won't automatically correct "Queen" to "Queen Elizabeth" and have her laying 2000 eggs a day.
And then there's Mr. Yao, also known as Yao Ming, a famous basketball player. Since Chinese names generally have the family name first, referring to him as Mr. Ming is wrong. But this is based on the name, not the nationality of the individual. I doubt many people refer to Jackie Chan as Mr. Jackie. Well, at least not to his face.
We also have the question of "roles". What role does someone play? Though some might vehemently deny it, William Shatner is, in fact, an actor. But some people don't have such easy roles. What about a news story that covers a criminal investigation? While Dominic Tenney might have been an "alleged rapist", that would be a rather litigious note to list next to his name now that we know he was falsely accused. So there are legal issues we have to contend with as well.
And what about biographies? Victoria Beckham is also known as Posh Spice but it's arguable that they should not share the same biography because depending upon the name you are using, you may be focusing on a different part of their life.
And have I mentioned Welsh, Gaelic, and other language translations?
These issues, and more, is what led us to sit around a table today and seriously discuss whether or not we knew who Batman really was. Data modelling is hard.
1. Actually, we track "programmes." I spell it the proper American way but for some inexplicable reason, most of my colleagues appear to be British and insist upon mangling the language and I suppose I should humour them.