... and hi to all the hardworking people at the NSA.
Sorry for distracting you, please continue on to the next intercept.
As far as I am concerned, English is ultimately defined as the language spoken in England. The rest of the British Isles and the other parts of the British Empire currently over here competing at the Commonwealth Games have decided to use it as well (except for the special case of Canadian, but more on that later).
But (to drop briefly into software analogy) 13 colonies in the new Americas, in their wisdom, got sick of the English project lead for unrelated reasons, mainly because they saw him as not quite a benevolent-enough benevolent dictator for life.
So they formed a break-away group, and then forked the language to create American. In order to differentiate their project from the original, they hired an expensive interface usability consultant to help make the user interface simpler and easier to learn.
A number of years later, in an unfortunate turn of events for the 30% of the world still using English 1.2.48 (build 1970 or so) the project team for American managed to implement the new fields of productised shrink wrap software with American 1.0 as a compulsory dependency.
This has resulting in a rediculous sitation in which "English" in almost any software product actually means the American fork of English.
For a long time we in the majority of English users have put up with this, because we were just happy that we weren't in the situation of the traditional chinese or something really esoteric.
And to be fair, in some situations it's better to use American. When it comes to APIs, such as module, method and global variable naming, I think on the balance it's better to give in to using American just to gain the consistency of a single API, rather than the long term edge cases and scaling problems you would get in an OO system having both WWW::Mechanize and WWW::Mechanise, or ->color and ->colour.
But we've had Unicode for long enough to be standard practice now, and the situation is now almost worse for English than before.
Highly popular software packages now come with your choice of 15 different languages, except English.
English English has fallen into a hole, where we all get annoyed by applications being in American, even if it's just different for 3 or 4 words in the interface, and despite it being a TINY change, nobody really could be bothered to put in the effort to create and maintain an entire language pack just to change 4 words. And in any case, the application will bundle American as well, just in case there are any words missing in the English translation of American.
To add insult to injury the website of my favourite editor, Ultraedit even lists the English language version of the application with specifically BOTH American and British flags on it, but provide only "English (American)" as well as "Spanish (International Sort)" and 6 other languages.
On emailing them about the situation I was informed "We have no plans to implement a British translation of the program". and yet it's probably only 7 or 8 words making the difference.
If the news presentors in Spain were all forced to speak in Mexican or Portugese, or French all cinema switched to Quebecious(sp?) the uproar would be huge.
And while here in Australia we are so close to British we don't really need our own translation, I pity more the poor users of Canadian, a language with only 20 million odd writers, that is a weird half-breed caught between English and American. They have an even smaller payoff to create translations than for the billions of people that use English instead of American.
But one thing is clear in all this.
There are too many applications needing to be translated from American to English, not enough people to do them, and a something of a lack of care-factor due to the small changes that would need to be maintained over time.
So perhaps it's time to look at something different.
Perhaps it's time to look at implementing automated American to English translation (and vice versa) and then integrating that into our internationalisation systems, so that every American program with internationalisation support gets an English and Canadian.
It's been one of those little things I keep meaning to look at properly, but never really had a good plan on how to do it.
The biggest problem I always faced in a general conceptual design was the data. Doing simple parsers or regex replacing things wouldn't be hard, but what do you use for data. It's often under restrictive copyright, or otherwise difficult to get hold of, install, locate, and manage. There always seems to be some problem with it.
And so I took another look at the problem today, and finally found a sane path to fix it.
To celebrate the 2006 Melbourne Commonwealth Games, currently coming to a close, and as a token of respect and a gift to our head of state Elizabeth the Second, by the Grace of God, Queen of Australia and Her other Realms and Territories, Head of the Commonwealth (visiting here for probably the last time) I thought it appropriate to take the first step in the battle to reclaim English.
Lingua::EN::VarCon is a Data Package (a distribution that logically ties a specific data product to a Perl namespace) that provides access to the VarCon database, originating from a wide variety of sources (see the largish copyright documentation section) and compiled and released by the Word List SourceForge project.
It provides a set of 5 tables in tab-seperated-columns format, which contain a number of lists of conversion data. Most notable is the ABBC dataset which contains the list of words that differ between the various written dialects of English, and how that word is spelled in English, American, and what I think is two different Canadian dialects.
The other tables contain various other bits and pieces such as "footpath" vs "sidewalk" type translations that might not be entirely safe to apply all the time.
Although currently it contains just a simple set of methods to locate the files, as part of commitment to the Battle for English I pledge to implement in the Lingua::EN::VarCon module new methods to help others get access to the dataset in whatever is the most optimal format for their use, be it BDB or in-memory hash or whatever else is needed.
To HRH Elizabeth II,
on the occasion of her visit
It seems the rebel colonials in America have gotten a little out of control, and are creating spelling confusion and inconvenience for all of the 1.8 billion peoples in the 53 countries that make up the Commonwealth of Nations.
For my part, I hope that this gift goes some way towards rectifying this situation.
And thank you for your continuing support, and your continuing willingness to stay entirely out of our way.
I have the honour to be, Madam, Your Majesty's humble and obedient servant.