Posts Tagged ‘internationalization’

The Trouble With hCard and Microformats in General

Tuesday, April 15th, 2008

If you haven’t already heard of Microformats, they are a set of simple, open data formats built upon existing and widely adopted standards. Basically, they allow you to use a combination of class names to mark up data in your page along the same lines as existing data formats. So for example, you’d mark up contact info on a page using class names based on the popular vCard format. The resulting markup would be an hCard. In essence, Microformats aren’t in the business of reinventing the wheel, they’re all about reuse of existing patterns, and therein lie their genius.

<span class="tel">
    <span class="type">home</span>:
    <span class="value">+1.415.555.1212</span>
</span>

An example of an hCard from: http://microformats.org/wiki/hcard

There is however, one glaring problem with Microformats and I ran into it head on when I was marking up a page in French. Since property values need to be in the clear (in this case, “home”), and those values need to follow an established format (in this case vCard), you can’t use any other language but English. Yep, you heard me, Microformats (at least hCard) are English only. So much for i18n.

<span class="tel">
    <span class="type">Téléphone</span>:
    <span class="value">+1.415.555.1212</span>
</span>

An example of an invalid hCard due to a property value in French

I think that the defining difference between hCard and vCard is that the former needs to have its property values in cleartext whereas I don’t think the latter does. In other words, you can exchange vCards containing English property values using Japanese applications and since the vCard is simply a file, the Japanese app can open it up, read the properties in English and then display them with Japanese labels.

I tried discussing the issue with the Microformats community, but the results were far from conclusive. I was told that the abbr property could be used to remedy this situation, however the spec doesn’t discuss this in terms of i18n. Rather the intended use of abbr is to differentiate between human and machine readable formats of the same content. For example dates:

<abbr title="2008-04-15T00:00:00">April 15th, 2008</abbr>

If I’m way off base here or Microformats have evolved since to include i18n, please by all means let me know in the comments.

Worry Free JavaScript Internationalization (i18n)

Sunday, April 13th, 2008

There’s a right way and a wrong way to go about i18n. Just to be clear, I don’t subscribe to the idea of right vs. wrong unless there’s a good reason for it, otherwise I just chalk it up to preference. Sometimes I consider something as a matter of preference until someone points out a compelling argument either for or against it. It’s with that perspective that I address the issue of i18n.

I’ve often found that when first confronted with the need for i18n, the temptation is to do something like this:

var lang = getLang();
var msg = "";
if (lang === "en") {
    msg = "Hello world!";
} else if (lang === "fr") {
    msg = "Bonjour Monde !";
}

That will work fine, but what happens if there’s a bunch of text throughout the app that needs i18n? That’s a whole lot of if/else blocks. And what happens if you suddenly have to support three, four or fifteen languages?

Object literals to the rescue! JavaScript has this wonderful little thing called the object literal represented by a set of brace brackets: {}. It creates a singleton object that can be nested within other objects and can contain pretty much anything. The syntax is very straight forward, all you need are key/value pairs separated by commas:

var data = {
    helloworld: {
        en: "Hello World!",
        fr: "Bonjour Monde !"
    }
}

In this case we have an object being assigned to a variable named data. That object in turn contains another object named helloworld. Finally, helloworld contains two strings, one named en and the other fr. These values can now be accessed like so:

msg = data.helloworld.en;

Of course what we need is to be able to dynamically access the language node in our dataset. This is where index notation comes to the rescue. So far we’ve accessed our data via dot notation, but it’s also possible to access data via index notation like so:

msg = data.helloworld["en"];

I’m sure you can see where this is going. Now that we can specify the last node of our dataset with a string value, all we need to do is substitute it with a variable.

var lang = getLang();
msg = data.helloworld[lang];

Here, getLang determines what the current language setting is and returns a string value accordingly. Once that string value has been received, it can be placed into the index portion of our data object and voila! No more if/else logic, and this technique can be used to support an infinite number of languages without ever having to modify the code itself. You want Spanish? Just add an sp node to your dataset. That’s it, that’s all.

Enjoy!

Update: I neglected to mention that I implemented this solution in the context of a Yahoo! Widget where all of the data is stored locally on the desktop. This solution doesn’t make much sense in the context of a website since you’ll be sending way too much data that won’t be used down the pipe. Thanks to AB for pointing out my oversight.