Multilingual Taxonomies – Hedden Information Management

We know that taxonomies help information-seekers browse or search for desired documents/information. Taxonomies provide the bridge between the user’s choice of words and the wording within the desired documents. But what if the user actually speaks a different language than that of the content? Documents can be translated (automatically if it’s just to get the general meaning or by human translators when accuracy is important), but that’s only done after the document is found. To support the findability of foreign language documents what is needed is a bilingual or multilingual taxonomy (“bilingual” meaning in two languages, and “multilingual” meaning in three or more languages).

This Thursday, December 1, I will be presenting on the topic of multilingual taxonomies at the Gilbane Conference in Boston, were the focus is web and enterprise content management. This session, which will be shared with the co-speaker Ross Lehrer of WAND, appears to be only one in the conference dedicated to taxonomies and the only presentation with the word “multilingual” in its name. The topic will be of interest to both those concerned with multilingual content but with no experience with taxonomies and to those with an interest in taxonomies but no experience with multilingual content.

The description of the session (which I did not write) on the conference website says: “Multilingual content dramatically expands the potential market for your products, and multilingual taxonomies often need to be part of your multilingual strategy.” This description applies better to my colleague’s presentation, especially since the taxonomies that his company builds are product taxonomies. My presentation, on the other hand, addresses taxonomies for more than just websites of products, such as taxonomies for retrieving articles written in different languages.

The issue is whether the multilingual content is created and managed internally or externally to your organization. If your multilingual content is what your organization creates, such as additional language versions of a public website for a global market, then it is likely that the content in the different languages is managed internally but separately, by separate language teams. The content is similar but not identical in each language, and the taxonomies that support search and browse may also be created and managed separately. Having taxonomies in different languages, however, is not exactly the same as a “multilingual taxonomy.”

A good analogy would be a translated book. The book’s index should not simply be translated; rather a new index is created by an indexer, who is a native-language speaker of the translated language, based on the newly translated text. Consulting the original language index is fine, but directly translating it will have less than ideal results. Similarly, if you have a website translated into another language, and the website has a taxonomy for browsing for specific content pages, that taxonomy should not simply be translated, but rather a new second-language taxonomy should be created, consulting the first taxonomy, of course.

By contrast, a truly multilingual taxonomy connects users who speak one language to content that is in another language. There needs to be a one-to-one correspondence between terms across both languages, and the different language versions need to be managed together. It’s somewhat complicated to design and create, but software tools are available for this, and the result is a powerful aid to searching and browsing across languages. What is important is to match your multilingual taxonomy design to the specific goals, either (1) service in different language markets, each with their own language content; or (2) users being able to access content in a language which they don’t speak.