A Voyage Into the Universalization of Localization

2011-02-05

This project is being developed alongside Dr. Kevin Scannell.

Problem

The current status of localization leaves much to be desired, to which any localizer who has ever had conflict with a website designer will attest. Presently, localization is an oddly local task; that is, translations are handled per-website. This leads to three issues:

each website must implement its own, often decentralized, system for managing translations,
translations lag behind that of the primary language (e.g., an English website’s new text may not see translation into Irish for some time), and
websites without direct programmer intervention are left without any ability to be translated, even if users would find such functionality useful

These three aspects serve as significantly negative roadblocks to the continued effort to make the Internet more accessible.

Current Solutions

We are aware of only one current solution,Dakwak, which aims to localize websites into any of over 60 languages. Unfortunately, this neither alleviates the need for site developer intervention nor provides a more natural user interface through which to submit the translations.

Solution

Our proposed solution to this problem is to create a global localization system: localize once, reap the benefits everywhere, which will be most likely interfaced through a JavaScript bookmarklet, presenting cross-browser, cross-platform usage. This service will incorporate a few primary properties:

the user interface will allow in-place localization on the website itself, maintaining context and preventing the need to leave a familiar interface,
translations will be stored in a centralized location based on a convenient fragment size, most likely sentences, minimizing the needless duplication that the localization world sees today, and
users can localize any website without developer intervention, making the process one in which language communities can directly and effortlessly contribute to their web presence

Although we may use machine translation services such as Apertium to fill in the gaps, the current state of machine translation is such that the quality is not always high enough for fully readable text, bringing us to the focus on collaborative human translation.

Ultimately, in addition to making the web a more accessible place, the goal is to use the human-provided translations to improve today’s machine learning systems. Some considerations will need to be taken into account in the implementation of the interface itself depending on the intended system to train. Apertium, for example, uses rule-based translation, so it may be wise to provide an optional third step which would allow the user to identify the meaning of words not yet in the Apertium dictionary.

Solution

This post is just intended to outline the general idea as it stands. I have more specific implementation details in mind and underway, so will touch back later to give more information. Any input is certainly welcome.

← all posts

michael schade

I like learning new things. Previously: Kenchi founder, eng & ops teams at Stripe from 2012-2019. Say hi! 🏳️‍🌈

archive