Posterous theme by Cory Watilo

Filed under: API

A Voyage Into the Universalization of Localization

This project is being developed alongside Dr. Kevin Scannell.

Problem

The current status of localization leaves much to be desired, to which any localizer who has ever had conflict with a website designer will attest. Presently, localization is an oddly local task; that is, translations are handled per-website. This leads to three issues:
  1. each website must implement its own, often decentralized, system for managing translations,
  2. translations lag behind that of the primary language (e.g., an English website’s new text may not see translation into Irish for some time), and
  3. websites without direct programmer intervention are left without any ability to be translated, even if users would find such functionality useful

These three aspects serve as significantly negative roadblocks to the continued effort to make the Internet more accessible.

Current Solutions

We are aware of only one current solution, Dakwak, which aims to localize websites into any of over 60 languages. Unfortunately, this neither alleviates the need for site developer intervention nor provides a more natural user interface through which to submit the translations.

Solution 

Our proposed solution to this problem is to create a global localization system: localize once, reap the benefits everywhere, which will be most likely interfaced through a JavaScript bookmarklet, presenting cross-browser, cross-platform usage. This service will incorporate a few primary properties:
  1. the user interface will allow in-place localization on the website itself, maintaining context and preventing the need to leave a familiar interface,
  2. translations will be stored in a centralized location based on a convenient fragment size, most likely sentences, minimizing the needless duplication that the localization world sees today, and
  3. users can localize any website without developer intervention, making the process one in which language communities can directly and effortlessly contribute to their web presence

Although we may use machine translation services such as Apertium to fill in the gaps, the current state of machine translation is such that the quality is not always high enough for fully readable text, bringing us to the focus on collaborative human translation.
 
Ultimately, in addition to making the web a more accessible place, the goal is to use the human-provided translations to improve today’s machine learning systems. Some considerations will need to be taken into account in the implementation of the interface itself depending on the intended system to train. Apertium, for example, uses rule-based translation, so it may be wise to provide an optional third step which would allow the user to identify the meaning of words not yet in the Apertium dictionary.

Solution 

This post is just intended to outline the general idea as it stands. I have more specific implementation details in mind and underway, so will touch back later to give more information. Any input is certainly welcome.

 

Upcoming Talks: Photography and Accentuate.us

Strange Loop: Strange Passions

On October 14, I'll be giving an extended version of my July 21st Perl Mongers s/2 years/5 minutes/ of Photography lightning talk for Strange Loop's Strange Passions track. Alex tells me we'll be getting our hands on video thanks to the lovely InfoQ, so I'll have that and slides posted as soon as it comes my way.

Be sure to also check out Matt Follett's Perl 6 Talk, which he'll also be giving at Strange Loop.

Saint Louis University Math/CS Club: Accentuate Us!

Alogo-128

Dr. Kevin Scannell and I will be giving a talk on November 10th to the Math/CS Club, starting around 4pm, to discuss Accentuate.us, a new system allowing you to type quickly and easily in over 100 languages. Video is not yet confirmed, but I'm looking into it.