Google's translation agenda

Haven’t had time to do the proper checking, but Google invited journalists to discover what’s cooking in their development kitchen last week, in the wake of the underwhelming Personalization launch. Apparently translation automation was on the menu, according to this report:

Officials from Google also announced that the company is working on a translation program. “Historically, the approach to building machine translation systems is to have expert machine linguists write down dictionaries and rules on how to translate, say, from Chinese to English” said researcher Franz Och. “Trying to write down all the rules on how to translate from Chinese to English is very hard.”

Instead, Google is fine-tuning a translation program that can automatically translate back and forth between documents in different languages. All the languages of the United Nations will be supported.

If true, this language spread means English, French, Spanish, Russian, Chinese, and Arabic. Pity that Franz Och can’t work on his native German.

A similar story put it slightly differently:

Instead, Google is fine-tuning a translation program that can automatically translate back and forth between documents in different languages — a sort of virtual Rosetta Stone.

That strange instead again. Going back and forth between languages is surely what translation automation programs do anyway, however they are crafted. The over-used Rosetta Stone meme is the wrong metaphor for translation as process (a carved stone only displays the results of an act of translating), and misleading in a world in which translation will appear as effortlessly instantaneous - you won’t ever need to see the original language alongside your translated version.

Although it’s still early days, Google’s translation program is good news. We need more and more such real-world translation efforts among the big web players. They enable us to test public acceptability thresholds fairly rapidly, explore more quickly than in the past rival technologies (statistics vs. rules vs. hybrids of both, etc) in an age of near infinite computing power, and steadily position translation as a ‘natural’ practice at the beating heart of information finding (or searching, as they still call it).

Posted by on 05/22 at 05:48 AM
  1. You missed the point, which is that Google is adopting a massively parallel corpus-based approach to MT, using its TBs of crawled data. Yes, other people are working on that too, but Google has more data, computing power, and PhDs than them. You missed the Rosetta stone analogy too. Google is using the analogy correctly--Roesetta stone refers to a multilingual KEY for translating other things, which is what it plans to use the web as.

    Posted by torazaburo  on  05/25  at  07:31 PM
  2. Looks like the are doing things LinearB already demonstrated on many occasions. Wouldn’t be surprised if there were many intersections.

    Posted by PetaMem  on  05/26  at  10:08 AM
  3. I was commenting more on the reports than the Google facts. But you’re right. Certainly Google is well endowed and we are all looking forward to seeing how well they can leverage the multilingual resources of the web. I also appreciate that the Rosetta Stone image is used to emphasize the exploitation of an existing translation base rather than creating one by encoding the rules that would enable the translation process from scratch. I wonder whether marshalling a few existing parsers might improve results on those sub-segmental chunks which are not inscribed in stone.

    Posted by  on  05/26  at  01:02 PM
  4. It’s interesting. A gentleman by the name of Seth Wagoner came up with a plan several years ago to harvest translated material from the web and use stats-based MT, and I’m sure there have been other people with similar ideas so there’s nothing new in Google’s basic plan. I’d be interested to know how they would verify the quality of the translations they harvest, and where they’d harvest them from. I’m sure we’ve all come across a lot of horrendously translated websites (often companies from fast-growing Asian countries translate their websites themselves, which can be hilarious), so if Google were to use pure stats (ie common mistranslations being accepted as accurate because of their frequency)there could be some interestingly propagated ‘international English-isms’. Obviously the fun lovers at Google will be aware of this - but does anyone know how they would select translations of acceptable quality?

    Posted by christian  on  05/29  at  11:31 AM

Name:

Email:

Location:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?

Next entry: Greeked domain names

Previous entry: ABC of Kulcha

<< Back to main