Translation Technology

Monday, February 22, 2010

Meedan Aims High

There’s great excitement in the Twitterverse today about Meedan, a San Francisco-based operation that offers Web users English news articles translated into Arabic, and vice-versa. The site allows speakers of both languages to communicate with each other and, in addition to an international team of translators and editors, it uses IBM machine translation technology to expedite the commmunication. Any such development that encourages cultural and diplomatic understanding is very welcome.

The UK’s Guardian covers the story well here and here, as well as it being picked up by Wired, describing Meedan (which means “town square” in Arabic) thus:

Think of it a social network filled with people you don’t know, but want to understand.

You can also view a YouTube video that explains how the system works:

I am sure we will be hearing a lot more about this story soon.

Posted by ultan on 02/22 at 09:20 AM

Translation Technology • (0) Comments • (0) TrackbacksPermalink

DeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Saturday, October 31, 2009

Facebook I18N: Way More Than A Token Gesture

Tokens (markers that are replaced at run-time by other text or values) in strings can be the bane of a translator’s life if used incorrectly because they frustrate a correct translation being made. However, I love the way that the Facebook translation tool allows you to comment on their use as you translate. 

Looking at the options below for commenting on token use is an education in itself (the tokens concerned are {number} and {chat-service-name}).

This approach allows users to comment as much on the effectiveness of the internationalization (i18n) practice as on the quality of the translation.

Facebook’s internationalization best practices for developers are here.

Facebook: Available in How Many Languages?

I’m sure we’re all very familiar with the vaunted Facebook crowdsourcing translation model by now. It’s been central to Facebook’s phenomenal international growth, and it’s a fantastic innovation even subject to a U.S. patent application. Anyone who supports the global sharing of information can only but admire what Facebook have done, meself included!

But I’m stuck here. Maybe you can help me?

As a user experience (UX) professional, I can see how allowing users to translate their own content can be part of a compelling engagement strategy, and within that context I would have thought the entire user experience should be in the user’s language, not just part of it.

So, then, why is it that when we constantly read that Facebook is available in 65, 70, 80, whatever number of languages, we can find that the Facebook help is available in less than 10? Here is what Irish language (Gaeilge) users see under Help:

Irish language Facebook help screen showing seven languages have translated help.

Is it because:
a) The Facebook crowdsourcing translation tool doesn’t allow the help strings to be translated?
b) Facebook users don’t want to translate help because they don’t like or need it, or doing so just ain’t cool (or easy) anyway?
or
c) There’s a whole bunch of places out there populated by people way way smarter than others and they don’t need help in their own language?

As a localization professional working according to budget, I was sometimes faced with the prospect of having to preside over a localization plan where help or doc not included and left in English (actually, Facebook doesn’t seem to allow users who switch their language to one where no help translation is available an option to read help in English instead). I wondered: if this approach was acceptable then why the help was written in English in the first place?

For me, partial localization is fine if the market and user experience accepts it, of course, though it’s clear that for some cultures doing so is a negative experience.

But what’s going on with community translation of user assistance like help?

Answers to the organizers of the next localization or UX conference, anywhere, please.

Posted by ultan on 10/31 at 01:59 AM

Translation Technology • (3) Comments • (0) TrackbacksPermalink

DeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Friday, August 28, 2009

Technology Company in Silicon Valley Applies for a Patent

If you’d been following @localization on Twitter, you’d know it was announced days ago that Facebook has applied for a patent (in the U.S.) called “Community Translation On A Social Network” (this was covered on the Baltimore Sun’s Tech Blog before anywhere else.)

The details, filed with the U.S. Patent and Trademark Office, aim to patent the much-vaunted social network translation process we hear so much about: where Facebook “volunteers” contribute freely-provided translations, which are then “voted” on as being appropriate or not. The filing states:

Embodiments of the invention provide techniques for translating text in a social network. In one embodiment translations of text phrases are received from members of the social network. These text phrases include content displayed in a social networking system, such as content from social networking objects. A particular member is provided with content including a text phrase in a first language, and the member requests translation into another language. Responsive to this request, a translation of the text phrase is selected from a set of available translations. The selection is based on actions by friends of the member in the social network, the actions being associated with the set of available translations. These actions can the viewing of or approval of translations by the friends, for example. The selected translation is then presented to the member requesting the translation.

(I admit being familiar with some of this kind of language, although lawyers were paid to explain it.)

This development would seem to offer a great deal of potential to not only extend the debate about crowdsourced translation, to whom the benefits really accrue, at what cost to others, and so on, but to ask the question “where will this all end up?” Except that a linkedinfail-style debate hasn’t ignited (yet.) Facebook, although wrong-footing the pundits very badly on this, is really doing what anyone else in the business (and it is a business) would attempt to do, so such a patent application isn’t surprising. It’s what tech companies do all the time, and why wouldn’t Facebook want to try and “own” a core global user engagement process like this? Amazon, after all, made a science out of it (and nice profits after a bit.)

However, Facebook is not alone in the social networking space (and the filing is specifically for “social networks") using the approach for translation, but that’ll be a “prior art” issue that the U.S. Patent and Trademark Office will need to examine.

All I can say - as the pursuer of a patent application with the U.S. Patent and Trademark Office myself - is these things can take years to conclude - even if the patent is to be awarded eventually.

Personally, I think this patent application will eventually fail, but yet again Facebook has made its translation process a headline event in even trying (nothing wrong with that.) You may have an opinion on all this, so until we hear from the U.S. Patent and Trademark Office, we’d like to read it ...

Posted by ultan on 08/28 at 01:24 AM

Translation Technology • (0) Comments • (0) TrackbacksPermalinkDeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Saturday, August 08, 2009

Translation Party

Bored? Try Translation Party.

Yes, if you’ve nothing to do because Twitter and Facebook are broken again, then amuse yourself for hours this way. Translate phrases backwards and forwards between English and Japanese using Google Translate until you reach a state of “equilibrium.” Or your boss appears.

Translation Party Image

More on Translation Party from Techcrunch.

If you think that’s pointless, then consider some people have no problem paying good money for back-translations.

Posted by ultan on 08/08 at 02:33 AM

Translation Technology • (0) Comments • (0) TrackbacksPermalink

DeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Friday, July 17, 2009

Yamli - Arabic Without an Arabic Keyboard

NPR’s All Tech Considered has a great story about Yamli - a very cool technology helping users connect through Arabic.

You can use your regular keyboard and non-Arabic input to search in Arabic. According to the story:

"The idea is, if you don’t have an Arabic keyboard, you can type Arabic by spelling your words out phonetically.” .... “For example ... when you’re writing the word ‘falafel,’ Yamli will convert that to Arabic in your Web browser. We will go and search not only the Arabic script version of that search query, but also for all the Western variations of that keyword.”

The technology recently “best in show” at a recent “new” technology forum at MIT. Very neat.

Posted by ultan on 07/17 at 11:56 AM

Translation Technology • (0) Comments • (0) TrackbacksPermalink

DeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Monday, February 16, 2009

Getting It Sorted

A simple challenge to you: How do you alphabetically sort your translated XML or HTML content?

I’ve been to many L10n/I18n conferences, workshops, and sales events, and have read many, many articles in various industry publications over the years. And yet, when it comes to the part when you’re supposed to tell everyone how you alphabetically sorted all that translated HTML and XML that was so wonderfully handled by the best in modern technology, everybody ducks.

Why? The reason I suspect is that there are no out-of-the box solutions out there that really work for every language anyway, and when it comes to the sorting part, well then it’s down to human intervention? Am I right?

So, how do you do it? Assume you’re translating HTML and need to resort an alphabetically sorted index or glossary for an online manual. How do you proceed? Is there a tool you use? Or is it a manual process? How much does this process cost?

Do you translate XML? How do you sort the transformed content? Do you render it from the XML file directly? Use a database? XSLT?  Do it the hard way? Ever used the index-sort-as element in DITA? Again, assume I want a sorted index or glossary.

What do you recommend? If you have examples and details, let us know and I’ll record them here. Or even better, write an article about how you alphabetically sort your content and submit it for publication in Multilingual.

Posted by ultan on 02/16 at 10:39 AM

Translation Technology • (2) Comments • (0) TrackbacksPermalinkDeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Saturday, November 29, 2008

DITA: The Obama of Global Content?

Is Darwin Information Typing Architecture (DITA) the answer to all our content globalization problems? On its own, no. The big issues remain fundamentally the same as before. Yet, that’s not the impression we’re often given when out shopping for solutions, but rather a switch to DITA will somehow solve issues of cost, quality, content in every language. But then, most of us have yet to come across a vendor who wouldn’t say, “Yes, we can” (for a price) either, have we? Here’s my analysis…

Time and time again there’s a claim that the use of DITA leads to big translation savings, better content quality to translate, easily delivered content in every language and so no. Usually, the use of DITA is positioned in this context along with the use of some content management systems that are then plugged into various localization workflows.

This kind of DITA globalization solution stuff has been kicking around for 4 or 5 years now, and various “out of the box” solutions are pushed by various vendors. Of course, people have solutions to sell, white papers to post, and PowerPoint Karaoke to rehearse for the next localization conference, but all this touching faith in DITA per se from solutions vendors needs to be challenged.

Perhaps think about the following issues and questions when you’re considering a DITA globalization solution:

* If you have existing content, including translated material stored in a CMS or in TMs, especially content created from a non-structured environment, then how do you migrate to DITA? What about internal tags that might be stored in TMs? How does a format-based content creation system map to a structured environment? What for example would you map the STRONG or B element in a format-based HTML environment to in DITA land? Or an heading level 5 equivalent in RTF? Oh, and can we see a large-scale solution please? Not one based on the translation of a couple of hundred of pages. Anyone who has been involved in these migration projects knows that it is not a trivial undertaking - even with customized tools. Sorry folks, no out of the box solutions there.

* Why would DITA reduce word count, as has been claimed, if you can still write as much content as you like, in any way you like? Just like in any other environment, structured or otherwise, you need to establish authoring rules, educate about them, enforce the rules and then measure the resulting volume and re-use. DITA on its own will not help.

* Why does it improve content quality? It cannot. DITA is about structuring content, not QA of that content. You need manual or automatic review tools or a combination of these. Just like any other authoring environment you need a process for this. The tools and processes that might - like controlled authoring - work on the same principles as non-DITA content.

* Why does DITA make product globalization easier - “content in any language”? Just because you can structure your content does not mean the rendering of that content is automatically provided for. In fact, the two issues - structure and formatting are deliberately separate. So, think about how your ability to render your XML content as Arabic PDF using XSL-FO (a little more complicated than CSS) or whatever.  Why would using DITA make such rendering easier than if you used any other flavor of XML to write simple topics?  As far as I can see there are a good few DITA pushers out there who simply haven’t a clue about rendering in this regard. Oh, and how does DITA solve the old problem that nobody wants to address (and I’ve been asking about for 10 years) - the automatic and correct alphabetical sorting of localized content such as online and print indices, glossaries, and so on?

* What is the relationship between DITA and the ITS and XLIFF? Do you translate DITA directly? If not, why not?

* How do you address the problem of topic-based authoring from a translation viewpoint? If you’re translating piecemeal, then obviously there is less overall context, so what happens when you assemble it using a bookmap? At lower granularity, the use of DITA element names like step or shortdesc don’t help that much (particularly if they content they’re supposed to express bears no relationship to the element name - oh, but that problem exists in any XML environment).

* Translation rules - with the exception of some best practices from Joann Hackos and the DITA translation subcommittee (practices that I rarely see cited) - has anyone considered the potential translation problems of conrefs and the challenges of indexing topic-based materials with keywords? There are other areas too. Even conrefs at the paragraph levels present challenges for translation.

* What translation tools support DITA out of the box (I mean non-specialization)? When I last checked the leading tag-editing tool couldn’t do it, and required faffing about with INI files and so on to cater for the different non-specialized topic solutions of DITA. Plus, if there is any specialization of the DITA DTD or schema, then even if there was an out of the box solution, content authors would still need to tell content translators what those XML elements and attributes really meant. Er, just like you did years ago with HTML too…

* Most importantly, will DITA speed up the arrival of my economic stimulus check from the IRS?

So, is anyone up to the challenge of addressing these issues? Asking the questions, and demonstrating the answer for real?

Posted by ultan on 11/29 at 10:53 AM

Translation Technology • (0) Comments • (0) TrackbacksPermalink

DeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Wednesday, November 12, 2008

Google Reader: Blogs in Any Language

The Google Reader team have announce a feature whereby you can easily translate any subscribed blog into your language.

MAKE has more information.

So, now you have no excuse for missing that all-important news item, opinion, or comment anywhere!

Posted by ultan on 11/12 at 08:50 PM

Translation Technology • (0) Comments • (0) TrackbacksPermalinkDeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Thursday, September 04, 2008

And Now For Something Completely Different ... Comic Books

Google have released their new Chrome browser. Nice and simple, ‘though I won’t be switching to it as my main browser for a while yet. I’ll be sticking with Firefox.

That said, I was intrigued by the documentation that comes with it.

See for yourself at: http://www.google.com/googlebooks/chrome/#size=small&page=4

I really like the approach at getting the message across and challenging the accepted notions of user assistance that comes with these kind of products. Good job.

But I wonder how it will fare in translation? Can it be exported to SVG? XLIFF? Can SDL Worldserver do it?  Or “volunteers”? Or will there be a different version for the international versions (there should be an alternative version for accessibility anyway)?

How novel (groan)...

Posted by ultan on 09/04 at 03:47 PM

Translation Technology • (0) Comments • (0) TrackbacksPermalinkDeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Thursday, May 15, 2008

New Languages in Google Translate

I see Google Translate has now added 10 languages (Bulgarian, Croatian, Czech, Danish, Finnish, Hindi, Norwegian, Polish, Romanian and Swedish), bringing the total to 23. There is also a new ability to perform cross-language searches, “For example, we now support Chinese translation to/from any of our languages (e.g., Chinese to French)”, they tell us.

Naturally, this means you’ll be able to find and access tonnes of content from local sources as you keep up with the summer’s unfolding developments in Tib..., er, I mean the Olympics in Bejing. Whatever.

On the subject of Polish - I see the .pl domain name has gone through the million mark.

Posted by ultan on 05/15 at 05:05 PM

Translation Technology • (0) Comments • (0) TrackbacksPermalinkDeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter

Friday, March 28, 2008

Der Mundo, New Easy to Use Multilingual Blogging Tool

As he mentioned at the SV Localization UnConference Brian McConnell of the Worldwide Lexicon (see previous Blogos posting), has announced Der Mundo, a new, easy to use multilingual blogging tool.

This is a free hosted blogging service, as easy to use as Twitter, with the WWL community translation tools built in. More information is available on how to sign up and use it at http://blog.dermundo.com.

Brian tells us:

With Der Mundo, it’s easy to sign up and start writing. The service auto-detects the reader’s language preferences and displays translations if present, or invites them to contribute. You can use it as a standalone service, or export RSS to your favorite publishing system.

I’ve signed us up for it - and will revisit as soon as this work stuff is out of the way...

Posted by ultan on 03/28 at 05:20 AM

Translation Technology • (1) Comments • (0) TrackbacksPermalink

DeliciousDiggMa.gnoliaStumbleUponDesign FloatRedditTwitter