30 junho 2010

How Translation Software Saves Mother Tongue

Illustration: Caleb Bennett

Illustration: Caleb Bennett


Every day, on the Xiha Life homepage, there’s a playful poll designed to get members conversing. When I recently dropped by the social networking site, the questions were, How many pets do you have in your home? Would you like more or less? The survey sparked a lively thread: One member owned turtles and a dog, another wanted a rabbit, and a third argued she couldn’t have pets because she vacations so frequently.

But here’s the thing: Many of these commentors didn’t speak one another’s language. Some posted in English, others in Czech, one in Spanish — yet they were all participating in the same conversation. How? At the bottom of each comment on Xiha Life there’s a Translate button. Hit it and the site uses Google’s automatic-translation software to produce an on-the-fly version in your local language. The upshot is a delightful experience: People across the world engaging in a single conversation without ever leaving their mother tongue.

Automatic-translation software has long been treated as a joke because of how hilariously it mangles phrases. (After one Czech member described her collection of pythons, another poster chimed in seamlessly to admit, “I am so afraid by snakes.”) But in the past few years, something has shifted: The technology is now surprisingly mature. I discovered this in the spring when I had to read Web material written in Finnish, German, Spanish, and even Korean. I used Google’s Chrome browser, and it automatically reworked every foreign page into shockingly understandable English.

How have the machines become so adept? Mostly by using new “statistical” techniques. Instead of trying to teach a program the rules of language, computer scientists locate massive corpora of online documents previously translated by humans — say, UN proceedings, which are routinely available in six different languages, or bilingual newspapers. Then they train cloud computers to recognize which words and phrases match up across tongues.
That’s why Google is leading the pack: It’s best at finding oodles of documents to train its cloud. This method also means that the more the Web grows, the better our multilingual machines will get.

The geopolitical implications are profound. For years, pundits have wondered which language will eventually dominate. Will English remain the lingua franca? Will Mandarin ascend?
But maybe it’s no longer a competition. Machine translation could be good enough to obviate the need for a primary global language.

Certainly, any activity requiring serious precision — legal proceedings, business discussions, diplomatic negotiations — will still need expert human translators. And in the short run, English will probably dominate those fields. But most people don’t need that level of quality to chat with foreign friends or surf the international Web.

“Machine translation isn’t good enough to translate a book, but when you have someone you want to talk to, it really helps out a lot,” says Jani Penttinen, CEO of Xiha Life. “Your options are either don’t talk to that hot girl in China — or speak in a little funny way.” He baked translation into his social network because he wanted to seed crosslinguistic conversation. It has worked: Xiha’s 750,000 members come from hundreds of countries, yet no more than 5 percent of them hail from the same nation.

Some academics predict that auto-translation could even save minor languages from extinction. In Chile, for example, pressure to speak Spanish is eroding the indigenous language of the Mapuche people. Auto-translation might make it possible for the Mapuche to communicate with the outside world without abandoning their dialect. “It decreases the drive to consolidate into one dominant language if you can use your own,” says Jaime Carbonell, head of Carnegie Mellon’s Language Technologies Institute.

Welcome— or should I say bienvenue, maligayangpagdating, or välkomma— to a world where everyone can speak for themselves.

WIRED

Sem comentários: