The following is an article published on the topic of machine translation by Masaharu Hayataki, an influential Japanese translator, copywriter and consultant.
“Considering how quickly technology evolves and what machine learning is capable of, concerns about job security among translators is understandable”
In this article, I am going to explain why you can never rely on a machine translation to translate your website and marketing materials and why no matter how much it evolves, machine translation can never replace really good translators.
Machines are not smart enough to learn a language
Machines are simply not smart enough to perform the intricate process of language learning:
An intelligent being cannot treat every object it sees as a unique entity unlike anything else in the universe. It has to put objects in categories so that it may apply its hard-won knowledge about similar objects, encountered in the past, to the object at hand. But whenever one tries to program a set of criteria to capture the members of a category, the category disintegrates.” (How mind works)
Machine translation is all about rules. Before machine learning, machine translation systems used large bilingual dictionaries and hard-coded rules for fixing the word order in the final output, which, as you can imagine, does not translate well.
That was in 1950. By the 21st century, computers had acquired more powerful computation abilities and a higher level of storage, which enabled better-quality translations. However, the basic mechanism hasn’t changed: dictionaries, rules, and every single rule of translation must still be taught to the computer.
A bachelor, of course, is simply an adult human male who has never been married. But now imagine that a friend asks you to invite some bachelors to her party. What would happen if you used the definition to decide which of the following people to invite?
Arthur has been living happily with Alice for the last five years. They have a two-year-old daughter and have never officially married.
Charlie is 17 years old. He lives at home with his parents and is in high school.
David is 17 years old. He left home at 13, started a small business, and is now a successful young entrepreneur leading a playboy’s lifestyle in his penthouse apartment.
The list, which comes from the computer scientist Terry Winograd, shows that the straightforward definition of ‘bachelor’ does not capture our intuitions about who fits the category. Knowing who a bachelor is is just common sense, but there’s nothing common about common sense. Somehow it must find its way into a human or robot brain. And common sense is not simply an almanac about life that can be dictated by a teacher or downloaded like an enormous database. No database could list all the facts we tacitly know, and no one ever taught them to us. (How the Mind Works)
In How the Mind Works [title in italics], Steven Pinker explains how complex and elegant the process of language learning of human brains is.
Before machine learning, teaching a machine to recognize cats was impossible. It’s because you had to explain how a cat looks like. For example, “if an object has 4 legs, big eyes, little pointy ears and a hairy body, then it is a cat.” Of course, many things have 4 legs and big eyes, so computers would confuse cats and dogs. Also, it was necessary to explain what are legs and eyes were, etc. Defining cats and creating a corresponding algorithm was nearly impossible.
Machine learning is the process of feeding computers with millions of pictures and tagging objects in the images like cats, dogs, humans, churches, specific people, etc. Then, the computer creates its own algorithms to recognize cats and other objects.
Learning a language is more difficult than that because language learning includes learning what a feeling, a metaphor, or a lie is. A picture of a cat is always just that, a picture of a cat, but the word “bachelor” can mean many different things. Additionally, the definition of some words can be different from one place to another, and the meaning of a word can always change in the future.
Translation is way too irregular for computers to create reliable algorithms
Translation is much more complex than learning isolated words of a language. From one language to another, the grammar, order of words, and metaphors are different.
The Japanese language has three scripts: hiragana, katakana, and kanji. “こころ”, “ココロ” and “心” are one word, “KOKORO,” written in hiragana, katakana, and kanji. They mean something like mind, heart, and soul.
“こころ,” which is written in hiragana, gives a warm impression to readers. “ココロ,” which is written in katakana, sounds cold and artificial because katakana is often used for foreign words, and many engineering terms are foreign words. SF comics or novels would use ココロ for the minds of robots and こころ for the minds of humans. The kanji “心” often has a more masculine or formal connotation.
Teaching those differences to a machine is a lot more difficult than teaching a machine to recognize cats. Maybe you can have machines learn it by tagging こころ in a book as warm or non-robotic like you would tag cats in images. However, unlike images of cats, nuances of words can be different in different contexts or even with the use of different fonts and colors. There are too many variables for machines to learn.
Creating machines that can learn to find the closest match to the nuance of a pair of language is impossible.
Statistically, a machine translation won’t be able to perform a creative translation.
Machines can’t learn the subtleties of translation. Feed them 1000 SF books in one language and their Japanese translation, and they will figure out statistically that ”mind” is translated to ココロ, not こころ or 心, when the word is near technology-related words. Machines won’t know what those words mean, but they can make the right choice of words through this kind of process.
Statistical machine translations were developed at the beginning of the 21st century. With statistical translations, rather than coding expert knowledge into a software and creating a lexicon and grammatical rules for the translation of one specific language to another, large amounts of text in both designated languages are fed the software, and the computer is set to analyze this data. Google is already working on it by asking users of Google Translate to proofread machine-translated text.
Statistical translation would make the machine learn complex grammar and the appropriate choice of words by itself.
However, this is still not good enough. Why is that? Marketing translation and creative translation are not just translation. They need a process of transcreation. A language is always linked to its culture, so no matter how perfect a translation is, if the cultural elements of the language aren’t taken into account, the translated text will fail to communicate with the readers the same way as the writer intended.
Westerners’ confidence often appears rude or arrogant to Japanese people. Japanese modesty often appears as a lack of confidence to Westerners. Game of Thrones or Bible references are very common references in Western contents but not in Japan.
Arbitrarily, we all know what good writing and what bad writing are. If you are given two books, one written by a professional writer and another written by a high school student, you will most likely be able to guess which is written by the pro, but will not be able to explain why the pro’s writing is better.
Good writing is a very subjective concept, so I cannot tell you what exactly a transcreation should be, but we know a machine won’t be creative enough to add the explanation of a biblical reference in the translated version or to change a Game of Thrones reference to a Naruto reference.
Good writing is often seen as really bad writing to computers.
Machine translation and machine learning are used to give the machine data, to let people tag the data, and to let the machine create a sorting algorithm. The machine creates the rules.
Good writing often breaks the rules.
Jack Kerouac is one of the best writers in history, but his writing is often grammatically incorrect and very unorthodox. If a machine was to judge the quality of his writing, it would be rated very poorly.
Very well known copywriting also looks bad to computers:
“Got milk?” “Think out of the box.” “I’m loving it.” “Just do it.”
Computers will never come up with this type of copywriting. They have nothing to do with computers, food, or sports.
Good writing is more than just writing. It’s a very complex and unquantifiable thinking process that machines cannot copy unless we invent machines that can pass a very high-level Turing test, but for now, that is really just science fiction, and it’s likely that if that happens, people will start claiming that robots have souls and should have human rights.”