Google Translatotron – The Translator Tool

Posted on June 1, 2019

Google Translatotron

Google Translatotron is a technology that translates voice from one language to another directly without converting first into text.

Youtuber James Charles loses millions of followers after accusations of harassment and “betray” by his mentor
Google’s Artificial Intelligence now wants to create it’s own AI.

Google presents to us on its blog Translatotron, its new integral model of voice-to-speech translation. That the company has been perfecting its translation models for years is not something new. But, now these AI models are capable of imitating the voice of people.

Google’s main objective of this is to help people who speak different languages to communicate with each other. To achieve this new system of voice to voice, they propose a unique model from sequence to sequence. The sequence moves away from cascading systems and improves. According to Google, Translatotron has the speed, errors of composition and the translation itself.

Imitating Accents And Pronunciation Of Translations

Google tells us that Translalotron is based on an end-to-end model, superior to traditional cascade systems. With this, they intend to demonstrate that speech can be translated from one language to another without the need for an intermediate representation of text in either language, something that cascading systems do.

The new Google Translatotron tool takes the source spectrograms and directly generates other spectrograms with the content translated into the desired language. For this, it uses a Neural Vocoder, which is in charge of giving the desired shape to the waves of the output spectrogram. They also use an encoder capable of preserving the characteristics of the voice that recorded.

The main novelty of Translatotron is that it does not work in cascade and that it adds elements such as an encoder capable of retaining the speech characteristics of the recorded voice.
When it comes to training Translatotron, Google uses a multitasking objective. In this, it intended to predict transcriptions of origin and destination, while, simultaneously, the final spectrograms generated.

In short, Google Translatotron registers the voice of the interlocutor, manages to preserve the characteristics of his speech, and manages to generate an output spectrogram translated into the target language, maintaining said speech characteristics.

Emulating The Natural Language

Creating natural voice models has long been an obsession with Google. We have been able to see it in the way that Google Assistant speaks. This is mainly the difference they seek with the rest of the assistants and models, the naturalness.

Google Translatotron itself admits that its results fall below traditional cascade systems, but demonstrate the viability of end-to-end voice systems, which was its main objective.

First, they show us how Google Translatotron works under a cascade model.
In short, Google tries to put on the table its model of translation by a voice from end to end.

The Google Translatotron translation highlights that they are capable of preserving the characteristics of natural speech, a key point for assistants and translators to work in a “more human” way. With this, they seek to create a good starting point for future research and the development of voice translation system.