SouthWest England Prestige Award
01793354694 | 07501241137 | 01793976868 | info@computerhubuk.com

www.computerhubuk.com

Google has pulled the live translation feature

Google has pulled back the curtain on how it developed the live translation feature for Google Meet. The company’s audio engineering and product management teams, along with Google DeepMind, were able to achieve what was apparently a five-year goal in just two years.

Fredric, who leads the audio engineering team for Meet, explained that Google knew instantaneous translation was necessary for live calls, and the breakthroughs in large models made it possible. It’s no secret that live translation has been a goal across Google’s services, but engineers from Pixel, Cloud, and Chrome all worked with Google DeepMind to make real-time speech translation a reality.

The old way of doing things was apparently clunky, to say the least. Previous audio translation tech had to go through a multi-step process: it would transcribe the speech, translate the text, and then convert it back to speech. As you can imagine, this led to some serious latency issues, with delays of 10 to 20 seconds. It made natural conversation pretty much impossible. Also, the translated voices were generic, so they didn’t have the inflections and mannerisms of a person speaking, which takes away from the overall experience.

According to Huib, who is the lead for product management on the audio quality side, the real breakthrough came from “large models,” which are different from the large language models (LLMs) we hear so much about. These models are capable of “one-shot” translation, meaning you send in audio and the model almost immediately starts outputting the translated audio.

Of course, building something this complex wasn’t without its challenges. One of the biggest hurdles was making sure the translations were high-quality, since things like a speaker’s accent, background noise, or network issues can throw a wrench in the works. The Meet and DeepMind teams had to work together to refine these models and adjust them based on real-world performance. The teams even brought in linguists and other language experts to help them understand all the little nuances of translation and accents.

Some languages, like Spanish, Italian, Portuguese, and French, are easier to integrate since they have a closer affinity. On the other hand, languages with different structures, like German, were much more challenging because of things like grammar and common idioms.

Right now, the model translates most expressions literally, which can sometimes lead to some pretty funny misunderstandings. But Huib and Frederic expect that future updates, using more advanced LLMs, will be able to grasp and translate these nuances more accurately, even picking up on things like tone and irony. Until then, just having a live translator you can rely on is a huge deal, so it’s a win overall.

Source: Google

Share the Post:

Leave a Reply

Your email address will not be published. Required fields are marked *