Google is making progress in developing an artificial intelligence (AI) model that can support the world’s most commonly 1000 spoken languages. The Universal Speech Model (USM) is a family of speech models that includes two billion parameters that have been trained on 12 million hours of speech and 28 billion sentences of text. The model is already being used in Google’s products, such as YouTube.
Google used a self-supervised learning approach that leveraged audio-only data, which was available in large quantities across languages, making it easier to scale. After self-supervised learning on audio, Google put the model through a second step where its quality and coverage were improved using text data, and then fine-tuned it using downstream tasks such as automatic speech recognition (ASR).
Google found that the model’s word error rate (WER) was less than 30 percent across 73 languages, an achievement for the company. For commonly used languages like U.S. English, the relative rate was six percent lower than the state-of-the-art internal model the company used. Comparisons with other publicly available datasets also showed that USM performs better at ASR and speech translation tasks.
The company is now working on using the model architecture and training pipeline to build its 1,000 language model. This development is part of Google’s push to incorporate AI into its products, which will be showcased at its annual I/O event. This year, the event is expected to feature over 20 products powered by AI. As Google competes with Microsoft for AI-powered products, this boost is much needed.
You might also Like.
Apollo Tyres Ltd. – Q2FY25 Result Update
Sector Outlook: Neutral Trade Now Apollo Tyres Ltd. reported consolidated...