Autor/es reacciones

Maite Martín

Professor of the Department of Computer Science at the University of Jaén and researcher of the research group SINAI (Intelligent Systems for Information Access)

The paper presents a multimodal, multilingual machine translation model called SEAMLESSM4T, developed to overcome current limitations in text-to-speech translation, including translations between resource-poor languages. This unified model enables tasks such as speech-to-speech, speech-to-text, text-to-text and text-to-speech translation, with support for up to 101 source languages and up to 36 target languages in speech modalities.

In my view, one of the highlights of the model is its focus on studying and incorporating under-resourced languages, such as Maltese and Swahili, which have historically been excluded from technological advances in machine translation. These languages, lacking large volumes of tagged data and specific resources, are often left behind in the development of advanced linguistic tools. However, the work addresses this gap by creating a massive corpus of aligned speech and text data. This corpus combines manually tagged data with automatically generated resources, which significantly extends the scope and accuracy of the model in under-represented languages. This effort not only improves the accessibility of translation technologies for these communities, but also marks an advance in linguistic inclusion by democratising access to advanced communication tools.

An equally relevant aspect of the work is the decision to make these data and tools available to the scientific community for non-commercial use. This approach fosters collaborative research by allowing other developers and researchers to use these resources to further advance machine translation, especially in multilingual and multimodal contexts. The publication of these resources not only consolidates the model as a benchmark in technological innovation, but also drives the development of more inclusive and equitable solutions, laying the foundations for a more open and dynamic research ecosystem.

The model, however, also faces important limitations. Although it improves translation accuracy in resource-poor languages, the results are still inferior to those obtained with high-availability languages. In addition, aspects such as real-time interaction, expressiveness of the translated speech, and mitigation of gender bias and toxicity remain open challenges. These limitations suggest that, although SEAMLESSM4T represents a significant advance, there is still work to be done to optimise its implementation in practical scenarios.

EN