Home > News list > Tech >> Industry dynamics

Qifu Technology Robot Team Speech Paper Selected for INTERSPEECH2023

On June 1st, it was reported that the paper "Eden TTS: A Simple and Efficient Parallel Text to speech Architecture with Collaborative Duration alignment Learning" by Qifu Technology's robotics team was recently accepted by the global top conference on speech and acoustics, INTERSPEECH2023.It is reported that INTERSPEECH is the top flagship international conference in the field of voice signal processing founded by the International Speech Communication Association (ISCA)

On June 1st, it was reported that the paper "Eden TTS: A Simple and Efficient Parallel Text to speech Architecture with Collaborative Duration alignment Learning" by Qifu Technology's robotics team was recently accepted by the global top conference on speech and acoustics, INTERSPEECH2023.

It is reported that INTERSPEECH is the top flagship international conference in the field of voice signal processing founded by the International Speech Communication Association (ISCA). It is the world's largest comprehensive voice signal processing event, which enjoys a high reputation internationally and has extensive academic influence.


The research results of Qifu Technology's paper provide innovative solutions for application scenarios that require text to speech, proposing an end-to-end differentiable non autoregressive neural network speech synthesis model architecture. Based on the close relationship between the duration of text phonemes and alignment, the paper proposes a simple and efficient alignment learning method: first, a new energy modified attention mechanism is used to obtain the guide alignment, then the guide alignment is used to calculate the duration information of phonemes, and finally, a monotonic alignment is constructed based on the duration information of phonemes. This method does not need external alignment information and additional alignment loss function.

For business efficiency improvement, this end-to-end differentiable method enables various modules to be easily replaced with various types of neural network modules, thus having good scalability and stability. Compared to mainstream autoregressive models, the inference speed has been increased by more than 10 times, which can meet the needs of real-time speech synthesis.

According to the multi person MOS evaluation conducted, the MOS score of this method reached 4.32 out of 5, and the natural fluency of synthesized speech is close to the current optimal autoregressive model, significantly superior to non autoregressive models of the same type.

In addition, compared with similar methods, this method can save more than 50% of training time and significantly improve model training efficiency.

Qifu Technology has always been committed to investment and self research in the field of dialogue robots. Just two months ago, another audio paper from Qifu Technology, "Multilevel Transformer for Multimodal Emotion Recognition," was received by the 48th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2023).

We are pleased to have achieved critical results in understanding users and optimizing expression. With the restructuring of various business layers of the company by Qifu GPT, we have greatly improved our ability to understand users' text, from speech to text, and then back to speech. Better recognition is for better expression and output. We will continue to invest in using cutting-edge technology to reshape the user experience Fei Haojun, Chief Algorithm Scientist of Qifu Technology, said. (One Orange)

Tag: Qifu Technology Robot Team Speech Paper Selected for INTERSPEECH2023


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

AdminSo

http://www.adminso.com

Copyright @ 2007~2024 All Rights Reserved.

Powered By AdminSo

Open your phone and scan the QR code on it to open the mobile version


Scan WeChat QR code

Follow us for more hot news

AdminSo Technical Support