Home > News list > Tech >> Industry dynamics

Himalayan self-developed Everest speech generation model, achieving "rapid cloning" of sound within 5 seconds

On October 31st, at the 2023 Yunqi Conference, Himalaya showcased its latest self-developed voice technology achievements, including the Himalayan Everest voice generation model and the second generation intelligent voice interaction system.It is reported that the Himalayan Everest speech generation model has the ability to quickly customize the voice color and style

On October 31st, at the 2023 Yunqi Conference, Himalaya showcased its latest self-developed voice technology achievements, including the Himalayan Everest voice generation model and the second generation intelligent voice interaction system.

It is reported that the Himalayan Everest speech generation model has the ability to quickly customize the voice color and style. This technology supports real-time conversion of sound colors in rich scenes, endowing the sound with creative "sound changing" ability, just like painting different "skins" on the sound. Previously, the Himalayan Everest Laboratory team had created over 37000 audio book albums through AIGC, and the daily playback time of AIGC works exceeded 2.5 million hours.


At the Yunqi Conference, the Himalayas showcased their self-developed Everest speech generation model. This large model is a collaboration between the Himalayan Everest Homo sapiens team and Northwestern Polytechnical University's ASLPLab. Based on a self-developed framework, it achieves dense training of audio and text in a unified framework for speech generation tasks. It can achieve zero shot learning and transfer of speech style and timbre, and achieve any combination of style and timbre, The cloud native big data platform built by Himalaya based on Alibaba Cloud Data Lake 3.0 provides a massive amount of high-quality data for voice big model training, making it an indispensable "data engine" for the Himalayan voice big model.

According to Lu Heng, Chief Scientist of the Himalayas and Head of the Everest Laboratory, The Himalayan voice generation model has made significant breakthroughs in terms of voice customization, achieving 'rapid cloning' of sound within 5 seconds. With a very small amount of data, the model can clone basic voice colors with 90% similarity and quickly generate customized audio in just 10 seconds. In the future, this technology is expected to play a role in short video creation, digital voice dubbing, human-machine interaction dialogue, celebrity IP replication, and other fields Generate enormous potential value and effectively address communication pain points in business scenarios

Lv Ruitao, a senior product expert at the Himalayan Everest Laboratory, introduced on-site that the speech model adopts a new speech codec based on speech vectors and semantic markers. The speech vectors contain acoustic details for high fidelity speech reconstruction, while semantic markers (LLM) focus on the language content of speech modeling, ultimately achieving efficient generation of the most expressive and fidelity speech (dialogue) content. In application scenarios, this speech model can be applied to various tasks such as speech content generation, oral dialogue, real-time voice color conversion, speech style transfer, speech to speech cross lingual translation, speaker anonymization, etc.

Himalaya will also showcase its second-generation intelligent voice interaction system, which is based on Alibaba Cloud's "Tongyi Qianwen" model and centered on the Himalayan children's image spokesperson "Bobo", enhancing his natural and coherent dialogue ability, highlighting the characteristics of the IP image "Bobo". The intelligent voice interaction system has provided services through the Himalayan Children's App and Himalaya, and Boboqiu provides accompanying conversation functions for family parent-child users. (One Orange)

Tag: Himalayan self-developed Everest speech generation model achieving rapid cloning


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

AdminSo

http://www.adminso.com

Copyright @ 2007~2024 All Rights Reserved.

Powered By AdminSo

Open your phone and scan the QR code on it to open the mobile version


Scan WeChat QR code

Follow us for more hot news

AdminSo Technical Support