In just 15 minutes, iPhone can 'copy' your voice

Intelligent devices 2023-06-05 10:26:14 Source: Network

Stephen Hawking's "mechanical sound" is perhaps one of the most recognizable sounds in the world.But that's not Hawking's own voice

Stephen Hawking's "mechanical sound" is perhaps one of the most recognizable sounds in the world.

But that's not Hawking's own voice.

In the era when Hawking lost his ability to speak due to ALS, technology was not yet sufficient to enable him to generate his own voice. In fact, there are only a few people who can use speech synthesizers.

Today, although ALS patients have more options for speech synthesis, the overall cost and time threshold are still not low, and the popularity is also limited.

Recently, Apple announced a new accessibility feature called PersonalVoice (currently not available), which not only allows users to "backup" their own voice for free, but also makes an interesting attempt to securely apply AI technology.

Just 15 minutes of 'tuning' to generate your voice

In an era where generative AI can imitate everything, using AI to imitate a person's voice doesn't sound novel anymore, it just feels a bit of a security hazard.

What I am curious about is more about how Apple can safely and efficiently implement the feature of PersonalVoice.

It is reported that iPhone, iPad, and Mac users only need to record 15 minutes of audio according to the prompts, and Apple will generate the same sound as the user based on device side machine learning technology.

In contrast, companies that provide professional speech synthesis services for aphasic groups may need to use professional equipment to record several hours of speech materials, with prices starting at several hundred dollars.

Another new accessibility feature, LiveSpeech, supports users to input text to generate voice content when making phone calls, FaceTime, or face-to-face conversations with others, providing another way for users who are speechless or unable to speak.

Combining the two features of PersonalVoice and LiveSpeech, aphasic users can communicate with others using generated sounds that are close to their original voice.

It's convenient to use, but how to avoid someone using voice materials picked up online to generate others' voices?

Randomization of materials.

During the process of recording 15 minutes of voice material, Apple will randomly generate content that requires users to read aloud, reducing the likelihood of others guessing the material.

Physical distance barrier.

During the recording process, users need to complete the recording in a specific space 6 to 10 inches (approximately 15 to 25 centimeters) away from the device.

During the generation process, all data will be completed locally on the device through Apple's Neural Engine, and there is no need to upload it to the cloud for processing.

After speech synthesis, if third-party applications want to use PersonalVoice, they must obtain clear authorization from the user.

Even if third-party applications are authorized for use, Apple will use additional background protection to ensure that third-party applications cannot access PersonalVoice and voice materials previously recorded by users.

If you are an Apple "family bucket" user, you can generate your own PersonalVoice and synchronize it to different devices through iCloud, and encrypt it end-to-end.

How important it is to lose one's own voice to understand how important it is

People are emotional creatures, and sound is a strong emotional trigger.

Research has shown that when a person hears their mother's voice, their body releases levels of oxytocin to a similar extent as when hugging their mother. Another study suggests that hearing one's own voice enhances one's self motivation.

This sounds a bit abstract.

But when we lose it, the importance becomes apparent.

In March 2021, RuthBrunton was diagnosed with ALS. That Christmas, she was already speechless.

About 25% of ALS patients suffer from "bulbar onset" amyotrophic lateral sclerosis, mainly characterized by speech disorders or swallowing difficulties. This type of patient may gradually become vague, with nasal sounds, and even lose their speech.

Brunton's action was decisive and he immediately contacted the company for voice generation after diagnosis.

It took a month to go back and forth, recording more than 3000 sentence corpora, but the final result was not ideal.

That company uses a technology called "unit selection".

Simply put, it is achieved through "concatenation" to generate speech, breaking down the corpus into a large number of small speech units, and then assembling the elements as needed.

Under unit selection technology, the word 'Bob' can be split into different phonetic elements, as shown in the Washington Post

The voice generated by this technology can be heard clearly, but it may sound a bit electric and not very natural.

The result was that Brunton's recorded corpus combined a voice from Microsoft called 'Heather', which not only had no resemblance to his own, but even forced the British to 'speak' with an American accent.

https://s3.ifanr.com/wp-content/uploads/2023/06/real-ruth.m4a

Brunton's own true voice

https://s3.ifanr.com/wp-content/uploads/2023/06/heather.m4a

Trapped in this voice, Brunton "only speaks when necessary, no longer because he wants to speak.

The mood of playing and chatting with her husband in the past has disappeared, and Brunton is not very willing to participate in multi person conversations.

Even when it comes to 'I love you', speaking in a voice that doesn't sound like one's own, the meaning seems to be weakened.

Six months later, Brunton and her husband competed to retrieve the original recorded voice material and found another company to synthesize a voice that was more like her own using AI technology:

This may sound silly, but regaining my voice has given me more confidence.

John M. Costello, who is responsible for the "Enhanced Communication" project at Boston Children's Hospital, noticed that patients who use more realistic speech generation seem to be able to establish deep connections with close people.

On Christmas Day 2022, Brunton, who has regained his voice, also recorded a holiday message through voice recording.

https://s3.ifanr.com/wp-content/uploads/2023/06/ruth.m4a

However, Brunton took on the COVID-19 just after Christmas and finally died in February this year.

The night she left, her husband David held her hand all night:

We have two years to bid farewell. We have agreed, we will say everything we want to say.

It's hard to imagine whether Brunton would have been able to freely say everything she wanted if she hadn't switched to a voice more like her own later on.

Accessible thinking ignites inspiration, AI ignites productivity

I have always believed that what accessible design excavates is actually the imaginative resources created by human diversity.

We go to people with vastly different life experiences from ourselves, listen to less told stories and experiences, and create a new way of life that we never imagined before, but can be friendly to more people.

PersonalVoice can help ALS patients with aphasia regain their voice; It can also help me use my own voice to converse with others while experiencing the "blade voice"; Even, it's hard for me to avoid imagining whether I should use this as a "backup" voice for those close to me, so as not to suddenly pass away one day.

And AI technology is about achieving these imaginative productivity.

As editor Du mentioned before, although not in line with the excitement of generative AI, Apple has always used AI to improve user experience - improving efficiency and protecting privacy.

Improving efficiency lies in improving locally executed machine learning algorithms and models.

In addition to PersonalVoice, Apple's other accessible feature in this preview, PointAndSpeak, also uses local device side machine learning technology.

In the future, visually impaired users will be able to turn their iPhone into a "point reader" with their fingers, by combining PointAndSpeak and voiceover functions in the built-in amplifier of the iPhone - where to click, allowing the iPhone to read text for you.

The principle of last year's "door detection" function is similar, allowing machine learning on the device side to help visually impaired users identify their doors and read the information on the doors and surrounding signs aloud.

As for privacy, according to Steve Jobs, it is "if you need their (user's) data, ask them (user's). Every time

This is also particularly important in terms of accessibility design - as these functional designs originate from people whose services are often overlooked by so-called "conventional design" and are often more vulnerable groups, it is more necessary to ensure that the privacy of these users is not violated.

In this context, we can also initiate more discussions on data application rights and transparency.

When Apple made PersonalVoice this time, it collaborated with the non-profit organization TeamGleason Foundation, which helps ALS patients.

CEOBlairCasey of the organization has also been promoting voice generation companies to set a standard recording material setting, allowing users to directly record this part of the material and experience the voice effects generated by different companies, rather than blindly gambling like now.

Meanwhile, Casey also advocates that voice generation companies provide users with recorded voice material data (as many users may become speechless after recording), in order to prevent them from wanting to use this data for other technologies in the future:

If better technology comes out, wouldn't you want to give it a try? If you can't retrieve your voice material, you won't be able to try it out.

AI may be the strongest productivity of our time.

However, how to use this force may be guided by a people-oriented and accessible design.

Tag: In just minutes iPhone can copy your voice

Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

Previous: Hema Established Sustainable Development Department: Annual Total Organic Users Nearly 10 million

Previous: Starbucks' First Airborne CEO: I Now Have a Chinese Name