Voice Generator: OpenAI introduces AI model for voice cloning

Technology

Following the AI ​​video Sora, OpenAI introduces “Voice Engine,” an AI model for cloning user-defined voices, also used by HeyGen for its eponymous lip-syncing AI video translator. Voice Engine can generate natural voices very close to the speaker’s voice through text input and based on a 15-second audio recording, as the audio examples in the company’s blog show.

Announcement

This appears to be OpenAI will now focus more on voice cloning after Suno AI, ElevenLabs and Co. Voice Engine aims to translate and generate content with the speaker’s voice “so that YouTubers and businesses can reach more people smoothly and with the own voice”. According to OpenAI, the AI ​​only needs a 15-second recording of the human speaker’s voice to copy.

Open AI is aware of the potential for abuse, especially in an election year. That’s why we work with partners across the government, media, entertainment, education and civil society sectors. It is important to take their feedback into account during development. Since the end of last year, Voice Engine has been tested by selected partners to gain experience. In early January the company had its Terms of Use for AI Tools expanded accordingly.

According to its own information, the company is currently “deciding on a preview, but not a full release of this technology.” Based on small-scale test results According to OpenAI A “more informed decision” will be made in the future about “whether and how to use this technology on a large scale.”

According to the blog post, the company began developing the technology in late 2022. This led to the integration of a language feature into ChatGPT. Other projects where Voice Engine is used include “Age of Learning”, a reading aid for children and non-readers, content translation and support for people who cannot speak. The company Dimagi Inc., specialized in the healthcare sector, also focuses on the speech engine and GPT-4. The Livox company it also works with OpenAI for its communications app.

OpenAI cites as an example Lifespan’s Norman Prince Neurosciences Institute, which serves as a teaching institute and is using the communications app as part of a pilot project. The goal is to give patients back the voice they have lost due to oncological or degenerative diseases. For example, the voice of a young patient who could no longer speak fluently due to a brain tumor was restored. A video recorded for the school was used as input for the language model.

Given the risks associated with producing human-like language, OpenAI has implemented a number of security measures. This includes, but is not limited to, watermarking to trace the origin of any audio data generated by the Voice Engine and proactively monitor its usage.

OpenAI emphasizes that any widespread adoption of synthetic speech technology should be accompanied by speech recognition experiences. According to OpenAI, people should be educated to understand the capabilities and limitations of AI technologies, including the possibility of misleading AI content.


(mac)

To the home page

Leave a Reply

Your email address will not be published. Required fields are marked *