Nvidia has developed Fugatto, a new artificial intelligence (AI) model that can generate and transform any mix of music, voices and sounds using text prompts. The company promises that the resource is capable of creating melodic compositions “never heard before.”
Fugatto (Foundational Generative Audio Transformer Opus 1) is the result of a research work carried out by the engineers of the big tech. Rafael Valle, applied audio research manager at Nvidia, indicates that “the intention was to create an AI engine that could understand and produce audio pieces just as humans do”.
How does Nvidia’s new AI model work?
Fugatto is the first fundamental generative AI model with emergent features, according to the researchers. It is capable of attending to different tasks included in a single indication thanks to the interaction of various individually trained skills. “This is a great advance towards a future in which unsupervised multitask learning in audio synthesis and transformation arises from the scale of data and models,” explains Valle.
The music engine is enabled with voice modeling, vocoding and audio understanding technologies. It uses 2.5 billion parameters and was trained on a bench of Nvidia DGX systems including 32 H100 Tensor Core GPUs. The engineers used a technique known as ComposableART which empowers the algorithm to combine instructions and small groups of data that it learned separately. Consequently, the system can process complex requests and responses. “Fugatto can make a trumpet bark or a saxophone meow. Generates high-quality singing voices and sounds that change over time. It facilitates the creation of soundscapes never seen before,” say its developers.
Nvidia claims that its new AI has been input with “a combined data set containing millions of audio samples.” He did not give details about the sources of these materials. He limited himself to stating that the work of information collection, research and development lasted more than a year.
The company led by Jen-Hsun Huang has been accused in the past of having used content without authorization to train its AI models. Analysis organization Proof News found that the subtitles of 173,536 YouTube videos, taken from more than 48,000 channels, were used without consent by firms such as Anthropic, Nvidia, Apple and Salesforce to train their intelligent algorithms. Google’s video site has explicit rules prohibiting this practice.
Nvidia has not revealed whether Fugatto will be available to the general public. Despite this, he indicates that “music producers could use the model to create the prototype of a song. An advertising agency would be able to apply this technology to adapt a campaign to multiple regions or situations, applying different accents and emotions to the voices in off. Additionally, video game developers could use the model to modify pre-recorded assets in their title to adapt to changing actions. All from text instructions and optional audio inputs.”
#Nvidia #presents #Fugatto #revolutionary #engine #capable #creating #editing #music #voices #text