Riffusion’s AI generates music from text using visual sonograms

Enlarge / An AI-generated image of musical notes exploding forth from a computer monitor.Ars Technica

On Thursday, a pair of tech hobbyists released Riffusion, an AI model that generates music from text prompts by creating a visual representation of sound and converting it to audio for playback. It uses a fine-tuned version of the Stable Diffusion 1.5 image synthesis model, applying visual latent diffusion to sound processing in a novel way.

Created as a hobby project by Seth Forsgren and Hayk Martiros, Riffusion works by generating sonograms, which store audio in a two-dimensional image. In a sonogram, the X-axis represents

→ Continue reading at Ars Technica

Related articles

Comments

Share article

Latest articles