Consider a professional musician who can experiment with new tunes without ever touching an instrument. Or a small company owner easily adding a music to their latest Instagram video ad. That is the promise of AudioCraft, our newest AI tool for creating high-quality, realistic audio and music from text.
MusicGen, AudioGen, and EnCodec are the three models in AudioCraft. MusicGen generates music from text prompts after being taught using Meta-owned and expressly licensed music, whereas AudioGen generates audio from text prompts after being trained on public sound effects. We’re delighted to announce the introduction of an enhanced version of our EnCodec decoder today, which enables for higher quality music creation with fewer artifacts. We’re also offering pre-trained AudioGen models, which allow you to create environmental sounds and sound effects such as a dog barking, a car honking, or footfall on a wooden floor. Finally, we’re making available all of the AudioCraft model weights and code.
AudioCraft: Generative AI for Audio Made
While there has been a lot of buzz about generative AI for photos, video, and text, audio has tended to lag behind. There is some work available, but it is quite complicated and not particularly open, so people cannot easily experiment with it. To generate high-fidelity audio of any kind, complex signals and patterns at diverse scales must be modeled. Music is likely the most difficult sort of audio to create because it is made up of both local and long-range patterns, ranging from a suite of notes to a global musical structure with several instruments.
The AudioCraft line of devices can provide high-quality audio with long-term consistency and are simple to operate. When compared to previous work in the field, AudioCraft simplifies the overall design of generative models for audio — giving people the full recipe to play with the existing models that Meta has been developing over the past several years while also empowering them to push the limits and develop their own models.
Music, sound, compression, and creation are all supported by AudioCraft. People that want to design better sound generators, compression algorithms, or music generators can do so in the same code base and build on top of what others have done because it is easy to build on and reuse.
In the future, having a solid open source basis will stimulate innovation and complement the way we make and listen to audio and music. With more controls, we believe MusicGen can evolve into a new form of instrument, similar to how synthesizers first arose.
We consider the AudioCraft family of models as tools for musicians and sound designers to use for inspiration, rapid brainstorming, and iterating on their works in new ways. We’re excited to see what folks do with Audiocraft.