Generate Music Based on Natural Language Prompts Using Local LLMs #1324

DjagbleyEmmanuel · 2025-01-20T14:27:58Z

Hi team,

I’d love to see a feature where we can generate music or audio clips from natural language prompts. Imagine typing something like, “Create a calm piano melody for relaxation,” and the system generates a track that matches the vibe.

To make this even better, it would be amazing if this worked entirely offline using small, efficient language models (LLMs). This way, users could create music privately without relying on cloud services or heavy hardware requirements.

LostRuins · 2025-01-20T15:11:13Z

It's actually a possible idea, though the quality of the 40t/s wavtokenizer is not great. I am waiting for the WavTokenizer author to release the 75t/s model which hopefully has the fidelity for this purpose jishengpeng/WavTokenizer#42

There's also the problem of obtaining a high quality tagged/captioned audio dataset. The current state of open datasets are extremely dismal - low quality, full of Synthslop or just plain censored.

LostRuins added the enhancement New feature or request label Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Music Based on Natural Language Prompts Using Local LLMs #1324

Generate Music Based on Natural Language Prompts Using Local LLMs #1324

DjagbleyEmmanuel commented Jan 20, 2025

LostRuins commented Jan 20, 2025

Generate Music Based on Natural Language Prompts Using Local LLMs #1324

Generate Music Based on Natural Language Prompts Using Local LLMs #1324

Comments

DjagbleyEmmanuel commented Jan 20, 2025

LostRuins commented Jan 20, 2025