Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Music Based on Natural Language Prompts Using Local LLMs #1324

Open
DjagbleyEmmanuel opened this issue Jan 20, 2025 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@DjagbleyEmmanuel
Copy link

Hi team,

I’d love to see a feature where we can generate music or audio clips from natural language prompts. Imagine typing something like, “Create a calm piano melody for relaxation,” and the system generates a track that matches the vibe.

To make this even better, it would be amazing if this worked entirely offline using small, efficient language models (LLMs). This way, users could create music privately without relying on cloud services or heavy hardware requirements.

@LostRuins
Copy link
Owner

It's actually a possible idea, though the quality of the 40t/s wavtokenizer is not great. I am waiting for the WavTokenizer author to release the 75t/s model which hopefully has the fidelity for this purpose jishengpeng/WavTokenizer#42

There's also the problem of obtaining a high quality tagged/captioned audio dataset. The current state of open datasets are extremely dismal - low quality, full of Synthslop or just plain censored.

@LostRuins LostRuins added the enhancement New feature or request label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants