You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’d love to see a feature where we can generate music or audio clips from natural language prompts. Imagine typing something like, “Create a calm piano melody for relaxation,” and the system generates a track that matches the vibe.
To make this even better, it would be amazing if this worked entirely offline using small, efficient language models (LLMs). This way, users could create music privately without relying on cloud services or heavy hardware requirements.
The text was updated successfully, but these errors were encountered:
It's actually a possible idea, though the quality of the 40t/s wavtokenizer is not great. I am waiting for the WavTokenizer author to release the 75t/s model which hopefully has the fidelity for this purpose jishengpeng/WavTokenizer#42
There's also the problem of obtaining a high quality tagged/captioned audio dataset. The current state of open datasets are extremely dismal - low quality, full of Synthslop or just plain censored.
Hi team,
I’d love to see a feature where we can generate music or audio clips from natural language prompts. Imagine typing something like, “Create a calm piano melody for relaxation,” and the system generates a track that matches the vibe.
To make this even better, it would be amazing if this worked entirely offline using small, efficient language models (LLMs). This way, users could create music privately without relying on cloud services or heavy hardware requirements.
The text was updated successfully, but these errors were encountered: