Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tts: add speaker file support #12048

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

dm4
Copy link
Contributor

@dm4 dm4 commented Feb 24, 2025

  • Added support for TTS speaker files, including a new command-line option --tts-speaker-file to specify the file path.
  • Implemented JSON handling in tts.cpp to load and parse speaker data, enhancing audio generation capabilities.

@dm4 dm4 force-pushed the dm4/tts-speaker-file branch from ea8711d to bf3f5ee Compare February 24, 2025 11:02
@ngxson
Copy link
Collaborator

ngxson commented Feb 26, 2025

@edwko Could you please have a look on this PR?

@edwko
Copy link

edwko commented Feb 27, 2025

@ngxson @dm4 Looks good! Just a couple of thoughts, this would handle only v0.2 it might make sense to do this more dynamically, maybe add versioning logic similar to this PR #11287

Maybe get version from common_get_builtin_chat_template, or I could add more metadata to the speaker files (like a version fields) to construct the prompt based on the specific version.

// Something like this:

double get_speaker_version(json speaker) {
    if (speaker.contains("version")) {
        return speaker["version"].get<double>();
    } 
    // Also could get version from model itself
    // if (common_get_builtin_chat_template(model) == "outetts-0.3") {
    //     return 0.3;
    // }
    return 0.2;
}

static std::string audio_text_from_speaker(json speaker) {
    std::string audio_text = "<|text_start|>";
    double version = get_speaker_version(speaker);
    
    if (version <= 0.3) {
        std::string separator = (version == 0.3) ? "<|space|>" : "<|text_sep|>";
        for (const auto &word : speaker["words"])
            audio_text += word["word"].get<std::string>() + separator;
    }
    else if (version > 0.3) {
        // Future version support could be added here
    }

    return audio_text;
}

// static std::string audio_data_from_speaker(json speaker) would also need some adjustments to support different versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants