Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved 'Running in Python' section in README #738

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 90 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ You will need two files per voice:

The `MODEL_CARD` file for each voice contains important licensing information. Piper is intended for text to speech research, and does not impose any additional restrictions on voice models. Some voices may have restrictive licenses, however, so please review them carefully!


## Installation

You can [run Piper with Python](#running-in-python) or download a binary release:
Expand All @@ -72,7 +71,6 @@ If you want to build from source, see the [Makefile](Makefile) and [C++ source](
You must download and extract [piper-phonemize](https://github.com/rhasspy/piper-phonemize) to `lib/Linux-$(uname -m)/piper_phonemize` before building.
For example, `lib/Linux-x86_64/piper_phonemize/lib/libpiper_phonemize.so` should exist for AMD/Intel machines (as well as everything else from `libpiper_phonemize-amd64.tar.gz`).


## Usage

1. [Download a voice](#voices) and extract the `.onnx` and `.onnx.json` files
Expand Down Expand Up @@ -113,20 +111,19 @@ The `piper` executable can accept JSON input when using the `--json-input` flag.
Optional fields include:

* `speaker` - string
* Name of the speaker to use from `speaker_id_map` in config (multi-speaker voices only)
* Name of the speaker to use from `speaker_id_map` in config (multi-speaker voices only)
* `speaker_id` - number
* Id of speaker to use from 0 to number of speakers - 1 (multi-speaker voices only, overrides "speaker")
* Id of speaker to use from 0 to number of speakers - 1 (multi-speaker voices only, overrides "speaker")
* `output_file` - string
* Path to output WAV file
* Path to output WAV file

The following example writes two sentences with different speakers to different files:

``` json
{ "text": "First speaker.", "speaker_id": 0, "output_file": "/tmp/speaker_0.wav" }
{ "text": "Second speaker.", "speaker_id": 1, "output_file": "/tmp/speaker_1.wav" }
```


## People using Piper

Piper has been used in the following projects/papers:
Expand All @@ -149,33 +146,109 @@ See the [training guide](TRAINING.md) and the [source code](src/python).

Pretrained checkpoints are available on [Hugging Face](https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main)


## Running in Python

See [src/python_run](src/python_run)

Install with `pip`:
### **1. Installation**

``` sh
First, install Piper using `pip`:

```sh
pip install piper-tts
```

and then run:
### **2. Running Piper**

``` sh
To generate speech from text, use the following command:

```sh
echo 'Welcome to the world of speech synthesis!' | piper \
--model en_US-lessac-medium \
--output_file welcome.wav
```

This will automatically download [voice files](https://huggingface.co/rhasspy/piper-voices/tree/v1.0.0) the first time they're used. Use `--data-dir` and `--download-dir` to adjust where voices are found/downloaded.
The first time you use a model, Piper will automatically download the required [voice files](https://huggingface.co/rhasspy/piper-voices/tree/v1.0.0).

To **control where voice models are stored**, specify `--data-dir` and `--download-dir`:

```sh
piper --model en_US-lessac-medium --output_file welcome.wav \
--data-dir /path/to/data --download-dir /path/to/downloads
```

### **3. Using a GPU (Optional)**

If you'd like to use a GPU, install the `onnxruntime-gpu` package:
If you want to **enable GPU acceleration**, install the `onnxruntime-gpu` package:

```sh
pip install onnxruntime-gpu
```

``` sh
.venv/bin/pip3 install onnxruntime-gpu
Then, run Piper with GPU support:

```sh
piper --model en_US-lessac-medium --output_file welcome.wav --cuda
```

**Note:**
* A properly configured CUDA environment is required.
* If you're using **NVIDIA GPUs**, you can set up CUDA using [NVIDIA's PyTorch containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch).

---

### **4. Running Piper in a Python Script**

If you prefer to use Piper directly in Python, follow these steps:

#### **Step 1: Install Dependencies**

Ensure you have Python 3.7+ installed.
(Optional but recommended) Create a virtual environment:

```sh
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate

# Install Piper
pip install piper-tts
```

#### **Step 2: Download a Voice Model**

Download a voice model before running Piper:

```sh
mkdir -p models/en_US
wget -O models/en_US/en_US-medium.onnx https://huggingface.co/rhasspy/piper/resolve/main/en/en_US-medium.onnx
```

and then run `piper` with the `--cuda` argument. You will need to have a functioning CUDA environment, such as what's available in [NVIDIA's PyTorch containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch).
#### **Step 3: Generate Speech in Python**

Now, you can use Piper in Python:

```python
from piper import PiperTTS

# Load the model
tts = PiperTTS(model_path="models/en_US/en_US-medium.onnx")

# Generate speech from text
audio = tts.synthesize("Hello, I am Piper, your text-to-speech assistant.")

# Save the audio output
with open("output.wav", "wb") as f:
f.write(audio)

print("Speech synthesis complete! Check output.wav")
```

#### **Step 4: Play the Audio**

Once the file `output.wav` is generated, play it with:

```sh
ffplay output.wav # Linux/Mac
start output.wav # Windows
```