Running Piper TTS with JavaScript in the Bun Runtime
I was looking for a simple Text-to-Speech (TTS) solution for my side projects and came across Piper TTS. Intrigued, I decided to give it a try.
While Piper TTS is a fantastic project, it's written in Python. I wanted to run it in JavaScript—because why not?
Although there are a few JavaScript ports of Piper TTS, I enjoy reinventing the wheel, so I decided to create my own version.
First, I examined the Python code to understand its workings. Piper uses
onnx
and onnxruntime
to run the TTS model, which is also possible in JavaScript using onnxruntime-node
, a library compatible with the Bun runtime.Prerequisites
- Start by initializing a new Bun project:
bash
bun init -y
Step 1: Setting Up Piper Phonemizer
The first step in running Piper TTS with JavaScript is phonemization, where text is converted into phonemes. Phonemes are then used to generate the audio.
Download the Phonemizer Binary
You’ll need to download the Piper Phonemizer binary from the releases page and place it in a folder, such as
vendor
.For this example, I downloaded the
piper-phonemize_windows_amd64.zip
file and extracted it to vendor/piper-phonemize
.Creating the Phonemizer Script
Create a file named
phonemize.ts
and add the following code:typescript
import { $ } from "bun";
const scriptPath = "./vendor/piper-phonemize/bin/piper_phonemize_exe.exe";
const espeakPath = "./vendor/piper-phonemize/share/espeak-ng-data/";
type Phonemes = {
phoneme_ids: number[];
phonemes: string[];
processed_text: string;
text: string;
};
export const phonemize = async (text: string, lang: string = "en-us") => {
const phonemized = await $`echo "${text}" | ${scriptPath} -l ${lang} --espeak-data ${espeakPath} --json`;
const buffer = phonemized.stdout;
const str = buffer.toString();
const phonemes = str
.split("\n")
.map((line) => line.trim())
.filter((line) => line.length > 0)
.map((line) => JSON.parse(line));
return phonemes as Phonemes[];
};
This script uses Bun’s shell API to execute the command, parses the phonemizer's output, and returns an array of phonemes.
Step 2: Setting Up Piper TTS
Download the Voice Model
Before running Piper TTS, download a voice model and its configuration file from the Hugging Face model hub. For this example, I used
en_US-ryan-high
.Place the following files in a
voice
folder:en_US-ryan-high.onnx
en_US-ryan-high.onnx.json
Install Dependencies
Install the
onnxruntime-node
library:bash
bun add onnxruntime-node
Create a PCM-to-WAV Converter
Create a file named
audio.ts
with the following code to convert PCM data to WAV format:typescript
export function pcm2wav(pcmData: Float32Array, numChannels: number, sampleRate: number): ArrayBuffer {
const bitDepth = 32;
const bytesPerSample = bitDepth / 8;
const blockAlign = numChannels * bytesPerSample;
const byteRate = sampleRate * blockAlign;
const dataSize = pcmData.length * bytesPerSample;
const buffer = new ArrayBuffer(44 + dataSize);
const view = new DataView(buffer);
writeString(view, 0, "RIFF");
view.setUint32(4, 36 + dataSize, true);
writeString(view, 8, "WAVE");
writeString(view, 12, "fmt ");
view.setUint32(16, 16, true);
view.setUint16(20, 3, true);
view.setUint16(22, numChannels, true);
view.setUint32(24, sampleRate, true);
view.setUint32(28, byteRate, true);
view.setUint16(32, blockAlign, true);
view.setUint16(34, bitDepth, true);
writeString(view, 36, "data");
view.setUint32(40, dataSize, true);
let offset = 44;
for (let i = 0; i < pcmData.length; i++) {
view.setFloat32(offset + i * bytesPerSample, pcmData[i], true);
}
return buffer;
}
function writeString(view: DataView, offset: number, string: string) {
for (let i = 0; i < string.length; i++) {
view.setUint8(offset + i, string.charCodeAt(i));
}
}
This function generates a WAV file header and appends PCM audio data to it.
Running Piper TTS
Create a file named
piper.ts
and write the following code:typescript
import ort from "onnxruntime-node";
import { phonemize } from "./phonemize";
import { pcm2wav } from "./audio";
const VOICE_DIR = "./voice";
const VOICE = "en_US-ryan-high";
const VOICE_ONNX_PATH = `${VOICE_DIR}/${VOICE}.onnx`;
const VOICE_CONFIG_PATH = `${VOICE_DIR}/${VOICE}.onnx.json`;
const voiceConfig = await Bun.file(VOICE_CONFIG_PATH).json();
export const predict = async (text: string) => {
const phonemes = await phonemize(text);
const phonemeIds = phonemes.flatMap((phoneme) => phoneme.phoneme_ids);
const { sample_rate: sampleRate, inference } = voiceConfig.audio;
const feed = {
input: new ort.Tensor("int64", phonemeIds, [1, phonemeIds.length]),
input_lengths: new ort.Tensor("int64", [phonemeIds.length]),
scales: new ort.Tensor("float32", [inference.noise_scale, inference.length_scale, inference.noise_w]),
};
const ortSession = await ort.InferenceSession.create(VOICE_ONNX_PATH);
const output = await ortSession.run(feed);
const pcm = output.output.data;
const blob = new Blob([pcm2wav(pcm as Float32Array, 1, sampleRate)], { type: "audio/x-wav" });
await Bun.write("./output.wav", blob);
};
Run the script with:
typescript
await predict("Hello world");
This creates a file named
output.wav
in the root directory containing the synthesized speech.Conclusion
This guide demonstrates how to set up and run Piper TTS using JavaScript in the Bun runtime. With this foundation, you can build your own voice assistant or integrate TTS into your applications.