Running Piper TTS with JavaScript in the Bun Runtime

I was looking for a simple Text-to-Speech (TTS) solution for my side projects and came across Piper TTS. Intrigued, I decided to give it a try.

While Piper TTS is a fantastic project, it's written in Python. I wanted to run it in JavaScript—because why not?

Although there are a few JavaScript ports of Piper TTS, I enjoy reinventing the wheel, so I decided to create my own version.

First, I examined the Python code to understand its workings. Piper uses onnx and onnxruntime to run the TTS model, which is also possible in JavaScript using onnxruntime-node, a library compatible with the Bun runtime.

Prerequisites

Start by initializing a new Bun project:
bash
```
bun init -y
```

Step 1: Setting Up Piper Phonemizer

The first step in running Piper TTS with JavaScript is phonemization, where text is converted into phonemes. Phonemes are then used to generate the audio.

Download the Phonemizer Binary

You’ll need to download the Piper Phonemizer binary from the releases page and place it in a folder, such as vendor.

For this example, I downloaded the piper-phonemize_windows_amd64.zip file and extracted it to vendor/piper-phonemize.

Creating the Phonemizer Script

Create a file named phonemize.ts and add the following code:

typescript

import { $ } from "bun";
 
const scriptPath = "./vendor/piper-phonemize/bin/piper_phonemize_exe.exe";
const espeakPath = "./vendor/piper-phonemize/share/espeak-ng-data/";
 
type Phonemes = {
  phoneme_ids: number[];
  phonemes: string[];
  processed_text: string;
  text: string;
};
 
export const phonemize = async (text: string, lang: string = "en-us") => {
  const phonemized = await $`echo "${text}" | ${scriptPath} -l ${lang} --espeak-data ${espeakPath} --json`;
  const buffer = phonemized.stdout;
  const str = buffer.toString();
  const phonemes = str
    .split("\n")
    .map((line) => line.trim())
    .filter((line) => line.length > 0)
    .map((line) => JSON.parse(line));
  return phonemes as Phonemes[];
};

This script uses Bun’s shell API to execute the command, parses the phonemizer's output, and returns an array of phonemes.

Step 2: Setting Up Piper TTS

Download the Voice Model

Before running Piper TTS, download a voice model and its configuration file from the Hugging Face model hub. For this example, I used en_US-ryan-high.

Place the following files in a voice folder:

en_US-ryan-high.onnx
en_US-ryan-high.onnx.json

Install Dependencies

Install the onnxruntime-node library:

bash

bun add onnxruntime-node

Create a PCM-to-WAV Converter

Create a file named audio.ts with the following code to convert PCM data to WAV format:

typescript

export function pcm2wav(pcmData: Float32Array, numChannels: number, sampleRate: number): ArrayBuffer {
  const bitDepth = 32;
  const bytesPerSample = bitDepth / 8;
  const blockAlign = numChannels * bytesPerSample;
  const byteRate = sampleRate * blockAlign;
  const dataSize = pcmData.length * bytesPerSample;
  const buffer = new ArrayBuffer(44 + dataSize);
  const view = new DataView(buffer);
 
  writeString(view, 0, "RIFF");
  view.setUint32(4, 36 + dataSize, true);
  writeString(view, 8, "WAVE");
  writeString(view, 12, "fmt ");
  view.setUint32(16, 16, true);
  view.setUint16(20, 3, true);
  view.setUint16(22, numChannels, true);
  view.setUint32(24, sampleRate, true);
  view.setUint32(28, byteRate, true);
  view.setUint16(32, blockAlign, true);
  view.setUint16(34, bitDepth, true);
  writeString(view, 36, "data");
  view.setUint32(40, dataSize, true);
 
  let offset = 44;
  for (let i = 0; i < pcmData.length; i++) {
    view.setFloat32(offset + i * bytesPerSample, pcmData[i], true);
  }
 
  return buffer;
}
 
function writeString(view: DataView, offset: number, string: string) {
  for (let i = 0; i < string.length; i++) {
    view.setUint8(offset + i, string.charCodeAt(i));
  }
}

This function generates a WAV file header and appends PCM audio data to it.