Chatterbox is fantastic. I created an API wrapper that also makes installation e...

mistersquid · 2025-06-12T18:37:17 1749753437

> Chatterbox is fantastic.

> I created an API wrapper that also makes installation easier (Dockerized as well) https://github.com/travisvn/chatterbox-tts-ap

Gave your wrapper a try and, wow, I'm blown away by both Chatterbox TTS and your API wrapper.

Excuse the rudimentary level of what follows.

Was looking for a quick and dirty CLI incantation to specify a local text file instead of the inline `input` object, but couldn't figure it.

Pointers much appreciated.

travisvn · 2025-06-12T18:50:01 1749754201

This API wrapper was initially made to support a particular use case where someone's running, say, Open WebUI or AnythingLLM or some other local LLM frontend.

A lot of these frontends have an option for using OpenAI's TTS API, and some of them allow you to specify the URL for that endpoint, allowing for "drop-in replacements" like this project.

So the speech generation endpoint in the API is designed to fill that niche. However, its usage is pretty basic and there are curl statements in the README for testing your setup.

Anyway, to get to your actual question, let me see if I can whip something up. I'll edit this comment with the command if I can swing it.

In the meantime, can I assume your local text files are actual `.txt` files?

mistersquid · 2025-06-12T18:52:02 1749754322

This is way more of a response than I could have even hoped for. Thank you so much.

To answer your question, yes, my local text files are .txt files.

travisvn · 2025-06-12T19:03:12 1749754992

Ok, here's a command that works.

I'm new to actually commenting on HN as opposed to just lurking, so I hope this formatting works..

  cat your_file.txt | python3 -c 'import sys, json; print(json.dumps({"input": sys.stdin.read()}))' | curl -X POST http://localhost:5123/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d @- \
    --output speech.wav

Just replace the `your_file.txt` with.. well, you get it.

This'll hopefully handle any potential issues you'd have with quotes or other symbols breaking the JSON input.

Let me know how it goes!

Oh and you might want to change `python3` to `python` depending on your setup.

mistersquid · 2025-06-12T19:13:06 1749755586

> Just replace the `your_file.txt` with.. well, you get it.

> This'll hopefully handle any potential issues you'd have with quotes or other symbols breaking the JSON input.

> Let me know how it goes!

Wow. I'm humbled and grateful.

I'll update once I'm done with work and back in front of my hone nachine.

travisvn · 2025-06-13T02:04:13 1749780253

Hey — just pushed a big update that adds an (opt-in) frontend to test the API

For now, there's just a textarea for input (so you'll have to copy the `.txt` contents) — but it's a lot easier than trying to finagle into a `curl` request

Let me know if you have any issues!

mistersquid · 2025-06-13T03:59:12 1749787152

(Didn't carefully read your reply. What follows are the results of cat-ing a text file in the CLI. Will give the new textbox a whirl in the morning PDT. A truly heartfelt thanks for helping me work with Chatterbox TTS!)

Absolutely blown away.

I fed it the first page of Gibson's "Neuromancer" and your incantation worked like a charm. Thanks for the shell script pipe mojo.

Some other details:

  - 3:01 (3 mins, 1 sec) of generated .wav took 4:28 to process
  - running on M4 Max with 128GB RAM
  - Chatterbox TTS inserted a few strange artifacts which sounded like air venting, machine whirring, and vehicles passing. Very odd and, oddly, apropos for cyberpunk.
  - Chatterbox TTS managed to enunciate the dialog _as_ dialog, even going so far as to mimick an Australian accent where the speaker was identified as such. (This might be the effect of wishful listening.)

I am astounded.

travisvn · 2025-06-13T20:34:34 1749846874

An M4 Max with 128GB RAM? drools

What did your `it/s` end up looking like with that setup? MLX is fascinating to me. Apple made a really smart decision with the induction of its M-series.

With regard to the artifacts — this is definitely a known issue with Chatterbox. I'm unsure of where the current investigation on fixing it is at (or what the "tricks" are to avoid this), but it's definitely something that is eery among other things.

I appreciate your feedback through all of this!

Would love to have you on the Discord to keep in touch https://chatterboxtts.com/discord

mistersquid · 2025-06-15T01:34:38 1749951278

I'll follow up on Discord!

For those following along at home: frontend works (and is quite nice) after updating `vite.config.ts` with a proxy

  server: {
    proxy: {
      // Proxy all API requests to the FastAPI backend
      '/v1': 'http://localhost:4123',
    },
  },

nitroedge · 2025-06-15T05:37:16 1749965836

Spent an hour trying to get it running with a RTX 50 series, no luck, tried with PyTorch 2.7.

Seems built for 2.6.

"chatterbox-tts 0.1.2 requires torch==2.6.0, but you have torch 2.7.0+cu128 which is incompatible. chatterbox-tts 0.1.2 requires torchaudio==2.6.0, but you have torchaudio 2.7.0+cu128 which is incompatible."

venusenvy47 · 2025-06-12T12:29:25 1749731365

Would this be usable on a PC without a GPU?

travisvn · 2025-06-12T18:29:31 1749752971

It can definitely run on CPU — but I'm not sure if it can run on a machine without a GPU entirely.

To be honest, it uses a decently large amount of resources. If you had a GPU, you could expect about 4-5 gb memory usage. And given the optimizations for tensors on GPUs, I'm not sure how well things would work "CPU only".

If you try it, let me know. There are some "CPU" Docker builds in the repo you could look at for guidance.

If you want free TTS without using local resources, you could try edge-tts https://github.com/travisvn/openai-edge-tts