HyperTTS 2.6.0: OpenAI gpt-4o-mini-tts for Text to Speech, other updates

The OpenAI gpt-4o-mini-tts model is now supported in HyperTTS. This is a feature contributed by Claus (thank you!). You have to select it in the voice options (the default is still tts-1-hd for OpenAI). This voice model accepts an optional instruction field. You can use it to instruct the model to speak in a certain way, or indicate that the source text is in a particular language. For more details, you can consult the OpenAi reference. As with neural and LLM models, the actual output will really vary with the situation, so you’ll have to experiment. Claus’ feedback is: GPT-4o mini TTS is the first OpenAI TTS model that provides usable output for my Greek flashcards. The common feedback with OpenAI (and also ElevenLabs) is that non-english output was not that good and suffered from an american accent. Hopefully this new model improves that.

OpenAI gpt-4o-mini-tts model with instructions

So what’s next ? People have been asking for Google Gemini TTS. I’ve been working on this for HyperTTS but there’s a serious limitation, google limits requests to 10 per minute, even on Tier 1 paid accounts. This means mass-generating Gemini audio will be a tedious process, and HyperTTS will need to implement some retry logic, which will be welcome anyway, to handle the occasional timeout. Besides that here are the issues that I will tackle in the coming weeks in HyperTTS.

Besides that, in the coming months, I’d like to make progress on an idea I started before Christmas: generating long-play audio files with Anki flashcard sounds, so that you can review your deck while walking. Kind of like a podcast. I have a working prototype but I need to finish it. I’m also actively thinking about how Language Tools can use LLMs. This is honestly an overdue feature given how AI chatbots have become amazing at translation, but also transliteration (for Chinese, you can confidently ask gpt-4 to convert to Pinyin).

2 Likes