A guide to using Anki with speech-dispatcher + PiperTTS on Linux

Linux’s support for TTS has come a long way in the last few years, and it is now feasible to use Linux’s native TTS capabilities with Anki (and have good voices for it thanks to PiperTTS). I spent yesterday tinkering and getting everything to work, and decided to write a guide for anyone else who wants to do this.

This guide was written based on my experiences setting everything up on Debian stable (trixie). Exact details may differ for other distributions, but hopefully this will be enough to help figure the process out no matter what distribution you are using.

Part 1: speech-dispatcher

Speech-dispatcher is the service on Linux for handling TTS. It doesn’t do any TTS itself, its purpose is to act as a go-between, relaying requests from programs that want to use TTS to whatever TTS-handling software you have installed on your system.

On many systemd-based distributions, it runs as a user-level service, and can be managed with commands like systemctl --user restart speech-dispatcher. On some distros, it may run as a system-level service instead.

Where to find config files and logs:

  • The configuration files for speech-dispatcher can be found under /etc/speech-dispatcher/ (system-wide) or ~/.config/speech-dispatcher/ (user-level). Its primary configuration file is usually named speechd.conf. If you don’t have user-level config files and want to generate them, you can use the command spd-conf -uc.
  • Speech-dispatcher uses “modules” (text files telling it how to interact with different TTS software available on your computer). These modules are located in a subdirectory of the config directory, eg. /etc/speech-dispatcher/modules/ or ~/.config/speech-dispatcher/modules.
  • Where speech-dispatcher keeps its log files is going to vary by distro. When running as a user-level service on Debian trixie, it put them in /run/user/1000/speech-dispatcher/log/ . On some systems when run as a user-level service it may use ~/.cache/speech-dispatcher/log/ . When run as a system-level service, the log directory is usually /var/log/speech-dispatcher . The main log file is named speech-dispatcher.log. Each module also has its own log file in the same directory, for example, if you set up a module named “piper” its log file will be named piper.log.

Useful things to know about speech-dispatcher:

  • The manual is here
  • You can test speech-dispatcher with the command spd-say "Hello world". The default voice may not sound very good, but we’ll fix that in later parts of this guide.
  • You can view a list of available modules with spd-say -O, and specify a specific module to use with spd-say -o name_of_module.
  • You can view a list of availale voices with spd-say -L, and specify a voice to use with spd-say -y name_of_voice.

Troubleshooting speech-dispatcher:

  • speech-dispatcher has a basic self-diagnostic you can invoke with spd-conf --diagnostics, which may help diagnose problems if you can’t get audio with basic command-line tests of spd-say.
  • For other troubleshooting, try setting LogLevel 5 (the highest level of detail) in speechd.conf.
  • speech-dispatcher is designed with the assumption that it may be used by people who rely on it for accessibility/screen-reader use, so if one voice or TTS module doesn’t work, it will automatically fall back to another so it can still make audio. If speech-dispatcher is using the wrong voice/wrong TTS software, you are probably seeing a fallback behavior.
  • If speech-dispatcher has encountered an error with a module, it will sometimes stop using that module and just go straight to the fallback. Restarting speech-dispatcher will get it to try that module again.

Part 2: PiperTTS

PiperTTS is one of several options for getting good-quality modern TTS voices on Linux. Other options include Coqui and MaryTTS.

Instructions for installing PipeTTS will often say to run pip install piper-tts to install it, but many modern Linux distributions treat the main system Python as distro-managed and will discourage this. On Debian, I wound up using pipx install piper-tts (as a user, not with sudo), to install piper in an isolated environment with its dependencies. That installed it as an executable located at ~/.local/bin/piper. You may be able to find other options for installing it at the piper-tts github.

There is a tool for installing and managing piper called pied. It is not made by the team behind piper, it’s something somebody else made. I didn’t use it, so there won’t be any details about it in this guide.

If you did an install of piper under your user account, you’ll need to make sure the location you installed it to is in your path. To do that, add export PATH="${PATH}:/home/yourusername/.local/bin/" to your .bashrc file and then run source ~/.bashrc.

Once you’ve installed piper, you’ll need to get some voices for it. You can listen to voices online here, and each voice has a link to where you can download it. You’ll need the .onnx file and the .onnx.json file for each voice. For my downloaded voices, I created a directory ~/.local/share/piper/, then put the voices in ~/.local/share/piper/voices/, but you can put them anywhere.

After you download voices, it’s time to test piper:

  • If your audio system is PulseAudio: echo "Hola mundo" | piper --model ~/.local/share/piper/voices/es_MX-claude-high.onnx --output-raw | paplay --raw --rate=22050 --channels=1 --format s16le
  • If your audio system is PipeWire, replace paplay with pw-play
  • As a fallback, you can also test with aplay: echo "Hola mundo" | piper --model ~/.local/share/piper/voices/es_MX-claude-high.onnx --output-raw | aplay -r 22050 -f S16_LE -t raw

There are lots of options you can pass to piper to do things like make the voice speak slower/faster, use piper --help to see them all.

Part 3: Writing a wrapper for PiperTTS

To simplify the next step of creating a module that lets speech-dispatcher use PiperTTS, I wrote a wrapper for PiperTTS that takes as input the information that speech-dispatcher supplies:

#!/bin/bash

# General settings
username='myusername'
voicedir="/home/$username/.local/share/piper/voices/"
pipercmd="/home/$username/.local/bin/piper" 

# Get the input parameters that come from speech-dispatcher
DATA=$1
LANGUAGE=$2
VOICE=$3
PITCH=$4
PITCH_RANGE=$5
RATE=$6

# Uncomment for debugging. 1>&2 sends output to the standard error so it will appear in the speech-dispatcher log file for your module.
# echo "DATA: $DATA" 1>&2
# echo "LANGUAGE: $LANGUAGE" 1>&2
# echo "VOICE: $VOICE" 1>&2
# echo "PITCH: $PITCH" 1>&2
# echo "PITCH_RANGE: $PITCH_RANGE" 1>&2
# echo "RATE: $RATE" 1>&2

# Find the model and the .json file, use the .json file to get the correct sample rate for the voice
model="${voicedir}/${VOICE}.onnx"
json="${model}.json"
sample_rate=`grep sample_rate $json | sed 's/.*://g' | tr -d ' ' | tr -d ','`

echo $DATA | $pipercmd --model "${model}" --output-raw | paplay --raw --rate=$sample_rate --channels=1 --format s16le # If you have PipeWire, replace paplay with pw-play

Currently this is a fairly basic wrapper that just uses the specified voice, finds the correct sample rate for the voice (22050 for most piper voices, 16000 for piper voices with “-low” in their names), and uses it to play audio. It also has the ability to output all options passed to it by speech-dispatcher, which is useful for troubleshooting.

speech-dispatcher will pass along any information the software requesting the TTS gave regarding pitch and speed. My wrapper doesn’t use that information, but there is a post with some discussion about how to do that. (A few different people give different versions in that thread, I don’t know which one is correct).

I put my wrapper in ~/.local/bin/piper_wrapper and made it executable.

Part 4: Making speech-dispatcher use PiperTTS

To make speech-dispatcher use piper, you’ll need to create a module to tell it how to use it. I named mine piper.conf and put it in ~/.config/speech-dispatcher/modules (you could also put it in /etc/speech-dispatcher/modules/ ).

After creating your module, you’ll need the following lines in your speechd.conf file:

AddModule "piper" "sd_generic" "piper.conf"
DefaultModule piper

There will probably be an existing DefaultModule line in speechd.conf, which you should edit instead of adding a new one.

The module I created at ~/.config/speech-dispatcher/modules/piper.conf looked like this:

GenericExecuteSynth "/home/myusername/.local/bin/piper_wrapper \"$DATA\" $LANGUAGE $VOICE $PITCH $PITCH_RANGE $RATE"
GenericLanguage "en-US" "en-US" "utf-8"
GenericLanguage "en-us" "en-US" "utf-8"
GenericLanguage "en-gb" "en-GB" "utf-8"
GenericLanguage "es" "es" "utf-8"
AddVoice "en-GB" "female1" "en_GB-alba-medium"
AddVoice "en-GB" "male1" "en_GB-northern_english_male-medium"
AddVoice "en-GB" "female2" "en_GB-southern_english_female-low"
AddVoice "en-US" "female1" "en_US-amy-medium"
AddVoice "en-US" "female2" "en_US-hfc_female-medium"
AddVoice "en-US" "male1" "en_US-joe-medium"
AddVoice "en-US" "female3" "en_US-kathleen-low"
AddVoice "en-US" "male2" "en_US-norman-medium"
AddVoice "es" "male1" "es_ES-davefx-medium"
AddVoice "es" "female1" "es_MX-claude-high"
DefaultVoice "es_MX-claude-high"

Mine is long because I had a lot of voices, yours doesn’t have to be if the only thing you’ll use TTS for is Anki.

To break down what the module does chunk-by-chunk:

GenericExecuteSynth "/home/myusername/.local/bin/piper_wrapper \"$DATA\" $LANGUAGE $VOICE $PITCH $PITCH_RANGE $RATE"

This is the line that tells speech-dispatcher how to run piper. I decided to have it pass data to my piper_wrapper script instead of putting one very long command in this line, it makes debugging/testing easier.

GenericLanguage "en-US" "en-US" "utf-8"
GenericLanguage "en-us" "en-US" "utf-8"
GenericLanguage "en-gb" "en-GB" "utf-8"
GenericLanguage "es-es" "es-ES" "utf-8"
GenericLanguage "es-mx" "es-MX" "utf-8"

These lines exist because without them speech-dispatcher will default to the ISO-8859-1 (latin1) character set that doesn’t work with Spanish, and accented characters like the á in está will come through as est� and cause piper to malfunction.

With a newer version of speech-dispatcher, you can (and should) replace all of the GenericLanguage lines in my example with just this:

GenericDefaultCharset "utf-8"

But the older version of speech-dispatcher on Debian trixie didn’t support that option. It also was quirky about capitalization, which is why I have two lines for US English, one of which renames en-us (lowercase) to en-US (uppercase). Hopefully in newer versions of speech-dispatcher that case-related behavior is fixed too.

Next are the lines for setting up voices for languages, which will look like this:

AddVoice "es" "male1" "es_ES-davefx-medium"
AddVoice "es" "female1" "es_MX-claude-high"
  • “es” specifies the language.
  • The next field gives a voice name/type recognized by certain types of software, which can only be male1-male3, female1-female3, child_male, and child_female. If you put other things besides the allowed values in that field, the voice will not be recognized by speech-dispatcher. But don’t worry, you’ll still see the actual voice names anywhere you interact with voice selection.
  • The last field gives the name you want speech-dispatcher to use for the voice when it invokes your wrapper. This field is also what you’ll see for voice names if you list available voices with spd-say -L.

Finally, (and VERY important), the line for specifying the default voice if speech-dispatcher doesn’t know which one to choose based on a request. Due to the current limitations of the Anki addon for using speech-dispatcher, whatever voice you set in your DefaultVoice line will be the one used by Anki. If you do not set a DefaultVoice line, Anki’s request to speech-dispatcher will probably lead to the wrapper script being passed a request for the voice “no_voice”, the module failing, and speech-dispatcher falling back to a non-piper voice for future requests.

DefaultVoice "es_MX-claude-high"

Full documentation for writing custom modules for speech-dispatcher can be found here.

Part 5: Troubleshooting common problems getting PiperTTS to work with speech-dispatcher

  • Error message: The module does not have any voice configured, please add them in the configuration file, or install the required files

This usually means there’s something invalid in your AddVoice lines in the piper.conf module you created, often a value other than one of the allowed values for the second field.

  • Error message: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe1 in position 3: invalid continuation byte

This error message is produced by piper and written to the piper.log file for the module when speech-dispatcher sends piper text using the wrong character set and piper can’t understand it (eg. my example with está vs. est� above). You’ll need to set the correct character set in your piper.conf module, using either GenericDefaultCharset (if your speech-dispatcher version is new enough to support it) or GenericLanguage (for older versions of speech-dispatcher).

  • Error message: Error: Module reported error in request from speechd (code 3xx): 300-Opening sound device failed. Reason: server audio is not supported.

This error message is perfectly normal and not a problem. What’s happening when you see this message is that your piper.conf module isn’t capable of sending audio output to the system audio server via speech-dispatcher, and is instead piping its output directly to paplay or pw-play.

If you have a verbose log-level set for speech-dispatcher, that line in speech-dispatcher.log will be followed by Output module does not support audio output through server, making it open audio by itself (which shows that speech-dispatcher is correctly handling the situation and nothing is actually wrong), but at less-verbose log output levels you may not see that follow-up line.

  • Error message: ValueError: Unable to find voice: /home/yourusername/.local/share/piper/voices/no_voice.onnx (use piper.download_voices)

This error message happens when speech-dispatcher can’t decide which of the AddVoice lines in piper.conf to use, so it sends the piper wrapper script a request for voice name “no_voice”. You can prevent this by including a DefaultVoice line in your piper.conf module.

I encountered this problem quite frequently due to problems with matching the requested language (en-US, en-us, es, es-MX, etc.) to my AddVoice lines. The locale on my computer is en-US (I’m learning Spanish, not using it full-time in my daily life!), and various software tools I’ve been using text-to-speech with often accidentally send their Spanish text to speech-dispatcher as language en-US.

  • General cause of problems: Not using absolute paths in your piper.conf module and piper_wrapper script.

When things are running non-interactively, they may not have access to the PATH setting you’d have in an interactive shell, and may or may not understand the use of ~ to mean home. So if logs show a problem finding the piper command or the piper_wrapper script, that is probably why.

  • General cause of problems: Needing to restart something

Whenever you make changes to your piper.conf setup, you’ll need to restart speech-dispatcher for the change to take effect. Also, if you fix an error in the piper_wrapper script, you’ll probably need to restart speech-dispatcher after that too, since if it detected an error it may have activated a fallback mode and stopped using piper.

If speech-dispatcher is running as a user-level service under systemd, the command to restart it is systemctl --user restart speech-dispatcher. If your distribution has it configured some other way, you’ll need a different command to restart it.

When you restart speech-dispatcher, TTS is likely to stop working in Anki, your web browser, etc. until you restart those programs so they see the new speech-dispatcher instance.

Part 6: Getting Anki to use speech-dispatcher

Anki currently doesn’t have native support for speech-dispatcher on Linux. (I am hoping that will change in the future now that it’s fairly standard across Linux distributions and can use good-quality modern voices!)

There is an addon for it written by forum member sudomain. It has some limitations, but I’m very grateful to sudomain for writing and sharing it!

After you install the addon, if you put {{tts-voices:}} in your card template to see available voices, it will show you exactly one option: {{tts es_US voices=speechd}}. If your tts entry in your Anki cards happens to be using en_US for language, you’re all set, they should work on Linux now (and you don’t need to specify a voice, since “speechd” is the only option, and will use whatever you set as DefaultVoice in piper.conf).

But I ran into a problem with this, because my tts entries specified es_MX in order to get the right voice when using the cards on my iPhone: {{tts es_MX voices=Apple_Paulina:SpanishWord}}. Because the Anki addon for speech-dispatcher identifies itself as en_US, my existing es_MX TTS entries didn’t match/use it.

To fix this, I downloaded the addon source code from sudomain’s github repository, and edited the line in __init__.py that hard-coded en_US to hard-code es_MX instead. Then I removed the original addon, copied the modified code to ~/.local/share/Anki2/addons21/anki-TTS-speech-dispatcher-modified/, restarted Anki, and… done!! My es_MX TTS entries now work on Linux as well as on my iPhone.

Wrap-up

After spending yesterday tinkering with this, I can now use Linux Anki with modern, decent-sounding text-to-speech voices that use Anki’s normal {{tts: }} tags and which don’t require use of external cloud services.

If you don’t like the piper voices, piper is only one of many options for Linux - you can modify my example piper speech-dispatcher module to work with the TTS software of your choice. And since speech-dispatcher provides an interface layer for any software seeking to use a text-to-speech service, the process of getting it to work with Anki will be the same no matter whether you use piper, coqui, marytts, or something else.

The Anki addon for letting it use Linux’s speech-dispatcher has some limitations (can’t choose a voice, hard-coded en_US language, no speed adjustment), but it gets the job done. I am very grateful to sudomain for writing and sharing it!!

I hope Anki might include built-in support for speech-dispatcher on Linux in the future. Speech-dispatcher is now the standard for TTS on most Linux distributions, and it offers a consistent interface for Anki to use regardless of the underlying TTS back-end (piper, coqui, etc.) you get your voices from.

Just discovered the forum software here has an @ feature.

@sudomain, thank you for writing and sharing the addon to let Anki use speech-dispatcher, without that none of this would have been possible!!