Using native TTS on multi-platforms

This post is based on a reddit post I wrote recently on the same. It’s both intended to share an experience on language learning with Anki with TTS, and to highlight a couple of issues.

My requirements : I am learning a language (Arabic), I’m intermediate level working to advanced. That means a bit less than 10,000 notes to practice (words, sentences, with most cards designed to produce the foreign language).
I want to practice hearing the foreign language. I practice my Anki cards any time I get a chance, on any platform : Win11, AnkiWeb, AnkiDroid, AnkiMobile.

Complication : Add-ons that produce sound files (like Awesome TTS) are not an adequate solution, because of the number and size of media files to generate, and the impracticality every time I have to add or modify a card.

Idea : Since native TTS voices are becoming quite good on all those platforms, can I teach Anki* to read cards on-the-fly exactly as I want?

Solution 1 and its challenge : the “new” native Anki tts {{tts}}
That looks like the best (easiest) solution. Unfortunately, I cannot make it work on more than one platform at a time (iOS or Win11), and could not make it work on Android (on a couple of Samsung phones/tablets)

The issue seems to be language specific (so you may be more lucky than me for other languages than Arabic). The lang code for Arabic varies : on Windows it’s ar_SA, while on iOS it’s ar-001.
It seems there is no way to tell Anki more that one language in the tts anchor, so that it can fall back on a second or third choice in case the first one doesn’t work, like :
{{tts ar_SA,ar-001:Front}}or {{tts lang:ar-001,ar_SA:Front}}

As for Android, I have no clue why it doesn’t work.

Solution 2 by tinkering with the Web Speech API (i.e. JavaScript).**
I managed to build a script that works on Anki Win11, AnkiWeb on Win11, AnkiWeb on iOS, and AnkiMobile (iOS). No luck with Android (both AnkiWeb and AnkiDroid), even though I have tried several TTS engines (Samsung, Google, and a purchased one : Acapela).

Your thoughts / ideas ?

For those who are interested, in the details below is an abstract of the back of my main card template. My card template includes 2 fields to show on the back : the word ArabicMSA to speak automatically (and repeat if I click on the word), and the sentences in Example (to speak only if I click on them)

{{FrontSide}}

<div style='padding-right:5%;padding-left:5%; background-color:lightgreen;color:black;' onclick='speakWordA(); ' >
  <hr >
  <span style="font-weight: bold; direction: rtl; ">{{ArabicMSA}}
  </span>

  <div style="font-size: xx-small; font-weight: regular; direction: ltr;">
    Audio:
    <span id="TTSmethod"> FILL-IN WITH SCRIPT </span>
    <span id="wordA" style="display: none;">
      {{ArabicMSA}}
    </span>
    <hr>
  </div>
</div>

<div style="padding-right:5%;padding-left:5%;font-size: small; font-weight: regular; direction: ltr;background-color:lightgreen;color:black;" onclick="speakExmple();" >
  <HR>
  <div id='exmple' style="text-align: justify ; font-size:large; font-weight: regular; direction: rtl">
    {{Example}}
  </div>
  <hr>
</div>

<script type="text/javascript">
  // the TTS flag may be replaced by something else (plateforme specific) at some point.
  document.getElementById('TTSmethod').textContent = "TTS";
  var w = document.getElementById("wordA");
  window.setTimeout("speakAR(w.innerText)", 500);
  var w3 = document.getElementById("exmple");

function speakAR(word) {
  // Create a promise-based function
  return new Promise((resolve, reject) => {
    // Check if speech synthesis is supported
    if (!('speechSynthesis' in window)) {
      console.error("Speech synthesis not supported");
      reject("Speech synthesis not supported");
      return;
    }
  const utterance = new SpeechSynthesisUtterance();
  utterance.text = word;
  utterance.volume = 0.8;
  utterance.rate = 1;
  utterance.pitch = 1;
  utterance.lang = "ar-SA";

  // Set up event handlers for the utterance
  utterance.onend = () => resolve();
  utterance.onerror = (event) => reject(`Speech synthesis error: ${event.error}`);

  // Function to find the best Arabic voice
  const findArabicVoice = () => {
    const voices = window.speechSynthesis.getVoices();
    // Try to find the Laila voice first
    let voice = voices.find(v => v.name === 'Laila');
    // If Laila isn't available, look for any Arabic voice
    if (!voice) {
      voice = voices.find(v => v.lang === 'ar-SA');
    }

    // If no exact match, try any voice that starts with 'ar'
    if (!voice) {
      voice = voices.find(v => v.lang.startsWith('ar'));
    }
  return voice;
  };

  // Function to start speaking with the best available voice
  const startSpeaking = () => {
    const voice = findArabicVoice();
    if (voice) {
      utterance.voice = voice;
    } 
    // Cancel any ongoing speech
    window.speechSynthesis.cancel();
    // Start speaking
    window.speechSynthesis.speak(utterance);
  };

  // Get voices and handle browser differences
  const voices = window.speechSynthesis.getVoices();
  if (voices.length > 0) {
    // Voices already loaded (Safari and some other browsers)
    startSpeaking();
  } else if (typeof speechSynthesis.onvoiceschanged !== 'undefined') {
    // Wait for voices to load (Chrome and some other browsers)
    speechSynthesis.onvoiceschanged = () => {
      // Only execute once
      speechSynthesis.onvoiceschanged = null;
      startSpeaking();
      };
    } else {
    // For browsers that don't support onvoiceschanged (like Safari)
    // Try with a delay as a fallback
    setTimeout(startSpeaking, 100);
    }
  });
}


function speakWordA()
{
  speakAR(w.innerText);
}

function speakExmple()
{
  speakAR(w3.innerText);
}
</script>

Producing a cross-platform solution is still difficult today. Even though it is 2025, there does not seem to be an easy solution.

If you use the HypeTTS add-on, you can select the Google Translate voices. They seem to be portable across most platforms. However, not all languages are available on all platforms; mobile platforms especially have a more limited range.

If you want a consistent TTS voice across multiple platforms, I think you underestimate the value and overestimate the difficulty of creating “Collection” audio in HyperTTS – or “Batch” in its earlier cousin, AwesomeTTS. Between the 2, I’d recommend Hyper – see the online Tutorials .

I’ve used both add-ons over the years for vocab and multiple sentences across my 7000-note primary deck. The mp3 files it produces are tiny – 5 KB or less for a word, 50 KB or less for a sentence – so I don’t find filesize to be an issue. The total media for that deck is under 150 MB.

Yes, initially generating the audio files for a large deck takes time, but it’s something you can turn on and walk away from, getting it done in batches. For newly added notes, once you have your “rules” set up, you really can add your audio with as little as one click. [I like to do it field-by-field, to verify the audio is good, so it’s a few more clicks for me.]

That is a bother that they don’t use the same language code. But I wonder if each OS would just ignore the tag that they don’t understand the code for. Have you tried separating the languages into different tags?

{{tts ar_SA:Front}}
{{tts ar-001:Front}}

If that doesn’t work with the {{tts}} tags, you could also try the [anki:tts] tags. Field Replacements - Anki Manual

The template might complain about it, but if it works when you’re studying, there’s no harm. It might show 2 play buttons. You can’t use platform-specific CSS to “hide” audio from auto-playing, but you should be able to use it to hide the button that isn’t in use. Styling & HTML - Anki Manual

1 Like