Using native TTS on multi-platforms

This post is based on a reddit post I wrote recently on the same. It’s both intended to share an experience on language learning with Anki with TTS, and to highlight a couple of issues.

My requirements : I am learning a language (Arabic), I’m intermediate level working to advanced. That means a bit less than 10,000 notes to practice (words, sentences, with most cards designed to produce the foreign language).
I want to practice hearing the foreign language. I practice my Anki cards any time I get a chance, on any platform : Win11, AnkiWeb, AnkiDroid, AnkiMobile.

Complication : Add-ons that produce sound files (like Awesome TTS) are not an adequate solution, because of the number and size of media files to generate, and the impracticality every time I have to add or modify a card.

Idea : Since native TTS voices are becoming quite good on all those platforms, can I teach Anki* to read cards on-the-fly exactly as I want?

Solution 1 and its challenge : the “new” native Anki tts {{tts}}
That looks like the best (easiest) solution. Unfortunately, I cannot make it work on more than one platform at a time (iOS or Win11), and could not make it work on Android (on a couple of Samsung phones/tablets)

The issue seems to be language specific (so you may be more lucky than me for other languages than Arabic). The lang code for Arabic varies : on Windows it’s ar_SA, while on iOS it’s ar-001.
It seems there is no way to tell Anki more that one language in the tts anchor, so that it can fall back on a second or third choice in case the first one doesn’t work, like :
{{tts ar_SA,ar-001:Front}}or {{tts lang:ar-001,ar_SA:Front}}

As for Android, I have no clue why it doesn’t work.

Solution 2 by tinkering with the Web Speech API (i.e. JavaScript).**
I managed to build a script that works on Anki Win11, AnkiWeb on Win11, AnkiWeb on iOS, and AnkiMobile (iOS). No luck with Android (both AnkiWeb and AnkiDroid), even though I have tried several TTS engines (Samsung, Google, and a purchased one : Acapela).

Your thoughts / ideas ?

For those who are interested, in the details below is an abstract of the back of my main card template. My card template includes 2 fields to show on the back : the word ArabicMSA to speak automatically (and repeat if I click on the word), and the sentences in Example (to speak only if I click on them)

{{FrontSide}}

<div style='padding-right:5%;padding-left:5%; background-color:lightgreen;color:black;' onclick='speakWordA(); ' >
  <hr >
  <span style="font-weight: bold; direction: rtl; ">{{ArabicMSA}}
  </span>

  <div style="font-size: xx-small; font-weight: regular; direction: ltr;">
    Audio:
    <span id="TTSmethod"> FILL-IN WITH SCRIPT </span>
    <span id="wordA" style="display: none;">
      {{ArabicMSA}}
    </span>
    <hr>
  </div>
</div>

<div style="padding-right:5%;padding-left:5%;font-size: small; font-weight: regular; direction: ltr;background-color:lightgreen;color:black;" onclick="speakExmple();" >
  <HR>
  <div id='exmple' style="text-align: justify ; font-size:large; font-weight: regular; direction: rtl">
    {{Example}}
  </div>
  <hr>
</div>

<script type="text/javascript">
  // the TTS flag may be replaced by something else (plateforme specific) at some point.
  document.getElementById('TTSmethod').textContent = "TTS";
  var w = document.getElementById("wordA");
  window.setTimeout("speakAR(w.innerText)", 500);
  var w3 = document.getElementById("exmple");

function speakAR(word) {
  // Create a promise-based function
  return new Promise((resolve, reject) => {
    // Check if speech synthesis is supported
    if (!('speechSynthesis' in window)) {
      console.error("Speech synthesis not supported");
      reject("Speech synthesis not supported");
      return;
    }
  const utterance = new SpeechSynthesisUtterance();
  utterance.text = word;
  utterance.volume = 0.8;
  utterance.rate = 1;
  utterance.pitch = 1;
  utterance.lang = "ar-SA";

  // Set up event handlers for the utterance
  utterance.onend = () => resolve();
  utterance.onerror = (event) => reject(`Speech synthesis error: ${event.error}`);

  // Function to find the best Arabic voice
  const findArabicVoice = () => {
    const voices = window.speechSynthesis.getVoices();
    // Try to find the Laila voice first
    let voice = voices.find(v => v.name === 'Laila');
    // If Laila isn't available, look for any Arabic voice
    if (!voice) {
      voice = voices.find(v => v.lang === 'ar-SA');
    }

    // If no exact match, try any voice that starts with 'ar'
    if (!voice) {
      voice = voices.find(v => v.lang.startsWith('ar'));
    }
  return voice;
  };

  // Function to start speaking with the best available voice
  const startSpeaking = () => {
    const voice = findArabicVoice();
    if (voice) {
      utterance.voice = voice;
    } 
    // Cancel any ongoing speech
    window.speechSynthesis.cancel();
    // Start speaking
    window.speechSynthesis.speak(utterance);
  };

  // Get voices and handle browser differences
  const voices = window.speechSynthesis.getVoices();
  if (voices.length > 0) {
    // Voices already loaded (Safari and some other browsers)
    startSpeaking();
  } else if (typeof speechSynthesis.onvoiceschanged !== 'undefined') {
    // Wait for voices to load (Chrome and some other browsers)
    speechSynthesis.onvoiceschanged = () => {
      // Only execute once
      speechSynthesis.onvoiceschanged = null;
      startSpeaking();
      };
    } else {
    // For browsers that don't support onvoiceschanged (like Safari)
    // Try with a delay as a fallback
    setTimeout(startSpeaking, 100);
    }
  });
}


function speakWordA()
{
  speakAR(w.innerText);
}

function speakExmple()
{
  speakAR(w3.innerText);
}
</script>