Complex field types (+associations?)

I was just pondering one issue and came up with an idea that it would be great to have something akin to “complex field types” (like maps?).

Right now I have “example” field where I put a couple of sentences that are relevant to particular vocabulary and then generate a recording using TTS which I put at the end of each sentence.
It looks like that:

<div>Fumar <b>perjudica</b> la salud.
    [[sound:elevenlabs-425bbba6-4b65c853-ea040ae6-2c28360c-c3bf24c2.mp3]]
</div>
<div>¿Sabías que los retrasos pueden <b>perjudicar</b> el proyecto?
    [[sound:elevenlabs-00247c79-39af0dbb-e6ad7fb3-26906a52-d6764467.mp3]]
</div>
<div>No quiero <b>perjudicar</b> a nadie con mis decisiones.
    [[sound:azure-25fceb26-a7abbbac-1fc86410-38586e71-bb93d0a0.mp3]]
</div>
<div>Una mala alimentación puede <b>perjudicar</b> gravemente al cuerpo.
    [[sound:azure-b0c114a6-b0baaac7-9024de3d-921d448d-b8df6e79.mp3]]
</div>

Which results in:

Now, in general this works OK-ish but it’s quite tedious inserting audios manually and on the other hand - it somewhat limits templating as it’s strictly based on the contents of the field itself.

If we would have a field type of Map/Dict/Tuple we could have it in a form of:

[
  (sentence1, audio1),
  (sentence2, audio2),
  (sentence3, audio3)
]

Which would help a lot with both formatting (one would reference the field and it’s index) and automatic generating of audio ("generate audio for all sentences from first field an insert it into second field).
What’s more - it could be possible to display randomly only 1 setences during reviews which would give a bit of variation during reviews

it’s quite tedious inserting audios manually

Anki has a built-in TTS.

What’s more - it could be possible to display randomly only 1 setences during reviews

That’s already possible[1] using JavaScript. For example, I believe the Memrise template has this feature.


You can basically get “map fields” by having, say, 8 fields for the sentences and audio (sentence1, audio1, sentence2, audio2, etc).


  1. though arguably a bad idea for an SRS app ↩︎

1 Like

Memrise template by itself only randomizes audio on listening cards. But randomizing sentences can indeed be added with JS, here is a solution.

2 Likes

The original suggestion is interesting, I think, and I had similar ideas when trying to come up with formatting for context on the cards. Similarly, each context item has to have multiple interconnected pieces of data:

  1. Text sentence
  2. Image (typically, a screenshot from a video)
  3. Source (URL for online articles, for example)
  4. Audio (I don’t use them myself, but an audio fragment of original material generally provides better context by containing non-verbal cues that simple TTS reading cannot relay)

When a card has several context items, there are two ways to go about it, neither of which seems optimal:

  1. Create several fields for each constituent (sentence, image, …), so that each would contain a list of only a specific type of media. The card template can then format the contents in any way using scripts to assemble pieces of data from different fields with respective list position. The downside is the necessity to keep track of all list indices when editing field contents. Also, some list items will be empty, which just doesn’t look convenient.
  2. Use a single field and pack all the related information together. The downside is that the formatting has to be hardcoded into each card, which makes it hard to edit or make significant changes to in the future.

Even more complications arise when one takes into account that sources and notes do not have a straightforward correspondence in either direction: not only can a note have references to multiple sources, but multiple different notes can originate from the same source sentence just as well. With the current system, where each individual note is a self-sufficient, packed piece of data, redundancy is inevitable.

The logical continuation of this idea would be to make it possible to store abstract pieces of data, independently of any notes, and let notes reference them when needed. This can be applied to much more than just context sentences, but also has the potential to solve the problems with multiple word meanings, sibling detection, alternative pronunciations and spellings, etc. The current systems for ruby characters and clozes, if you think about it, are also special cases of complex data packed into a single field, which can be generalized and unified with the described approach.

Ultimately, however, it all leads too far from how Anki operates on the fundamental level, so a complete implementation doesn’t look realistic.

macOS TTS is absurly :poop:
I’m mostly using Azure TTS (decent results) and recently started using ElevenLabs, which is just briliant…
What’s more - it’s generated once and it’s there so I can play it without delay

Yeah, thisi is what I was considering but this would somewhat make the edit window a bit cumbersome :slight_smile: