it’s quite tedious inserting audios manually
Anki has a built-in TTS.
What’s more - it could be possible to display randomly only 1 setences during reviews
That’s already possible[1] using JavaScript. For example, I believe the Memrise template has this feature.
You can basically get “map fields” by having, say, 8 fields for the sentences and audio (sentence1, audio1, sentence2, audio2, etc).
though arguably a bad idea for an SRS app ↩︎