Never used audio cards, but I find the way sound files are handled in Anki a bit weird. It gives the user almost no control. Why do it with Python when it could be done with HTML too?
Basic example
<audio src="sound.mp3" autoplay></audio>
That approach requires more work on the template side, but looking at the struggle here, I think it might be worth considering
For reference: <audio>: The Embed Audio element - HTML: HyperText Markup Language | MDN