Making an audio play only if you press play while other audio plays automatically

How can I make my cards do this.

  1. I have a Chinese sentence and a word which will play automatically.
  2. The word is then defined orally in Chinese but I only want this to play if the “play” button is pressed.

Why you might ask?
a) Even when the word may be easy for a beginner, understanding the oral definition isn’t always easy (at least for the beginner who can’t grasp the vocabulary being used to explain the word). As such, I’d like the user of the deck to be able to choose whether to play the audio or not (but the sentences and words have to come through).
b) The word definition is “street” Chinese. That is to say it’s on location somewhere in China sometimes with me (some years ago grunting away in toneless Mandarin) discussing the definition of the word. The content is raw, though cleaned up. It has the advantage of being “real” but it could be daunting for someone starting out and possibly annoying for the mid-level learner who doesn’t want to be bombarded with sound when they are still digesting sentences and words while thinking about the correct answer.

I’m afraid Anki’s sound tags don’t support selective autoplay at the moment.