You ask a lot of questions, so it’s easy to get lost in them. By asking one question at a time, someone might give a short answer, but long conversations might not be of much interest to the forum.
So there are two types of files that can be used in Anki?
Why two types? Are you talking about video and audio? Usually, yes, mp4 for video, but mp3 is still preferable for audio.
If you’re talking about the "<audio> and <video>" tags, there are two of them, and they’re made manageable in the same way using JavaScript.
I haven’t tested the video. But I did embed a YouTube video, since Anki is essentially an HTML page, so a lot is possible.
I even made an add-on that can create a crossword puzzle:
There’s a link to the deck on the page https://ankiweb.net/shared/info/451577856, but it’s no longer valid. You can check it out here: https://ankiweb.net/shared/info/1254999438
Anki has its own player (Method 1), and you have to paste “[sound:…]” into the field when you copy the data. But getting the name of the pasted file isn’t so easy, and it’s necessary to launch it later, like I do with "<audio>" (Method 2, but that’s not the file type). You’ll have to copy the name from the field yourself and paste it into another special field. Of course, you could create an add-on that always adds the file name hidden on the page… I’ve thought about it, maybe I will. Then, having the name of the file being played will make everything easier. But the good news is that on ankiweb.net, they already represent all their playing sounds as "<audio>", and with a script, you can hack into the controls and get the file name.
You might ask, why not do away with “[sound:…]” altogether and just write the name of the desired audio or video file? Because Anki syncs using these fields, it simply won’t see your entries. You’ll have to manually transfer them to a media folder on your computer or smartphone, and the web won’t even have these files. So there are complications, and it’s not all that simple.
What if I create a JavaScript code that pauses a video when an audio starts playing, and then Anki updates and it stops working? Could we fix the code so it works again?
If they don’t drastically modify the HTML page, which they most likely won’t, then everything will work as is.
Many people are still using older versions, believing that either the add-ons are more convenient or something else has changed, but they’re more accustomed to it.
Can Anki do that? Remove that functionality even when using JavaScript?
It’s hard for them to do that; they have other things to do, and they’re not responsible for the script or any issues it might cause. They’re just creating the standard functionality so it works. They shouldn’t touch any additional features. Of course, they could implement a feature that completely disables sound, but the system can do that for you, too. 
If I embed a video using a standard HTML element instead of Anki’s [sound:…] player, will JavaScript be able to control it reliably (pause, change speed, etc.)?
Perhaps, as you can see, even when inserting a YouTube frame, complete control is exercised over code that isn’t even stored in Anki, but is taken remotely from there.
So, combining both ideas: if instead of using Anki’s [sound:…] player, I embed audio and video using standard HTML elements ( and ) inside the card template, will JavaScript be able to control them reliably (for example, pause a video when audio starts playing, or change playback speed)?
It’s good to use Anki apps, but when we do this in the browser, the browser itself, for security reasons, requires the user to click somewhere, meaning permission to play the sound. So, you can do this with a button, but with autoplay, you can’t do it that easily. So, for me, the first time I show a side of the map, it’s silent, but once it’s started playing, they reload part of the page content, and the sound is always on.
Again, I’ll say it’s not perfect. Anki was created a long time ago, while HTML5 audio and video standards were introduced in 2008-2010.
And if that’s the case, is using standard HTML always safe?
I don’t understand the question, since HTML itself is simple formatting, essentially bolding and italics, but JavaScript, especially if we’re pulling in third-party libraries, can sometimes produce suspicious results. But you open it in your browser, so how do you live then?
If that’s true, could you create one JavaScript code that pauses a video when an audio starts playing, and another one to reduce playback speed?
I already described how they have their own player, and I can’t control it from JavaScript. I can control my files. I left the [sound:…] field because it’s required for file synchronization and to play sounds when a deck is opened, according to the deck’s settings. If you want your own playback, you’ll usually disable it in the settings.
So, in my example deck, there’s audio with their [sound:…] fields, and there’s my audio with a link to the same file as [sound:…], and I can control it and change the speed. If I play my audio and then their field, you’ll see that my audio stops, since their code has been reversed and I control it. Stopping Anki is harder, though, although I’ve tried simply launching an empty file without sound and it stopped, but I removed that from the deck… but I thought about that.
If you’d rather study only in Anki and not in Ankidroid, you can install various plugins. There’s a good plugin for slowing down audio; I use it: https://ankiweb.net/shared/info/312734862
I was planning on making an audio plugin myself, but it’s not that fast. I have a lot of problems with other code, and, of course, everyone else has a lot of problems with everything.
Here’s a hint: I need a special plugin for Ankidroid that would add functionality even to Ankidroid: I have long audio files, like lectures, or simply want to memorize a poem. I need to listen to a section of the audio, return to the beginning of that section, and listen again, possibly multiple times, but then move on to the next section and memorize it again. Basically, it’s a small player with a special feature for students.
So, we’re waiting for them to fix the Ankidroid code for you, as this is the simplest solution and should be implemented and fixed.
I don’t quite understand what you need, because I need to fully describe the entire learning process and what exactly is so great about it, when you have both audio and video on the same card. Why aren’t they separated into separate cards? So, there would be one recording, but the video field would show on one card, and the audio on another. I don’t know everything, of course, but maybe that’s exactly what you need.