Support audio in zip files, mpv can do it

@dae

Would it be possible for Anki to support mpv ability to play sound from archive files, for example from zip?

When I do

[user@user collection.media]$ mpv 'archive://sample.zip|become_1.mp3'
 (+) Audio --aid=1 (mp3 1ch 48000Hz)
AO: [pulse] 48000Hz mono 1ch float
A: 00:00:00 / 00:00:00 (63%) Cache: 0.0s

Exiting... (End of file)

mpv plays audio that is within zip file. But putting this in Anki doesn’t produce sound.
[sound:'archive://sample.zip|become_1.mp3']

Why I think this to be useful feature? Because it is quicker to make backups of whole Anki folders when there are a view big files than when there are thousands of small files. Besides, maybe this is more important to me, I like to have some order in my collections.

As a quick workaround, create a new folder in the add-ons folder, create a new __init__.py file and restart Anki

import os
import types

from anki.sound import AVTag, SoundOrVideoTag
from aqt.sound import mpvManager, OnDoneCallback
from aqt import gui_hooks


def play(self, tag: AVTag, on_done: OnDoneCallback) -> None:
    assert isinstance(tag, SoundOrVideoTag)
    self._on_done = on_done
    if tag.filename.startswith('archive://'):
        filename = tag.filename.replace('archive://', '')
        path = 'archive://' + os.path.join(os.getcwd(), filename)
    else:
        path = os.path.join(os.getcwd(), tag.filename)
    self.command("loadfile", path, "append-play")
    gui_hooks.av_player_did_begin_playing(self, tag)

mpvManager.play = types.MethodType(play, mpvManager)

https://gist.github.com/kelciour/300b06e7723465abf1c731fe54099e19

It should work with [sound:archive://sample.zip|become_1.mp3] but Check Media will be a bit less helpful.

The following files are referenced by cards, but were not found in the media folder:

Missing: ⁨archive://sample.zip|become_1.mp3⁩

The following files were found in the media folder, but do not appear to be used on any cards:

Unused: ⁨sample.zip⁩

5 Likes

I’m afraid this is unlikely to be implemented - I don’t see a big demand for it, and it would require extra work to support across all the Anki clients.

1 Like

I haven’t tried the add-on yet. Would the performance of a zip file containing about 70K audio files (not TTS generated) be slower or unafected?

It’s not practical - if you ever wanted to add, remove or modify a file, the entire zip would need to be synced again.

Thank you very much. It works. It is great to have such functionality.

Will you release it as official add-on on Anki add-on page? I think many other users might be interested to use it and not everyone is following this forum. Just an idea.

I think it may be practical, at least for some users.
Some add-ons that works with online dictionaries for example “Cambridge Dictionary”, “AwesomeTTS” allow to make thousands of cards with sound within a few hours and provide material for learning for a few years. I have a deck with about 5000 sound files and many more cards. Even if the content of cards and deck may be edited sound stays the same and even can be used to produce new cards - when only sound file names are copied. Other add-on " “Searching, PDF Reading & Note-Taking in Add Dialog” is a perfect tool to do that. This leads me to the conclusion that I prefer to keep them in one file - because I have already decided that I am not going to remove these files.
Other sound files, can be still be added normally, I mean as single mp3 files, and will be stored as single files in collection.media folder.
It is only a matter of how you address [sound: ] in a card.

And if average sound file is only a few KiB there is no need to remove it. When other files are added maybe I will wait until I will have another thousand and repack zip, which on SSD disk takes only a few seconds.

I tested Kelcior add-on on containing about 5000 mp3 files zip archieve and it works without any delay.

It’ll be a bit slower, but should be quite fast according to Wikipedia and probably unnoticeable.

A directory is placed at the end of a ZIP file. This identifies what files are in the ZIP and identifies where in the ZIP that file is located. This allows ZIP readers to load the list of files without reading the entire ZIP archive.

https://en.wikipedia.org/wiki/Zip_(file_format)#Design

But it can’t contain more than 65,535 files.

The original .ZIP format had a 4 GiB (2^32 bytes) limit on various things (uncompressed size of a file, compressed size of a file, and total size of the archive), as well as a limit of 65,535 (2^16) entries in a ZIP archive.

In version 4.5 of the specification … PKWARE introduced the “ZIP64” format extensions to get around these limitations.

The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista and later do. [citation needed]

https://en.wikipedia.org/wiki/Zip_(file_format)#ZIP64

If Windows 10 can’t create Zip64 with 70K audio files, it should be possible with 7-Zip File Manager by using Deflate64 compression method, and mpv seems to use the libarchive library under the hood and should be able to read Zip64 just fine.

пт, 31 июл. 2020 г. в 04:10, Guillem Palau-Salvà via Anki Forums <anki2@discoursemail.com>:

1 Like

I see.
The zip file has to be <100MB in order to be synched. If the user does not add, modify or delete any media reference in the affected notes, I won’t affect the synching bandwidth at all.

I think this is a very niche convenience. I also use some unrrecomended approaches via simple add-ons I created for myself. So I think this should remain as an add-on and not as a main app feature.

Let me expose my case as an outlier example very different fom the average user case. Personally, for two profiles of mine, I don’t use the synch function for the collection (due the weight limitation) hence the media as well (which I did not used back then even if I could to waive the server bandwidth, as I do have ~22GB, ~480K of media files (not counting TTS generated files, which I delete once I am familiar enough with the language phonetics) and images are compressed and resized to max 750px of any dimension).

I do have about 70K audio files from my Korean language notes. I already have all media files I will ever need, as some cards are still unseen, so some audios have been never played yet. So, I could name that this selection of audios are static i.e. I won’t add or modify any file in any point in the future. I will do for other note types or either add images to the notes. So, I can create a zip file for audio files used in those Korean notes in order to ease the backup process. Interestingly enough, that is what I do for backup my media files, I have a 7zip file for each “static media collection”, 92K+ SVG files are as another example with extremely high compression ratio.

Although I strongly won’t recommend this solution to the average user, people like me might be useful at some degree, as:

  • I do have media files that will be accessed very scarcely as they are referenced in very mature cards.
  • Some note types’ media files are already in the collection and no addition is going to take place, so having a .zip file is viable in this point of view.
  • The overall media files count is in the thousands of hundreds in my case, out of which most of the files are “static note types” in the sense that I won’t add, delete or modify any media file reference in the notes. I have some note types longer than 5 years without a change other than on-manually adding tags, usually leech.
1 Like

I don’t want to push anything. I will just refer to the arguments.

Isn’t synchronizing via Anki web a niche way of using Anki? Does every Anki user uses two devices? I don’t know how many % of Anki users does. I don’t. I only use desktop app. It would require some statistical data to compare how many of total Anki users uses synchronization to even start worry about it or taking it into account. The same about implementing this feature. We don’t know how many users may find it useful, at least in some of their decks. What I mean by this is that you probably think you are right based on your assumptions not based on statistical data. Maybe I am wrong. Maybe you can compare total Anki download to active, regular Anki web synchronization statistics.

I just don’t get the idea that one “niche” feature should block another “niche” feature. If I can’t use more than 100MB file to synchronize I need to get used to the that thought. If I can’t have it as a build in feature I will get used to the that thought as well. But I don’t understand why some users can’t have something because others will not be able to use it. This is just not a valid argumentation for me. Wouldn’t be enough to write in Anki documentation that users just need to consider this if they want to synchronize? My sample 5000 mp3 file takes about 36MiB.

Ok. I will accept whatever Damien desides.

I do not have access to any data. Based on forums and user support I think using synch is pretty common, for either using the phone as a second device or either as a simple backup.

From a user standpoint, I am the first one that wants new features for niche necessities, as having a large collection carries some repercussions. Being able to use a single profile and synch will be a nice expirience, but I do understand the caveats that this may carry to the server and bandwith.

I did my assumption based on prior discussions and I do have a good grasp of many prior user requests across the years to implement functionalities that are not going to be widely used. That is not necesarily the answer I would prefer personally, it is the one I think will be taken. I would be thrilled to see Anki incorporating so many features that could be used and customised to a broad of different user preferences.

I do get that this will add a load in maintenance for little return. I also understand Damien and contributors to the source code prioritize more mainstream features. For instance, 2.1.28 introduced a feature I requested 4 years ago and 2.1.29/30 in changing back some ways to represent the stats based on popular demand.

Note that I am not complaining. I am so grateful I have been able to use Anki for years with more features as times passes.

1 Like

Ok. Thank you for explanation. I also want more stable basis even at cost of less build in features.

Thank you very much for updating.