What is the right way to replace media files manually? Can I just (on Linux), e.g.:
cd ~/.local/share/Anki2/profile1/collection.media/
gm mogrify -strip *.jpg
The second command strips the metadata of these images, usually reducing the file size.
I’m asking because it seems Anki does some sum-checking of the media files. How does Anki deal with manually replaced media files? Do timestamps matter? etc.
I also wonder how syncing would then work…
Edit: I did a test run, replacing one image manually (with one smaller & also of a newer timestamp), and it worked as expected: syncing updated the image.
“Check Media” didn’t report anything wrong… So what’s the function of the checksum I heard about? (Edit: Wait, does Anki checksum media files at all? or only the notes (database)? (Edit: I remembered where I read about the media checksumming: here))
Should I just go ahead and do a batch replace?
Update: It appears I can just go ahead and replace the media files and then sync to AnkiWeb… If so, I do wonder what happens when the media files are replaced manually on two clients severally: what are the rules for deciding which version the file will end up being?
Anki uses timestamps to check whether media has changed, but these are provided by the OS and not saved inside a JPG. After all, Anki supports all kinds of media files and can’t rely on the metadata of a specific format. It would also defeat the purpose of checksumming if you needed to read the file first.
On the other hand, JPG metadata is part of the binary file and changes to it should trigger a sync just as any other operation overwriting the file.
(so that the atime and mtime of foo.jpg are not changed, though it’s replaced with a different image), Anki detected foo.jpg had changed and synced it to AnkiWeb.
(Update: No, I was wrong. I did not remember correctly. I just tested replacing media files while preserving timestamps: Anki did not detect the change. Then I touched two of the files, and now Anki synced these two files to AnkiWeb. So timestamp change (of both the media.collection directory and the file) is a necessary signal–probably triggering checksumming of the file.)
Do you mean that, regardless timestamps, whenever a file is changed, Anki detects the change and takes it as the newest version? (Update: No; see update above.)
My further question is this: does Anki have a good way to deal with this scenario?:
In the collection.media directory there’s a file named carbuncle.jpg, we call this version 0 (v0), which resides now on Anki Desktop, AnkiWeb, and AnkiDroid under the same account; v0 has a timestamp of 7 o’clock.
On Anki Desktop carbuncle.jpg is modified, its timestamp 8 o’clock. (v1)
On AnkiDroid carbuncle.jpg is modified a different way, its timestamp 9 o’clock (v2).
Now, if we sync Anki Desktop with AnkiWeb first, and then sync AnkiDroid with AnkiWeb, do we not end up with v2? And if we switch the order, do we not end up with v1?
To answer my own question: Yes, you can just replace the media files, and Anki will pick up the changes and sync them.
The long answer: how Anki detects changed media files appears to be:
Check collection.media directory timestamp. If changed, then
Check media files, find those with changed timestamps;
Checksum each of those files to determine if the file is changed;
Sync changed files to AnkiWeb.
(So one way to ensure all media files are re-checksummed is to update (with touch) the timestamps of both the media directory and all the media files within.)
The question wrt multiple versions of changed files still awaits an answer, although that is of course an unusual situation.
Modification times do not affect syncing direction. If you change the same media file in different ways on multiple devices without syncing, the first device synced will win.
So when the second device syncs, instead of pushing its change, it pulls from the server? Would you elaborate a little how this negotiation is done, please?
This code (line 488 to 497) reads like it’s saying whenever lsum != rsum, download form server. Which would be consistent with our 2nd device pulling from the server, but why then does our 1st device push instead? Maybe I’m being a numskull, but the crucial difference in the two cases is not explained by this code, is it?
(Sorry if I’m trespassing too much on your time. Feel free to ignore this…)
In the first sync, since AnkiWeb won’t have any pending changes, there’s no conflict to deal with. In any case, you shouldn’t encounter this situation if you follow best practices: Syncing in Anki: Getting Started - YouTube
Ah, I see. So AnkiWeb actually keeps a list of syncing clients? (so that it knows when to clear the pending state–when all clients have synced.) If so, there must be an “expiration date” for this pending state (or else if one of the syncing clients fail to sync indefinitely, AnkiWeb will be stuck in the pending state). Is this correct? (Or: maybe the pending state is per-syncing-client? More likely… )
(Edit: I think the most relevant sentence in the video is this: “Each time you move to a different device, it’s a good idea to sync twice: once at the start of the session, and again when you’re finished.”–and similarly: To ensure (can we ensure?) your changes will be pushed to AnkiWeb, sync once before you make these changes.)