Unused file—but it doesn't exist

Media check shows
Unused: ⁨neerlandais-4-3.mp3⁩
Unused: ⁨neerla3.mp3⁩
Unused: ⁨neerlandais-9-5.mp3⁩

I resolved all the typos for the neerlandais* files, but there is no neerla3.mp3⁩ in the directory.

I even did ls *n*e*e*r*l*a*3*.mp3 but nothing showed except neerlandais*.mp3

What happens if you use the button to delete unused media? Does the file still appear when you next use check media?

I didn’t want to delete any until I had resolved the notes they should have been on. But after I had done that, the check media window said there were 39 unused.

Then I clicked delete, and got
# Error

An error occurred. Please use Tools > Check Database to see if that fixes the problem.

If problems persist, please report the problem on our [support site](https://help.ankiweb.net/). Please copy and paste the information below into your report.

Anki 2.1.26 (70784154) Python 3.8.0 Qt 5.13.1 PyQt 5.14.1
Platform: Mac 10.15.
Flags: frz=True ao=False sv=1
Add-ons, last update check: 2020-07-22 11:16:07
Caught exception:
Traceback (most recent call last):
File "aqt/mediacheck.py", line 87, in <lambda>
File "aqt/mediacheck.py", line 151, in _on_trash_files
File "anki/media.py", line 119, in trash_files
File "anki/rsbackend.py", line 409, in trash_media_files
File "anki/rsbackend.py", line 257, in _run_command
anki.rsbackend.IOError: IOError { info: "Os { code: 1, kind: PermissionDenied, message: \"Operation not permitted\" }" }

So I did a check database and tried check media again. Same error, but “38 unused.”
Each time I do check media, there is one unused fewer than before. Then “delete unused” gets the above error.

When I got the unused count down to 35, I replaced 2.1.26 with 2.1.28 (7d8818f8). This did not fix it.

That’s odd, and one explanation is the directory entry is corrupt. Maybe the First Aid option in Disk Utility will reveal something?

That may be it. Disk utility found no errors in any disk volume. But, as an extra precaution, I did
mkdir AnkiTmp
time rsync -av Anki2/ AnkiTmp
rm -rf Anki2
mv AnkiTmp Anki2
and relaunched Anki. Check media said there were 33 unused, and delete got no error.

But I have another hypothesis: I have a lot of files with filenames in Chinese, Korean, and Ktunaxa (and other non-ASCII languages). Locale is UTF8. Perhaps a UTF8 byte sequence caused the problem, and it just happened to be in the file deleted just before I did the above.

It’s not clear from the last message, but I guess running Check Media again reports no unused files since they’re in the media.trash folder now and everything is fine.

It’s been mentioned on Changes - Changes

media check now place deleted files in a media.trash folder inside your profile, instead of placing the files in the system trash. You can use the Check Media function to either empty the trash, or restore the deleted files back to your media folder.

But getting back to the first message, if deleted files can be found in the media.trash folder, maybe it’s something related to isolation characters. At least, ths is what happened to me a few times, when I tried to find a few unused files by copy-pasting their filenames from Media Check and got no results in the collection.media folder on Windows.

Recent Anki versions happen to use isolation characters around filenames - anki/rslib/src/i18n/mod.rs at 52cd9fc4b8409d5073261ab975bc1ce654d9123f · ankitects/anki · GitHub

They’re usually invisible.

Unused: ⁨neerlandais-4-3.mp3⁩
Unused: ⁨neerla3.mp3⁩
Unused: ⁨neerlandais-9-5.mp3⁩

image

Unused: \u2068neerlandais-4-3.mp3\u2069
Unused: \u2068neerla3.mp3\u2069
Unused: \u2068neerlandais-9-5.mp3\u2069

4 Likes

That’s interesting, but “isolation characters” around three file names¹ doesn’t explain one of them being an error. Nor does it explain “delete unused” deleting only one and then crashing.

¹Actually, there were forty unused, then 39, then 38, … I just pasted the immediate context in the post.

The file could also start with an empty space (on their name)

New related problem? I now have two files shown as unused that ARE in the media directory and ARE referenced on cards. Running the output of media check through od -xc reveals that something has added bytes E2816E to the beginning of each of these two filenames in the Deck. ls | od -xc shows that the filenames do not start with those characters.

When I delete the “d:n” and retype it in [sound:nl_Waar_woon_jeQ.mp3] (and in the other file) in the input file and import, it says that two notes are updated. But media check still says those two files are unused. When I do the same edit in the browser AFTER that import, the error goes away.

So it looks like Anki added those bytes during import!

Well, even weirder. I did that edit on both of the allegedly unused files. ONE of them is still listed as unused.

I added over a hundred MP3 files to the collection, edited them into the import file, and imported. Anki added three bytes after the “[sound:” (before the file name) to seven of the updated notes. I verified that the import file did not contain those bytes.

Result is that the correct file name is listed as Unused, while the altered file name is listed as Missing.

If you can reproduce it with the latest Anki version and all add-ons disabled, please attach a small .txt file that demonstrates the problem along with the steps we should take to trigger it.

I can try. Doesn’t seem to be predictable. Happened a couple of times more than a year ago, but has apparently happened a few times this week. My collection.media is huge, though! 920 Megabytes.

Here are excerpts copy/paste from the media check window. Nots that the same two audio files are listed as both missing and unused, when they are NOT missing:

Missing: ⁨nl_Waar_woon_jeQ.mp3⁩⁩
Missing: ⁨nl_bent_u_soms.gif⁩
Missing: ⁨nl_de_koffiepauze.mp3⁩⁩
Missing: ⁨nl_de_voorstads.gif⁩
Missing: ⁨nl_een.gif⁩
Missing: ⁨nl_het_is_op_de_benedenverdieping.gif⁩
Missing: ⁨nl_hoe_gaat_het.gif⁩
Missing: ⁨noi.jpg⁩


The following files were found in the media folder, but do not appear to be used on any cards:
Unused: ⁨nl_Waar_woon_jeQ.mp3⁩
Unused: ⁨nl_de_koffiepauze.mp3⁩

Here is a shell transcript showing the extra bytes in that text, and in one copy/pasted from the browser:

WGroleau@MBP-WWG collection.media.backup % vi /tmp/tmp    
WGroleau@MBP-WWG collection.media.backup % od -xc !$
od -xc /tmp/tmp
0000000      4d0a    7369    6973    676e    203a    81e2    6ea8    5f6c
          \n   M   i   s   s   i   n   g   :     342 201 250   n   l   _
0000020      6157    7261    775f    6f6f    5f6e    656a    2e51    706d
           W   a   a   r   _   w   o   o   n   _   j   e   Q   .   m   p
0000040      e233    a981    81e2    0aa9    694d    7373    6e69    3a67
           3 342 201 251 342 201 251  \n   M   i   s   s   i   n   g   :
0000060      e220    a881    6c6e    625f    6e65    5f74    5f75    6f73
             342 201 250   n   l   _   b   e   n   t   _   u   _   s   o
0000100      736d    672e    6669    81e2    0aa9    694d    7373    6e69
           m   s   .   g   i   f 342 201 251  \n   M   i   s   s   i   n
0000120      3a67    e220    a881    6c6e    645f    5f65    6f6b    6666
           g   :     342 201 250   n   l   _   d   e   _   k   o   f   f
0000140      6569    6170    7a75    2e65    706d    e233    a981    81e2
           i   e   p   a   u   z   e   .   m   p   3 342 201 251 251 201
0000160      0aa9    694d    7373    6e69    3a67    e220    a881    6c6e
         251  \n   M   i   s   s   i   n   g   :     342 201 250   n   l
0000200      645f    5f65    6f76    726f    7473    6461    2e73    6967
           _   d   e   _   v   o   o   r   s   t   a   d   s   .   g   i
0000220      e266    a981    4d0a    7369    6973    676e    203a    81e2
           f 342 201 251  \n   M   i   s   s   i   n   g   :     250 201
0000240      6ea8    5f6c    6565    2e6e    6967    e266    a981    4d0a
         250   n   l   _   e   e   n   .   g   i   f 342 201 251  \n   M
0000260      7369    6973    676e    203a    81e2    6ea8    5f6c    6568
           i   s   s   i   n   g   :     342 201 250   n   l   _   h   e
0000300      5f74    7369    6f5f    5f70    6564    625f    6e65    6465
           t   _   i   s   _   o   p   _   d   e   _   b   e   n   e   d
0000320      6e65    6576    6472    6569    6970    676e    672e    6669
           e   n   v   e   r   d   i   e   p   i   n   g   .   g   i   f
0000340      81e2    0aa9    694d    7373    6e69    3a67    e220    a881
         342 201 251  \n   M   i   s   s   i   n   g   :     342 201 250
0000360      6c6e    685f    656f    675f    6161    5f74    6568    2e74
           n   l   _   h   o   e   _   g   a   a   t   _   h   e   t   .
0000400      6967    e266    a981    4d0a    7369    6973    676e    203a
           g   i   f 342 201 251  \n   M   i   s   s   i   n   g   :    
0000420      81e2    6ea8    696f    6a2e    6770    81e2    0aa9    0a0a
         342 201 250   n   o   i   .   j   p   g 342 201 251  \n  \n  \n
0000440      6854    2065    6f66    6c6c    776f    6e69    2067    6966
           T   h   e       f   o   l   l   o   w   i   n   g       f   i
0000460      656c    2073    6577    6572    6620    756f    646e    6920
           l   e   s       w   e   r   e       f   o   u   n   d       i
0000500      206e    6874    2065    656d    6964    2061    6f66    646c
           n       t   h   e       m   e   d   i   a       f   o   l   d
0000520      7265    202c    7562    2074    6f64    6e20    746f    6120
           e   r   ,       b   u   t       d   o       n   o   t       a
0000540      7070    6165    2072    6f74    6220    2065    7375    6465
           p   p   e   a   r       t   o       b   e       u   s   e   d
0000560      6f20    206e    6e61    2079    6163    6472    3a73    550a
               o   n       a   n   y       c   a   r   d   s   :  \n   U
0000600      756e    6573    3a64    e220    a881    6c6e    575f    6161
           n   u   s   e   d   :     342 201 250   n   l   _   W   a   a
0000620      5f72    6f77    6e6f    6a5f    5165    6d2e    3370    81e2
           r   _   w   o   o   n   _   j   e   Q   .   m   p   3 251 201
0000640      0aa9    6e55    7375    6465    203a    81e2    6ea8    5f6c
         251  \n   U   n   u   s   e   d   :     342 201 250   n   l   _
0000660      6564    6b5f    666f    6966    7065    7561    657a    6d2e
           d   e   _   k   o   f   f   i   e   p   a   u   z   e   .   m
0000700      3370    81e2    0aa9                                        
           p   3 342 201 251  \n                                        
0000706
WGroleau@MBP-WWG collection.media.backup % mv /tmp/tmp ~/Media_Check.txt
WGroleau@MBP-WWG collection.media.backup % file !$
file ~/Media_Check.txt
/Users/WGroleau/Media_Check.txt: UTF-8 Unicode text
WGroleau@MBP-WWG collection.media.backup % cat | od -xc
⁨[sound:nl_Waar_woon_jeQ.mp3⁩]
0000000      81e2    5ba8    6f73    6e75    3a64    6c6e    575f    6161
         342 201 250   [   s   o   u   n   d   :   n   l   _   W   a   a
0000020      5f72    6f77    6e6f    6a5f    5165    6d2e    3370    81e2
           r   _   w   o   o   n   _   j   e   Q   .   m   p   3 251 201
0000040      5da9    000a                                                
         251   ]  \n                                                    
0000043
WGroleau@MBP-WWG collection.media.backup % cd ../collection.media
WGroleau@MBP-WWG collection.media % ls *woon* | od -xc
0000000      6c6e    575f    6161    5f72    6f77    6e6f    6a5f    5165
           n   l   _   W   a   a   r   _   w   o   o   n   _   j   e   Q
0000020      6d2e    3370    6e0a    5f6c    6e65    6a5f    6a69    775f
           .   m   p   3  \n   n   l   _   e   n   _   j   i   j   _   w
0000040      6f6f    5f6e    656a    685f    6569    2e72    6967    0a66
           o   o   n   _   j   e   _   h   i   e   r   .   g   i   f  \n
0000060      6c6e    655f    5f6e    696a    5f6a    6f77    6e6f    6a5f
           n   l   _   e   n   _   j   i   j   _   w   o   o   n   _   j
0000100      5f65    6968    7265    6d2e    3370    6e0a    5f6c    6b69
           e   _   h   i   e   r   .   m   p   3  \n   n   l   _   i   k
0000120      775f    6f6f    5f6e    706f    645f    7469    615f    7264
           _   w   o   o   n   _   o   p   _   d   i   t   _   a   d   r
0000140      7365    672e    6669    6e0a    5f6c    6b69    775f    6f6f
           e   s   .   g   i   f  \n   n   l   _   i   k   _   w   o   o
0000160      5f6e    706f    645f    7469    615f    7264    7365    6d2e
           n   _   o   p   _   d   i   t   _   a   d   r   e   s   .   m
0000200      3370    6e0a    5f6c    616a    695f    5f6b    6f77    6e6f
           p   3  \n   n   l   _   j   a   _   i   k   _   w   o   o   n
0000220      685f    6569    2e72    6967    0a66    6c6e    6a5f    5f61
           _   h   i   e   r   .   g   i   f  \n   n   l   _   j   a   _
0000240      6b69    775f    6f6f    5f6e    6968    7265    6d2e    3370
           i   k   _   w   o   o   n   _   h   i   e   r   .   m   p   3
0000260      000a                                                        
          \n                                                            
0000261

If I browse to “[sound:⁨nl_Waar_woon_jeQ.mp3]”, select the “d:n” and re-type it, then do another media check, that file is no longer shown as unused nor as missing. But when I try the same fix on [sound:⁨nl_de_koffiepauze.mp3], that file still appears on both lists.

Once you’ve figured out a reproducible test case, please let me know.

At 11:15 31 July 2020,
ls *woon* *koffie* | od -xc
showed that there are no non-ASCII characters in either of the two filenames affected and that the files do exist in the directory.

Check Media said both are Unused. But the browser shows that

[sound:nl_Waar_woon_jeQ.mp3]
⁨[sound:nl_de_koffiepauze.mp3⁩]

are both not unused.

When I erase (in the browser) the entire field for [sound:nl_Waar_woon_jeQ.mp3] and retype it, Check Media then shows only the other one unused.

When I erase (in the browser) the entire field for [sound:nl_de_koffiepauze.mp3⁩] and retype it, Check Media still shows it as unused.

When I look at the koffiepauze card, the audio icon appears, but clicking on it produces no sound.

The latest backup before I did the above test can be acquired from https://Groleau.Email/backup-2020-07-30-12.39.38.colpkg

The deck is called Nederlands.

“There are bad characters in the fields” describes the current state of affairs; it does not tell us how those characters got into the fields in the first place. That is the key - if you figure out the steps that causes it, please let me know.

I found an editor that would reveal invisible characters. There was a zero-width non-graphic character at the beginning of the two problem fields before the [ and one before the ] at the end. I have no idea how they got there. They were not on any other field.

I removed them from the input file and re-imported. The error no longer occurs.
I do not know what character that was nor how it got there, so I don’t know how to reproduce it. If it happens again, I now know a not-very-fun way to fix it.

But if Anki was interpreting the file extension as mp3? (where the ? stands for the mysterious character), then the error message makes sense.

But I can’t understand why removing them by retyping both fields in the Anki browser only fixed one of the two notes.

Maybe the koffiepauze card has a rogue character in the filename on disk, rather than the field?

Already checked that. They were not in the filename, but in the imported text file. However, as the file is sometimes exported, sometimes imported, I do not know where the characters came from. Still, both files had a rogue character between the “mp3” and the “]” so it’s strange that retyping one of them fixed it and retyping the other did not.

Anyway, removing those in the text file and re-importing resolved it. About a hundred new notes have since been added to the file using the same methods and re-imported and no new rogue characters have appeared.