BUG in main: All images in a field except last wrongly recognized as "unused"

I the development version of Anki (from commit f544bdd04), if a field contains multiple images, the Check media utility flags all images except the last one wrongly as unused.

For instance, have a note type for Korean with an “image” field, and sometimes I add multiple images. The html is in one example (file linked below):

<img alt="webp image" src="paste_1708544941408.webp">
<img alt="webp image" src="paste_1708544955995.webp">

In this specific case (and for some other cards) Anki reports the first image as unused in the “check media” menu, which resulted in me accidentally deleting the image.

This only happens when compiling Anki from latest main branch (commit 0018f12). Disabling addons with --safemode didn’t help, but I found that this error does not happen in the latest stable release 23.12.1.

Reproduction with MWE:

Here is a collection with a single note as a MWE example to reproduce the bug:

To reproduce, import the collection on anki compiled from main (after f544bdd04) and click Check media, then you get: Unused: paste_1708544941408.webp. Check the single note in the card browser, the image is actually used.

Weirdly, if I export my decks (or just the problematic notes) and then import them on the test profile, I cannot reproduce the error. But when I export my collection and import it on the test profile, I can reproduce the issue. So to make the above MWE, I had to import my collection on the test profile and deleted all unrelated notes and media.

Debug Info:

Anki 23.12.1 (0018f126) (src)
Python 3.11.7 Qt 6.6.2 PyQt 6.6.1
Platform: Linux-6.7.5-arch1-1-x86_64-with-glibc2.39

1 Like

I ran a git bisect and the first bad commit is f544bdd04 (PR #2918) by @vaxr:

(Edit: now also added the first bad commit number to the bug description in my opening comment in this thread, so that maintainers find the important information more quickly.)

I hoped this would be something that I might fix myself, but this is a regex issue, so I’ll let somebody who understands regex figure out how to fix this without just reverting the feature that this change introduces (allowing > inside HTML attributes).

A hotfix could be done in extract_media_refs by repeatedly applying the regex and removing the match from the string until there are no more matches, which I what I would be able to do with my coding knowledge, but a proper solution would involve just fixing the regex to match all images.

By the way, maybe it’s obvious, but a fix should probably also add fields with multiple images as a test case.

Update:

@vaxr reacted to my comments on his PR and submitted a bugfix PR in Fix regex skipping over all src except the last by vaxr · Pull Request #3021 · ankitects/anki · GitHub

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.