Unicode Kanji Normalization Duplication Bug

tsuru · May 11, 2022, 6:07pm

I’m having an issue with Anki’s Unicode normalization, specifically with certain kanji.

The bug (?) is this: if a ‘normalizable’ (i.e., would by default be normalized) character is the first field of a note, and a different version of that note (with the same first field but different other fields) is imported from a text file, the deck does not update its note, and instead produces a duplicate note.

I think this is a similar issue to [can’t post a link]/t/unicode-normalisation/2531. I’ve tested this on 2.1.49 and 2.1.51 both.

The character I’m using as example is 神, U+FA19, which normalizes to 神, U+795E.

Reproduction:
0. mw.col.conf[“normalize_note_text”] = False must have been run in the debug console in order to disable normalization.

Create two text files, one containing

神[tab]old

and the other containing

神[tab]new

Import the first text file, producing a note that has two fields, being ‘神’ and ‘old’.
Import the second text file.

You would expect this to update the note’s second field to ‘new’.

But instead note that you now have two notes, one with ‘old’ in field 2 and the other with ‘new’, and both having ‘神’ in the first field and consequently being marked as duplicates.

For whatever reason, I don’t think this bug happens when you’re importing from .apkg instead of a text file. That is, if you repeat the process but with .apkgs instead, the note is updated as expected, and no duplicates are produced.

dae · May 12, 2022, 1:42am

Added to check updating works correctly in new csv importer when not text not normalized · Issue #1863 · ankitects/anki · GitHub

Topic		Replies	Views
Inconsistent unicode normalization Help	11	976	May 1, 2023
Unicode normalisation Help	4	469	May 1, 2023
Don't Normalize CJK Characters Suggestions	7	262	April 6, 2024
Rare CJK normalization Help	2	392	May 1, 2023
Some Japanese Characters are Saved Incorrectly Help	3	501	May 1, 2023

Unicode Kanji Normalization Duplication Bug

Related topics