Unicode Kanji Normalization Duplication Bug

I’m having an issue with Anki’s Unicode normalization, specifically with certain kanji.

The bug (?) is this: if a ‘normalizable’ (i.e., would by default be normalized) character is the first field of a note, and a different version of that note (with the same first field but different other fields) is imported from a text file, the deck does not update its note, and instead produces a duplicate note.

I think this is a similar issue to [can’t post a link]/t/unicode-normalisation/2531. I’ve tested this on 2.1.49 and 2.1.51 both.

The character I’m using as example is 神, U+FA19, which normalizes to 神, U+795E.

Reproduction:
0. mw.col.conf[“normalize_note_text”] = False must have been run in the debug console in order to disable normalization.

  1. Create two text files, one containing

神[tab]old

and the other containing

神[tab]new

  1. Import the first text file, producing a note that has two fields, being ‘神’ and ‘old’.
  2. Import the second text file.

You would expect this to update the note’s second field to ‘new’.

But instead note that you now have two notes, one with ‘old’ in field 2 and the other with ‘new’, and both having ‘神’ in the first field and consequently being marked as duplicates.

For whatever reason, I don’t think this bug happens when you’re importing from .apkg instead of a text file. That is, if you repeat the process but with .apkgs instead, the note is updated as expected, and no duplicates are produced.

Added to check updating works correctly in new csv importer when not text not normalized · Issue #1863 · ankitects/anki · GitHub