Unicode normalisation

Anki normalises Unicode, and I think this makes sense. In previous versions of Anki Desktop, when adding a new card, it would warn about duplicates based on normalised strings. However, in the most recent version (2.1.30), this no longer works.

For example, “á” and “á” are equivalent Unicode sequences: \0061\0301 and \00E1, respectively. Anki automatically converts to the latter. In v2.1.30, if there is already a card with \00E1, and you try to add a card with \0061\0301, there is no warning about the duplicate in the add card dialog (i.e. before creating the card). However, once you add the card it is normalised to \00E1 and so you get a duplicate.

As normalisation used to be applied in the add card dialog, this bug seems to be a regression in a recent version.

Taken for the change log:

  • Unicode normalization:If you are studying rare CJK characters and wish to prevent them from being converted into modern equivalents, the following in the debug console will stop Anki from normalizing note text.
mw.col.conf["normalize_note_text"] = False
1 Like

I’m not talking about rare CJK characters, but rather diacritics like “á” vs “á” (\0061\0301 vs \00E1). While some fonts have different glyphs for CJK compatibility characters, “á” and “á” should correctly be displayed identically.

I’m not trying to switch off conversion. I would like the add card dialog’s duplicate check to take Unicode equivalence into account, as it used to, so that I don’t add a duplicate card by mistake.

Thanks for the report, this will be fixed in the next beta.

1 Like