Sorting should produce a more alphabetical order

BernardoSulzbach · July 20, 2022, 5:07pm

As of Anki 2.1.54, when sorting by “Sort Field”, letters with diacritics and umlauts don’t quite end up where users would expect them to. This is due to the comparison being a simple collate nocase in SQLite. The nocase collation also only folds ASCII letters, which causes Ü (U+00DC) to precede ä (U+00E4), even though a precedes U.

It doesn’t seem easy to do much better than this in SQLite. We could provide our own collation function, an idea that I don’t like very much as we would now have to maintain some C code around (I think). Alternatively, we could order by a string obtained after performing some normalization of the “Sort Field” string in SQL, but I don’t believe that would be performant enough.

I think we can get to some fairly “language-agnostic” alphabetical sorting order which would be better than the one we currently have. To give an example, for German, there are at least two standardized transformations to produce “more” alphabetical sorting orders:

DIN 5007 Variant 1 (“Dictionary order”)

ä = a
ö = o
ü = u
ß = ss

DIN 5007 Variant 2 (“Phone book order”)

ä = ae
ö = oe
ü = ue
ß = ss

Some discussion and research are needed to figure out what this language-agnostic transformation would look like. It doesn’t sound like a novel problem, so we probably can get some inspiration from existing open-source software out there. We would still need to figure out how to implement this in a performant way.

Preemptively, I strongly recommend against any system locale-dependent solution. Anki is very often used to learn other languages, which I believe will often make the locale you get from the system not what the users want their decks to be sorted with.

dae · July 21, 2022, 3:51am

Anki already injects a unicase collator which is used for things like sorting decks. Before we could switch to it here though, we’d need to get an idea of the performance impact it would have, and make sure that it doesn’t break the current behaviour where numbers are sorted in a natural order.

BernardoSulzbach · July 21, 2022, 7:44am

Great! I didn’t know we had that. I can look into testing this if you want me to.

dae · July 22, 2022, 3:25am

I’ve added it to Investigate performance impact of using unicase sorting for sfld column · Issue #1979 · ankitects/anki · GitHub

Topic		Replies	Views
Duplicate Fields With Values Help	6	1369	May 1, 2023
Card sorting in Card Management Help	12	1607	May 1, 2023
Sort field in alphabetic order problem Help	10	1429	April 26, 2023
Sort or Search in Browse window are case-insensitive for Latin alphabet, but case-sensitive for other alphabet (Cyrillic) Help	4	324	January 12, 2024
Allow to sort numerically Suggestions	6	2434	March 29, 2024

Sorting should produce a more alphabetical order

Related topics