Sorting should produce a more alphabetical order

As of Anki 2.1.54, when sorting by “Sort Field”, letters with diacritics and umlauts don’t quite end up where users would expect them to. This is due to the comparison being a simple collate nocase in SQLite. The nocase collation also only folds ASCII letters, which causes Ü (U+00DC) to precede ä (U+00E4), even though a precedes U.

It doesn’t seem easy to do much better than this in SQLite. We could provide our own collation function, an idea that I don’t like very much as we would now have to maintain some C code around (I think). Alternatively, we could order by a string obtained after performing some normalization of the “Sort Field” string in SQL, but I don’t believe that would be performant enough.

I think we can get to some fairly “language-agnostic” alphabetical sorting order which would be better than the one we currently have. To give an example, for German, there are at least two standardized transformations to produce “more” alphabetical sorting orders:

  • DIN 5007 Variant 1 (“Dictionary order”)
ä = a
ö = o
ü = u
ß = ss
  • DIN 5007 Variant 2 (“Phone book order”)
ä = ae
ö = oe
ü = ue
ß = ss

Some discussion and research are needed to figure out what this language-agnostic transformation would look like. It doesn’t sound like a novel problem, so we probably can get some inspiration from existing open-source software out there. We would still need to figure out how to implement this in a performant way.

Preemptively, I strongly recommend against any system locale-dependent solution. Anki is very often used to learn other languages, which I believe will often make the locale you get from the system not what the users want their decks to be sorted with.

1 Like

Anki already injects a unicase collator which is used for things like sorting decks. Before we could switch to it here though, we’d need to get an idea of the performance impact it would have, and make sure that it doesn’t break the current behaviour where numbers are sorted in a natural order.

Great! I didn’t know we had that. I can look into testing this if you want me to.

I’ve added it to Investigate performance impact of using unicase sorting for sfld column · Issue #1979 · ankitects/anki · GitHub