Long search query optimization/debugging/update

I am checking if I can upgrade from 2.1.22. I use Cardistry, so it will be to .33.

A 15 KB search query I use for prioritization takes 18 seconds with 2.1.22. With .40 and .33 or .26 (the latter is slow either way), it takes over 1.5 minutes.

Filtered decks based on related queries (much shorter and much longer) find nothing.

What should I do?

Well, without seeing the queries it will be hard to give any advice. Anyway, 15 kB are 15,000 ASCII characters which seems like an exorbitant length for a search query thatā€™s generally used with manual input. Why do you need such long queries?

  1. Important cards are to be exported as text to feed MorphManā€™s Readability Analyzer for the generation of a frequency.txt file to prioritize vocab cards (cards with only one unknown morph) by frequency.
  2. To learn the more important new cards first, and to review only the most important ones in case I donā€™t have enough time in a day.

A short example query of that kind: example/Š²Š°Š¶Š½Š¾ŃŃ‚ŃŒ-Š²Š°Š¶Š½Š¾Šµ.ankisearch Ā· master Ā· aleksejrs / anki-priofiltergenerator Ā· GitLab
But now many tags begin with an asterisk.

This is prepended to it: -is:learn -is:buried deck:/ (is:due OR is:new)

Canā€™t really help you with the issue of not finding any cards as this is dependent on your personal collection.
But I did some quick benchmarking and it looks indeed as if the search is drastically slower on 2.1.35 than on 2.1.22 and again a whole lot slower on 2.1.41.
Iā€™ll dig a little deeper. (I have contributed some code to this part of Anki in the last year.)

3 Likes

The bad performance seems mostly restricted to tag searches. On 2.1.22, those were implemented with a fast SQL comparison, but didnā€™t support escaping wildcards an escpecially werenā€™t respecting tag boundaries. By now, these issues have been addressed and tags are now compared by regex which should also be the main contributor to the performance loss, though.
It should be possible to have much faster and still correct tag matching but not with the way tags are currently stored (as a single whitespace separated string). I happen to have suggested that only recently, here is the answer I got:

thatā€™s a change we could potentially make in the future, but it will make maintaining compatibility with older clients more of a headache & slow down the upgrade/downgrade process, so I think itā€™s best to keep the notes table untouched for now.

So for the time being, youā€™re out of luck but maybe tag searches will become fast again in the future.

Donā€™t know the sqlite schema for current Anki, but are tags indexed, indexes are cheap in SQLite?

Tags are packed into a single field, and need to be prefix matched, so indexes do not help us here.

Thx, I need to download and look at the code to see how itā€™s put together (so busy with my daytime job just now). Another option is to keep a small sqlite-database in memory for the most common queries and sync up with the file database from time to time. Assuming the dbase hits for tags is a bottleneck. Or nest up the single field and prefix values into something easier to query in this in-memory database.

Long time ago I had to do something similar with a then-known-photo-app, but I ended up just using NSDMutableDictionary caches to speed up XML export writing.