I am a long-time Anki user and current Ph.D. student. In part prompted by this post on reddit, I’ve been thinking about how one might experimentally tweak the Anki scheduling algorithm. The most straightforward way one might try to find optimal spacing parameters would be just to run an RCT, but it seems possible to do better than that.
In particular, one direction that seems like it might be sensible would be to develop a scheduling bandit. This is not entirely trivial—at first blush, it seems like to get decent results, you might have to do something with hierarchical models and try to pool across decks (within each individual user) and across users.
From an implementation perspective, the former (pooling across decks) doesn’t seem like it would be especially difficult, but the latter (pooling across users) seems potentially thorny. In particular, I don’t see how you could accomplish that completely “client side.” One solution would be to create an add-on which overrides the default scheduling algorithm and does (something like) the following:
- Every time a user answers the card, it posts the user (or, say, a hash of their AnkiWeb user ID), the deck, and information about the card (e.g., number of previous failures, whether the most recent answer was correct, etc.) to an API.
- The bandit uses the information to choose the next review interval, which is sent as the response to the API request.
- The user’s local copy of Anki uses the response to schedule the next review interval.
Step 2, since it happens completely server-side, isn’t an issue from an implementation perspective. Likewise, my sense—which could be completely wrong—is that it would be reasonably easy to use existing hooks to implement Step 3. However, I don’t have a clear enough understanding of the Anki source code to understand what is involved in Step 1. For instance, is it possible to simply import the python
requests library or
urllib to handle the API call? Base Anki obviously does something similar to this to sync with AnkiWeb, but I don’t know exactly how that functionality is implemented.
If that’s true, it would seem like something like this wouldn’t actually be prohibitively difficult to put together. (Not to say that spinning up a server and API for the bandit is necessarily a snap, but there’s no question that that’s possible to implement.)
Thanks! Also, if this or something similar has been tried before, please let me know!
I’ve thought a bit about this as well. I think most discussions about optimizing SRS scheduling are misguided, IMHO; but I think you are correct that meaningful scheduling improvements should take into account a far broader context than per-card user ratings. Data from other users who are studying the same cards seems like a great place to start mining for that additional context.
Out of curiosity, is your thesis related to these questions/goals? Just curious if you’re interested in contributing any research, design, programming, etc, hours to this.
Yes, the requests library is available for add-ons.
@ankipalace is working on a project right now that would enable all of this as soon as this summer (including the infrastructure, API, and desktop add-on code). With those pieces in place, it would be trivial for us to extend the API and client-side add-on to facilitate curating this data and exposing it for anybody to analyze or to provide inputs to a scheduling bandit.
One path forward that might have potential would be to crowdfund a Kaggle competition.
Technical notes: I’m building out the API right now with the Django REST Framework and am considering adding MongDB integration to allow for more flexibility in the payloads that the API can receive and store.
Hi @andrewsanchez —
Thanks for your response! In short, the answer is “yes”: I am interested in putting in the hours to (1) developing a modeling approach and (2) coding the add-on. (I don’t know much about design, but, at least the way I’ve been thinking about it so far, there shouldn’t be too much design needed since hopefully this can be a largely under-the-hood drop-in replacement for the default scheduling algorithm.)
I’ve been talking with some more experienced students and postdocs in my lab who have some experience in education research to think about modeling approaches, and I think I now have a better understanding of what seems sensible. In particular, I think it makes sense—as it usually does!—to start with a slightly simpler approach than what I described in my initial post and do things iteratively. With the benefit of some of that discussion, I think it seems like it should be possible to do a reasonable first iteration just using the
requests library (plus building some things server-side to collect data and train the prediction model), although—assuming the simpler approach actually works, which is, of course, not at all guaranteed—it would be interesting as a next step to try to take advantage of things like shared decks etc. Is there a git repo where I could learn a little more about the
AnkiHub add-on you’re currently developing?