@DerIshmaelite here’s a sneak peak at Alex’s neural net:
RWKV uses a modified version of the RWKV architecture, which combines the properties of an RNN and a Transformer. It has too many input features to list, so here is a short version: interval lengths, grades, duration of the review aka answer time, note ID, deck ID, preset ID, sibling card information, hour of the day, day of the week, and the number of reviews done today aka workload. Other features are either variations of these (like reviews of new cards instead of reviews of non-new cards) or are derived from these via some data transformations.
RWKV-P predicts the result of a review at the time of the review. Does not have a forgetting curve in the traditional sense and predicts the probability of recall directly. Just like GRU-P, it may output unintuitive predictions, for example, it may never predict 100% or predict that the probability of recall will increase over time.
They are actually the same net, just in two different “modes” or “regimes”.
Here’s what RWKV-P + Anki would look like in practice:
- No “Optimize”, it would be pretrained on 10k users and then the same parameters would be used for everyone.
- No parameters window.
- Accurate probability of recall for any interval, even on the scale of minutes and seconds, unlike FSRS.
- No user-defined learning steps.
- It can accurately predict p(recall) (R) for cards for which it is impossible for FSRS to perform well. Consider the following simplified example: the user was in a good mood and was spamming Good at first, but then his mood got worse, and now he is spamming Again. The sequence looks like this:
Good, Good, Good, Good, Good, Again, Again, Again, Again, Again
FSRS cannot take advantage of this pattern, RWKV-P can. - No memory stability and difficulty values, and also no forgetting curve graphs.
- Adapting the simulator to work with RWKV-P would be either impossible or insanely difficult.
- No intervals above answer buttons.
Instead, scheduling would be completely different: every hour/minute/second RWKV-P would calculate R for all cards, then it would show you cards for which R is below the threshold that is desired retention.
You can’t really calculate intervals in a meaningful way using RWKV-P. So instead it would just recalculate R once per hour/minute/second and show you what needs to be reviewed. It would be extremely computationally expensive to do this every minute (let alone every second), so for the foreseeable future this is not viable.
We would need extremely aggressive optimization (in the “make it run faster” sense, not in the “find optimal parameters” sense) and/or hardware advancements for it to become viable. Hardware improves anyway, but at this rate it will be a while until we can run RWKV on 100k cards every single minute on a smartphone.
Also, depending on how you look at it, the number of parameters in RWKV should be considered zero since its parameters are not being optimized for every user individually, it just uses the same parameters. In this benchmark we consider FSRS-6 with default parameters to have 0 optimizable parameters since parameters are static and the same for all users, so by the same logic RWKV and RWKV-P also have 0 optimizable parameters.