Hello @dae. Other than the technical difficulty of the migration from FSRS itself, what else do you regard as an obstacle to such a radical change, provided that a neural net (with downscaled parameters) is most definitely coming with its own set of benefits, seeing that Dekki.ai is already using a neural net of its own (granted it hasn’t been benchmarked yet)
I disagree. These kind of algorithms are usually difficult to interpret and seeing how entire worlds explode over the fsrs parameters that really aren’t that hard to udnerstand, I don’t want to see the users when they get no explanation of why the intervals are the way they are. It’s also arguably more expensive than the current approach in terms of server and human resources. What benefit does it bring other than jumping on the AI Hype train and being a fancy buzzword?
The effort is better spent elsewhere. Anki’s biggest issues have nothing to do with user intervals being off by 2-3%. Which is probably the net benefit you’d get on a large scale from a change like this (Didn’t check any benchmarks, so feel free to correct me).
I find interprability of parameters is only important for troubleshooting. @Expertium may have other opinions about this. He and Jarrett decide that is a major deal factor, I agree. But if you are going to talk about the users, well…the users don’t have to understand why an interval comes about. This creates more problems than it solves. In fact, there is even an argument made about hiding FSRS params completely from the option UI.
@Expertium
What benefit does it bring other than jumping on the AI Hype train and being a fancy buzzword?
This is where we start speaking about fine margins, where this discussion really is irrelevant to an average casual Anki user. It is definitely more accurate in terms of predicitng probability of recall, accurate <1d intervals, not having to keep on optimising the entire time, taking into account answer time etc… So its not just some AI hype train buzzword.
This of course all comes with a cost in terms of computing power. There is no doubt about that. But there are compacter versions of a neural net that doesnt have to possess 1 million parameters or something like that compare to FSRS 21 parameters.
So yeah, this has its own advantages and disadvantages, as everything else really. But I saw Dekki do it, so why can’t Anki
You probably only wanted a reply from dae, but I’ll mention some things.
- Anki needs to work on most modern devices. If FSRS-NN crashes and lags on mobile that’s a downside we should be avoiding.
- Isn’t it better to have dev power go into something more useful, like svelte migration?
And is this not already an issue with FSRS? For the average person, FSRS is already peak confusion. More complexity wouldn’t possibly be a limiting agent for their comprehension. Plus, IMO most users don’t care how the app works.
AFAIK if the NN doesn’t do a post-training on individual user data then you don’t need additional sync server resources. @DerIshmaelite, would this need to be optimized on individual user data?
As far as I know, I think your device is the thing that is going to have to work harder for optimizing. Not the server itself. The server isnt the thing doing your optimizing. But ask Expertium.
That is an IF. There is already a toggle to switch between FSRS and SM. If Dae even wants a more conservative approach, he might consider making a seperate NN build for Anki as an experimental build. It took a long time before FSRS first became integral to Anki. I am not expecting to be any different.
- Isn’t it better to have dev power go into something more useful, like svelte migration?
There are all sorts of issues everywhere. Dev resources are stretched out thin, but that doesnt mean we downplay oppurtunities to improve. If dae is convinced, he might set this high on his priority list. And a NN is certainly useful, maybe more for some users than others. I am not saying that he should do it in the right here and now. Just that he should maybe consider it and see what his opinion is.
Alex (1DWalker on Github) is making a neural network that is far more accurate than FSRS, and he’s mostly done. It currently has several million parameters, but that number can be reduced + we can use some tricks like quantization (storing weights as 8-bit numbers instead of 32 or 16-bit) to make the NN faster.
Advantages of using Alex’s NN:
- Significantly more accurate predicted probability of recall, especially at <1d intervals
- Accurate intervals on the scale of minutes, something that FSRS cannot do because it lacks a short-term memory model
- No need to show parameters
- No need for the “Optimize” button - it would be pretrained on 10k users
- It can use more input features, such as deck ID, preset ID, sibling cards information and workload (this number below) to further improve predictions
Disadvantages of using Alex’s NN:
- It will require a lot of work to make it fast enough, especially on mobile devices
- It’s hard to guarantee that it won’t do weird things, such as guaranteeing that interval(Again) <= interval(Hard) <= interval(Good) <= interval(Easy), or give someone a 100 years first interval, or something else
Personally, I agree that development efforts are better spent elsewhere for now
I believe we are talking about storing and syncing the params. If optimisation is needed and if it’s not local, then just ditch the idea.
That’s kind of a huge win:
- No need for auto-optimisation.
- No need for ten hundred presets.
I’ll just mention that Alex previously talked of setting hard rules on top of what the NN gives us. So, we can have heuristics seperate from the NN that guarantee things don’t go wrong.
@DerIshmaelite here’s a sneak peak at Alex’s neural net:
RWKV uses a modified version of the RWKV architecture, which combines the properties of an RNN and a Transformer. It has too many input features to list, so here is a short version: interval lengths, grades, duration of the review aka answer time, note ID, deck ID, preset ID, sibling card information, hour of the day, day of the week, and the number of reviews done today aka workload. Other features are either variations of these (like reviews of new cards instead of reviews of non-new cards) or are derived from these via some data transformations.
RWKV-P predicts the result of a review at the time of the review. Does not have a forgetting curve in the traditional sense and predicts the probability of recall directly. Just like GRU-P, it may output unintuitive predictions, for example, it may never predict 100% or predict that the probability of recall will increase over time.
They are actually the same net, just in two different “modes” or “regimes”.
Here’s what RWKV-P + Anki would look like in practice:
- No “Optimize”, it would be pretrained on 10k users and then the same parameters would be used for everyone.
- No parameters window.
- Accurate probability of recall for any interval, even on the scale of minutes and seconds, unlike FSRS.
- No user-defined learning steps.
- It can accurately predict p(recall) (R) for cards for which it is impossible for FSRS to perform well. Consider the following simplified example: the user was in a good mood and was spamming Good at first, but then his mood got worse, and now he is spamming Again. The sequence looks like this:
Good, Good, Good, Good, Good, Again, Again, Again, Again, Again
FSRS cannot take advantage of this pattern, RWKV-P can. - No memory stability and difficulty values, and also no forgetting curve graphs.
- Adapting the simulator to work with RWKV-P would be either impossible or insanely difficult.
- No intervals above answer buttons.
Instead, scheduling would be completely different: every hour/minute/second RWKV-P would calculate R for all cards, then it would show you cards for which R is below the threshold that is desired retention.
You can’t really calculate intervals in a meaningful way using RWKV-P. So instead it would just recalculate R once per hour/minute/second and show you what needs to be reviewed. It would be extremely computationally expensive to do this every minute (let alone every second), so for the foreseeable future this is not viable.
We would need extremely aggressive optimization (in the “make it run faster” sense, not in the “find optimal parameters” sense) and/or hardware advancements for it to become viable. Hardware improves anyway, but at this rate it will be a while until we can run RWKV on 100k cards every single minute on a smartphone.
Also, depending on how you look at it, the number of parameters in RWKV should be considered zero since its parameters are not being optimized for every user individually, it just uses the same parameters. In this benchmark we consider FSRS-6 with default parameters to have 0 optimizable parameters since parameters are static and the same for all users, so by the same logic RWKV and RWKV-P also have 0 optimizable parameters.
This sounds more or less like my Anki wet dream. But if RWKV has technically 0 optimizable params, why do you you say nonetheless it requires aggressive optimization? Surely there has to be a different way. How does Dekki do it?
I specifically said
aggressive optimization (in the “make it run faster” sense)
It requires optimization in the sense that we need to make it run fast, not in the sense of finding optimal parameters.
The word “optimize” can refer to either.
How is Dekkis LSTM different in that regard? They don’t seem to have that problem? Perhaps just use an LSTM like Dekki?
They probably just have much fewer parameters. I don’t know how much we can speed up RWKV without losing accuracy.
Would it be possible to use RWKV for short-term(what fsrs cannot do) and for long-term keep fsrs?
additional complexity for no good reason imo.
Yeah, at that point you might as well use RWKV anyway. There is no scenario where using RWKV + FSRS is more convenient and less confusing than using only one of them
Another fun piece of info
RWKV-P just absolutely destroys everything else. Look at the top row, it’s basically just 100.0% everywhere. RWKV (not P) that has a forgetting curve in the traditional sense still outperforms FSRS-6 for 93.7% of users.
. . . Wow. Outperforms FSRS-6 for all 9,999 users??? I’d be interested in seeing how this performance continues for users outside of RWKV’s dataset that it was trained on, since you said it has 0 optimizable parameters.
This is based on users that it hasn’t been trained on. It’s optimized on 5k users and then evaluated on another 5k users. This procedure is repeated twice to cover the entire dataset. The metrics that you see here and above are all based on data that algorithms have NOT been trained on.
Holy Shit…And people wonder why I am pushing for an NN like there is no tomorrow. Even an LSTM (presumably the same model as Dekki, just with less params ) defeats FSRS 6 by a large margin.
We could implement RWKV (not P), the overall experience and workflow would be more similar to FSRS, whereas with RWKV-P it would be more dissimilar and there would be more issues to solve. Though RWKV-P would also provide much more accurate predictions, so it makes more sense to implement that instead. Well, they are the same NN, just used in two different “modes” or “regimes”.
But again
- I asked Alex, he doesn’t want to spend 6-12 months doing Anki development stuff
- As for other devs, they probably aren’t very interested either and there are already 200 other issues to worry about
@dae Suggestion to Damien: Migrating to an NN (Neural Net) - #8 by Expertium
Your opinion is welcome. Just to be clear, as much as I care about predictive power, I’m neutral on implementing a neural net in Anki.
Also, I’m thinking of “FSRS and RWKV-P are both available, and users can choose”, NOT of “FSRS is removed”.
Also, just to clarify, even if you do say “yes”, the willingness to spend 6-12 months on Anki development of the guy who will actually do all the hard work will probably go up only marginally.
I’m just asking if, in theory, having a super-mega-ultra-accurate neural net that can predict the probability of recall based on interval lengths, grades, answer time, hour of the day, day of the week, workload, sibling cards’ reviews, note ID, deck ID, preset ID, phase of the fifth moon of Saturn, and the will of God sounds like something you would approve.