FSRS 5: Optimizer

So, this and what else are you willing to test. It seems that you and Jarrett have a lot on your heads.

Right now I’m just gonna wait for like 30 hours to test 1 vs 10 seeds. That stuff takes forever even on a relatively small dataset
EDIT: update, I finished doing it with 2 seeds on 200 collections, some with (relatively) few reviews, and some with a lot of reviews. Note that I deliberately selected collections close to the extreme ends - either a lot of reviews or not a lot.
Here is the graph of the differences as a function of the number of reviews:

For each collection, RMSE(2 seeds) is calculated simply as min(RMSE(seed 1), RMSE(seed 2)).

Average difference = -0.0005, meaning that using 2 seeds to run the optimizer twice and then selecting the minimum improves RMSE by 0.05% (absolutely, not relatively).
Max difference = -0.0086 (-0.86%) in favor of 2 seeds.
Average difference for collections with <5k reviews = -0.0007 (-0.07%) in favor of 2 seeds.
Average difference for collections with <1k reviews = -0.0008 (-0.08%) in favor of 2 seeds.

Here’s the distribution of differences:

It looks like an exponential distribution to me.

Of course, this is with just 2 seeds. I will keep running it with more seeds. More seeds = the minimum will be lower = the difference will be bigger.
Also, this is only for 200 collections, but I don’t want to run it on 20k collections, it will take me months.

2 Likes

Fixed in Feat/align FSRS-rs with PyTorch Implementation by L-M-Sherlock ¡ Pull Request #3540 ¡ ankitects/anki ¡ GitHub

5 Likes

Thank you Jarrett :grin:. Does this mean FSRS Python is as good as FSRS Rust now? How much decrease in RMSE does this bring?


I suppose this topic has served its purpose. @expertiumi would like to move the discussions about the sims and the new ideas to a new topic.

I’m benchmarking the new FSRS-rs now. We will know the result tomorrow.

1 Like

After the recent updates, I have a question: Are Suspended Cards taken into account now when optimizing parameters? For users like me who don’t have an easy way to delete cloze cards other than suspending cards (otherwise the complete note with all its cards would be deleted), it would not make a lot of sense to include cards that we originally wanted to delete to have its reviews be included in the optimizer.

If it turns out to be true, I suggest reverting the change or at the very least including an option to turn off consideration of suspended cards during optimizing.


It’s the same as before.
Maybe you are confused by this (beta 4)
image
This is about something a bit different - updating memory states of suspended cards. Previously, they weren’t updated, meaning that if you un-suspend cards, they would have inaccurate memory states.

1 Like

Ah, this clears things up a bit. Thanks :grin:

Here is the result:

4 Likes

Hmm, I don’t know how to translate this. Does this mean that the new fix is not a significant improvement :question:

It shows the latest version of Rust optimizer (used in Anki) is as good as the Python optimizer.

However, the improvement is not very large to be honest. (Average RMSE decreased from 5.1% to 4.9%)

2 Likes

Fraction of cases where algorithm A (row) outperforms algorithm B (column)

Interestingly, FSRS-5 (Python) is still very slightly better than FSRS-rs, and the difference is statistically significant.

These are p-values

1 Like

Well any difference in RMSE that could be achieved, we will take it. In the end every small difference in RMSE however miniscule it is would accumulate and lead to an overall better FSRS.

With T-test, it’s not statistically significant:

With Wilcoxon test, it’s statistically significant, but the effect size is very small:

|r| < 0.1 means “no effect / very small effect”.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.