FSRS 5: Optimizer

DerIshmaelite · October 27, 2024, 3:12pm

So, this and what else are you willing to test. It seems that you and Jarrett have a lot on your heads.

Expertium · October 27, 2024, 3:12pm

Right now I’m just gonna wait for like 30 hours to test 1 vs 10 seeds. That stuff takes forever even on a relatively small dataset
EDIT: update, I finished doing it with 2 seeds on 200 collections, some with (relatively) few reviews, and some with a lot of reviews. Note that I deliberately selected collections close to the extreme ends - either a lot of reviews or not a lot.
Here is the graph of the differences as a function of the number of reviews:

For each collection, RMSE(2 seeds) is calculated simply as min(RMSE(seed 1), RMSE(seed 2)).

Average difference = -0.0005, meaning that using 2 seeds to run the optimizer twice and then selecting the minimum improves RMSE by 0.05% (absolutely, not relatively).
Max difference = -0.0086 (-0.86%) in favor of 2 seeds.
Average difference for collections with <5k reviews = -0.0007 (-0.07%) in favor of 2 seeds.
Average difference for collections with <1k reviews = -0.0008 (-0.08%) in favor of 2 seeds.

Here’s the distribution of differences:

It looks like an exponential distribution to me.

Of course, this is with just 2 seeds. I will keep running it with more seeds. More seeds = the minimum will be lower = the difference will be bigger.
Also, this is only for 200 collections, but I don’t want to run it on 20k collections, it will take me months.

L.M.Sherlock · October 28, 2024, 5:11am

Fixed in Feat/align FSRS-rs with PyTorch Implementation by L-M-Sherlock · Pull Request #3540 · ankitects/anki · GitHub

DerIshmaelite · October 28, 2024, 7:51am

Thank you Jarrett . Does this mean FSRS Python is as good as FSRS Rust now? How much decrease in RMSE does this bring?

I suppose this topic has served its purpose. @expertiumi would like to move the discussions about the sims and the new ideas to a new topic.

L.M.Sherlock · October 28, 2024, 8:31am

I’m benchmarking the new FSRS-rs now. We will know the result tomorrow.

DerIshmaelite · October 28, 2024, 9:56am

After the recent updates, I have a question: Are Suspended Cards taken into account now when optimizing parameters? For users like me who don’t have an easy way to delete cloze cards other than suspending cards (otherwise the complete note with all its cards would be deleted), it would not make a lot of sense to include cards that we originally wanted to delete to have its reviews be included in the optimizer.

If it turns out to be true, I suggest reverting the change or at the very least including an option to turn off consideration of suspended cards during optimizing.

Expertium · October 28, 2024, 10:09am

It’s the same as before.
Maybe you are confused by this (beta 4)

This is about something a bit different - updating memory states of suspended cards. Previously, they weren’t updated, meaning that if you un-suspend cards, they would have inaccurate memory states.

DerIshmaelite · October 28, 2024, 10:38am

Ah, this clears things up a bit. Thanks

L.M.Sherlock · October 29, 2024, 1:47am

Here is the result:

DerIshmaelite · October 29, 2024, 10:57am

Hmm, I don’t know how to translate this. Does this mean that the new fix is not a significant improvement

vaibhav · October 29, 2024, 11:08am

It shows the latest version of Rust optimizer (used in Anki) is as good as the Python optimizer.

However, the improvement is not very large to be honest. (Average RMSE decreased from 5.1% to 4.9%)

Expertium · October 29, 2024, 12:14pm

Fraction of cases where algorithm A (row) outperforms algorithm B (column)

Interestingly, FSRS-5 (Python) is still very slightly better than FSRS-rs, and the difference is statistically significant.

These are p-values

DerIshmaelite · October 29, 2024, 12:41pm

Well any difference in RMSE that could be achieved, we will take it. In the end every small difference in RMSE however miniscule it is would accumulate and lead to an overall better FSRS.

L.M.Sherlock · November 7, 2024, 2:39am

With T-test, it’s not statistically significant:

With Wilcoxon test, it’s statistically significant, but the effect size is very small:

|r| < 0.1 means “no effect / very small effect”.

system · December 7, 2024, 2:40am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Anki 24.10/11 Release candidate Beta Testing	174	4189	December 6, 2024
Anticipated Features in 24.10 Help	41	407	November 26, 2024
FSRS optimization make the next reviews take too long to appear FSRS	3	139	September 16, 2024
Measures to prevent new FSRS parameters from being worse than the old ones Suggestions	29	2479	February 27, 2024
The Only 100% Foolproof Strategy Is Inventing A Time Machine FSRS	19	321	September 7, 2024

FSRS 5: Optimizer

Related topics