So, this and what else are you willing to test. It seems that you and Jarrett have a lot on your heads.
Right now Iâm just gonna wait for like 30 hours to test 1 vs 10 seeds. That stuff takes forever even on a relatively small dataset
EDIT: update, I finished doing it with 2 seeds on 200 collections, some with (relatively) few reviews, and some with a lot of reviews. Note that I deliberately selected collections close to the extreme ends - either a lot of reviews or not a lot.
Here is the graph of the differences as a function of the number of reviews:
For each collection, RMSE(2 seeds) is calculated simply as min(RMSE(seed 1), RMSE(seed 2)).
Average difference = -0.0005, meaning that using 2 seeds to run the optimizer twice and then selecting the minimum improves RMSE by 0.05% (absolutely, not relatively).
Max difference = -0.0086 (-0.86%) in favor of 2 seeds.
Average difference for collections with <5k reviews = -0.0007 (-0.07%) in favor of 2 seeds.
Average difference for collections with <1k reviews = -0.0008 (-0.08%) in favor of 2 seeds.
Hereâs the distribution of differences:
It looks like an exponential distribution to me.
Of course, this is with just 2 seeds. I will keep running it with more seeds. More seeds = the minimum will be lower = the difference will be bigger.
Also, this is only for 200 collections, but I donât want to run it on 20k collections, it will take me months.
Thank you Jarrett . Does this mean FSRS Python is as good as FSRS Rust now? How much decrease in RMSE does this bring?
I suppose this topic has served its purpose. @expertiumi would like to move the discussions about the sims and the new ideas to a new topic.
Iâm benchmarking the new FSRS-rs now. We will know the result tomorrow.
After the recent updates, I have a question: Are Suspended Cards taken into account now when optimizing parameters? For users like me who donât have an easy way to delete cloze cards other than suspending cards (otherwise the complete note with all its cards would be deleted), it would not make a lot of sense to include cards that we originally wanted to delete to have its reviews be included in the optimizer.
If it turns out to be true, I suggest reverting the change or at the very least including an option to turn off consideration of suspended cards during optimizing.
Itâs the same as before.
Maybe you are confused by this (beta 4)
This is about something a bit different - updating memory states of suspended cards. Previously, they werenât updated, meaning that if you un-suspend cards, they would have inaccurate memory states.
Ah, this clears things up a bit. Thanks
Hmm, I donât know how to translate this. Does this mean that the new fix is not a significant improvement
It shows the latest version of Rust optimizer (used in Anki) is as good as the Python optimizer.
However, the improvement is not very large to be honest. (Average RMSE decreased from 5.1% to 4.9%)
Fraction of cases where algorithm A (row) outperforms algorithm B (column)
Interestingly, FSRS-5 (Python) is still very slightly better than FSRS-rs, and the difference is statistically significant.
These are p-values
Well any difference in RMSE that could be achieved, we will take it. In the end every small difference in RMSE however miniscule it is would accumulate and lead to an overall better FSRS.
With T-test, itâs not statistically significant:
With Wilcoxon test, itâs statistically significant, but the effect size is very small:
|r| < 0.1 means âno effect / very small effectâ.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.