FSRS vs SM2 in 2 seperate profiles

Currently with FSRS 5 I get the following statistics on my Profile1:

Log loss: 0.4479, RMSE(bins): 2.76%. Smaller numbers indicate a better fit to your review history.

I also have a Profile2 that I used 2-3 years ago with SM2. Although SM2 is not based on recalling probability, is there a way to compare or generate RMSE in these deck collection?

1 Like

No, you can’t generate RMSE on an SM-2 scheduling history. SM-2 wasn’t doing anything to predict your recall, so there’s nothing to compare it to.

RMSE (bins) can be interpreted as the average difference between the predicted probability of recalling a card (R) and the measured (from the review history) probability. For example, RMSE=0.05 means that, on average, FSRS is off by 5% when predicting R.

1 Like

I found this:

SM-2-trainable: a variant of SM-2 where the parameters are trainable

in GitHub - open-spaced-repetition/srs-benchmark: A benchmark for spaced repetition schedulers/algorithms

How could I generate a SM2 trainable model for comparison purposes ?

I’ve moved your post to the FSRS category so someone familiar with the benchmarking models might see it.

1 Like

That was made just for benchmarking purposes, so you’d have to copy-paste the Python code to use it on your own. The original SM-2 doesn’t predict probabilities, so in the benchmark LMSherlock added some extra formulas on top of it.
Theoretically, it’s possible to add the same probability-related formulas to Anki’s version of SM-2, hook it up to the optimizer and run optimization the same way as we do with FSRS…but why?
Btw, according to the benchmark, FSRS-5 outperforms SM-2 trainable in 97.4% of cases, so even if you level the playing field by making both optimizable and using the same optimizer, FSRS is still clearly better.

Actually, on second thought, it would be fun and it would make it clear that SM-2 isn’t as good as FSRS. We could show both FSRS and SM-2 metrics side-by-side in “Evaluate”, so that people will be like “SM-2 has higher numbers, so it’s worse, ok, got it”.
Is it worth the effort to implement? Debatable. Will Jarrett do it? I wouldn’t bet on it.

4 Likes

Thank you! I was looking for the graph with the flat SM-2 line across the top, and I couldn’t think of where it was. But this is much clearer!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.