It makes sense to compare two RMSE values only when they are calculated on exactly the same data.

FSRS 4.5 discards the same-day reviews but FSRS 5 uses them. So, does it make sense to compare the RMSE values obtained by clicking Evaluate on exactly the same collection in Anki 24.10 and Anki 24.06.3?

The scenario in which comparing them would make sense (and I advice implementing Evaluate in this way if it doesn’t already work like this):

When the user clicks Evaluate, FSRS should use ALL the reviews (including the same-day ones) to calculate the DSR at each review. Then, it should compare the predicted R and actual R ONLY for the first review of a day when calculating the RMSE.

I don’t get why should RMSE be calculated that way. You can use the latest version to compare parameters generated in 24.06.03 with parameters generated in 24.10.

You are right. But, what if I want to evaluate the same parameters in 24.06.3 and 24.10? The formulas have changed slightly. So, the same parameters will produce slightly different DSR values in FSRS 4.5 and FSRS 5.

Apart from facilitating proper comparison with older Anki versions, calculating RMSE this way is important because the predicted R for same-day reviews is always 100% and comparing the actual R with a constant doesn’t really make sense.

Come next version (or later), we’ll have to change it again though. And it creates extra work for Evaluate. I’m still fine if you or anyone else wants to implement this in a way that DSR is only calculated the first time around (when you still have old params).

It should change with short term version maybe? If not, this does make sense.

As of beta 2, FSRS has a “bug” in how it uses FSRS 4.5 parameters. It can be one reason why the RMSE is higher. This issue should be fixed in the next beta.

However, I am not 100% sure if the RMSE is directly comparable, which is why I have made this post.