Take this text from the tooltip Anki uses for Retention stats (formerly “True Retention”):
If you are using FSRS, your retention is expected to be close to your desired retention.
Now, that should be generally true, but —
We can have backlogs
We study outside regular schedule in filtered decks
We change DR from time to time
If none of that applies, at least one would have reviews from cards that weren’t rescheduled after the last regular optimisation.
The point is that retrievability at the time of review can be very different from the DR that you’ve set so Anki. This makes comparison harder, because you can’t just compare your actual retention with whatever the desired retention value is.
Therefore, Anki should store retrievability for each and every review just like it does for difficulty. That means we can come up with average retrievability for all the reviews we have done and can compare it to the retention.
I’ve also started to realise — and what prompted this post — that storing difficulty isn’t very useful for the user. Apart from the fact that it’s a really unintuitive number (what is 86.7% difficulty?), it also is not comparable across presets and collections. So what value does it provide? I really think storing retrievability is a better choice.
Retrievability depends on the FSRS parameters. And they change with every optimization. Why record information that becomes obsolete over time? Moreover, if necessary, it can be calculated.
If you change True Retention, it will no longer be True Retention. The fact that this table allows you to indirectly evaluate the work of FSRS is only an additional function.
To evaluate FSRS, there is already a function "Check health when optimizing (slow)".
If you need some kind of numerical measure of FSRS accuracy, you could suggest returning log loss and RMSE.
Do you mean we can retroactively calculate all the previous R values? That would be awesome then.
But as for recording the R values in revlog, I agree it’s weird to record something that can become obsolete. But that felt like the only option, besides Anki records difficulty anyway which is unhelpful and can be confusing even (people get confused when the D value suddenly change after a optimisation).
The goal is to evaluate individual performance rather. This is also useful if you’re comparing different hours of the day:
They’re similar graphs but take a look at this for what I’m thinking about:
Similar to that, I think Anki can show “expected retention” in the retention table (“true retention” formerly). So, you have the “actual retention” and you have the “expected retention” to contrast them both.