Fixing Low Optimal Retention Bias

I ran a few experiments looking at optimal retention in FSRS. Simulations tend to prefer low target retentions, which is counter intuitive.

Here is a notebook showing what I mean, and proposing a solution which leads to sensible optimal retention plots.

The core point is that current optimal retention simulations model unseen cards as having retrievability of zero. This is not true as we do not grade all new cards with AGAIN; the true average retention of unseen cards is 1-P(first grade AGAIN). This systematic under-estimation of retrievability for unseen cards results in optimal retention plots favouring retentions which allow the testing of more new cards.

Applying this modifier to simulations removes the bias towards low target retentions in optimal retention plots, giving sensible recommended optimal retentions for users.

1 Like

IIRC, the current simulation has taken the user’s rating distributions of new cards into account.

Reference:

2 Likes

I didn’t mean the distribution of first ratings during simulation is not correct (sorry for the bad wording), but that when we compare the set of simulations used for optimal retention plots we are not comparing apples-to-apples.

Unseen cards currently do not contribute to the total retention in the optimal retention graph. Some simulations have more unseen cards than others. Unseen cards do not have zero retention in reality (we do not grade 100% of unseen cards again), so to avoid bias towards simulations with fewer unseen cards (due to the current underestimation of their retention) we need to adjust total retention measures to account for them.

simulate.rs::simulate explicitly excludes new cards when calculating memorized_cnt_per_day (one potential fix may be adding an else here to add the retention contributed by new cards).

We could also post-hoc add this in, e.g. in simulate.rs::optimal_retention by looking at the introduced_cnt_per_day to calculate the number of unseen cards in the deck for each simulation.

From my own experience, the formula is (almost) fine (It needs to be updated with this branch GitHub - Luc-Mcgrady/anki at cheesecake-alex otherwise existing cards already count as a reward for the final computation), the issue is as you said, if the user indeed the R ranges that will be used to check for efficiency.

In my case, for about 1.5y, I never review anything at <70% which means my FSRS params really focused only on >70% range and I got because of that a low decay (w[20]) at around ~0.11

Recently, I started running my own fork and review my cards at around 30-60% R instead of the mandatory >70% (Desired Retention can’t be set lower).

Doing that, and having now reviews and a good calibration for cards with >30R, we can see my “optimal R” evolve into a nicely reversed-U shape showing what is indeed the optimal in my case, which might be surprisingly low, but having run reviews at ~50% R for the past 2 months, I can definitely see why it has huge huge benefits in my case (but that’s out of topic here, that point is dependent on parameters, for some it might be a classical 70%).

But as you can see, if I’d run a naive approach and search the “max efficiency”, it would still be at <10%, but IMO when we seek that maximum, we need to not consider R that are not covered by reviews.

Incidentally, it also means that I think that a greater amount of fuzz, or some protocols to make sure some reviews are reviewed at very low R like [30, 50] are done.

This is why I opened a PR to allow people to set DR as low as 10% here Allow 10% as minimum Desired Retention by JSchoreels · Pull Request #4462 · ankitects/anki · GitHub

Hmm, grading a card again push its R to 100%, so it’s not 0%.

In fact, this is why optimal retention is often lower than expected : For many people with low decay, pressing “Again” to a new cards, mean having forever a R that will be >0, so introducing 10 cards today and never reviewing them again will give you some Memorized “forever” (if decay is low enough) while Workload˜=0 thus meaning Efficiency peak at very low R.

But by reviewing more things you don’t know, as explained in my previous comment, your decay will kinda increase (and if it doesn’t, well, it truly means you never totally forget anything, so the very low R peak is still justified).

In my case I went from .11 to .35 decay

Allowing retentions below 70% I agree is sensible. I’ve also noticed the issue with FSRS only fitting on reviews around a tight retention range, I’m playing around with some simulations to see the impact of that on the side too (including the impact of adding more fuzz on how “correct” fitted FSRS parameters are). I agree adding more fuzz seems a sensible idea to ensure FSRS has samples across the whole forgetting curve.

The curve you posted illustrates the point I’m making well here; your optimal retention is 0% there, which is just the setting that will mean you never have to review cards again. This is because the optimal retention curve underestimates the retention of unseen cards, leading it to be biased towards target retentions which minimize the number of unseen cards (i.e. maximise the ratio (reviewing new cards) vs (reviewing learnt cards). This leads to non-sensical results such as yours.

It’s interesting if fitting FSRS on lower retention cards helps improve the shape of the optimal retention graph. I think even if that is the case the issue I’m raising in this thread is still an issue though, maybe just coincidentally made less major by fixing the issue of FSRS being data starved at low retentions.