25.02 possible bug - minimum recommended retention 0.70

The problem started to appear after the update to the 25.02 version.
For multiple presets I’ve got a minimum recommended retention calculated to 0.70. It is surprisingly low. If I put a large value in “days to simulate”, like 3650 I’ve got a more realistic result, like 0.82.
Can it be related to the fact, that I have a lot of new cards in those decks?

Example weights:

0.1630, 0.3126, 23.7797, 48.4751, 6.4131, 0.9241, 2.3493, 0.0010, 1.2731, 0.0301, 0.8056, 1.9036, 0.0355, 0.1766, 2.3649, 0.1738, 3.5567, 0.2962, 1.1574

Encountered the same, interesting that changing the amount of days has such an effect on the recommended value.

This is most likely related to the fact that pre 25.02, The simulator had a bug where the change in difficulty was clamped rather than the resultant difficulty itself. also CMRR will naturally increase as you increase the days to simulate because the later days will have less reviews / time spent.

The simulator and CMRR use the same code so to see this, you divide the total time spent with the last value of “Memorized” (What CMRR tries to optimize for

).

Heres mine for example (#1: 83% DR, #2: 70% DR (My CMRR))

image
image

83%: 8851 / 15.03 = 588.9 cards/day
70%: 8249 / 9.49 = 869.2 cards/day

TLDR: The simulator was bugged before the update. 0.70 is now the correct value for you.

4 Likes

@L.M.Sherlock

Hmm, if expected CMRR changes like this it might be better to calculate it for every semester and change it accordingly.

1 Like

I understand, that from a pure simulation perspective 70% DR should let me memorize more material than 85% DR.
However, from a practical standpoint, I am still skeptical. With 70% DR one is failing twice as many reviews, compared to 85% DR - which makes reviewing far more tiresome and decreases learning motivation.

It’s a minimum, going above it isn’t advised against.
Personally, I have 70% CMRR and I also opt for 80% DR. It’s a personal choice.

With 70% DR one is failing twice as many reviews

1-(70%/85%) ≈ 18% more reviews failed. (Or at least it should be if FSRS is doing its job well?)

Maybe I am doing fast math wrong, but with 85% DR one is failing 15% of reviews. With 70% DR one is failing 30% of reviews. Thus twice as many.

2 Likes

I’m stupid sometimes…

1 Like

A little off-top:
I am not sure how the current implementation of “Compute minimum recommended retention” works.
Does it take into account how many cards of the deck I’ve already learned? Or does it just simulate starting a fresh deck, with all new cards?

It simulates it without existing cards:
(Unsure how much the codes going to help on this one but have it anyway.)

It does take your existing reviews into account though. (Beyond just your parameters.)

(list of used attributes):

1 Like

Just think in terms of odds rather than probabilities
odds=1/(1-p)

If p=99%, then odds=1/(1-0.99)=100
If p=85%, then odds=1/(1-0.85)=6.66
If p=70%, then odds=1/(1-0.7)=3.33

Sadly, a lot of people get this stuff wrong. Someone might say “But it’s 99% vs 99.9%, that’s a 0.9% difference, who cares about 0.9%?”, but in terms of odds thats 100 vs 1000.
The closer the probability is to 0% or 100%, the better off you are by using odds instead. Doubly so if you have to present the numbers to people who are not math-savvy.

1 Like

I am starting to think, that Memorized in simulation is either misleading or calculated wrongly.
I’ve made 2 simulations, for DR 70% and 90%. This is on smaller deck on which I’ve gone through all almost new cards. There are 5830 cards in this deck.

With going through all cards with 70% DR my real recall on this deck will be lower, than with 90% DR - obviously. However, the simulation result suggests, that I will know the material equally well in both cases.

Big issue with that metric is that it considers total knowledge like the sum of all card’s R.

Let’s take this :
Your DR is 80. You have 80% R on 10 cards, you’re score is 8. Let’s say your workload is right now 10 reviews/day.

You drop DR to 70. a 240 interval becomes a 444d. Meaning you divided by 1.85 your workload.

If you do indeed 70% retention, it means with the same amount of reviews/day, you can in fact handle 18.5 cards in your deck with 10r reviews/day, and multiplied by 70%, you get a score 12.95 instead of 8.

Since the DR interval scaling is so aggressive in workload compared to the drop of DR itself, sure it sounds like a good idea !

Until you realize FSRS doesn’t translate well a DR=80 to good interval for DR=70%, and suddenly your Retention drop at 50-60. So now the score is 18.5*0.5 = 9. (True Story)

Basically, now you know just the same amount as before, but in fact you just know them at a 50-60% rate which was not really the goal right ?

Being able to do higher total score, higher memorized, by sacrificing scores, is only good to try to pass an exam/test with the least amount of time. For anyone who cares about actually IMPROVING, it’s just going full backwards.

It’s a very very bad metric and it’s based on some smart calculation making you think you’ll be better. Thing is, if you indeed want to perform as best as you can at a certain exam, doing as many cards as possible is probably the best strategy… But you won’t build any kind of mastery.

To me, this should not even be in the public build of Anki. People should be focused on building higher stability, or increasing retention, not on knowing less to still maximize some kind of average score.

2 Likes

Unless I misunderstand you, it seems like you raise 2 separate issues:

  1. Criticism of the sum of R as a metric
  2. FSRS not generalizing well across different levels of retentions

I disagree on the first one. Sure, we could make some kind of complicated metric that better reflects how much the user knows, but this one is (relatively) simple and works well if the algorithm itself works well.

Here’s an idea for a metric: average discounted stability.
It would be calculated as sum(R_i * S_i) / n, where R_i is retrievability of the ith card, S_i is the stability of the ith card, and n is the number of cards.
The difference between this and simply average S is that average S doesn’t take into account the fact that you won’t be able to recall 100% of your cards, only some fraction <100%.
While this would be a nice metric in the mathematical sense, it’s a lot less intuitive than “the number of cards that you are expected to remember” aka estimated total knowledge. As I said in my comment above, it’s not that we can’t come up with other metrics, it’s that estimated total knowledge is the most intuitive one.

R itself is fine, it’s more about the Total Knowledge and how considering “more knowledge” to have 1000 items with 1% R (Total Knowledge = 10) than 9 with 100% (Total Knowledge = 9), sounds off to me, and it’s also why I think for the “minimum recommended” (which is in fact optimizing Total Knowledge) tends to just advice to lower R as much as possible.

Now, is it worth it to make it more complex ? Maybe, if that Optimal DR was really a huge huge deal (for example, if it’s used by future iteration of FSRS, smarter scheduler with varying DR, etc etc), but right now I feel this metric is not that important at this state, thus why I was even arguing about its current usefulness.

Also, the second point, how FSRS translate a DR=90% prediction into a 70% prediction, is indeed something that would need to be really improved before switching DR “on the fly”. (Or maybe not translating it, but using a more aggressive recency weight and train different parameters for different DRs, but that slightly contradicts finding a forgetting curve then)

L.M.Sherlock can you implement average discounted stability and see if CMRR gives vastly different outputs if we optimize for average discounted stability/time instead of total knowledge/time?

Perhaps we should move this to a new topic

EDIT: man, how does Jarrett always manage to make code I don’t understand…I’m trying to run optimal_retention from here and I can’t figure out how.

EDIT 2: ok, screw it, I’ll just hard-code FSRS parameters into the function itself.
Alright, so original CMRR with default FSRS parameters outputs 0.84. If I use average discounted stability instead of total knowledge, I get 0.93. So yeah, it matters a lot.
Nvm, this implementation is not correct. Jaaaarrreeeettt, do the thing for me plz… :sob:

@L.M.Sherlock I implemented it like this, I hope it’s correct.
avg_discount_s = (card_table[col["retrievability"]] * card_table[col["stability"]]).sum() / len(card_table[col["stability"]])

I need to do the [today] thing, but idk how. Basically, the way I did it above calculates the metric across all days (I think?), but we need to calculate it only based on the last day.

EDIT 3: Nvm, seems like it was actually correct. Probably. Well, if it is, then the new value with default FSRS parameters is 0.93, which is very different from 0.84.
This means that using average discounted stability would push optimal retention very high compared to using total knowledge.

Next I used parameters for one of my hardest decks where MRR is always at 0.7. I got 0.87 with the new metric.
Next I used parameters for a hard deck where MRR is 0.73. I got 0.85 with the new metric.
Next I used parameters for an easier deck where MRR is 0.87. I got 0.88.

So far
0.84 → 0.93
0.70 → 0.87
0.73 → 0.85
0.87 → 0.88

EDIT 4: I did some more testing to see if it ever gets stuck at 0.95. It doesn’t, that’s good.

1 Like

Please open an issue in GitHub - open-spaced-repetition/fsrs-optimizer: FSRS Optimizer Package

I will take a look when I’m available.

1 Like

Done