Add analysis of review logs in the optimizer of FSRS

L.M.Sherlock · October 9, 2022, 10:47am

Continuing the discussion from Big update in FSRS4Anki v3.0.0:

In the FSRS4Anki v3.2.0, I developed a new feature to analyze the review logs in a more explainable way:

       r_history  avg_ivl  avg_retention  stability  factor  group_cnt
1              1      1.7         0.7649     1.0391     inf       7978
2              2      1.0         0.9009     1.0893     inf        234
3              3      1.5         0.9622     5.3588     inf       9070
4              4      3.8         0.9656    12.1251     inf      11436
12           3,1      1.1         0.9260     1.5290  0.2853        410
13           3,2      3.6         0.9296     8.3895  1.5656       1091
14           3,3      3.9         0.9664    15.1840  2.8335       6527
15           3,4      8.8         0.9376    20.8551  3.8917        724
45         3,3,1      1.2         0.9408     2.2464  0.1479        239
46         3,3,2      6.5         0.9319    16.7267  1.1016        594
47         3,3,3      9.0         0.9602    23.5304  1.5497       5036
152      3,3,3,1      1.8         0.9460     3.5427  0.1506        357
153      3,3,3,2     22.8         0.8804    18.4659  0.7848        448
154      3,3,3,3     18.6         0.9412    35.1949  1.4957       3052
369    3,3,3,3,1      1.5         0.9519     3.9697  0.1128        249
370    3,3,3,3,2     23.3         0.8438    14.4558  0.4107        149
371    3,3,3,3,3     39.5         0.9135    46.9134  1.3330       1423
667  3,3,3,3,3,3     74.3         0.8528    55.5966  1.1851        411

r_history is the history of ratings on each review. avg_ivl is the average interval when you reviewed cards. avg_retention is the average retention. stability is the estimated memory state variable, which is an approximate interval that induce 90% retention. factor is stability / previous stability. group_cnt is the number of review logs used to stat.

When the required retention is 90%, the avg_ivl is too short when the avg_retention is bigger than 90%.

So the FSRS generates these intervals for me:

1:again, 2:hard, 3:good, 4:easy

first rating: 1
rating history: 1,3,3,3,3,3,3,3,3,3,3
interval history: 0,1,2,4,9,19,39,79,159,317,624
difficulty history: 0,7.3,7.2,7.2,7.1,7.1,7.0,7.0,6.9,6.9,6.8

first rating: 2
rating history: 2,3,3,3,3,3,3,3,3,3,3
interval history: 0,3,8,19,44,100,223,489,1052,2226,4631
difficulty history: 0,6.1,6.1,6.1,6.0,6.0,6.0,6.0,5.9,5.9,5.9

first rating: 3
rating history: 3,3,3,3,3,3,3,3,3,3,3
interval history: 0,6,16,42,107,265,641,1512,3483,7842,17280
difficulty history: 0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0

first rating: 4
rating history: 4,3,3,3,3,3,3,3,3,3,3
interval history: 0,8,24,69,192,517,1348,3409,8376,20022,46625
difficulty history: 0,3.8,3.8,3.9,3.9,3.9,3.9,4.0,4.0,4.0,4.0

These intervals are longer than the intervals given by Anki’s built-in scheduler. You can generate analysis for yourself in this notebook:

Any feedback is welcome.

kuroahna · October 10, 2022, 2:50am

Still a bit confused with what I’m looking at in your table

L.M.Sherlock:

       r_history  avg_ivl  avg_retention  stability  factor  group_cnt
1              1      1.7         0.7649     1.0391     inf       7978
2              2      1.0         0.9009     1.0893     inf        234
3              3      1.5         0.9622     5.3588     inf       9070
4              4      3.8         0.9656    12.1251     inf      11436
12           3,1      1.1         0.9260     1.5290  0.2853        410
13           3,2      3.6         0.9296     8.3895  1.5656       1091
14           3,3      3.9         0.9664    15.1840  2.8335       6527
15           3,4      8.8         0.9376    20.8551  3.8917        724
45         3,3,1      1.2         0.9408     2.2464  0.1479        239
46         3,3,2      6.5         0.9319    16.7267  1.1016        594
47         3,3,3      9.0         0.9602    23.5304  1.5497       5036
152      3,3,3,1      1.8         0.9460     3.5427  0.1506        357
153      3,3,3,2     22.8         0.8804    18.4659  0.7848        448
154      3,3,3,3     18.6         0.9412    35.1949  1.4957       3052
369    3,3,3,3,1      1.5         0.9519     3.9697  0.1128        249
370    3,3,3,3,2     23.3         0.8438    14.4558  0.4107        149
371    3,3,3,3,3     39.5         0.9135    46.9134  1.3330       1423
667  3,3,3,3,3,3     74.3         0.8528    55.5966  1.1851        411

What does the first column represent? Such as the number 152 in

       r_history  avg_ivl  avg_retention  stability  factor  group_cnt
...
152      3,3,3,1      1.8         0.9460     3.5427  0.1506        357
...

And for r_history, you mention that it is “the history of ratings on each review”, so does 3,3,3,1 means that you press Good, Good, Good, then Again?
“avg_ivl is the average interval when you reviewed cards”. Is this the average interval for the cards you reviewed that day? Or is it the average interval for the cards that you cumulatively reviewed up to that day?
“When the required retention is 90%, the avg_ivl is too short when the avg_retention is bigger than `90%” I don’t think I fully understand this, can you rephrase it? Perhaps with some examples? Like in your example

       r_history  avg_ivl  avg_retention  stability  factor  group_cnt
...
152      3,3,3,1      1.8         0.9460     3.5427  0.1506        357
...

If you set required retention to 90%, then is the 1.8 day avg interval too short since your average retention is 0.9460? Is the 1.8 day average interval coming from Anki SM2?

Also, it’d be great to have an explanation of this in the notebook as you have described here

L.M.Sherlock · October 10, 2022, 3:07am

It is the index auto-generated by Pandas. You can ignore it.

Yes, but it doesn’t contain those reviews in learning steps shorter than 1 day.

It is the average interval for the cards that you cumulatively reviewed up to that day.

The average interval is coming from Anki SM2 and the delay that you actual reviews.

The average retention is coming from your reviews at those intervals.

Anki’s manual said that:

For moderately difficult material, the average user should find they remember approximately 90% of mature cards that come up for review.

So if your retention is less than 90%, it means that the default interval is too long for you. If it is bigger than 90%, the interval is too short.

L.M.Sherlock · October 10, 2022, 3:18am

I will add it in the next patch. Thanks for your suggestion.

jambamboleo · October 12, 2022, 8:38pm

BTW, what is the most efficient target retention?

Can we rely on the Piotr Woźniak’s following calculation from?

The relationship between the forgetting index and knowledge retention can accurately be expressed using the following formula:

Retention = -FI/ln(1-FI)

where

Retention - overall knowledge retention expressed as a fraction (0…1),

FI - forgetting index expressed as a fraction (forgetting index equals 1 minus knowledge retention at repetitions).

The above formula can be derived from the formula for the exponential decay of memory traces (R=e-d*t where R - retention, d - decay constant, t - time)

The greatest overall increase in the optimal interval can be observed for the forgetting index of about 20%. The overall increase takes into the consideration the fact that for forgotten items, the optimal interval decreases. Therefore, for the forgetting index greater than 20%, the positive effect of long intervals on memory resulting from the spacing effect is offset by the increasing number of forgotten items.

The greatest overall knowledge acquisition rate is obtained for the forgetting index of about 20-30% (see [Figure 3](https://www.supermemo.com/#Figure 3)). This results from the trade-off between reducing the repetition workload and increasing the relearning workload as the forgetting index progresses upward. In other words, high values of the forgetting index result in longer intervals, but the gain is offset by an additional workload coming from a greater number of forgotten items that have to be relearned.

https://static.supermemo.com/old_articles/images/ol_fig6.gif

Bear in mind that forgetting index is not simply the inverse of retention rate: Forgetting index in SuperMemo - supermemo.guru. The formula is

retention = -(forgetting index)/ln(1-(forgetting index))

20% forgetting index corresponds to a ~90% retention rate

L.M.Sherlock · October 13, 2022, 1:39am

In FSRS, retention is 1 - forgetting index. And the most efficient target retention depends on the parameters of FSRS.

Topic		Replies	Views
New features of FSRS4Anki from v3.0.0 to v3.6.0 Scheduling	7	1880	May 1, 2023
Introduce recent changes of FSRS4Anki, and want to collect some feedback Scheduling	8	699	December 21, 2022
Main updates of FSRS4Anki from v3.7.0 to v3.23.0 Add-ons	0	542	June 9, 2023
Why does FSRS provide fewer review than the original Anki SM-2? Scheduling	6	1785	November 9, 2023
Big update in FSRS4Anki v3.0.0 Scheduling	24	3467	May 1, 2023

Add analysis of review logs in the optimizer of FSRS

Related topics