New progress in implement the custom algorithm

Thank you, this was a great help. After reading the paper, it seems that their dataset is specifically language learning so scientifically speaking there is no basis to comment on other topic. However, I no see a reason why this can’t be applied to diverse subject matter as the effect of future iterations will be marginal on subject matter - at the end of the day, the use case for this is a scheduling tool for reviewing material to learn. How do you see it?

1 Like

I assume that it could be used in Anki with various learning stuffs, and I am working on that.

2 Likes

Doesn’t @dae technically have access to a larger dataset of anyone who uses AnkiWeb? Maybe ask if he can share that with you for analysis.

This is my one doubt about this new algorithm. The work @L.M.Sherlock is doing is fantastic and I’m super appreciative, but I do have a concern about how well this will adapt outside of language learning. I’m hoping the optimizer takes care of that.

To be specific I’ve always taken the Anki algorithm/SM-2 to be inefficient for my use case for SRS, i.e. it results in too many reviews and gives me an excessive workload. So I was surprised to see FSRS giving even short intervals in many cases, resulting in more reviews. The SM-18 algorithm by contrast generally results in much longer intervals and thus fewer reviews.

I would assume that’s due to large differences in SRS use between different people, and the datasets used to develop the algorithms. From what I understand Wozniak used much smaller datasets that were basically hand picked - his own, and other experienced users of SueprMemo. By the nature of SuperMemo, and the emphasis in that community, people are generally not using it for rote memorization of asemantic information, but rather for retaining things that have already been understood, learned, are building upon current existing knowledge in a semantic way. Along with an emphasis on high quality card formulation.

By contrast the average SRS in most apps is not as skilled, and the most common use case is rote memorization of the new vocabulary in a foreign language - something that is not very semantic until 1) you’re getting quite good in the language, 2) you get good at card formulation for language learning. So again, if we’re taking huge datasets, the average person just isn’t going to be that great. Using SRS well is a skill developed over time and with effort.

Of course this difference in approach probably leads to SM-18 pretty bad for retention for a brand new SRS user try to rote memorize vocab in a new language, and SM-2/FSRS being pretty inefficient in terms of excess reviews for a highly skilled SRS user trying to retain semantic knowledge. The advantage of SM-2/FSRS I guess is that they cater for more users, in that they still fulfil their intended purpose of retention, even if they are inefficient for certain users, whereas SM-18 simply won’t provide the intended retention for an unskilled user. That does provide a forcing function though, and is why those that stick with SuperMemo tend to become highly skilled SRS users.

Both SM-18 and FSRS adapt to the user though, so the big question in my mind is how quickly and how well do they adapt for a user for which the initial algorithm is far from optimal.

3 Likes

I am developing a new feature to analyze the review logs in a more explainable way. The short/long intervals would be more acceptable if you see the analysis of your own reviews.

3 Likes

To be specific I’ve always taken the Anki algorithm/SM-2 to be inefficient for my use case for SRS, i.e. it results in too many reviews and gives me an excessive workload. So I was surprised to see FSRS giving even short intervals in many cases, resulting in more reviews. The SM-18 algorithm by contrast generally results in much longer intervals and thus fewer reviews.

Is it really that simple to just look at the intervals given by the scheduler to determine if it results in more reviews or not? And hence less workload? I feel like that’s overly simplistic and I also had that assumption when I was first trying to compare the Anki SM-2 vs FSRS scheduler. However, when I thought about it more, I feel like it’s much more complicated than that.

For example, if you’re using the default Anki settings (250% ease factor, interval modifier is 100%, learning steps are 1min 10min, relearning step is 1 minute, and your new interval after a lapse is 0% (meaning it will set back the interval to 1 day when you fail the card and pass it)), and assuming you’re always pressing Good, then your intervals will be

1,3,8,20,50,125,313,783,1958,4895

since the formula is New interval = old interval * interval modifier * ease factor = old interval * 1 * 2.5 = old interval * 2.5

This kind of analysis completely ignores the user’s retention rate. What if the user’s retention rate is below 90%? Like 80%? Or even 70%? That means they’ll be pressing Again and hence doing more reviews in addition to their daily reviews.

I personally don’t know how to calculate this, and not really sure if this is correct, but let’s assume that we’re trying to target 90% retention rate and the user’s current retention rate is 80% using Anki SM-2. Then that means that we’re forgetting an additional 10% of cards from our targeted retention rate (90%). And let’s assume we have 400 cards to review each day. That means that 10% of 400 of these review cards will be forgotten (we press the Again button), which is an additional 40 cards that you forgot from that targeted 90% retention rate. And your relearning step is also set to 10 minutes, meaning that when you press Again to fail the card, it’ll show the card again in 10 minutes. If we assume 100% retention rate for relearning these cards, meaning you press Good on all of these 40 cards, then that means you have reviewed an additional 40 cards for that day from the targeted 90% retention rate.

It’s quite unrealistic to assume 100% retention rate for relearning lapsed cards. So let’s assume we have 90% retention rate for lapsed cards instead. Using similar logic above, for 400 cards to review, and we forget an additional 10% of cards from our targeted 90% retention rate, then we fail 40 of these cards. And when we relearn these cards, from our assumption, 90% of the lapsed cards that we relearned will move into the next day, which is 36 cards. But now, 10% of those 40 cards (4 cards), are forgotten and hence we need to assume that we press Again, and finally press Good again. This means that we did an additional 4 cards since we forgot 10% of the relearned cards. So in total, this is 44 additional cards that we needed to review from the targeted 90% retention rate. It can be even more complicated than that, but this is just an example to consider.

Now, if FSRS truly does give you 90% retention rate for those intervals that it suggests; despite giving shorter intervals, then it could very well be that you do less reviews compared to Anki SM-2, since you’re not forgetting as many cards and hence not having to relearn as many cards.

IMO, it’s really hard to compare the 2 algorithms just by looking at the intervals that it gives. It’s really important to consider a bunch of things, such as the retention rate.

Of course, if you assume that a user that has 90% retention rate using Anki SM-2, then that logic above won’t apply, and they probably won’t have a reason to switch over.

However, I wouldn’t overlook FSRS just yet. For some analysis, I’ve got my friend who’s learning Japanese with about ~11k cards in his deck who has 88.7% young monthly retention rate, 87.8% mature monthly retention rate, and 88.4% young+mature monthly retention rate:

image

Here are the trained parameters and results from FSRS Optimizer v3.2.0

var w = [0.2566, 0.9675, 4.9943, -0.9537, -0.6965, 0.0093, 1.4051, -0.0352, 0.7127, 1.6794, -0.5165, 0.7874, 0.124];

1:again, 2:hard, 3:good, 4:easy

first rating: 1
rating history: 1,3,3,3,3,3,3,3,3,3,3
interval history: 0,1,1,3,6,13,28,60,126,261,536
difficulty history: 0,6.9,6.9,6.9,6.8,6.8,6.8,6.8,6.8,6.8,6.7

first rating: 2
rating history: 2,3,3,3,3,3,3,3,3,3,3
interval history: 0,1,3,7,17,41,96,222,504,1128,2488
difficulty history: 0,5.9,5.9,5.9,5.9,5.9,5.9,5.9,5.9,5.9,5.9

first rating: 3
rating history: 3,3,3,3,3,3,3,3,3,3,3
interval history: 0,2,6,16,42,109,276,685,1669,3993,9384
difficulty history: 0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0

first rating: 4
rating history: 4,3,3,3,3,3,3,3,3,3,3
interval history: 0,3,9,27,77,215,585,1557,4053,10328,25782
difficulty history: 0,4.0,4.0,4.1,4.1,4.1,4.1,4.1,4.1,4.1,4.1

You’ll notice that if you just press Good only, the change in intervals suggested by FSRS is

0 -> 2 = N/A
2 -> 6 = 6/2 = 300%
6 -> 16 = 16/6 = 266.67%
16 -> 42 = 42/16 = 262.5%
42 -> 109 = 109/42 = 259.5%
109 -> 276 = 276/109 = 253.21%
276 -> 685 = 685/276 = 248.19%
685 -> 1669 = 1669/685 = 243.65%
...

It actually gives larger intervals compared to Anki SM-2 where it is just a change in interval of 250%

So he has 88.4% young+mature monthly retention rate using Anki SM-2, but FSRS suggests giving him larger intervals at the beginning to target 90% retention rate, before it starts to drop below a change in interval of 250%. Which is interesting, because you’d expect him to be given more reviews to increase it from 88.4% to 90%. Perhaps this suggests that we shouldn’t just look at pressing Good only when doing the comparison between the 2 schedulers. There’s more factors involved to consider.

Now, for another comparison, here’s my own personal results, on about ~19k Japanese cards as well.

image

My monthly young retention rate using Anki SM-2 is 88.2%, monthly mature retention rate is 81.8%, and young+mature monthly retention rate is 86.2%.

Here’s my results after training using FSRS optimizer v3.2.0

var w = [1.2879, 0.5135, 5.1439, -1.4261, -1.0481, 0.0074, 1.337, -0.029, 0.7063, 1.8312, -0.405, 0.7284, 0.5238];

1:again, 2:hard, 3:good, 4:easy

first rating: 1
rating history: 1,3,3,3,3,3,3,3,3,3,3
interval history: 0,1,2,4,7,13,23,42,75,133,236
difficulty history: 0,8.0,8.0,8.0,7.9,7.9,7.9,7.9,7.9,7.8,7.8

first rating: 2
rating history: 2,3,3,3,3,3,3,3,3,3,3
interval history: 0,2,4,9,19,41,87,183,380,783,1596
difficulty history: 0,6.6,6.6,6.5,6.5,6.5,6.5,6.5,6.5,6.5,6.5

first rating: 3
rating history: 3,3,3,3,3,3,3,3,3,3,3
interval history: 0,2,6,15,37,92,224,536,1265,2943,6754
difficulty history: 0,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.1

first rating: 4
rating history: 4,3,3,3,3,3,3,3,3,3,3
interval history: 0,3,9,26,74,205,559,1496,3927,10126,25655
difficulty history: 0,3.7,3.7,3.7,3.7,3.8,3.8,3.8,3.8,3.8,3.8

You can see that the change in interval when pressing Good only is

0 -> 2 = N/A
2 -> 6 = 300%
6 -> 15 = 250%
15 -> 37 = 246.67%
37 -> 92 = 248.65%
92 -> 224 = 243.48%
224 -> 536 = 239.29%
536 -> 1265 = 236%
...

These intervals suggest that I need to review the cards “more often” to increase my retention rate from 86% monthly young+mature retention rate to 90% (or maybe I should look at the 81% monthly mature retention rate?). But again, is it really more reviews if it means that I pass more cards despite having shorter intervals? That is, instead of 86% monthly young+mature retention rate using Anki SM-2 (or is it 81% monthly retention rate)?, I have 90% retention rate using FSRS. I’m not sure. But I don’t think it’s as simple as just looking at the intervals generated by Anki SM-2 and FSRS.

And of course, this is assuming that we truly do get 90% retention rate using FSRS. It could very well be that it doesn’t give us 90% retention rate.

Also, a few weeks ago, I’ve did some comparisons using Anki Simulator vs FSRS Simulator v1.5.1 (unfortunately the simulator is not updated for the new updates just yet), but I made a simulation where I learn 50 new cards per day for a deck of 20k cards and compared the 2. Anki Simulator predicts that I will have about ~1200 cards to review at the 400th day (the peak), whereas FSRS Simulator predicts that I will have 1000 cards only to review at the 400th day (the peak). This means I actually do 1200/1000 = 1.2 = 20% less cards than Anki SM-2 using FSRS. Now, I’m not sure how accurate these simulators are, but it does seem to suggest that there’s more to it than just looking at the intervals.

Also, FSRS recently added a new feature where it retains your interval when you relapse, compared to Anki SM-2 where it just resets your card to 1 day. The post-lapse stability formula is a bit quite complex and I don’t fully understand it, but this could also potentially save you some reviews. And it’s probably way better than setting your New interval setting in Anki to some constant percentage like 20% or 50%, since it’s more adaptive based on this formula.

I’m honestly not too sure how I feel about the Post-Lapse stability formula since according to Supermemo, they mention that

It has been shown long ago that the length of the first post-lapse optimum [[interval] is best correlated with the number of memory lapses recorded for the item. Even then, post-lapse interval usually oscillates in the range of 1-4 days for the default forgetting index of 10%. The correlation between lapses and the PLS is not very useful in adding to the efficiency of learning. Some competitive spaced repetition software, as well as SuperMemo in its first years, experimented with re-learning hypotheses based on ancient wisdoms of psychology, e.g. by halving intervals after a memory lapse. Current data shows clearly that this approach is harmful, as it slows down the identification of leeches. Such an approach to handling forgotten items is a form of irrational procrastination.

Retaining 50% of your interval is harmful according to their data, as it slows down the identification of leeches, and their optimal post lapse interval usually oscillates in the range of 1-4 days for the default forgetting index of 10%. But perhaps with FSRS post lapse stability formula, it could be beneficial. I’m not sure

Another cool feature that he recently added is dealing with the “ease” hell problem known in anki. This makes FSRS quite appealing, since the solution is baked into his algorithm, compared to Anki, where you have to install Auto Ease Factor or Straight Rewards addon to solve Anki SM-2’s shortcomings. If you suffer from ease hell in Anki, then you actually do more reviews than you should (a bunch of cards stuck at 130% ease, meaning your cards grow really slowly). Hence, we really shouldn’t just be looking at the intervals given by Anki SM-2 and FSRS when pressing only pressing Good. It’s a lot more complicated than that.

Overall, it’ll be nice if we can get more accurate comparisons and analysis between Anki SM-2 and FSRS to convince users why FSRS could be better than SM-2, but I don’t think we have the data yet since it’s in its early stages and needs more users to test. I wonder if it’s possible to create some simulation comparisons between FSRS and Anki SM-2 like in his paper.

3 Likes

True retention only classify cards by interval. The analysis generated by FSRS optimizer is more accurate.

1 Like

It’s possible. I think the current formula of model is stable and not modified frequently. I will update the simulator in few days.

2 Likes

Right, mature cards are cards with >=21 day interval. That’s honestly quite an arbitrary number. And young cards are cards with <21 day interval.

The analysis generated by FSRS optimizer is more accurate.

Are you referring to the new analysis table that you recently added to v3.2.0?

In your other thread, you mentioned

The average interval is coming from Anki SM2 and the delay that you actual reviews.
The average retention is coming from your reviews at those intervals.
if your retention is less than 90%, it means that the default interval is too long for you. If it is bigger than 90%, the interval is too short.

This is very interesting.

Here’s my friend’s pre-training analysis table:

           r_history  avg_interval  avg_retention  stability  factor  \
1                  1           1.0         0.6486     0.2435     inf   
2                  3           1.0         0.9532     2.1974     inf   
7                3,1           1.0         0.8732     0.7780  0.3541   
8                3,2           2.4         0.8828     2.1599  0.9829   
9                3,3           3.0         0.9513     6.3821  2.9044   
30             3,3,1           1.0         0.9407     1.7330  0.2715   
32             3,3,3           6.9         0.9539    15.4016  2.4132   
81           3,3,3,1           1.0         0.9533     2.2030  0.1430   
83           3,3,3,3          16.8         0.9501    35.8667  2.3288   
161        3,3,3,3,1           1.0         0.9512     2.1091  0.0588   
163        3,3,3,3,3          41.8         0.9430    77.7765  2.1685   
256      3,3,3,3,3,1           1.0         0.9617     2.6979  0.0347   
258      3,3,3,3,3,3         104.2         0.9344   166.4570  2.1402   
349    3,3,3,3,3,3,1           1.0         0.9570     2.4300  0.0146   
350    3,3,3,3,3,3,3         226.5         0.9055   242.0319  1.4540   
455  3,3,3,3,3,3,3,1           1.0         0.9328     1.5146  0.0063   

     group_cnt  
1         6891  
2         8218  
7          395  
8          171  
9         7583  
30         372  
32        6978  
81         300  
83        6379  
161        317  
163       5771  
256        313  
258       3977  
349        278  
350       1191  
455        119  

In particular, we can see that pressing Good 6 times for a hypothetical card:

           r_history  avg_interval  avg_retention  stability  factor  \
...
258      3,3,3,3,3,3         104.2         0.9344   166.4570  2.1402   
...

If I understand this table correctly, Anki SM-2 will give him an average interval of 104.2 days, whereas FSRS will suggest an stability of 166.4570 (approximately 166.4570 days that is predicted to give us a 90% retention rate). So there’s a huge increase here for him.

Contrastly, my table:

           r_history  avg_interval  avg_retention  stability  factor  \
1                  1           1.0         0.9223     1.3058     inf   
2                  3           1.0         0.9230     1.3918     inf   
6                3,1           1.1         0.9539     2.5066  1.8010   
7                3,2           2.7         0.8365     1.5963  1.1469   
8                3,3           2.8         0.9474     6.1363  4.4089   
19             3,3,1           1.1         0.9752     4.6229  0.7534   
20             3,3,2           3.7         0.9639     9.4033  1.5324   
21             3,3,3           6.0         0.9778    26.3843  4.2997   
52           3,3,3,2           6.5         0.9555    15.0628  0.5709   
53           3,3,3,3          12.9         0.9643    34.9486  1.3246   
104        3,3,3,3,1           1.0         0.9744     4.0627  0.1162   
105        3,3,3,3,2          16.9         0.8746    11.7997  0.3376   
106        3,3,3,3,3          29.3         0.9398    46.1356  1.3201   
174      3,3,3,3,3,1           1.1         0.9779     4.8879  0.1059   
175      3,3,3,3,3,2          41.5         0.8120    18.0562  0.3914   
176      3,3,3,3,3,3          51.0         0.9252    65.0278  1.4095   
275    3,3,3,3,3,3,3          36.7         0.9645    97.8701  1.5051   
394  3,3,3,3,3,3,3,3          86.1         0.8260    48.1421  0.4919   

     group_cnt  
1        16340  
2         7852  
6          722  
7          706  
8         6235  
19         253  
20         721  
21        4772  
52         277  
53        3615  
104        117  
105        148  
106       2363  
174        131  
175        139  
176       1229  
275        579  
394        107  

In particular, pressing Good 6 times for a hypothetical card

           r_history  avg_interval  avg_retention  stability  factor  \
...
176      3,3,3,3,3,3          51.0         0.9252    65.0278  1.4095  
...

Anki SM-2 will give me an average interval of 51 days for the card, and FSRS will give me a stability of 65.0278 (65.0278 days that is predicted to give us a 90% retention rate).

Interestingly enough, my friend’s data shows that Anki SM-2 intervals is too short for him, and he can have larger intervals using FSRS, since his average retention is above 90%.

On the other hand, my data shows that Anki SM-2 intervals are too large, there are instances where my average retention drops below 90%; particularly

           r_history  avg_interval  avg_retention  stability  factor  \
...
394  3,3,3,3,3,3,3,3          86.1         0.8260    48.1421  0.4919   

This massive drop in retention to 82.60% is quite huge, and I definitely feel like I’m doing more reviews using Anki SM-2 because of that. If my retention was 90%, I wouldn’t be doing as many reviews. FSRS may suggest shorter intervals than SM-2 for me, but I feel like there’s that optimal spot between the interval spacing and retention rate where you do the least amount of reviews. In other words, Anki SM-2’s algorithm, although it gives you large intervals, it could give you more reviews if you’re not actually hitting that 90% retention rate. Conversely, with FSRS, it could give you shorter intervals, but if that means being able to increase your retention rate to 90%, then you potentially might be doing less cards, since you’re not failing so many cards and having to relearn them.

There’s also some things to consider between my friend and me though with how we review our cards. I tend to fail fast, I have an average of 3-4 seconds review time per card. Whereas my friend has an average of 6-9 seconds per card, taking a bit more time to review the cards, which potentially may affect our retention rates, due to how we review our cards differently.

It need the optimizer give an optimal request retention. In my paper, I implement it in C++ because it is too complicated and Python is too inefficent to do it. I will improve the performance of code in the future.

1 Like

Oh that’s interesting. With all the math libraries available, including ones written in CPython I had assumed Python would be more than up to number crunching tasks like this.

Is the Google Colab optimizer for fsrs4anki still behind the capability of your C++ one?

The Google Colab optimizer doesn’t include the module. You can see the module at:

Hi! just wondering, it’s the correct solution for those of us doing reviews on ankidroid to do the reviews on the phone, and then once a day use the “Reschedule Cards” option from the FSRS4Anki Helper addon?

Would that work?

Yes it works, but they are not fully consistent.

Are the results produced by fsrs.js similar to the outcomes of this Anki build? I have integrated the .js
module as part of a macro on a spreadsheet that I am using to run through bulk data.

I am guessing the .js module does not include the features of the optimizer here.

The default parameters generate result similar to the built-in schedule.

Here is a basic comparsion:


Rating sequence: 1,3,3,3,3,3,3,3,3,3

Anki’s intervals: 1,3,8,20,50,125,313,783,1958,4895

FSRS’s intervals: 1,3,8,19,44,98,207,421,822,1550


Rating sequence: 2,3,3,3,3,3,3,3,3,3

Anki’s intervals: 1,3,8,20,50,125,313,783,1958,4895

FSRS’s intervals: 2,5,13,32,73,159,331,659,1263,2338


Rating sequence: 3,3,3,3,3,3,3,3,3,3

Anki’s intervals: 1,3,8,20,50,125,313,783,1958,4895

FSRS’s intervals: 3,8,21,50,114,245,501,983,1854,3377


Rating sequence: 4,3,3,3,3,3,3,3,3,3

Anki’s intervals: 4,10,25,63,158,395,988,2470,6175,15438

FSRS’s intervals: 4,11,29,70,158,337,684,1329,2481,4471

I use 18 different deck configurations.
Should the optimization be done in a universion way or for each 1 of the 18?

If your goals are significantly different in different decks, to optimize parameters for each one is better.

1 Like

This is very helpful, thank you. Just to be clear - you are comparing fsrs4anki to fsrs.js? Or anki to fsrs?

anki to fsrs