I am not able to access the paper, can you comment on if the answer is ‘yes’ or ‘no’? I would to experiment with this algorithm for other topics as well, if it is a good fit.
You can gain access to the paper by using @L.M.Sherlock 's link https://www.maimemo.com/paper/
Thank you, this was a great help. After reading the paper, it seems that their dataset is specifically language learning so scientifically speaking there is no basis to comment on other topic. However, I no see a reason why this can’t be applied to diverse subject matter as the effect of future iterations will be marginal on subject matter - at the end of the day, the use case for this is a scheduling tool for reviewing material to learn. How do you see it?
I assume that it could be used in Anki with various learning stuffs, and I am working on that.
Doesn’t @dae technically have access to a larger dataset of anyone who uses AnkiWeb? Maybe ask if he can share that with you for analysis.
This is my one doubt about this new algorithm. The work @L.M.Sherlock is doing is fantastic and I’m super appreciative, but I do have a concern about how well this will adapt outside of language learning. I’m hoping the optimizer takes care of that.
To be specific I’ve always taken the Anki algorithm/SM-2 to be inefficient for my use case for SRS, i.e. it results in too many reviews and gives me an excessive workload. So I was surprised to see FSRS giving even short intervals in many cases, resulting in more reviews. The SM-18 algorithm by contrast generally results in much longer intervals and thus fewer reviews.
I would assume that’s due to large differences in SRS use between different people, and the datasets used to develop the algorithms. From what I understand Wozniak used much smaller datasets that were basically hand picked - his own, and other experienced users of SueprMemo. By the nature of SuperMemo, and the emphasis in that community, people are generally not using it for rote memorization of asemantic information, but rather for retaining things that have already been understood, learned, are building upon current existing knowledge in a semantic way. Along with an emphasis on high quality card formulation.
By contrast the average SRS in most apps is not as skilled, and the most common use case is rote memorization of the new vocabulary in a foreign language - something that is not very semantic until 1) you’re getting quite good in the language, 2) you get good at card formulation for language learning. So again, if we’re taking huge datasets, the average person just isn’t going to be that great. Using SRS well is a skill developed over time and with effort.
Of course this difference in approach probably leads to SM-18 pretty bad for retention for a brand new SRS user try to rote memorize vocab in a new language, and SM-2/FSRS being pretty inefficient in terms of excess reviews for a highly skilled SRS user trying to retain semantic knowledge. The advantage of SM-2/FSRS I guess is that they cater for more users, in that they still fulfil their intended purpose of retention, even if they are inefficient for certain users, whereas SM-18 simply won’t provide the intended retention for an unskilled user. That does provide a forcing function though, and is why those that stick with SuperMemo tend to become highly skilled SRS users.
Both SM-18 and FSRS adapt to the user though, so the big question in my mind is how quickly and how well do they adapt for a user for which the initial algorithm is far from optimal.
I am developing a new feature to analyze the review logs in a more explainable way. The short/long intervals would be more acceptable if you see the analysis of your own reviews.
To be specific I’ve always taken the Anki algorithm/SM-2 to be inefficient for my use case for SRS, i.e. it results in too many reviews and gives me an excessive workload. So I was surprised to see FSRS giving even short intervals in many cases, resulting in more reviews. The SM-18 algorithm by contrast generally results in much longer intervals and thus fewer reviews.
Is it really that simple to just look at the intervals given by the scheduler to determine if it results in more reviews or not? And hence less workload? I feel like that’s overly simplistic and I also had that assumption when I was first trying to compare the Anki SM-2 vs FSRS scheduler. However, when I thought about it more, I feel like it’s much more complicated than that.
For example, if you’re using the default Anki settings (250% ease factor, interval modifier is 100%, learning steps are 1min 10min, relearning step is 1 minute, and your new interval after a lapse is 0% (meaning it will set back the interval to 1 day when you fail the card and pass it)), and assuming you’re always pressing Good, then your intervals will be
1,3,8,20,50,125,313,783,1958,4895
since the formula is New interval = old interval * interval modifier * ease factor = old interval * 1 * 2.5 = old interval * 2.5
This kind of analysis completely ignores the user’s retention rate. What if the user’s retention rate is below 90%? Like 80%? Or even 70%? That means they’ll be pressing Again and hence doing more reviews in addition to their daily reviews.
I personally don’t know how to calculate this, and not really sure if this is correct, but let’s assume that we’re trying to target 90% retention rate and the user’s current retention rate is 80% using Anki SM-2. Then that means that we’re forgetting an additional 10% of cards from our targeted retention rate (90%). And let’s assume we have 400 cards to review each day. That means that 10% of 400 of these review cards will be forgotten (we press the Again button), which is an additional 40 cards that you forgot from that targeted 90% retention rate. And your relearning step is also set to 10 minutes, meaning that when you press Again to fail the card, it’ll show the card again in 10 minutes. If we assume 100% retention rate for relearning these cards, meaning you press Good on all of these 40 cards, then that means you have reviewed an additional 40 cards for that day from the targeted 90% retention rate.
It’s quite unrealistic to assume 100% retention rate for relearning lapsed cards. So let’s assume we have 90% retention rate for lapsed cards instead. Using similar logic above, for 400 cards to review, and we forget an additional 10% of cards from our targeted 90% retention rate, then we fail 40 of these cards. And when we relearn these cards, from our assumption, 90% of the lapsed cards that we relearned will move into the next day, which is 36 cards. But now, 10% of those 40 cards (4 cards), are forgotten and hence we need to assume that we press Again, and finally press Good again. This means that we did an additional 4 cards since we forgot 10% of the relearned cards. So in total, this is 44 additional cards that we needed to review from the targeted 90% retention rate. It can be even more complicated than that, but this is just an example to consider.
Now, if FSRS truly does give you 90% retention rate for those intervals that it suggests; despite giving shorter intervals, then it could very well be that you do less reviews compared to Anki SM-2, since you’re not forgetting as many cards and hence not having to relearn as many cards.
IMO, it’s really hard to compare the 2 algorithms just by looking at the intervals that it gives. It’s really important to consider a bunch of things, such as the retention rate.
Of course, if you assume that a user that has 90% retention rate using Anki SM-2, then that logic above won’t apply, and they probably won’t have a reason to switch over.
However, I wouldn’t overlook FSRS just yet. For some analysis, I’ve got my friend who’s learning Japanese with about ~11k cards in his deck who has 88.7% young monthly retention rate, 87.8% mature monthly retention rate, and 88.4% young+mature monthly retention rate:
Here are the trained parameters and results from FSRS Optimizer v3.2.0
var w = [0.2566, 0.9675, 4.9943, -0.9537, -0.6965, 0.0093, 1.4051, -0.0352, 0.7127, 1.6794, -0.5165, 0.7874, 0.124];
1:again, 2:hard, 3:good, 4:easy
first rating: 1
rating history: 1,3,3,3,3,3,3,3,3,3,3
interval history: 0,1,1,3,6,13,28,60,126,261,536
difficulty history: 0,6.9,6.9,6.9,6.8,6.8,6.8,6.8,6.8,6.8,6.7
first rating: 2
rating history: 2,3,3,3,3,3,3,3,3,3,3
interval history: 0,1,3,7,17,41,96,222,504,1128,2488
difficulty history: 0,5.9,5.9,5.9,5.9,5.9,5.9,5.9,5.9,5.9,5.9
first rating: 3
rating history: 3,3,3,3,3,3,3,3,3,3,3
interval history: 0,2,6,16,42,109,276,685,1669,3993,9384
difficulty history: 0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0
first rating: 4
rating history: 4,3,3,3,3,3,3,3,3,3,3
interval history: 0,3,9,27,77,215,585,1557,4053,10328,25782
difficulty history: 0,4.0,4.0,4.1,4.1,4.1,4.1,4.1,4.1,4.1,4.1
You’ll notice that if you just press Good only, the change in intervals suggested by FSRS is
0 -> 2 = N/A
2 -> 6 = 6/2 = 300%
6 -> 16 = 16/6 = 266.67%
16 -> 42 = 42/16 = 262.5%
42 -> 109 = 109/42 = 259.5%
109 -> 276 = 276/109 = 253.21%
276 -> 685 = 685/276 = 248.19%
685 -> 1669 = 1669/685 = 243.65%
...
It actually gives larger intervals compared to Anki SM-2 where it is just a change in interval of 250%
So he has 88.4% young+mature monthly retention rate using Anki SM-2, but FSRS suggests giving him larger intervals at the beginning to target 90% retention rate, before it starts to drop below a change in interval of 250%. Which is interesting, because you’d expect him to be given more reviews to increase it from 88.4% to 90%. Perhaps this suggests that we shouldn’t just look at pressing Good only when doing the comparison between the 2 schedulers. There’s more factors involved to consider.
Now, for another comparison, here’s my own personal results, on about ~19k Japanese cards as well.
My monthly young retention rate using Anki SM-2 is 88.2%, monthly mature retention rate is 81.8%, and young+mature monthly retention rate is 86.2%.
Here’s my results after training using FSRS optimizer v3.2.0
var w = [1.2879, 0.5135, 5.1439, -1.4261, -1.0481, 0.0074, 1.337, -0.029, 0.7063, 1.8312, -0.405, 0.7284, 0.5238];
1:again, 2:hard, 3:good, 4:easy
first rating: 1
rating history: 1,3,3,3,3,3,3,3,3,3,3
interval history: 0,1,2,4,7,13,23,42,75,133,236
difficulty history: 0,8.0,8.0,8.0,7.9,7.9,7.9,7.9,7.9,7.8,7.8
first rating: 2
rating history: 2,3,3,3,3,3,3,3,3,3,3
interval history: 0,2,4,9,19,41,87,183,380,783,1596
difficulty history: 0,6.6,6.6,6.5,6.5,6.5,6.5,6.5,6.5,6.5,6.5
first rating: 3
rating history: 3,3,3,3,3,3,3,3,3,3,3
interval history: 0,2,6,15,37,92,224,536,1265,2943,6754
difficulty history: 0,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.1,5.1
first rating: 4
rating history: 4,3,3,3,3,3,3,3,3,3,3
interval history: 0,3,9,26,74,205,559,1496,3927,10126,25655
difficulty history: 0,3.7,3.7,3.7,3.7,3.8,3.8,3.8,3.8,3.8,3.8
You can see that the change in interval when pressing Good only is
0 -> 2 = N/A
2 -> 6 = 300%
6 -> 15 = 250%
15 -> 37 = 246.67%
37 -> 92 = 248.65%
92 -> 224 = 243.48%
224 -> 536 = 239.29%
536 -> 1265 = 236%
...
These intervals suggest that I need to review the cards “more often” to increase my retention rate from 86% monthly young+mature retention rate to 90% (or maybe I should look at the 81% monthly mature retention rate?). But again, is it really more reviews if it means that I pass more cards despite having shorter intervals? That is, instead of 86% monthly young+mature retention rate using Anki SM-2 (or is it 81% monthly retention rate)?, I have 90% retention rate using FSRS. I’m not sure. But I don’t think it’s as simple as just looking at the intervals generated by Anki SM-2 and FSRS.
And of course, this is assuming that we truly do get 90% retention rate using FSRS. It could very well be that it doesn’t give us 90% retention rate.
Also, a few weeks ago, I’ve did some comparisons using Anki Simulator vs FSRS Simulator v1.5.1 (unfortunately the simulator is not updated for the new updates just yet), but I made a simulation where I learn 50 new cards per day for a deck of 20k cards and compared the 2. Anki Simulator predicts that I will have about ~1200 cards to review at the 400th day (the peak), whereas FSRS Simulator predicts that I will have 1000 cards only to review at the 400th day (the peak). This means I actually do 1200/1000 = 1.2 = 20% less cards than Anki SM-2 using FSRS. Now, I’m not sure how accurate these simulators are, but it does seem to suggest that there’s more to it than just looking at the intervals.
Also, FSRS recently added a new feature where it retains your interval when you relapse, compared to Anki SM-2 where it just resets your card to 1 day. The post-lapse stability formula is a bit quite complex and I don’t fully understand it, but this could also potentially save you some reviews. And it’s probably way better than setting your New interval
setting in Anki to some constant percentage like 20% or 50%, since it’s more adaptive based on this formula.
I’m honestly not too sure how I feel about the Post-Lapse stability formula since according to Supermemo, they mention that
It has been shown long ago that the length of the first post-lapse optimum [[interval] is best correlated with the number of memory lapses recorded for the item. Even then, post-lapse interval usually oscillates in the range of 1-4 days for the default forgetting index of 10%. The correlation between lapses and the PLS is not very useful in adding to the efficiency of learning. Some competitive spaced repetition software, as well as SuperMemo in its first years, experimented with re-learning hypotheses based on ancient wisdoms of psychology, e.g. by halving intervals after a memory lapse. Current data shows clearly that this approach is harmful, as it slows down the identification of leeches. Such an approach to handling forgotten items is a form of irrational procrastination.
Retaining 50% of your interval is harmful according to their data, as it slows down the identification of leeches, and their optimal post lapse interval usually oscillates in the range of 1-4 days for the default forgetting index of 10%. But perhaps with FSRS post lapse stability formula, it could be beneficial. I’m not sure
Another cool feature that he recently added is dealing with the “ease” hell problem known in anki. This makes FSRS quite appealing, since the solution is baked into his algorithm, compared to Anki, where you have to install Auto Ease Factor or Straight Rewards addon to solve Anki SM-2’s shortcomings. If you suffer from ease hell in Anki, then you actually do more reviews than you should (a bunch of cards stuck at 130% ease, meaning your cards grow really slowly). Hence, we really shouldn’t just be looking at the intervals given by Anki SM-2 and FSRS when pressing only pressing Good. It’s a lot more complicated than that.
Overall, it’ll be nice if we can get more accurate comparisons and analysis between Anki SM-2 and FSRS to convince users why FSRS could be better than SM-2, but I don’t think we have the data yet since it’s in its early stages and needs more users to test. I wonder if it’s possible to create some simulation comparisons between FSRS and Anki SM-2 like in his paper.
True retention only classify cards by interval. The analysis generated by FSRS optimizer is more accurate.
It’s possible. I think the current formula of model is stable and not modified frequently. I will update the simulator in few days.
Right, mature cards are cards with >=21 day
interval. That’s honestly quite an arbitrary number. And young cards are cards with <21 day
interval.
The analysis generated by FSRS optimizer is more accurate.
Are you referring to the new analysis table that you recently added to v3.2.0?
In your other thread, you mentioned
The average interval is coming from Anki SM2 and the delay that you actual reviews.
The average retention is coming from your reviews at those intervals.
if your retention is less than 90%, it means that the default interval is too long for you. If it is bigger than 90%, the interval is too short.
This is very interesting.
Here’s my friend’s pre-training analysis table:
r_history avg_interval avg_retention stability factor \
1 1 1.0 0.6486 0.2435 inf
2 3 1.0 0.9532 2.1974 inf
7 3,1 1.0 0.8732 0.7780 0.3541
8 3,2 2.4 0.8828 2.1599 0.9829
9 3,3 3.0 0.9513 6.3821 2.9044
30 3,3,1 1.0 0.9407 1.7330 0.2715
32 3,3,3 6.9 0.9539 15.4016 2.4132
81 3,3,3,1 1.0 0.9533 2.2030 0.1430
83 3,3,3,3 16.8 0.9501 35.8667 2.3288
161 3,3,3,3,1 1.0 0.9512 2.1091 0.0588
163 3,3,3,3,3 41.8 0.9430 77.7765 2.1685
256 3,3,3,3,3,1 1.0 0.9617 2.6979 0.0347
258 3,3,3,3,3,3 104.2 0.9344 166.4570 2.1402
349 3,3,3,3,3,3,1 1.0 0.9570 2.4300 0.0146
350 3,3,3,3,3,3,3 226.5 0.9055 242.0319 1.4540
455 3,3,3,3,3,3,3,1 1.0 0.9328 1.5146 0.0063
group_cnt
1 6891
2 8218
7 395
8 171
9 7583
30 372
32 6978
81 300
83 6379
161 317
163 5771
256 313
258 3977
349 278
350 1191
455 119
In particular, we can see that pressing Good 6 times for a hypothetical card:
r_history avg_interval avg_retention stability factor \
...
258 3,3,3,3,3,3 104.2 0.9344 166.4570 2.1402
...
If I understand this table correctly, Anki SM-2 will give him an average interval of 104.2 days, whereas FSRS will suggest an stability of 166.4570 (approximately 166.4570 days that is predicted to give us a 90% retention rate). So there’s a huge increase here for him.
Contrastly, my table:
r_history avg_interval avg_retention stability factor \
1 1 1.0 0.9223 1.3058 inf
2 3 1.0 0.9230 1.3918 inf
6 3,1 1.1 0.9539 2.5066 1.8010
7 3,2 2.7 0.8365 1.5963 1.1469
8 3,3 2.8 0.9474 6.1363 4.4089
19 3,3,1 1.1 0.9752 4.6229 0.7534
20 3,3,2 3.7 0.9639 9.4033 1.5324
21 3,3,3 6.0 0.9778 26.3843 4.2997
52 3,3,3,2 6.5 0.9555 15.0628 0.5709
53 3,3,3,3 12.9 0.9643 34.9486 1.3246
104 3,3,3,3,1 1.0 0.9744 4.0627 0.1162
105 3,3,3,3,2 16.9 0.8746 11.7997 0.3376
106 3,3,3,3,3 29.3 0.9398 46.1356 1.3201
174 3,3,3,3,3,1 1.1 0.9779 4.8879 0.1059
175 3,3,3,3,3,2 41.5 0.8120 18.0562 0.3914
176 3,3,3,3,3,3 51.0 0.9252 65.0278 1.4095
275 3,3,3,3,3,3,3 36.7 0.9645 97.8701 1.5051
394 3,3,3,3,3,3,3,3 86.1 0.8260 48.1421 0.4919
group_cnt
1 16340
2 7852
6 722
7 706
8 6235
19 253
20 721
21 4772
52 277
53 3615
104 117
105 148
106 2363
174 131
175 139
176 1229
275 579
394 107
In particular, pressing Good 6 times for a hypothetical card
r_history avg_interval avg_retention stability factor \
...
176 3,3,3,3,3,3 51.0 0.9252 65.0278 1.4095
...
Anki SM-2 will give me an average interval of 51 days for the card, and FSRS will give me a stability of 65.0278 (65.0278 days that is predicted to give us a 90% retention rate).
Interestingly enough, my friend’s data shows that Anki SM-2 intervals is too short for him, and he can have larger intervals using FSRS, since his average retention is above 90%.
On the other hand, my data shows that Anki SM-2 intervals are too large, there are instances where my average retention drops below 90%; particularly
r_history avg_interval avg_retention stability factor \
...
394 3,3,3,3,3,3,3,3 86.1 0.8260 48.1421 0.4919
This massive drop in retention to 82.60% is quite huge, and I definitely feel like I’m doing more reviews using Anki SM-2 because of that. If my retention was 90%, I wouldn’t be doing as many reviews. FSRS may suggest shorter intervals than SM-2 for me, but I feel like there’s that optimal spot between the interval spacing and retention rate where you do the least amount of reviews. In other words, Anki SM-2’s algorithm, although it gives you large intervals, it could give you more reviews if you’re not actually hitting that 90% retention rate. Conversely, with FSRS, it could give you shorter intervals, but if that means being able to increase your retention rate to 90%, then you potentially might be doing less cards, since you’re not failing so many cards and having to relearn them.
There’s also some things to consider between my friend and me though with how we review our cards. I tend to fail fast, I have an average of 3-4 seconds review time per card. Whereas my friend has an average of 6-9 seconds per card, taking a bit more time to review the cards, which potentially may affect our retention rates, due to how we review our cards differently.
It need the optimizer give an optimal request retention. In my paper, I implement it in C++ because it is too complicated and Python is too inefficent to do it. I will improve the performance of code in the future.
Oh that’s interesting. With all the math libraries available, including ones written in CPython I had assumed Python would be more than up to number crunching tasks like this.
Is the Google Colab optimizer for fsrs4anki still behind the capability of your C++ one?
The Google Colab optimizer doesn’t include the module. You can see the module at:
Hi! just wondering, it’s the correct solution for those of us doing reviews on ankidroid to do the reviews on the phone, and then once a day use the “Reschedule Cards” option from the FSRS4Anki Helper addon?
Would that work?
Yes it works, but they are not fully consistent.
Are the results produced by fsrs.js similar to the outcomes of this Anki build? I have integrated the .js
module as part of a macro on a spreadsheet that I am using to run through bulk data.
I am guessing the .js module does not include the features of the optimizer here.
The default parameters generate result similar to the built-in schedule.
Here is a basic comparsion:
Rating sequence: 1,3,3,3,3,3,3,3,3,3
Anki’s intervals: 1,3,8,20,50,125,313,783,1958,4895
FSRS’s intervals: 1,3,8,19,44,98,207,421,822,1550
Rating sequence: 2,3,3,3,3,3,3,3,3,3
Anki’s intervals: 1,3,8,20,50,125,313,783,1958,4895
FSRS’s intervals: 2,5,13,32,73,159,331,659,1263,2338
Rating sequence: 3,3,3,3,3,3,3,3,3,3
Anki’s intervals: 1,3,8,20,50,125,313,783,1958,4895
FSRS’s intervals: 3,8,21,50,114,245,501,983,1854,3377
Rating sequence: 4,3,3,3,3,3,3,3,3,3
Anki’s intervals: 4,10,25,63,158,395,988,2470,6175,15438
FSRS’s intervals: 4,11,29,70,158,337,684,1329,2481,4471
I use 18 different deck configurations.
Should the optimization be done in a universion way or for each 1 of the 18?
If your goals are significantly different in different decks, to optimize parameters for each one is better.