Nice comparison, thanks for putting this together.
Would be really interesting to be able to get SM-18 into the comparison, but obviously not possible since it’s closed source.
It’s not possible to get much hard data on SM-18, but Guillem Palau, who was a heavy Anki user for 11+ years has mentioned that he’s ended up with slightly higher retention from SM-18 but with roughly half the repetitions of Anki SM-2.
This simulator is a great idea. I’ve been using a custom scheduler based on a speculation of how a certain app works, but could never get the retention above 90%. This simulator has helped me work out some quirks. Thanks.
FSRS looks promising. However, it requires the right weights for each study subject. This means more tuning and customization from the user’s side which has always been a headache for those new to SRS. I would not advice users to go above 20% FI (below 89.62% retention). This has proven to harm learning leading to problems down the line. It was written extensively in SM blogs.
[quote]
Setting the forgetting index above 20% would be like giving up SuperMemo altogether and coming back to remembering only that what is easy to remember…Nevertheless, if you want to maximize the speed of learning with little control over what actually stays in your memory, set the forgetting index to 20%[/quote]
I think if we have more shared profiles, we could analysis these in depth. Maybe create a tool to wipe the card data and export them for anonymous sharing? I didn’t want to take up the task myself, any brave souls? I’ll leave you with some screenshots comparing different algos, it took forever to generate these. (Seems like I’m limited to 5 images.)
I’ve been using a custom scheduler based on a speculation of how a certain app works, but could never get the retention above 90%
Are you able to share the details of the custom scheduler that you’re referring to here?
These graphs that you shared are interesting. I never heard of the “kensho” algorithm before (I can’t find anything on google either). It seems strange that it’s able to maintain a near 95% retention with the lowest amount of repetitions compared to all algorithms. Generally, the higher the retention rate, the more reviews you need to do (smaller intervals), so how is this possible?
It only counts the retention on reviewed cards per day. If an algorithm only schedule the review on a small set of cards, the retention could be higher. To avoid this cheating, I will add a figure to show the retention on all cards.
Sorry for the misunderstanding. Kensho is a code name for a scheduler I made for personal use. Nothing published. It’s a no brainer which app I was referring to.
Your link above still points to the old version using google colaboratory.
Calling it cheating is a bit rude don’t you think, the goal is to get to the cause of the problem. It would be nice if we can use a common apkg file for these tests. Can you provide a link to an apkg that I can use? I don’t really trust this collection. It’s full of old scheduling data, so it might be throwing off your simulator.
Full disclaimer: For the images below, I took the changes for “expected_memorization_per_day” and patched it to the old version v3.3.2 for this test. It would be too much work otherwise.
That’s probably because I was messing around with different startup intervals and forgot to change it back for this test?
The result is less pronounced on a regular user’s collection, so it’s definitely something in my collection that’s giving the algo an advantage. I’ll dig into it later. This is addicting, but also mentally distracting.
Are you interested in making your scheduler algorithm open source for others to see and provide feedback? There’s not going to be much discussion here if all we can see is just the graphs
To my surprise I didn’t find the same results you did on your simulation. FSRS did managed to stick to the target retention rate (90%), but it gave so much more repetitions that I’m not sure if it was worth it.
Here are the results with the 0.9 retention:
Settings the target retention to 85% did result in a better performance from FSRS. But what does this means? If I’m targeting 90% retention should I just stick with default anki?
I played around with the interval modifier until SM-2 was giving a similar retention to FSRS and FSRS did achieve it in less repetitions. Sorry for the confusion!