Pass/Fail Grading as Default

Expertium · August 1, 2024, 5:57pm

Last time I did the following: I took data from the FSRS Anki 20k dataset, the largest publicly available dataset with spaced repetition data. Then I put people either in the “four button users” category or in the “two button users” category, based on how often they use Hard and Easy. I did this for many diferent thresholds, in other words, I varied what exactly counts as “using Hard and Easy a lot”.
Turns out, FSRS is more accurate for 2 button users.

While that analysis wasn’t bad, it had 2 caveats:

It’s hard to use 2 buttons inconsistently, but much easier to use 4 buttons inconsistently. What about 2 buttons users vs consistent 4 button users? Would the conclusion be different?
What if the choice of metric (log loss or RMSE) affects the conclusion?

In order to address 1, I made this survey: Button usage

The important part is this:

Consistent 4 button users were asked to submit their collections.

Initially, I was planning to get data from both 2 button users and 4 button users, but I didn’t get enough data from 2-button users, so I’ll just have to get it from FSRS Anki 20k. Anyone who uses Good+Again >95% of the time (and therefore uses Hard+Easy <5% of the time) counts as a 2 button user. As for 4 button users, I got 45 collections from my survey. 5 weren’t usable because the user didn’t select “Support older Anki versions” when exporting the collection. So that’s 40 in total; but 1 wasn’t processed by the optimizer for some reason, maybe due to a small number of reviews. That left me with 39 collections.

So I ran FSRS-4.5 on 3035 collections of 2 button users and on 39 collections of consistent 4 button users, and recorded the values of RMSE and log loss. The table shows their average values.

The difference is not statistically significant for either metric, though I suspect that this is due to a small sample size of consistent 4 button users. Of course, it would be better if I had hundreds or thousands of collections from consistent 4 button users, but I can’t do much better than that. Surveing people on r/Anki and on Discord can only get me so far.

TLDR: if you are wondering “Should I change my opinion more towards “4 buttons are better than 2” or towards the opposite?”, the answer is neither. The results are inconclusive.

Topic		Replies	Views
Add setting to remove "easy" and "hard" button Suggestions	21	454	June 4, 2025
How to prevent users from misusing Hard? Ideas are welcome Suggestions	207	2534	July 30, 2025
On the answer side, I need some reference information to help me decide which button to press Help	16	259	May 16, 2025
Anki 23.10 Beta 1-4 Beta Testing	206	8917	October 18, 2023
Hide time until next review by default Suggestions	24	415	June 10, 2024

Pass/Fail Grading as Default

Related topics