Pass/Fail Grading as Default

Last time I did the following: I took data from the FSRS Anki 20k dataset, the largest publicly available dataset with spaced repetition data. Then I put people either in the “four button users” category or in the “two button users” category, based on how often they use Hard and Easy. I did this for many diferent thresholds, in other words, I varied what exactly counts as “using Hard and Easy a lot”.
Turns out, FSRS is more accurate for 2 button users.

While that analysis wasn’t bad, it had 2 caveats:

  1. It’s hard to use 2 buttons inconsistently, but much easier to use 4 buttons inconsistently. What about 2 buttons users vs consistent 4 button users? Would the conclusion be different?
  2. What if the choice of metric (log loss or RMSE) affects the conclusion?

In order to address 1, I made this survey: Button usage

The important part is this:

Consistent 4 button users were asked to submit their collections.

Initially, I was planning to get data from both 2 button users and 4 button users, but I didn’t get enough data from 2-button users, so I’ll just have to get it from FSRS Anki 20k. Anyone who uses Good+Again >95% of the time (and therefore uses Hard+Easy <5% of the time) counts as a 2 button user. As for 4 button users, I got 45 collections from my survey. 5 weren’t usable because the user didn’t select “Support older Anki versions” when exporting the collection. So that’s 40 in total; but 1 wasn’t processed by the optimizer for some reason, maybe due to a small number of reviews. That left me with 39 collections.

So I ran FSRS-4.5 on 3035 collections of 2 button users and on 39 collections of consistent 4 button users, and recorded the values of RMSE and log loss. The table shows their average values.

The difference is not statistically significant for either metric, though I suspect that this is due to a small sample size of consistent 4 button users. Of course, it would be better if I had hundreds or thousands of collections from consistent 4 button users, but I can’t do much better than that. Surveing people on r/Anki and on Discord can only get me so far.

TLDR: if you are wondering “Should I change my opinion more towards “4 buttons are better than 2” or towards the opposite?”, the answer is neither. The results are inconclusive.

7 Likes