Pass/Fail Grading as Default

Oh true. You’re right. So probably “better” knowledge acquisition rate.

Edit: Haha this is the 100th reply under this topic.

Edit2: Actually, the reaction times probably will be much lesser for two button users than for 4 button users.

@dae sorry I’m pinging you again and again but what do you think of this suggestion. This has gotten very long I think so let me quote your initial argument against this,

but this might not be true anymore.

Hey @Expertium of DASH doesn’t differentiate between Hard/Good/Easy how is RMSE different for 2 button users/4 button users? Am I missing something?

That’s a good question. I guess because the intervals of 4 button users are more heterogeneous.

1 Like

I think I forgot the fact that you were doing the analysis on SM2 users’ data. Thus the confusion :sweat_smile:

Also, if you have time, and you don’t think this is redundant, can you actually find the answer to this question from L.M.Sherlock,

Do 2-button users spend less time to remember more cards than 4-button users?

Preferably for users with at least a year of data, or two years like last time albeit 6K users would be a very small sample size. Let me know if that would that be possible for you.

I’m not sure how to measure it

Can you calculate the total time spent?

Is not knowledge:workload ratio a viable way of doing that? AFAIK knowledge is just total stuff learnt and workload is total time spent. The average 2-button users will have higher ratio than the average 4-button user if 2 button actually is better.

Edit: It was workload:knowledge.

Yes, but that would be difficult to do. @L.M.Sherlock help would be appreciated

Don’t cards just introduced have smaller workload than old cards?

2-button users could have lots of 4444 cards technically indistinguishable from 2222 cards.
And then they’ll switch between the two modes, and FSRS will be confused about Hard and Easy?

Yes. We will be looking at total workload though.

2=Hard? 4=Easy? But even then I don’t get what you are trying to say. Can you rephrase sorry.

A card always rated Easy will be reviewed maybe 4 times. A card always rated Hard will be reviewed much more. With two buttons, the only difference will be a Fail or two.
In a way, an extremely easy card is a waste of time, and rating it Easy removes it. But can you trust yourself to decide to remove an overly easy card?
Although an easy card will probably be rated Good more often.

https://docs.ankiweb.net/studying.html?highlight=easiest#review-cards

Because ‘Easy’ rapidly increases the delay, it’s best used for only the easiest of cards. Usually you should find yourself answering ‘Good’ instead.

It seems to me that the Easy button adds less decision and Undo overhead than the Hard button, because I don’t think about pressing it often, unless the card is annoyingly easy. Otherwise I think of using it after I encountered the knowledge outside the review process.

What you said initially describes 4 button users. 2-button users refer to people who use Hard+Easy buttons less than a certain threshold. That was the original definition.

The reaction times are also similar. Easy has the lowest average reaction time, followed by Good, then Hard. Hard is hard-to-press it seems.

I actually expect the average reaction time for 4 button users to be more than 2 button users because they’re dithering about which one among Easy/Good/Hard to use while our 2-button user has already moved on.

As far as I’m aware, the graphs above are only attempting to answer the question “Is FSRS less accurately able to predict the recall of a 4 button user?”. That’s only part of the puzzle, and what would be more interesting to show is if hinting that some cards are hard or easy to the scheduler is worth the downsides of slower reviews, reduced RMSE, etc.

3 Likes

If it doesn’t make the scheduler do better retrievability prediction then I’m not sure Hard and Easy are doing anything beneficial here. I am only countering what you said - that it would hamper the scheduler’s performance. Is there anything else a scheduler does other than showing you the cards at the correct time (in this case, when R falls to a particular value). Also to quote you,

This is from 2020 by the way.

In any case, I saw L.M.Sherlock creating a repo called Anki button usage so he might be working on something here.

(@Expertium I think I misquoted but here you go. I wanted to reply to this)

2 Likes

https://forms.gle/FB8iZuq36fWg9WULA
Hey everyone, here’s a survey. Depending on your answers, you may be asked to upload your Anki collection. Don’t worry if you’ve never done that before, the survey has a simple guide with extra steps for users who are concerned about privacy.
This is important, so I’d love to get as many respondents as possible.

3 Likes

can you disable collecting names&emails?

2 Likes

Nope, it has to be activated to collect files.

1 Like

Last time I did the following: I took data from the FSRS Anki 20k dataset, the largest publicly available dataset with spaced repetition data. Then I put people either in the “four button users” category or in the “two button users” category, based on how often they use Hard and Easy. I did this for many diferent thresholds, in other words, I varied what exactly counts as “using Hard and Easy a lot”.
Turns out, FSRS is more accurate for 2 button users.

While that analysis wasn’t bad, it had 2 caveats:

  1. It’s hard to use 2 buttons inconsistently, but much easier to use 4 buttons inconsistently. What about 2 buttons users vs consistent 4 button users? Would the conclusion be different?
  2. What if the choice of metric (log loss or RMSE) affects the conclusion?

In order to address 1, I made this survey: Button usage

The important part is this:

Consistent 4 button users were asked to submit their collections.

Initially, I was planning to get data from both 2 button users and 4 button users, but I didn’t get enough data from 2-button users, so I’ll just have to get it from FSRS Anki 20k. Anyone who uses Good+Again >95% of the time (and therefore uses Hard+Easy <5% of the time) counts as a 2 button user. As for 4 button users, I got 45 collections from my survey. 5 weren’t usable because the user didn’t select “Support older Anki versions” when exporting the collection. So that’s 40 in total; but 1 wasn’t processed by the optimizer for some reason, maybe due to a small number of reviews. That left me with 39 collections.

So I ran FSRS-4.5 on 3035 collections of 2 button users and on 39 collections of consistent 4 button users, and recorded the values of RMSE and log loss. The table shows their average values.

The difference is not statistically significant for either metric, though I suspect that this is due to a small sample size of consistent 4 button users. Of course, it would be better if I had hundreds or thousands of collections from consistent 4 button users, but I can’t do much better than that. Surveing people on r/Anki and on Discord can only get me so far.

TLDR: if you are wondering “Should I change my opinion more towards “4 buttons are better than 2” or towards the opposite?”, the answer is neither. The results are inconclusive.

7 Likes

The OP started this thread by saying mainly two things:

four-tier grading is anything but intuitive and means you have to make up personal rules on how to grade your memory that you may or may not follow consistently

makes your reviews take more time to judge

(emphasis mine)

Dae had a minor objection which went along the lines of:

I suspect it would somewhat hamper the scheduler’s performance, FSRS or no.

This was further corroborated by LMSherlock. “Using hard and easy could help FSRS schedule your cards more accurately.”

It was then, when the point about how FSRS performs for 2 button users vs 4 button users was brought up.

Now the point is, when I look at the survey it seems a third of all 4 button participants are sure of their inconsistency! And some do not even know whether they are inconsistent or not. This was exactly the point brought up by trashmoder.

There have been some doubts about “How can adding 2 extra buttons make scheduling worse!” Well, this is how.

But in any case, this discussion has gone around in circles and now we are for some reason looking at “consistent 4 button use” vs “consistent 2 button use”. If people are not consistent then it doesn’t matter what results we get. It is of no use. Dae says,

I have no idea what the other part of the puzzle is.

To be clear, the idea isn’t to force 2 buttons on everyone. But make sure the users of Anki who are new or who don’t take much interest in grading, don’t have to deal with the ambiguity of 4 button grading. Let the default be Again + Good.

1 Like