Pass/Fail Grading as Default

Actually now that I think about it, this data only shows us that using Easy/Hard too much might be bad. Moderate use of those buttons still might have a positive effect on scheduling so dae maybe isn’t wrong.

I’m not sure what makes you say so. The graph shows that RMSE is lower for two buttons users for almost any threshold, from 2.5% Hard+Easy all the way to 42.5% Hard+Easy.

2 Likes

Oh sorry I didn’t look at the graph. I just read the Text you posted here. If what you say is true, then Again/Good being default makes the most sense to me. (Funny how SuperMemo after years of research uses 6 buttons and we’re talking about 2 buttons)

Edit: The meta once was use 2 buttons because of ease hell, then it became you can use 4 buttons becaure ease hell is solved and now we’re back to 2 buttons.

Recent versions of Supermemo algorithm had lifted weight on user grades to combat user bias. Grades within pass or fail (3 each) are still taken into account, but there are other factors that have a greater impact such as the priority set during review.

2 Likes

If I’m not wrong the graph is showing the less Hard+Easy is used the less the RMSE is? Dang that’s crazy.

Sorry, but I think I don’t totally understand the graph. What exactly is meant by threshold here? Percentage of hard/easy usage? Does it apply only to 4-button users then?

I gave an example. Here’s a step-by-step explanation:

  1. Calculate how often the user uses Hard, in %
  2. Calculate how often the user uses Easy, in %
  3. Add them together
  4. If the sum exceeds the threshold, put the user into the “four button users” category, else put him into the “two button users” category
  5. Repeat steps 1-4 for many different values of the threshold, to get the full picture
2 Likes

You guys are missing the point I made before. You cannot “put” ppl in 4 button vs. 2 button groups based on thresholds of use. These groups are systematically different, therefore the conclusion does not hold that 2 buttons are more accurate to 4 buttons. You would have to take users who are in the 4-button group, and randomize them into 2 groups, 1 that is forced to use 2 buttons and one that continues to have 4 buttons.

This is similar to what was done in SM when there was a change in buttons.

This current analysis will miss a degrading in the performance of the predictive algorithm due to differences in 4-button and 2-button users, which not only includes experience (more advanced users), but also complexity of the material they are using, etc.

I’m not sure what you mean by “priority set during review”, I believe the priority set for items has no impact on scheduling intervals.

Because that analysis was done correctly (and it now uses 5 buttons)

3 Likes

You would have to take users who are in the 4-button group, and randomize them into 2 groups, 1 that is forced to use 2 buttons and one that continues to have 4 buttons.

Well, I cannot do that, but I could take some collections, randomly replace some Hard and Easy with Good, and see whether it will degrade the performance of FSRS. Spoiler: yes, it will.

I mean the priority of the element when you do the review (repetition in SM jargon); as priorities are dynamic and the element will slightly change the priority as soon as the user grades the recall.

It does. Just to name a few, a very low priority item will get higher intervals directly, or indirectly due a higher A-factor increase.Lapsing a very high priority item will get a lower next interval than a low priority item (can be days or several weeks), etc.

Just for the record, SuperMemo has 5 buttons on the interface but the rating 0 (null recall) can still be used by the keyboard shortcut, so the latest algorithm SM-18 still has 6 grade options.

The removal of the button for grade 0 was to discourage users to overuse it.

1 Like

I had no idea there was a hidden button! My guess is that it is practically not known by anyone and therefore rarely used.

In anki, it would not make much difference if the default view only showed 2 buttons and one could turn on 4 buttons once in the settings, and never have to worry about it again. But I suspect that ppl not using anki is not due to 2 vs. 4 buttons, there are many things suggested which I doubt would have much effect on retaining users if all the proposed changes which are supposedly affecting retention were to be implemented.

So now what that might mean? Also Hard is often pressed in stead of Again so there’s that.

Okay but it still shows average 4 button user shifting to 2 button use will only benefit him. Or you’re claiming that people who use 4 buttons will continue to see higher RMSE?

It does not show this. That’s a misinterpretation of the data.

Okay so what’s causing it? You mentioned difficulty of material but what does this has anything to do with difficulty of material?

Hey @Expertium saw someone quoting this from FSRS wiki’s FAQ, Shouldn’t this need to be changed?

A12: Yes. FSRS is about equally accurate for people who rarely use “Hard” and “Easy” and for people who use all 4 buttons a lot. However, this is not the final conclusion, and as we gather more data, this conclusion may change.

1 Like

A supplement for current suggestion:

This correlation appeared to be weak due to the fact that all users tend to deploy their own grading systems, which is often inconsistent.

In that light, two grade systems would have the exact same effect on the algorithm as the six grade system.

Grade-retrievability correlations are also collected, however, their weight is negligible.

Source: First data-driven spaced repetition algorithm: Algorithm SM-8 - supermemo.guru

4 Likes

For anyone curious, if Anki works best for pass/fail with grades 1 (again) and 3 (good), in SuperMemo the equivalent is 2 (fail) and 4 (good).

Here I explain it in more detail (time stamped) https://youtu.be/P22ig_erHoE?t=622

I don’t have enough familiarity to know which buttons to replace with which (@sorata, the idea is to compare the result of that with the 2-button users), but is this even possible? Because changing a grade affects when the card would next be displayed, which is no longer in the dataset, so one doesn’t know which grade the user would have selected on that day since it didn’t take place.
In other words, any changing of grades affects future dates, which then affects the next date, etc.

Why you pinging me though? I understand what’s being attempted and what issues the other person raises. The question is “What if the 4 button users start using 2 buttons? Would the RMSE improve or not?”

One very flawed way of doing that might be actually asking people to change their behaviour, preferably in the same deck they’ve been using before, then compare before/after results. But we can’t possibly do any of that. It would also be nice to have a control group and make everyone learn similar material, etc etc.

What the other person suggested doesn’t work for the reasons you stated.