@sorata @nmjkjm @suiyuan @Keks @L.M.Sherlock @dae I re-did the analysis with new RMSE values, here: FSRS-4.5 - Album on Imgur
The conclusion is the same, but I also added another interesting comparison.
EDIT: give it some time, for some reason imgur keeps cutting off my text.
EDIT 2: ok, I don’t know what’s happening, imgur text is just broken.
Text:
All users were put either in the “two buttons group” or in the “four buttons group”. If the % of times the user used Hard + the % of times the user used Easy exceeded the threshold, the user would be put in the “four buttons group”, otherwise in the “two buttons group”.
Example: a user pressed Hard 5% of the time and Easy 10% of the time. The threshold is 12%. 0.05+0.1 > 0.12, hence this user belongs in the “four buttons group”.
Then I tried lots of different thresholds (x axis) and plotted the RMSE values of both groups. The green area indicates statistical significance, meaning that if the curves are in the green area, the difference between them is not a fluke (p-value<0.01). If the curves are in the white area, the difference between them might be a fluke.FSRS is more accurate for users who only use two buttons (lower RMSE = better).
I also put users into 3 different groups: those who use Again and Hard, those who use Again and Good, and those who use Again and Easy 95% (or more) of the time, and use the other two buttons <=5% of the time.
The difference was statistically significant (p-value<0.01) for Again+Hard vs Again+Good and for Again+Easy vs Again+Good, but not for Again+Hard vs Again+Easy.
Oh, and of course most users were not included into any of those groups.
EDIT 3: I made a Reddit post: Reddit - Dive into anything
EDIT 4: screw imgur, here are the images