FSRS>=5 is a significant regression for a minority of users who use again a lot on new and learn cards

7Gamil · September 21, 2025, 7:01pm

You mean Anki learning step? if it short like 1h 1d, maybe work well with FSRS

Expertium · September 21, 2025, 7:02pm

I mean “Learn” vs “Review” (and also “Relearn”, I forgot that’s a thing)

7Gamil · September 21, 2025, 7:04pm

Is it problem if we let Anki only effect learning and relearning with short steps, and leave review cards to FSRS?

Expertium · September 21, 2025, 7:05pm

It will make FSRS’s predictions worse, we already know that since around FSRS-5.

Helge · September 21, 2025, 7:07pm

Yes, looking only at the duration between reviews is much better.

7Gamil · September 21, 2025, 7:13pm

Thanks for letting me know, sorry for this method.
We then can go with auto reset after changing card state from learning to review, it will work well enough and easy to do.

Expertium · September 21, 2025, 7:15pm

Making learn/review/relearn an input feature (like interval lengths and grades are right now) for FSRS is one thing. Not letting FSRS see the card’s previous review history is…look man, just trust me when I say this, it’s not a good idea.

7Gamil · September 21, 2025, 7:20pm

Okay, then let’s wait until we get a novel method.
Thanks for your hard work.

Helge · September 21, 2025, 7:58pm

Okay, it’s great that your analyses would flag this case. I didn’t know it was so rare here. In LLMs, this is the case for most parameters, since they also store information about small towns, for example, that is only ever needed by an absolutely tiny number of users who will ever ask about these towns.

Expertium · September 21, 2025, 8:02pm

I mean, realistically, I won’t find myself in that situation. If changing something doesn’t affect metrics for 99% of users and makes metrics better for 1% of users, it will look like a reduction of logloss on average, so I will implement it anyway without having to think too hard.
Oversimplified: let’s say for users 1 and 2 logloss doesn’t change after I added a new formula to FSRS, and for user 3 it changes by -0.003. Then the average is
(0+0-0.003)/3=-0.001, so I will be like “alright, let’s keep that new formula”.

Btw, it’s possible that some change has a very small or 0 impact on the average, but it improves worst-case performance for, say, 1% or 0.5% of users with the highest logloss and makes performance worse for top users with the lowest logloss, so on average it cancels out. So it makes the entire distribution more narrow. This is not something that I test right now, and idk if that happens often. I think no, it probably doesn’t happen very often. It would require performance on above-average and below-average users to cancel out in a very precise way.
It’s pure speculation at this point, but making the distribution of logloss more narrow could be a good thing. It means FSRS would be more consistent. Winners win less, but losers lose less, so to speak.

Helge · September 21, 2025, 8:19pm

In any case, thanks again so much, expertium, for all the work that goes into all of this.

Expertium · September 21, 2025, 8:33pm

I plotted the distribution of logloss for different FSRS versions
…it looked better in my head

Vertical lines are averages. Also, distributions become narrower

version=FSRS v1, stand. dev.=0.3122, IQR=0.3333
version=FSRS v2, stand. dev.=0.2803, IQR=0.3184
version=FSRS v3, stand. dev.=0.2632, IQR=0.3024
version=FSRS v4, stand. dev.=0.1852, IQR=0.2587
version=FSRS-4.5, stand. dev.=0.1757, IQR=0.2496
version=FSRS-5, stand. dev.=0.1717, IQR=0.2432
version=FSRS-6-recency, stand. dev.=0.1631, IQR=0.2366

Anon_0000 · September 21, 2025, 8:39pm

How can we interpret this graph? Freqency seems to have gone down a lot; is that a good thing?

(I’m assuming logloss getting narrower is certainly a good thing)

Expertium · September 21, 2025, 8:41pm

Frequency is just “how many users have logloss of around this much”, don’t look at its absolute values, absolute values of frequency don’t matter.
Averages have been decreasing (vertical lines shifting to the left), that’s definitely good. Distributions getting narrower is less-obviously-good-but-still-good, it means fewer users get really bad results and FSRS is getting more consistent.

Expertium · September 21, 2025, 9:08pm

I was hoping that if I do density estimation, it would look better. Uh, well, not really

This is basically “how to avoid histograms if you’re a nerd”. At least here you don’t have to worry about the width of bins, like on the previous graph. And this gives a better feel that distributions are getting narrower.
Notice that older versions have a thicker right tail. That’s bad, that means more people get extremely poor predictions. Right tail should be as thin as possible. Newer versions have thinner right tail, that’s good.

Helge · September 21, 2025, 10:11pm

That is solid improvement for all users. Very reassuring. Thank you!

L.M.Sherlock · September 22, 2025, 2:35am

Why not just do a test?

I will benchmark this change:

new_d = torch.where(short_term, state[:, 1], self.next_d(state, X[:, 1]))

For short-term reviews, the difficulty will not change.

L.M.Sherlock · September 22, 2025, 2:39am

If you are willing to share your collection with me, I could evaluate the change with your case.

L.M.Sherlock · September 22, 2025, 3:53am

@Expertium

The difference is so negligible:

Model: FSRS-6-dev
Total number of users: 2061
Total number of reviews: 68679005
Weighted average by reviews:
FSRS-6-dev LogLoss (mean±std): 0.3334±0.1511
FSRS-6-dev RMSE(bins) (mean±std): 0.0474±0.0299
FSRS-6-dev AUC (mean±std): 0.7082±0.0820

Weighted average by log(reviews):
FSRS-6-dev LogLoss (mean±std): 0.3496±0.1612
FSRS-6-dev RMSE(bins) (mean±std): 0.0626±0.0396
FSRS-6-dev AUC (mean±std): 0.7051±0.0876

Weighted average by users:
FSRS-6-dev LogLoss (mean±std): 0.3515±0.1630
FSRS-6-dev RMSE(bins) (mean±std): 0.0649±0.0407
FSRS-6-dev AUC (mean±std): 0.7044±0.0895

parameters: [0.2116, 1.0897, 2.9447, 12.7109, 6.5001, 0.7207, 3.0567, 0.0142, 1.7844, 0.1558, 0.7581, 1.5011, 0.0523, 0.3266, 1.7133, 0.3781, 1.9568, 0.7399, 0.1184, 0.1267, 0.1799]

Model: FSRS-6
Total number of users: 2061
Total number of reviews: 68679005
Weighted average by reviews:
FSRS-6 LogLoss (mean±std): 0.3333±0.1509
FSRS-6 RMSE(bins) (mean±std): 0.0475±0.0299
FSRS-6 AUC (mean±std): 0.7081±0.0821

Weighted average by log(reviews):
FSRS-6 LogLoss (mean±std): 0.3496±0.1610
FSRS-6 RMSE(bins) (mean±std): 0.0627±0.0396
FSRS-6 AUC (mean±std): 0.7048±0.0880

Weighted average by users:
FSRS-6 LogLoss (mean±std): 0.3515±0.1629
FSRS-6 RMSE(bins) (mean±std): 0.0650±0.0407
FSRS-6 AUC (mean±std): 0.7041±0.0899

parameters: [0.2122, 1.0908, 2.9459, 12.7045, 6.4391, 0.679, 3.0999, 0.0213, 1.8084, 0.1802, 0.7802, 1.496, 0.0565, 0.3234, 1.7089, 0.3869, 1.9502, 0.7046, 0.1261, 0.1282, 0.1813]

Expertium · September 22, 2025, 7:21am

Thank you, though I will still benchmark it myself using FSRS-7. I wonder if results will be similar. FSRS-7 uses fractional interval lengths instead of integer interval lengths, so maybe. And more importantly, FSRS-7 predicts p(recall) for same-day reviews.

Topic		Replies	Views
Due Column - Changing Days (from Whole Numbers to Decimals in Scheduling) Suggestions	59	708	December 9, 2024
Difficulty not updated correctly FSRS	25	1198	April 25, 2024
Several FSRS-related suggestions FSRS	167	2628	October 29, 2024
[Feature Request] FSRS Should Ignore Learning Cards Reviews FSRS	36	736	June 29, 2025
FSRS 5: <1d Scheduling and Learning Steps FSRS	167	3374	November 25, 2024

FSRS>=5 is a significant regression for a minority of users who use again a lot on new and learn cards

Related topics