FSRS5: Super low difficulty after optimization

I’m a bit late to the party, but I just installed RC2 for Anki 24.10 and optimized my deck with FSRS5. My RMSE went down from 2.78% to 1.57% (with the python version optimizer, 1.61% with built-in), but what concerns me is how nearly all of my cards now have a difficulty of 0%, including cards with reviews:

123123232323223333333
122232233331333
and even 12333

My requested retention is 96% but true is 98.9% for my full 189 days of use (I reformed my habits less than 5 months ago and have ignored reviews since 6/14/2024 when optimizing), and I don’t review tons of cards every day. I know that difficulty won’t affect my reviews, but it certainly doesn’t feel like 0% difficulty each time I see a card.

I even tried to bump up my desired retention to 98% and rescheduled cards on change, but it’s still 0% difficulty for most cards and the peak workload I’ll have to do is 41 cards in 80-89 days with the projected forecast.

I also simulated some reviews on a blank new card and answered 1223 and difficulty went from 65% to 36% to 17% to 0%. Interestingly enough, the FSRS Visualizer says it should be 65.38% to 57.46% to 51.71% to 37.56%. Here are my FSRS v5.2.0 optimized parameters: [6.7006, 18.4477, 18.1612, 18.1421, 6.8842, 0.8254, 1.2354, 0.2736, 1.8441, 0.0, 1.3177, 2.0384, 0.0044, 0.3869, 2.4027, 0.3577, 3.2153, 0.8074, 0.937]

Can I do anything about this? I would love to hear any experts’ thoughts on this and am willing to run tests. I’m not sure if this is a bug or (more likely) it’s just me, but here are some of my stats (after optimization, and, of course, I’m testing with backups):



Before optimization:

Please keep in mind that ignores entire cards, not just a part of their review history. Try turning it off (setting a very old date).

I even tried to bump up my desired retention to 98% and rescheduled cards on change, but it’s still 0% difficulty for most cards

Increasing desired retention won’t affect D.

the FSRS Visualizer says it should be 65.38% to 57.46% to 51.71% to 37.56%.

It’s outdated.

First of all, the Rust version (built-in) and the Python version are now equally equal in the latest RC, so there is no reason to use the Python version. Previously, the Rust version was mildly worse because of a bug, not anymore.

Anyway, this does seem quite strange. @L.M.Sherlock pinging so that you take a look. It probably has something to do with “Ignore cards reviewed before”.

With this user’s parameters and ratings=1,2,2,3, I get difficulty history: 0,6.9,4.2,2.5,1.0

I also encountered strange behavior of D when using the “Ignore cards reviewed before” function. However, in my case, the date was indicated for the future.

The answer that was given to me:

I tried optimizing again without ignoring any cards and got a lower log loss of 0.0634 and a lower RMSE of 0.98%; however, all cards have 0% D now. Average interval increased to 1.77⁩ years and average stability increased to 4.87 years—from 3.9 months and 3.49 years respectively.

New params: [5.3131, 18.864, 18.9563, 18.5678, 6.7846, 1.0013, 1.2118, 0.2608, 1.9724, 0.0, 1.4316, 2.0897, 0.001, 0.4293, 2.4399, 0.4069, 3.5015, 0.7426, 0.8977]
Percent changes from ignoring reviews to not: [-20.7, 2.3, 4.4, 2.3, -1.4, 21.3, -1.9, -4.7, 7, 0, 8.6, 2.5, -77.3, 11, 1.5, 13.8, 8.9, -8, -4.2]

1=6.7846
2=5,0617
3=0,3909
4=-12,30

For some reason, your w7 parameter is very large. Therefore, with any answer, your difficulty tends to D0(4)=0% or 1.

rating history: 1,1,1,1
difficulty history: 0,6.8,2.4,1.0,1.0

Yeah, w[7] is super high. I’m also not sure why w[9] and w[12] are at 0 (when they were higher with FSRS 4.5), which is causing that stability ballooning. Here are my revlogs if anyone wants to run analysis: Google Drive. Here are also some cursory python plots comparing my FSRS5 params to my previous ones and default


w[7] steers the distribution of card difficulty, low steers to 100% and high moves it to 0%
in your case the parameter is very high so at high retention all cards will end up at 0%

I don’t think it’s problematic. Your true retention is very high, so FSRS think it’s unnecessary to distingue your cards’ difficulty.

For detail diagnosis, you can use fsrs4anki/fsrs4anki_optimizer.ipynb at main · open-spaced-repetition/fsrs4anki to optimize your collection, and post these figures:



The model fits your reviews pretty well.

I can’t deny that. An RMSE of .96% is astonishing.

I’m unsure if the “removal” of (the power of) a few parameters with D = 1.0 being constant for almost all cards is beneficial for FSRS5 or just a thing it’s dealing with due to the D formula change from v4.5. Would it do better with some sigmoid function or exponential factor? Maybe something will happen in a few years that completely revamps D, or maybe it won’t exist anymore (I know that people have tried tons of things). It’s too bad that I won’t be able to sort cards by the difficulty statistic for now, but at least it doesn’t seem to be limiting FSRS5’s performance for me. I’d love to hear any of your thoughts.

OK. I figure it out. The mean_reversion is using the raw init_difficulty which hasn’t been clamped.

I will fix it soon.

1 Like

Could you send your collection to me? I need to check whether the patch solve this problem. And I need info about your timezone and next_day_start_at.

By the way, according to the initial benchmark result, the patch makes the model worse.

Update:

The fixed model is ~1.36% worse than previous one. Should we apply this patch?

Model: FSRS-rs
Total number of users: 4567
Total number of reviews: 154775095
Weighted average by reviews:
FSRS-rs LogLoss (mean±std): 0.3293±0.1533
FSRS-rs RMSE(bins) (mean±std): 0.0519±0.0330
FSRS-rs AUC (mean±std): 0.7027±0.0798

Weighted average by log(reviews):
FSRS-rs LogLoss (mean±std): 0.3545±0.1696
FSRS-rs RMSE(bins) (mean±std): 0.0701±0.0449
FSRS-rs AUC (mean±std): 0.6994±0.0889

Weighted average by users:
FSRS-rs LogLoss (mean±std): 0.3577±0.1721
FSRS-rs RMSE(bins) (mean±std): 0.0728±0.0465
FSRS-rs AUC (mean±std): 0.6984±0.0909

parameters: [0.4141, 1.1189, 3.0284, 15.7564, 7.1558, 0.5599, 1.8076, 0.0094, 1.5009, 0.1258, 0.9932, 1.9051, 0.1066, 0.2961, 2.3267, 0.2273, 2.9898, 0.5032, 0.6276]
Model: FSRS-rs-old
Total number of users: 4567
Total number of reviews: 154775095
Weighted average by reviews:
FSRS-rs-1 LogLoss (mean±std): 0.3286±0.1530
FSRS-rs-1 RMSE(bins) (mean±std): 0.0512±0.0330
FSRS-rs-1 AUC (mean±std): 0.7035±0.0798

Weighted average by log(reviews):
FSRS-rs-1 LogLoss (mean±std): 0.3541±0.1695
FSRS-rs-1 RMSE(bins) (mean±std): 0.0698±0.0451
FSRS-rs-1 AUC (mean±std): 0.7002±0.0886

Weighted average by users:
FSRS-rs-1 LogLoss (mean±std): 0.3573±0.1720
FSRS-rs-1 RMSE(bins) (mean±std): 0.0726±0.0466
FSRS-rs-1 AUC (mean±std): 0.6992±0.0906

parameters: [0.4092, 1.1189, 3.0727, 15.7654, 7.137, 0.5263, 1.7831, 0.0082, 1.5103, 0.1192, 0.9995, 1.9062, 0.1136, 0.2961, 2.3252, 0.2255, 2.9898, 0.505, 0.6308]

I would create a PR and get some other reviewers’ opinions. I think that it’s kind of a big change, and it reduces average performance. I don’t know enough about programming to have a say. How does it affect the difficulty distribution for other people’s collections or your own?

How does it reduce average performance

RMSE increases on average, so the patch would make FSRS less accurate for most people; that’s the cost of improving the difficulty distribution, I guess. Sorry for being unclear with “reducing average performance.”

Well if the only benefit that is to get out of this patch is having a better difficulty distribution so that I better sort using Difficulty Ascending or Descending, when I don’t anymore since Retrievability Descending is now (arguably) the best sorting order, it makes little sense to me. A bad trade-off. But if there is a different insight into this, I am a bit undecisive. @L.M.Sherlock

Here is the final result:

Model: FSRS-rs
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-rs LogLoss (mean±std): 0.3279±0.1524
FSRS-rs RMSE(bins) (mean±std): 0.0518±0.0330
FSRS-rs AUC (mean±std): 0.6997±0.0774

Weighted average by log(reviews):
FSRS-rs LogLoss (mean±std): 0.3532±0.1692
FSRS-rs RMSE(bins) (mean±std): 0.0707±0.0459
FSRS-rs AUC (mean±std): 0.6993±0.0885

Weighted average by users:
FSRS-rs LogLoss (mean±std): 0.3565±0.1717
FSRS-rs RMSE(bins) (mean±std): 0.0736±0.0476
FSRS-rs AUC (mean±std): 0.6985±0.0906

parameters: [0.416, 1.1495, 3.16, 15.8143, 7.1478, 0.5661, 1.7997, 0.0096, 1.5061, 0.1246, 0.9961, 1.9041, 0.106, 0.2961, 2.3278, 0.2288, 3.0111, 0.5058, 0.6379]
Model: FSRS-rs-1
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-rs-1 LogLoss (mean±std): 0.3271±0.1521
FSRS-rs-1 RMSE(bins) (mean±std): 0.0512±0.0331
FSRS-rs-1 AUC (mean±std): 0.7014±0.0778

Weighted average by log(reviews):
FSRS-rs-1 LogLoss (mean±std): 0.3528±0.1691
FSRS-rs-1 RMSE(bins) (mean±std): 0.0705±0.0460
FSRS-rs-1 AUC (mean±std): 0.7001±0.0885

Weighted average by users:
FSRS-rs-1 LogLoss (mean±std): 0.3561±0.1716
FSRS-rs-1 RMSE(bins) (mean±std): 0.0734±0.0478
FSRS-rs-1 AUC (mean±std): 0.6992±0.0905

parameters: [0.4127, 1.1488, 3.1878, 15.8143, 7.1333, 0.5271, 1.7733, 0.0084, 1.5148, 0.1191, 1.003, 1.9051, 0.1122, 0.2962, 2.3266, 0.2272, 3.0122, 0.5077, 0.6404]

It’s ~1% worse than previous implementation. So my decision is not fixing it.

1 Like

Please make sure that there are no inconsistencies between the Rust version and the Python version.