FSRS5: Super low difficulty after optimization

sidereal · November 5, 2024, 8:16pm

I’m a bit late to the party, but I just installed RC2 for Anki 24.10 and optimized my deck with FSRS5. My RMSE went down from 2.78% to 1.57% (with the python version optimizer, 1.61% with built-in), but what concerns me is how nearly all of my cards now have a difficulty of 0%, including cards with reviews:

123123232323223333333
122232233331333
and even 12333

My requested retention is 96% but true is 98.9% for my full 189 days of use (I reformed my habits less than 5 months ago and have ignored reviews since 6/14/2024 when optimizing), and I don’t review tons of cards every day. I know that difficulty won’t affect my reviews, but it certainly doesn’t feel like 0% difficulty each time I see a card.

I even tried to bump up my desired retention to 98% and rescheduled cards on change, but it’s still 0% difficulty for most cards and the peak workload I’ll have to do is 41 cards in 80-89 days with the projected forecast.

I also simulated some reviews on a blank new card and answered 1223 and difficulty went from 65% to 36% to 17% to 0%. Interestingly enough, the FSRS Visualizer says it should be 65.38% to 57.46% to 51.71% to 37.56%. Here are my FSRS v5.2.0 optimized parameters: [6.7006, 18.4477, 18.1612, 18.1421, 6.8842, 0.8254, 1.2354, 0.2736, 1.8441, 0.0, 1.3177, 2.0384, 0.0044, 0.3869, 2.4027, 0.3577, 3.2153, 0.8074, 0.937]

Can I do anything about this? I would love to hear any experts’ thoughts on this and am willing to run tests. I’m not sure if this is a bug or (more likely) it’s just me, but here are some of my stats (after optimization, and, of course, I’m testing with backups):

Before optimization:

Expertium · November 5, 2024, 9:00pm

Please keep in mind that ignores entire cards, not just a part of their review history. Try turning it off (setting a very old date).

I even tried to bump up my desired retention to 98% and rescheduled cards on change, but it’s still 0% difficulty for most cards

Increasing desired retention won’t affect D.

the FSRS Visualizer says it should be 65.38% to 57.46% to 51.71% to 37.56%.

It’s outdated.

First of all, the Rust version (built-in) and the Python version are now equally equal in the latest RC, so there is no reason to use the Python version. Previously, the Rust version was mildly worse because of a bug, not anymore.

Anyway, this does seem quite strange. @L.M.Sherlock pinging so that you take a look. It probably has something to do with “Ignore cards reviewed before”.

With this user’s parameters and ratings=1,2,2,3, I get difficulty history: 0,6.9,4.2,2.5,1.0

Keks · November 5, 2024, 9:16pm

I also encountered strange behavior of D when using the “Ignore cards reviewed before” function. However, in my case, the date was indicated for the future.

The answer that was given to me:

sidereal · November 5, 2024, 10:15pm

I tried optimizing again without ignoring any cards and got a lower log loss of 0.0634 and a lower RMSE of 0.98%; however, all cards have 0% D now. Average interval increased to 1.77⁩ years and average stability increased to 4.87 years—from 3.9 months and 3.49 years respectively.

New params: [5.3131, 18.864, 18.9563, 18.5678, 6.7846, 1.0013, 1.2118, 0.2608, 1.9724, 0.0, 1.4316, 2.0897, 0.001, 0.4293, 2.4399, 0.4069, 3.5015, 0.7426, 0.8977]
Percent changes from ignoring reviews to not: [-20.7, 2.3, 4.4, 2.3, -1.4, 21.3, -1.9, -4.7, 7, 0, 8.6, 2.5, -77.3, 11, 1.5, 13.8, 8.9, -8, -4.2]

Keks · November 5, 2024, 10:21pm

1=6.7846
2=5,0617
3=0,3909
4=-12,30

For some reason, your w7 parameter is very large. Therefore, with any answer, your difficulty tends to D0(4)=0% or 1.

rating history: 1,1,1,1
difficulty history: 0,6.8,2.4,1.0,1.0

sidereal · November 6, 2024, 4:31pm

Yeah, w[7] is super high. I’m also not sure why w[9] and w[12] are at 0 (when they were higher with FSRS 4.5), which is causing that stability ballooning. Here are my revlogs if anyone wants to run analysis: Google Drive. Here are also some cursory python plots comparing my FSRS5 params to my previous ones and default

Gilfar · November 6, 2024, 5:48pm

w[7] steers the distribution of card difficulty, low steers to 100% and high moves it to 0%
in your case the parameter is very high so at high retention all cards will end up at 0%

L.M.Sherlock · November 11, 2024, 1:46am

I don’t think it’s problematic. Your true retention is very high, so FSRS think it’s unnecessary to distingue your cards’ difficulty.

For detail diagnosis, you can use fsrs4anki/fsrs4anki_optimizer.ipynb at main · open-spaced-repetition/fsrs4anki to optimize your collection, and post these figures:

sidereal · November 11, 2024, 2:03am

L.M.Sherlock · November 11, 2024, 2:23am

The model fits your reviews pretty well.

sidereal · November 11, 2024, 3:10am

I can’t deny that. An RMSE of .96% is astonishing.

I’m unsure if the “removal” of (the power of) a few parameters with D = 1.0 being constant for almost all cards is beneficial for FSRS5 or just a thing it’s dealing with due to the D formula change from v4.5. Would it do better with some sigmoid function or exponential factor? Maybe something will happen in a few years that completely revamps D, or maybe it won’t exist anymore (I know that people have tried tons of things). It’s too bad that I won’t be able to sort cards by the difficulty statistic for now, but at least it doesn’t seem to be limiting FSRS5’s performance for me. I’d love to hear any of your thoughts.

L.M.Sherlock · November 11, 2024, 3:30am

OK. I figure it out. The mean_reversion is using the raw init_difficulty which hasn’t been clamped.

github.com

open-spaced-repetition/fsrs-rs/blob/7477d2bf20749e393c437007564bd87519b7e70d/src/model.rs#L106-L117


      
          fn mean_reversion(&self, new_d: Tensor<B, 1>) -> Tensor<B, 1> {
              let rating = Tensor::from_floats([4.0], &B::Device::default());
              self.w.get(7) * (self.init_difficulty(rating) - new_d.clone()) + new_d
          }
          
          pub(crate) fn init_stability(&self, rating: Tensor<B, 1>) -> Tensor<B, 1> {
              self.w.val().select(0, rating.int() - 1)
          }
          
          fn init_difficulty(&self, rating: Tensor<B, 1>) -> Tensor<B, 1> {
              self.w.get(4) - (self.w.get(5) * (rating - 1)).exp() + 1
          }

I will fix it soon.

L.M.Sherlock · November 11, 2024, 4:06am

Could you send your collection to me? I need to check whether the patch solve this problem. And I need info about your timezone and next_day_start_at.

L.M.Sherlock · November 11, 2024, 4:08am

By the way, according to the initial benchmark result, the patch makes the model worse.

Update:

The fixed model is ~1.36% worse than previous one. Should we apply this patch?

Model: FSRS-rs
Total number of users: 4567
Total number of reviews: 154775095
Weighted average by reviews:
FSRS-rs LogLoss (mean±std): 0.3293±0.1533
FSRS-rs RMSE(bins) (mean±std): 0.0519±0.0330
FSRS-rs AUC (mean±std): 0.7027±0.0798

Weighted average by log(reviews):
FSRS-rs LogLoss (mean±std): 0.3545±0.1696
FSRS-rs RMSE(bins) (mean±std): 0.0701±0.0449
FSRS-rs AUC (mean±std): 0.6994±0.0889

Weighted average by users:
FSRS-rs LogLoss (mean±std): 0.3577±0.1721
FSRS-rs RMSE(bins) (mean±std): 0.0728±0.0465
FSRS-rs AUC (mean±std): 0.6984±0.0909

parameters: [0.4141, 1.1189, 3.0284, 15.7564, 7.1558, 0.5599, 1.8076, 0.0094, 1.5009, 0.1258, 0.9932, 1.9051, 0.1066, 0.2961, 2.3267, 0.2273, 2.9898, 0.5032, 0.6276]
Model: FSRS-rs-old
Total number of users: 4567
Total number of reviews: 154775095
Weighted average by reviews:
FSRS-rs-1 LogLoss (mean±std): 0.3286±0.1530
FSRS-rs-1 RMSE(bins) (mean±std): 0.0512±0.0330
FSRS-rs-1 AUC (mean±std): 0.7035±0.0798

Weighted average by log(reviews):
FSRS-rs-1 LogLoss (mean±std): 0.3541±0.1695
FSRS-rs-1 RMSE(bins) (mean±std): 0.0698±0.0451
FSRS-rs-1 AUC (mean±std): 0.7002±0.0886

Weighted average by users:
FSRS-rs-1 LogLoss (mean±std): 0.3573±0.1720
FSRS-rs-1 RMSE(bins) (mean±std): 0.0726±0.0466
FSRS-rs-1 AUC (mean±std): 0.6992±0.0906

parameters: [0.4092, 1.1189, 3.0727, 15.7654, 7.137, 0.5263, 1.7831, 0.0082, 1.5103, 0.1192, 0.9995, 1.9062, 0.1136, 0.2961, 2.3252, 0.2255, 2.9898, 0.505, 0.6308]

sidereal · November 11, 2024, 3:50pm

I would create a PR and get some other reviewers’ opinions. I think that it’s kind of a big change, and it reduces average performance. I don’t know enough about programming to have a say. How does it affect the difficulty distribution for other people’s collections or your own?

DerIshmaelite · November 11, 2024, 9:34pm

How does it reduce average performance

sidereal · November 11, 2024, 9:52pm

RMSE increases on average, so the patch would make FSRS less accurate for most people; that’s the cost of improving the difficulty distribution, I guess. Sorry for being unclear with “reducing average performance.”

DerIshmaelite · November 11, 2024, 9:58pm

Well if the only benefit that is to get out of this patch is having a better difficulty distribution so that I better sort using Difficulty Ascending or Descending, when I don’t anymore since Retrievability Descending is now (arguably) the best sorting order, it makes little sense to me. A bad trade-off. But if there is a different insight into this, I am a bit undecisive. @L.M.Sherlock

L.M.Sherlock · November 12, 2024, 1:58am

Here is the final result:

Model: FSRS-rs
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-rs LogLoss (mean±std): 0.3279±0.1524
FSRS-rs RMSE(bins) (mean±std): 0.0518±0.0330
FSRS-rs AUC (mean±std): 0.6997±0.0774

Weighted average by log(reviews):
FSRS-rs LogLoss (mean±std): 0.3532±0.1692
FSRS-rs RMSE(bins) (mean±std): 0.0707±0.0459
FSRS-rs AUC (mean±std): 0.6993±0.0885

Weighted average by users:
FSRS-rs LogLoss (mean±std): 0.3565±0.1717
FSRS-rs RMSE(bins) (mean±std): 0.0736±0.0476
FSRS-rs AUC (mean±std): 0.6985±0.0906

parameters: [0.416, 1.1495, 3.16, 15.8143, 7.1478, 0.5661, 1.7997, 0.0096, 1.5061, 0.1246, 0.9961, 1.9041, 0.106, 0.2961, 2.3278, 0.2288, 3.0111, 0.5058, 0.6379]
Model: FSRS-rs-1
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-rs-1 LogLoss (mean±std): 0.3271±0.1521
FSRS-rs-1 RMSE(bins) (mean±std): 0.0512±0.0331
FSRS-rs-1 AUC (mean±std): 0.7014±0.0778

Weighted average by log(reviews):
FSRS-rs-1 LogLoss (mean±std): 0.3528±0.1691
FSRS-rs-1 RMSE(bins) (mean±std): 0.0705±0.0460
FSRS-rs-1 AUC (mean±std): 0.7001±0.0885

Weighted average by users:
FSRS-rs-1 LogLoss (mean±std): 0.3561±0.1716
FSRS-rs-1 RMSE(bins) (mean±std): 0.0734±0.0478
FSRS-rs-1 AUC (mean±std): 0.6992±0.0905

parameters: [0.4127, 1.1488, 3.1878, 15.8143, 7.1333, 0.5271, 1.7733, 0.0084, 1.5148, 0.1191, 1.003, 1.9051, 0.1122, 0.2962, 2.3266, 0.2272, 3.0122, 0.5077, 0.6404]

It’s ~1% worse than previous implementation. So my decision is not fixing it.

Expertium · November 12, 2024, 8:36pm

Please make sure that there are no inconsistencies between the Rust version and the Python version.

Topic		Replies	Views
No missed ratings/lapses but difficulty @100% for FSRS5 FSRS	8	144	June 6, 2025
Since switching to FSRS workload has gone up dramatically FSRS	8	201	March 28, 2025
FSRS 5: Difficulty mismatch between Browser and Card Info Panel FSRS	6	101	December 31, 2024
Strange FSRS Parameters and Average Difficulty is 1% FSRS	3	118	February 5, 2025
Higher rmse in fsrs 5 FSRS	11	310	January 5, 2025

FSRS5: Super low difficulty after optimization

Related topics