FSRS parameters hitting boundries(?) after optimization

Today one of my decks reached 1000 review records, so I pressed the “optimize” button. And I got this:


The parameters look suspicious to me in two ways:

  1. the 3rd and 4th parameter is somehow exactly same (14.8363);
  2. the 2nd last parameter in the first row is exactly 0.1000.

I tried adding some extra review records and optimize it again: the numbers changed but the 1) and 2) above still holds. It leads to a feeling that these parameters are hitting the “boundaries” of the parameter space during optimization, which is obviously not a good sign for a fitting.

So my question: should I continue using these new parameters, or it’s better to revert to the default ones? I did observe some large changes with the new parameters (like some intervals go 4d→15d), though I’m not sure if they are intended. The fit does improve according to the evaluator (log loss 0.5085→0.4757, rmse 8.35%→5.23%); however, I’m worrying that these boundary values may lead to nonsense intervals on some cards as I continue studying.

btw I think it would be nice if there is some documentation (understandable to an average user) to explain what these parameters mean (or better, visualize the model in some way interactively as the user changes parameters). The FSRS scheduler feels better than SM2 for me in general, but a significant advantage of the latter is that every parameter has a clear meaning, and I can change them by myself. With FSRS however, I have to believe in the optimizer always doing correctly…

1 Like

I found this post easiest to follow and understand what the optimizer is doing – see the last-third of the “S, Memory Stability” section, and the final “Optimization aka training” section. The 1st 4 parameters are the initial Stability values that are given based on your first grading of the card. From the wiki, this explanation of what the scheduler does with the parameters was helpful too. [There are many more articles you can find on the Wiki, but YMMV on how accessible they are!]

It’s hard for some folks to trust a completely new algorithm – especially one that is as intricate as FSRS. But that doesn’t mean that FSRS is invalid, or that there’s reasons to think it’s flawed.

If you just hit 1000 reviews, maybe you’d feel better keeping FSRS off for now. Save these parameters somewhere. And optimize it again when you hit 2000 reviews, so you can see how they compare. No one will force you to use FSRS between now and then. :wink:

As to your suspicions –

  1. Someone with a more intimate understanding for the algorithm might have a better answer for you, but your 3rd and 4th values being the same – to me that makes it look like your cards have similar outcomes whether you grade them Good or Easy when you first see them. As with so many things about FSRS – the numbers might say more about your grading habits than they do about the algorithm.
  2. That’s not suspicious. Or rather, it’s not any more suspicious than something like – 5 of your parameters end with “5” – or 8 of your parameters are <1. They are just numbers. There’s as much of a likelihood (setting aside the specific formulas and divisors/factors) for a parameter to be 0.1000 as 0.1111 or 0.1999 (which, by the way, one of mine is).
    image

I’ll cast my vote for that being highly unlikely. [see below] But I think the more parameters you see, from different users and collections, the more comfortable you’ll be with how much variation there is.

They are. FAQ #6. But it’s a big jump to get used to.

On the contrary, the boundaries prevent the interval from being nonsense. And it usually happens when the number of review logs is low.

2 Likes

Thanks for pointing them out, I’ll read them when I have time. imo they should be presented in some kind of official documentation though, I did not expect to find this on reddit lol

well that probably makes sense, I didn’t do detailed statistics but it’s possible that the cards labeled 3 for the first time indeed performs better (or same) as those with 4; partially due to the limited number of samples.

I feel it’s suspicious because of this

basically all other numbers changed (by non-trivial amount) after the re-optimization, but this number sticks on 0.1000. I don’t think this is a coincidence.

yea that’s why boundaries exist, but it could also be interpreted in the opposite way — hitting the boundaries means the fitting is already somewhat ill-behaved, which requires manual interference to avoid going wild. I think users should be notified in such scenarios, as (at least) it indicates that they should refit the parameters when more review records are available.

Anyway, I assume it’s better for me to revert to the defaults for now and revisit it later. Thanks for the discussion.

No, that happens sometimes. As for the parameters 3 and 4 being exactly the same - it happens if your initial stability for “Easy” is lower than for “Good”. S0(Again)≤S0(Hard)≤S0(Good)≤S0(Easy). If this inequality holds - good. If this inequality doesn’t hold, FSRS enforces it by changing S0 values.

Here: The Algorithm · open-spaced-repetition/fsrs4anki Wiki · GitHub

First of all, we should do as much math behind the scenes as possible to not make users confused. Not everyone is tech-savvy and math-savvy. Second of all, what would they do after being notified? Tweak parameters manually? That’s not a great idea, and defeats the whole point of having an optimizer in the first place.

Not necessarily. You can press “Evaluate”, write down RMSE, then do the same thing with the default parameters, and compare RMSE. Lower = better.

1 Like

Here is some update after I checked out the formula of FSRS. The 0.1000 in the image I originally posted is w9 (I wasted like 1 hour before realizing that the first parameter is w0 instead of w1 lol), appearing in the stability update formula (copied from the reddit post linked in Danika_Dakika’s reply:

Namely, w9 determines how the growth of intervals slows down as S increases; small w9 means a mild slow down. For this purpose, I think w9=0.1000 is quite reasonable (S=5hours->30days, S^(-w9)=1.17->0.71 ). To be honest, I don’t see any reason to bound w9 at 0.1000, i.e. w9<0.1000 should be possible too. Indeed, I would even consider w9<0 to be a thing, because as the spacing goes large, it is more likely for users to encounter the learnt material outside anki in that period (e.g. in learning a language), which are effectively reviews but does not trigger rescheduling. (this is out of scope of the model itself and has the risk of divergence, so idk if it’s actually a thing though)

With that saying, I have also seen cases where w9 seemingly should be larger than 0.1000. Images below shows the FSRS parameter on another deck I’m learning. The first image shows the auto-optimized parameter, where w9 is 0.1000; the second image shows the result after manually changing w8 and w9, which somehow outperforms the original parameters (on this specific sampleset, of course). idk what should I do with this though.


2 Likes

I wasted like 1 hour before realizing that the first parameter is w0 instead of w1 lol

Welcome to Python, where list indexing starts from 0, lol.

I’ve considered that as well, and I will benchmark it. There might be a very specific scenario where this could be the case. Suppose the user has several cards that interfere with each other, for example, maybe he’s learning English vocabulary and he’s trying to memorize “effective” and “efficient”. And they keep interfering with each other, and as a result, their S never gets higher than a couple of days. But once the user finds a good way to tell them apart, their S can finally grow to weeks/months/years. In this case, the increase in S would be very small initially, when the cards are interfering with each other, and much larger later, once the user resolves the issue. But then again, this needs to be benchmarked.

Also, it’s not very aesthetically pleasing when all parameters are positive and only one of them has a minus sign, but that can be solved by adjusting that parameter by a constant internally, inside FSRS’s code.

1 Like

I completely forgot to share the result of my benchmark.
image
It’s a small improvement, but it’s better than nothing.

2 Likes