Measures to prevent new FSRS parameters from being worse than the old ones

A lot of users have been reporting that the log-loss and RMSE are worse after optimization. While I don’t know why this is happening, I have two suggestions:

  1. If the user has used the optimizer at least once, use his last parameters as a starting point for the new optimization rather than the default ones. I believe right now every time the user runs the optimizer, the default parameters are used as a starting point, which is suboptimal. Using that user’s last parameters can improve convergence, since the starting point will be closer to the optimum.
  2. Run the optimizer, evaluate both old and new parameters, and keep whichever results in a lower RMSE. Though the issue is that it’s possible for RMSE to decrease while log-loss increases, and vice versa. @L.M.Sherlock, I still need your opinion on this.
1 Like

My opinion is I need to complete this todo at first:

It will help me figure it out whether it is common that the new parameters are often worse than the old one and why.

1 Like

And my paper deadline is Feb 8th 2024. So I could only do some small tasks that could be done in hours before the deadline.

1 Like

Also, it seems that at least a few users don’t know that you have to click “Save” to apply parameters after optimization. @dae how about this: if the user presses “Optimize” and the old parameters have been replaced with the new ones, then when he tries to close the settings window without saving, a pop-up will ask him something like, “Are you sure you want to quit without saving changes?”
I thought this was already implemented based on this comment, but I’m using 23.12.1 and there is no such window.

2 Likes

That comment was about the card layout screen.

For what it’s worth (n=1), this happens to me consistently and it feels like it’s happening since 23.12, though I can’t be 100% sure. Not only were the 23.10 numbers better, but I also often get worse results if I generate new weights with 23.12 again after a few days. Of course I shouldn’t optimize so often anyway, but I’m just checking daily now to check how it varies. All of the RMSEs are below 5%, though, so I’m definitely in the good range.

One specific example: a jump from 2.89% (already optimized with 23.12) to 3.75% after a few days. Another one (same situation): 3.50% to 3.83%. Two other decks currently stay the same after optimization. From 23.10 to 23.12 there were some relatively big jumps, but I haven’t saved the numbers to share. If I recall correctly, one of them jumped from 2.x% to 4.x%.

To get the before and after numbers, I pressed evaluate first, optimized, evaluate again, so both numbers are from the same day before and after optimization.

According to my recent analysis, the new parameters are better than the old one in only 60% cases if you re-optimize every 1000 reviews.

1 Like

If you don’t have any better ideas, I suggest implementing my second idea.

1 Like

The next Anki release is not due soon. So, I think that it is better, at least for now, to let @L.M.Sherlock experiment more (whenever he finds time) and try to come up with a proper solution.

Before we can implement this, we’ll need to know how we should handle RMSE/log loss moving in different directions. Perhaps there should also be a change threshold (e.g. only small increase) where we don’t treat increases as a regression?

My opinion is only considering the RMSE. Regarding the threshold, maybe 5% (relatively) is acceptable.

1 Like

+1

Honestly, this:

  • Run the optimizer, evaluate both old and new parameters, and keep whichever results in a lower RMSE.

sounds like an idea that is practical and should be easy to implement until the optimization itself gets fixed/improved/changed.

This is literally how I optimize right now: Since the 23.12, whenever I optimize, my parameters are usually worse than the ones I optimized before (I would say in > 80 % of the cases, as opposed to 40 % in @L.M.Sherlock’s simulation). So I always copy the last parameters before optimizing, so I can go back to them if optimizing does not improve the RMSE.

2 Likes

I once had RMSE increase twofold, while Log loss didn’t change much.

Most of my deteriorations are small-ish, like going from 2.17 % to 2.32 % (7 % relative) or from 5.42 % to 6.09 % (12 % relative), but some are huge, going from 6.38 % to 8.59 % (35 % worse relative!). — Just some examples from optimizing two random preset right now.

(Actually, above I said parameters get worse in > 80 % but now I think for me it’s more like >> 90 %. I have optimized a lot of decks and only remember actually getting better params once or twice.)

1 Like

According to his analysis, the chances of getting better/worse parameters depend on the number of reviews done in between the two optimizations.

  • If you re-optimize after doing 1000 reviews, there is a 37% chance that you will get worse parameters.
  • If you re-optimize after doing 2000 reviews, there is a 29% chance that you will get worse parameters.
  • If you re-optimize after doing 4000 reviews, there is only a 14% chance that you will get worse parameters.

Deducing from this observation, if you re-optimize more often (say, after 100 reviews), you are more likely to get worse parameters than better parameters.

3 Likes

That makes sense, but if I wait with re-optimizing for, say, 2000 reviews, then I do 2000 reviews with non-optimal parameters (instead of only 100 or 500). As I see it, there is no downside to frequent optimization as long as the total number of reviews used for optimization is large enough (guaranteed, since we can only optimize after 1000 reviews).

That’s the problem - there is a downside, according to LMSherlock’s testing.

According to his analysis, the chances of getting better/worse parameters depend on the number of reviews done in between the two optimizations.

I can imagine that this is true, as the old parameters will possibly be less optimal for the newer reviews, which make up an increasingly large portion of the total number of reviews. Very useful to see this confirmed.

What I observe (with a modest number of reviews since the last optimization), is that the RMSE can go from 3.5% to 3% at one moment, but if you don’t apply the new parameters and do only about 15 extra reviews, the RMSE after optimization might increase to 4% instead, so it can be very sensitive to small changes. I assume that this is because the optimization gets stuck in a local minimum. I mean, if the old parameters are better, it means that it should in theory be possible to come up with the same or better parameters, right? I get that looking for the global minimum might take hours/days/weeks/years/eons, but is there any room for improvements there?

That’s something I’m wondering about myself. Unfortunately, I don’t know how to check whether FSRS gets stuck in a local minimum. I’m not sure if there is a way to check whether the optimizer has found the local or global minimum, aside from trying an unimaginably large number of combinations of parameters and evaluating every single one of them, which is obviously not feasible.
Perhaps @L.M.Sherlock has some thoughts on this.
One option that we could try is running the optimizer multiple times, with different initial parameters, and then choosing the best result out of all runs. Since the starting points will be different, the chances that every single time the optimizer will be stuck in a local minimum are lower. But that would make it much slower.

What’s the downside? Yes, you get a higher chance of a worse parameter set if you optimize frequently. But with your suggestion (use the better one, either old or new), I don’t see the downside. Both parameter sets are evaluated on the same reviews and we should always use the one that fits best at the current time. Using that one doesn’t make the next optimization worse (it has no impact on it). It lowers the chance that the next optimization improves the parameter set—but only because we already have a better fit (which is a good thing).

Example with made-up numbers:

  1. Assume we have a preset with 5,000 reviews and a current RMSE of 8.70 % (parameter set 1).
  2. Now, at 5,500 reviews, I optimize. The new optimization yields a RMSE of 8.10 % (parameter set 2). The old parameters (parameter set 1) have a RMSE of 8.75 % now (the 500 extra reviews give us more data and RMSE might change).
  3. At 6,000 reviews, I optimize again. The new optimization yields a RMSE of 8.30 % (parameter set 3). At this point, parameter set 1 has a RMSE of 8.05 % and parameter set 2 has a RMSE of 8.55 % (more reviews for evaluation → same parameters have a slightly different fit; could get better or worse).

Now, here is the point:

  • For the optimization at 6,000 reviews, it does not matter which parameter set I chose to use between 5,500 and 6,000 reviews (either parameter set 1 or 2). The optimization is just done on the reviews available at this point.
  • The only thing that changes is the relative comparison:
    • If we do not optimize at 5,500 reviews, we compare the new parameter set at 6,000 reviews (parameter set 3) with parameter set 1: We go from 8.55 % to 8.30 % RMSE → boom, better!
    • However, if we do optimize at 5,500 reviews, then we compare parameter set 3 with parameter set 2: We go from 8.05 % to 8.30 % RMSE → oh no, we got worse!
  • But the absolute fit will be the same: In both cases, parameter set 3 (optimization after 6,000 reviews) gives us a RMSE 8.30 %. This does not get worse by optimizing in the middle, it just looks worse because parameter set 2 is even better (which we would not know if we skipped the optimizing step at 5,500 reviews).
  • What actually gets slightly worse if we do not optimize at 5,500 steps: We do 500 extra reviews with slightly worse scheduling, resulting in unnecessary work.

Of course, this is an example where optimizing after 500 extra steps helps. But with your proposal (keep the better parameter set), there is no disadvantage: If the optimization yields a worse parameter set after 5,500 reviews, we just ignore it and keep using parameter set 1.