FSRS 5: Optimizer

DerIshmaelite · October 27, 2024, 9:58am

I feel like this is deserving of its own topic.

Expertium · October 27, 2024, 10:03am

@L.M.Sherlock

Do you know why FSRS-rs is slightly worse than the Python implementation?
I’m being annoying at this point, but please try my idea with seed=n reviews. Or at least find your old code that you used to plot this, maybe I’ll be able to adapt it to test my idea (tbh, probably no). I get that you are burned out, but if there is a chance that we could improve the optimizer by simply making the seed depend on the number of reviews, that would be awesome.

image638×500 73.5 KB

L.M.Sherlock · October 27, 2024, 10:15am

This matter has troubled me for quite some time. I intend to resolve it, yet I have not yet identified the cause. It may be due to some underlying Rust libraries, making the issue exceedingly difficult to pinpoint.

L.M.Sherlock · October 27, 2024, 1:16pm

Finally, I located the bug. The Adam optimizer doesn’t work expectedly after applying parameter clipper in FSRS-rs.

It costs me 8 hours… And I cannot fix it by myself. This bug requires a patch from the upstream lib:

github.com/tracel-ai/burn

[Bug] Parameter Clipping Causes Loss of AdamState in Optimizer

opened 01:13PM - 27 Oct 24 UTC

L-M-Sherlock

### Description When implementing parameter clipping after the optimization ste…p, I encountered a bug that causes the loss of AdamState in the optimizer. This occurs because the clipping operation creates a new `Param`, changing the `id` of `model.w`. Since the optimizer's `AdamState` is stored in `records` with the parameter `id` as the key, this clipping step leads to the loss of the `AdamState`, preventing the optimizer from working as expected. ### Code Example ```rust model = optim.step(lr, model, grads); model.w = Param::from_tensor(parameter_clipper(model.w.val())); ``` ### Expected Behavior The parameter clipping should maintain the optimizer state (AdamState) for the clipped parameters. ### Actual Behavior The optimizer state is lost after clipping, causing the optimization process to behave incorrectly in subsequent steps. ### Environment - burn version: 0.13.1 ### Possible Solution We need a way to clip parameters without creating a new `Param` instance, or a method to update the optimizer's state when parameter IDs change. ### Related Issues This issue is related to the feature request for a Weight Clipper: - #662 ### Additional Context This bug was discovered while implementing a weight clipping feature similar to PyTorch's approach, where weights are clamped after the optimization step.

DerIshmaelite · October 27, 2024, 1:20pm

Is there a chance this would be fixed before Dae releases the final stable 24.11 release

L.M.Sherlock · October 27, 2024, 1:24pm

It’s highly unlikely. This bug is caused by a library that fsrs-rs depends on. A fix requires a new release of the dependent library. Moreover, fsrs-rs relies on an older version. Upgrading the dependency with breaking changes would involve an unpredictable amount of work.

Expertium · October 27, 2024, 1:33pm

Regarding seeds, I have a crappy idea.
We can add a new toggle - slow vs fast optimization. Fast - only one seed. Slow - 10 different seeds, keep the parameters that result in the lowest RMSE among all ten.
The problem is that “slow” would be only marginally better than “fast” in most cases, at the cost of making the optimizer 10x slower.
Also
Upside: the user can now make a choice
Downside: the user now has to make a choice
The old flexibility vs simplicity dilemma
@sorata @vaibhav thoughts?

DerIshmaelite · October 27, 2024, 1:41pm

I am team simplicity down the drain.

vaibhav · October 27, 2024, 1:43pm

I don’t think adding such a toggle would be worth it. People would feel that they are missing out if they don’t enable slow optimization but the improvement in the RMSE won’t be proportional to the extra time spent.

By the way, is it that the greatest improvement in RMSE with different seeds occurs only in small collections? If it is so, we can try multiple seeds in smaller collections but only a single seed in larger ones.

Expertium · October 27, 2024, 1:50pm

Good question, I haven’t measured it. I’ll do it, but it will take a while.

oktoberpaard · October 27, 2024, 1:54pm

Not to be a jerk about it, but that was my idea

Did you test it and find a positive effect? I tested it in the thread that I just linked to and I did not find that it improved anything.

Expertium · October 27, 2024, 1:55pm

I’d say we came up with it simultaneously.
Anyway, I couldn’t figure out a way to test it. I’m not that great at coding.

Maltesaa · October 27, 2024, 2:45pm

Maybe instead of having the toggle in the regular deck options it could be somewhere in the Anki preferences, like in the Review tab. That way people who care about the possible marginally better parameters can toggle it, and it wouldn’t add any complexity in the deck preferences for regular users.

And if/when automatic parameter optimization gets added to Anki, you can have settings related to that in the same area in preferences to keep things consistent.

sorata · October 27, 2024, 2:48pm

Can you do something like you did for CMRR? Say, here if it takes 5 secs with the slow method it’s probably worth trying it. If it takes 20 secs then do optimisation with one seed.

sorata · October 27, 2024, 2:49pm

I’ll bet that’s not happening.

Expertium · October 27, 2024, 2:56pm

That’s what I will probably do, if it turns out that there is a significant (both statistically and practically) difference between using 1 and 10 seeds.
I’ll make the number of “runs” depend on the number of reviews.
(well, I’ll ask Jarrett to implement it)

DerIshmaelite · October 27, 2024, 2:57pm

What about the number of presets? Wouldnt it be extremely painful if you are to optimise params for 240+ different presets?

Expertium · October 27, 2024, 2:58pm

Don’t make 240 presets, buddy
But seriously though, it’s the number of reviews that matters. The amount of time spent on optimization depends on the number of reviews, not presets.

DerIshmaelite · October 27, 2024, 3:00pm

A bit unrelated…Is there any progress on this.

Expertium · October 27, 2024, 3:09pm

No, but we will get a new dataset soon, and then we’ll measure how much metrics improve if parameters are set for every preset/deck individually.
However, it will be difficult to figure out the optimal number of decks/presets. We will be able to calculate how much of a difference it makes on average (across 10k users with all kinds of decks and presets), but not what number gives you the biggest difference (or at least right now I can’t think of way to do that).

We’ll also be able to identify siblings, but that doesn’t matter because:

Even if we can incorporate sibling reviews into the optimization “on paper”, we can’t actually do that in Anki
I can guarantee you Jarrett will be like “I’m taking a break, f*** your siblings and their reviews, call me again after 10 years”

Topic		Replies	Views
Anki 24.10/11 Release candidate Beta Testing	174	4561	December 6, 2024
Anticipated Features in 24.10 Help	41	430	November 26, 2024
FSRS optimization make the next reviews take too long to appear FSRS	3	457	September 16, 2024
Measures to prevent new FSRS parameters from being worse than the old ones Suggestions	29	2760	February 27, 2024
Automatic FSRS Optimization (New Feature Request) Suggestions	3	81	April 30, 2025

FSRS 5: Optimizer

Related topics