When/How to separate presets for FSRS

nmjkjm · May 27, 2024, 11:28am

I started to use anki by importing cards from SM, no repetition history, and using FSRS from the beginning with default parameters. I have 3 decks with > 400 reviews , another few with close to 400 each, and the remaining 7 decks have around 600 reviews total (I don’t think there is a way to export a graph showing distribution of reviews by deck), also some decks I haven’t used yet.

I have a single preset for all 30 decks. I believe I used the “Optimize” button only once, when it was recommended to be done by all users due to an upgrade in the anki version.

How do I determine how many presets to have, for the purposes of FSRS, and which combination of decks to combine into which presets? Eg, if I am looking into statistics per deck, what exactly am I looking for?

Or is it recommended to run this code (when I figure out how to do it), and what do I look for in those results?:

Version ⁨24.04.1 (ccd9ca1a)⁩

Expertium · May 27, 2024, 11:38am

That code is experimental, forget about it.
Separating material into presets is subjective. If you feel like the material is different - make a different preset.
Also, please read this: fsrs4anki/docs/tutorial.md at main · open-spaced-repetition/fsrs4anki · GitHub

sorata · May 27, 2024, 11:39am

A better person to ask this is L.M.Sherlock the man himself.

As for the main concern, here is what Sherlock says,

I haven’t found out an objective method. Here is an initial experiment: GitHub - open-spaced-repetition/fsrs-when-to-separate-presets

I say this thing a lot but it always blows me up to think how similar most humans think because that’s the same question I asked Sherlock. Here’s the issue: [Question]: Questions regarding Optimisation · Issue #646 · open-spaced-repetition/fsrs4anki · GitHub

Edit: But @Expertium saying it’s “subjective” generally means the truth depends upon who is being asked. Here, even though the impression maybe of subjectivity there must lie undiscovered objective standards.

nmjkjm · May 27, 2024, 11:54am

I saw, that but it just says

If you have decks that vary wildly in difficulty, it is recommended to use separate presets for them because the parameters for easy decks and hard decks will be different.

In Card Difficulty under stats, would I just look at “average difficulty” for each deck (and maybe combine decks with similar ones for now into one preset), or maybe what the distribution of difficulty looks like?

Expertium · May 27, 2024, 11:57am

It means subjective difficulty, not FSRS difficulty. I don’t know how to word that better. Suggestions are welcome.

sorata · May 27, 2024, 12:34pm

that’s the best way to phrase it IMO. Expertium’s saying go with your guts.

But, one more thing, and I’m calling this pattern of learning. Say in Maths, you need to practice the same thing again and again, but then once learned the memory becomes very stable. This is different with vocabulary because there’s no “understanding” part. S partly it will depend upon the essence or the nature of the material.

Edit: btw saying difficulty might confuse people because often the same type of material has different difficulty levels. Different cards will feel different etcetra.

Aleksej · May 27, 2024, 6:05pm

Why?

if all cards in a subdeck with many review cards have a 60% FSRS difficulty, then the cards in the other subdecks vary too much for a preset that would work best for this subdeck (assuming its difficulty is not actually that uniform).
~~I think Sherlock said that cards with 0% or 100% FSRS difficulty are scheduled worse than others.~~

L.M.Sherlock · May 28, 2024, 2:26am

The FSRS difficulty is relative. You cannot compare two cards’ difficulty unless they are in the same preset.
I haven’t said that. Actually, FSRS has good performance in cards with high difficulty:

Aleksej · May 28, 2024, 12:14pm

I meant in the same preset (because this topic about separating).

sorata · May 28, 2024, 12:44pm

I think the graph above shows that it performs well enough for those cards.

Expertium · December 2, 2024, 11:12am

@sorata @DerIshmaelite here are the results:

github.com/open-spaced-repetition/srs-benchmark

Discussing the new dataset and benchmark

opened 12:44PM - 29 Oct 24 UTC

Expertium

https://github.com/ankitects/anki/pull/3511#issuecomment-2444087066 I have a …few questions regarding that 1) Will we keep using the default parameters based on the old dataset or on the new one? I think it's better to use the default parameters from the old one, since it has 20k users, and the new one will have 10k. So theoretically, the default parameters based on the old one should be slightly more accurate. 2) How will you make tables with the metrics? Since we want to compare optimization on the entire collection vs optimization on every deck, I assume you will make the current two tables longer? Or add two more tables? 3) Regarding siblings. It would be interesting to analyze how much using sibling reviews as "pseudoreviews" could help, assuming you aren't too burned out for that. I see two ways: - add a new column, like `sibling_review`, with values being 0 or 1 - add new grades. Like this: Again = 1, Hard = 2, Good = 3, Easy = 4, Again (sibling) = 5, Hard (sibling) = 6, Good (sibling) = 7, Easy (sibling) = 8. Choose whichever is more convenient. Then insert these pseudoreviews into cards' histories. That way, when running the optimizer, we can use those pseudoreviews to update the memory state. I'll add new parameters. Of course, we can't use this in Anki, but it's interesting from a theoretical perspective. Also, please add the total number of decks used for optimization to the .jsonl output file. I want to plot RMSE as a function of the number of decks used for optimization, to see if there is a magical number of decks such that splitting them any further is not beneficial, or even detrimental. @user1823 you are welcome to participate

Correlation coefficient=-0.056

The correlation coefficient between the average RMSE and the number of presets is virtually 0. Visually, I was expecting to see a U-shaped curve with a minimum that corresponds to the best number of presets, but nope. And according to the benchmark, RMSE is actually performs worse by 3-4% (relative) when FSRS is optimized on several presets rather than on the entire collection.

Note that I can’t extract the number of reviews per each preset from the file Jarrett gave me, only the total number of reviews across all presets.

P.S. Out of 9999 collections, the maximum number of presets is 130. So DerIshmaelite, your 273 (or whatever number it was) is literally off the charts.

EDIT: here’s the difference between the RMSE of FSRS-5 optimized on the entire collection and the average RMSE of FSRS-5 optimized on each preset.

Average difference (unweighted)=0.003
Average difference (weighted by n reviews of each user)=0.002

Positive difference means that FSRS-5-preset is worse than FSRS-5.

rich70521 · December 2, 2024, 3:20pm

The most obvious reason is a limitation of the app (or maybe I don’t know something), but you can’t seem to set a different Desired Retention for different decks using the same preset. So if you want a different DR for a deck, you have to put it in a different preset. Someone let me know if that’s wrong because I’d like to change what I’m doing if so.

Another reason, I set different presets for cards that have very different study “experiences” or whatever the right word would be. Any cards that just involve seeing the front and trying to remember the back, those all go in the main default preset, which has the vast majority of my cards. Here’s the couple examples of presets that don’t fit that model:

Writing Chinese characters: I feel like the experience of these cards is different enough that it warrants its own preset. I’m not just looking and thinking, I’m writing the characters and seeing if I got them right.

Coding: I have flashcards that prompt me for a certain python or R function, and I have to use that function in my IDE and it has to work. I only do a few of these a day because each is time consuming. I may combine these with the writing character flashcards eventually because they both involve “doing” something physically instead of just thinking.

Colors: I have a deck that shows you a color and you have to remember what it is. This deck is wildly different than all the other decks, because it’s really hard to get all the different shades of red, blue, etc. correct. Like, I’ll guess navy blue and it will really be indigo, which looks almost the exact same but not quite. For this deck, I’ve allowed myself to make two guesses, and if one is right I mark it right. The data for these cards is so different that I know it would skew my normal cards’ parameters, so I keep them separate.

My guess is most people will only need one preset, unless they really want a deck on a different DR.

sorata · December 2, 2024, 4:13pm

The problem with what was done today is that you’ll get higher RMSE when you calculate it on a smaller number of reviews. That’s one of the factors that affect RMSE. See the linked GitHub issue.

Totally agree. I study for the 漢字検定試験 and that’s something I always found it harder to do with SM2 than other materials.

rich70521 · December 2, 2024, 4:19pm

What was done today?

Edit: Oh, you mean Expertium’s post

sorata · December 2, 2024, 4:25pm

I was asking you to read this: Discussing the new dataset and benchmark · Issue #129 · open-spaced-repetition/srs-benchmark · GitHub

1. Create one collection-preset.
2. Optimise on everything.
3. Select a deck.
4. Clone the current preset.
5. In the new preset, click Optimise.
6. If you get new parameters, you're better off.

Jarrett says it’s too much work to do this but I think if you do this you always get better RMSE (given you have at least several hundred reviews in that preset).

rich70521 · December 2, 2024, 4:41pm

I’m not sure that’s true. Did you test that somehow?

rich70521 · December 2, 2024, 4:43pm

The problem is you can’t really test the RMSE on those particular cards when they’re in the old preset, because evaluating the presets always evaluates all the cards, you can’t test subsets.

My guess is that certain subsets would actually have a higher RMSE within the larger preset, but the larger preset as a whole has a lower RMSE. So you are still getting an improvement when you give them their own preset, but you can’t really test that.

sorata · December 2, 2024, 5:02pm

If I have a large number of reviews then yes. And I agree with your other post too. RMSE is a wacky thing.

Aleksej · December 4, 2024, 6:34am

AFAIK, Evaluate tests the cards the query finds. So if you optimized on some query finding something other than what you want to evaluate, you need to change/remove the query to test the parameters. Also, “deck:current” works on the actually current deck, not necessarily the one you clicked the gear button on.

Expertium · December 9, 2024, 9:20am

@sorata @DerIshmaelite @Danika_Dakika
Update: Jarrett fixed a bug with benchmarking FSRS on a per-preset basis. The new conclusion is that instead of being very mildly worse than using one gigantic collection-wide preset, per-preset optimization is very mildly better. By an imperceptible-in-practice amount.

Topic		Replies	Views
Advice on arrangement of decks for FSRS to properly work FSRS	6	51	April 20, 2025
I haven't used Anki in quite a while. How can I optimize the FSRS parameters? FSRS	5	165	December 30, 2024
FSRS: Different optimization parameters for different decks question FSRS	2	301	August 19, 2024
FSRS for Low nomber of cards Help	10	149	December 25, 2024
Number of Cards worthy of a Preset FSRS	6	109	November 19, 2024

When/How to separate presets for FSRS

Related topics