When/How to separate presets for FSRS

I started to use anki by importing cards from SM, no repetition history, and using FSRS from the beginning with default parameters. I have 3 decks with > 400 reviews , another few with close to 400 each, and the remaining 7 decks have around 600 reviews total (I don’t think there is a way to export a graph showing distribution of reviews by deck), also some decks I haven’t used yet.

I have a single preset for all 30 decks. I believe I used the “Optimize” button only once, when it was recommended to be done by all users due to an upgrade in the anki version.

How do I determine how many presets to have, for the purposes of FSRS, and which combination of decks to combine into which presets? Eg, if I am looking into statistics per deck, what exactly am I looking for?

Or is it recommended to run this code (when I figure out how to do it), and what do I look for in those results?:

Version ⁨24.04.1 (ccd9ca1a)⁩

1 Like

That code is experimental, forget about it.
Separating material into presets is subjective. If you feel like the material is different - make a different preset.
Also, please read this: fsrs4anki/docs/tutorial.md at main · open-spaced-repetition/fsrs4anki · GitHub

A better person to ask this is L.M.Sherlock the man himself.

As for the main concern, here is what Sherlock says,

I haven’t found out an objective method. Here is an initial experiment: GitHub - open-spaced-repetition/fsrs-when-to-separate-presets

I say this thing a lot but it always blows me up to think how similar most humans think because that’s the same question I asked Sherlock. Here’s the issue: [Question]: Questions regarding Optimisation · Issue #646 · open-spaced-repetition/fsrs4anki · GitHub

Edit: But @Expertium saying it’s “subjective” generally means the truth depends upon who is being asked. Here, even though the impression maybe of subjectivity there must lie undiscovered objective standards.

I saw, that but it just says

If you have decks that vary wildly in difficulty, it is recommended to use separate presets for them because the parameters for easy decks and hard decks will be different.

In Card Difficulty under stats, would I just look at “average difficulty” for each deck (and maybe combine decks with similar ones for now into one preset), or maybe what the distribution of difficulty looks like?

It means subjective difficulty, not FSRS difficulty. I don’t know how to word that better. Suggestions are welcome.

that’s the best way to phrase it IMO. Expertium’s saying go with your guts.

But, one more thing, and I’m calling this pattern of learning. Say in Maths, you need to practice the same thing again and again, but then once learned the memory becomes very stable. This is different with vocabulary because there’s no “understanding” part. S partly it will depend upon the essence or the nature of the material.

Edit: btw saying difficulty might confuse people because often the same type of material has different difficulty levels. Different cards will feel different etcetra.

Why?

  • if all cards in a subdeck with many review cards have a 60% FSRS difficulty, then the cards in the other subdecks vary too much for a preset that would work best for this subdeck (assuming its difficulty is not actually that uniform).
  • I think Sherlock said that cards with 0% or 100% FSRS difficulty are scheduled worse than others.
  1. The FSRS difficulty is relative. You cannot compare two cards’ difficulty unless they are in the same preset.
  2. I haven’t said that. Actually, FSRS has good performance in cards with high difficulty:

image

3 Likes

I meant in the same preset (because this topic about separating).

1 Like

I think the graph above shows that it performs well enough for those cards.

@sorata @DerIshmaelite here are the results:

Correlation coefficient=-0.056

The correlation coefficient between the average RMSE and the number of presets is virtually 0. Visually, I was expecting to see a U-shaped curve with a minimum that corresponds to the best number of presets, but nope. And according to the benchmark, RMSE is actually performs worse by 3-4% (relative) when FSRS is optimized on several presets rather than on the entire collection.

Note that I can’t extract the number of reviews per each preset from the file Jarrett gave me, only the total number of reviews across all presets.

P.S. Out of 9999 collections, the maximum number of presets is 130. So DerIshmaelite, your 273 (or whatever number it was) is literally off the charts.

EDIT: here’s the difference between the RMSE of FSRS-5 optimized on the entire collection and the average RMSE of FSRS-5 optimized on each preset.

Average difference (unweighted)=0.003
Average difference (weighted by n reviews of each user)=0.002

Positive difference means that FSRS-5-preset is worse than FSRS-5.

1 Like

The most obvious reason is a limitation of the app (or maybe I don’t know something), but you can’t seem to set a different Desired Retention for different decks using the same preset. So if you want a different DR for a deck, you have to put it in a different preset. Someone let me know if that’s wrong because I’d like to change what I’m doing if so.

Another reason, I set different presets for cards that have very different study “experiences” or whatever the right word would be. Any cards that just involve seeing the front and trying to remember the back, those all go in the main default preset, which has the vast majority of my cards. Here’s the couple examples of presets that don’t fit that model:

Writing Chinese characters: I feel like the experience of these cards is different enough that it warrants its own preset. I’m not just looking and thinking, I’m writing the characters and seeing if I got them right.

Coding: I have flashcards that prompt me for a certain python or R function, and I have to use that function in my IDE and it has to work. I only do a few of these a day because each is time consuming. I may combine these with the writing character flashcards eventually because they both involve “doing” something physically instead of just thinking.

Colors: I have a deck that shows you a color and you have to remember what it is. This deck is wildly different than all the other decks, because it’s really hard to get all the different shades of red, blue, etc. correct. Like, I’ll guess navy blue and it will really be indigo, which looks almost the exact same but not quite. For this deck, I’ve allowed myself to make two guesses, and if one is right I mark it right. The data for these cards is so different that I know it would skew my normal cards’ parameters, so I keep them separate.

My guess is most people will only need one preset, unless they really want a deck on a different DR.

The problem with what was done today is that you’ll get higher RMSE when you calculate it on a smaller number of reviews. That’s one of the factors that affect RMSE. See the linked GitHub issue.

Totally agree. I study for the 漢字検定試験 and that’s something I always found it harder to do with SM2 than other materials.

What was done today?

Edit: Oh, you mean Expertium’s post

I was asking you to read this: Discussing the new dataset and benchmark · Issue #129 · open-spaced-repetition/srs-benchmark · GitHub

1. Create one collection-preset.
2. Optimise on everything.
3. Select a deck.
4. Clone the current preset.
5. In the new preset, click Optimise.
6. If you get new parameters, you're better off. 

Jarrett says it’s too much work to do this but I think if you do this you always get better RMSE (given you have at least several hundred reviews in that preset).

I’m not sure that’s true. Did you test that somehow?

The problem is you can’t really test the RMSE on those particular cards when they’re in the old preset, because evaluating the presets always evaluates all the cards, you can’t test subsets.

My guess is that certain subsets would actually have a higher RMSE within the larger preset, but the larger preset as a whole has a lower RMSE. So you are still getting an improvement when you give them their own preset, but you can’t really test that.

1 Like

If I have a large number of reviews then yes. And I agree with your other post too. RMSE is a wacky thing.

AFAIK, Evaluate tests the cards the query finds. So if you optimized on some query finding something other than what you want to evaluate, you need to change/remove the query to test the parameters. Also, “deck:current” works on the actually current deck, not necessarily the one you clicked the gear button on.