Aren't parameters cheap?

Helge · September 6, 2025, 6:24pm

21 parameter seems a very low number to model something in a world of billion parameter models. I read that parameters are tested in simulations on how much they improve predictions. But what if a parameter only improves the predictions only for a very small percentage of users, but their absolute number still is hundreds or thousands of users who get better predictions ?

I suspect that parameters are quite cheap. Therefore, why not in doubt add a parameter for anything which might improve predictions?

Thank you for reading and any comments! - Helge

womeda2 · September 7, 2025, 2:37pm

A parameter isn’t something you can just add. I mean, it is, but a useful parameter is very different.

In the space of theoretical "parameters” that one could add, approximately 100% of them will do absolutely nothing to improve a given model. The number of parameters that actually do anything are extremely small. I’m surprised it’s as high as 21.

tiviskan · September 7, 2025, 3:03pm

I’ll add that there is a diminishing return on benefit from adding more parameters while possibly being costly in that more parameters could cause overfitting (which gives you a worse result). For most Anki collections there isn’t that much training data to train on in the grand scheme of things (at least if you want to optimize for a single individual’s review history).

You want the model to generalize the patterns that exists in the data, not match the data perfectly. As an example, imaging matching a polynomial function against n data points. You could for n datapoints find a polynomial of order n-1 that matches the data points exactly (ie. f(x_i) = y_i for each data point. That doesn’t make it useful to generalize a pattern (see eg. this figure where you see that fitting a polynomial to a bell curve gives huge deviations between the data points).

So the main point is that:

The fewer data points you have, the fewer parameters you can optimize without overfitting.
Most review histories have a relatively small number of datapoints (especially compared to the billions of parameter models you see in eg. LLMs or image classifiers).

Expertium · September 7, 2025, 5:00pm

I’ve actually trained a neural net with 1k parameters and another guy on github trained a NN with 9k parameters for our benchmark (GitHub - open-spaced-repetition/srs-benchmark: A benchmark for spaced repetition schedulers/algorithms, LSTM), and it outperforms FSRS, so overfitting isn’t that big of an issue. Plus, regularizaiton exists for a reason.
He also made a neural net with 2.7 million parameters, but that one is different - it’s not optimized on the data of every individual user, it’s “pretrained” on 5 thousand users and evaluated on the other 5, and that is repeated twice to cover the entire dataset. I’m actually surprised that it works so well. This is a different approach compared to all other algorithms that we have benchmarked. Other algorithms are trained on each user individually.

Bigger issues are:

More parameters = slower optimization.
FSRS isn’t a neural net. I can’t add a thousand parameters by changing one line of code. And the philosophy of FSRS is keeping it interpretable, so even if I made a hybrid neural FSRS, Jarrett wouldn’t approve. Well, I wouldn’t either, unless the benefits were absolutely massive.

system · October 7, 2025, 5:01pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How many reviews for accurate optimization? FSRS	6	301	January 18, 2025
Higher rmse in fsrs 5 FSRS	11	391	January 5, 2025
FSRS parameters hitting boundries(?) after optimization FSRS	7	1415	February 21, 2024
Measures to prevent new FSRS parameters from being worse than the old ones Suggestions	29	2876	February 27, 2024
I haven't used Anki in quite a while. How can I optimize the FSRS parameters? FSRS	5	225	December 30, 2024

Aren't parameters cheap?

Related topics