Discount factor in training data

ananas · March 26, 2024, 7:55am

I wonder if it would be a good idea to add a temporal discount factor to the training (i.e. more recent data has more weight in the score to be optimised).

This may take into account the possibility that the memory curve of a user for a collection is not static. For example, more cards they learn from the collection, the easier (or more difficulty) for them to learn new cards.

sorata · March 27, 2024, 8:14am

If you ever learn to write Hanzi/Kanji you’ll realise that it is basically drawing the same basic shapes again again in different but similar combinations. Wouldn’t be surprised if initial memory curve is too steep in the beginning.

ananas · March 27, 2024, 9:12am

Yes, Hanzi/Kanji is a great example!

Expertium · March 27, 2024, 9:30am

The problem is that it’s unclear how to choose the value of such factor.

ananas · March 27, 2024, 10:17am

We may let users pick this factor. For example, we can add an option “Review half-life” with something like a year as its default value. Instead of just ignoring all reviews before a given date, this new option would allow a more fine-grained control on how old revlogs influence the parameters.

Expertium · March 27, 2024, 11:13am

I don’t think you understand what’s the problem. If you change the factor, RMSE and log loss will change. This means that you cannot use log loss/RMSE to tell you what’s the best factor, since these metrics themselves would depend on it.

EDIT: to clarify, here’s an analogy. Imagine that you are reading statistics about the average wage. You read a publication from 2022. Then you read a publication from 2023, and it says that the average wage went down, but it also says that the way the average is calculated has been changed this year. Can you determine whether the reported average wage went down because people actually became poorer or because the methodology has changed? Nope.

The situation with choosing a factor for discounting past data is similar.

“But wouldn’t that also be a problem with choosing a date?” you may ask. Yes, it is a problem. But it’s much less severe because choosing a date is easy and intuitive, choosing some abstract number that affects the algorithm in some less-than-obvious way is not.

ananas · March 27, 2024, 12:56pm

Thank you very much for the detailed explanation.

I was not talking about the difficulty of implementing a new algorithm to incorporate this factor (which, sure enough, is not negligible), nor the negative impact of this factor on the interpretability of the metrics. I was only thinking about the possible improvement to memory curve fitting by adding this factor; after all, the idea of discounting past data is frequently used in machine learning.

Admittedly, due to the fact that RMSE is not additive, how to discount data in RMSE is already a non-trivial question. On the other hand, log loss is additive, so (at least conceptually) introducing a discount factor would not cause mayhem to the rest of the algorithm.

Expertium · March 27, 2024, 1:15pm

I was only thinking about the possible improvement to memory curve fitting by adding this factor

But you can’t assess whether changing the factor improves curve fitting, that’s the crux. You would need some kind of meta-metric. Log loss is used to assess the goodness of fit of the algorithm. The meta-metric (if such a thing even exists) would be used to assess the validity of log loss with a given discount factor.

ananas · March 27, 2024, 2:29pm

Very good point.

Topic		Replies	Views
Feature suggestion: automatic optimization of learning steps with fsrs helper FSRS	3	63	May 1, 2025
Short Term FSRS Efficacy FSRS	3	116	February 28, 2025
FSRS Algorithm for New Cards Help	2	894	January 5, 2024
About FSRS algorithm's "first rating" FSRS	6	273	October 28, 2024
FSRS 5 shortest interval FSRS	4	173	December 10, 2024

Discount factor in training data

Related topics