Due Column - Changing Days (from Whole Numbers to Decimals in Scheduling)

Expertium · December 6, 2024, 11:55am

Right now all same-day reviews have an interval length of 0. This prevents us from developing and using an actually decent short-term memory model, since right now for FSRS 5 minutes and 5 hours is the same. If we could use accurate intervals (expressed as a fraction of a day), we could improve the short-term memory formula, improve sorting by retrievability for “learning” cards, and remove the “All learning cards have R=100%” placeholder.

dae · December 6, 2024, 1:15pm

@vaibhav we read from the revlog already when memory state is missing, but it has a performance cost, so I like your suggestion better. We could also fall back on a slower revlog read when the data is missing to address the older-client-reviews problem.

DerIshmaelite · December 8, 2024, 11:21am

Have you been able to benchmark the improvement behind this

Expertium · December 8, 2024, 11:31am

Not yet. Well, strictly speaking, right now FSRS-5-secs performs worse than just FSRS-5, but that is without any changes to actually properly use fractional interval lengths.

DerIshmaelite · December 8, 2024, 12:12pm

So should we start an issue on GitHub to push for fractional intervals

Expertium · December 8, 2024, 12:23pm

Not yet. Right now we don’t have a model that would benefit from fractional intervals.

rich70521 · December 9, 2024, 5:33am

This is based on data taken from people that had fixed learning and relearning intervals though, so you can’t really say how it performs because the data is bad for that test. You’d need data taken from people using that model (or something that at least doesn’t have fixed intervals for everything less than 24 hours) to know how it actually performs.

Expertium · December 9, 2024, 6:14am

We can say how well it performs. We run FSRS and make it predict probabilities, then compare them to real labels (0 or 1). Clarification: I say FSRS-5-secs is worse, but the data is not the same. FSRS-5 doesn’t predict probabilities for same-day reviews. It just uses that data to refine its prediction for the next day. FSRS-5-secs actually tries to predict the probability for same-day reviews, unlike FSRS-5. The comparison is not 100% fair, and I don’t think it can be.

If this is confusing, here’s an analogy: in FSRS-5, >=1d intervals are like exams. We evaluate it based on how well it does on them. Same-day reviews are like homework - FSRS-5 does its homework, but we do not evaluate it based on that.
In FSRS-5-secs, all reviews are exams.

sorata · December 9, 2024, 8:35am

You missed that important part. Could you benchmark it on the same data, filter out sub-day intervals somehow when evaluating?

Expertium · December 9, 2024, 8:53am

If we evaluate it with same-day reviews, then the data is not the same.
If we evaluate it without same-day reviews, then we’re not evaluating it on the data that we care about.
So there is no 100% fair way to compare the two.

sorata · December 9, 2024, 8:59am

No, you’d need to benchmark that one too. But for comparisons you’d also need to find out how well both perform on non-sub-day review data. If FSRS-5-secs is fairly worse on this, then you’d need to reconsider what we do here.

Expertium · December 9, 2024, 9:05am

Oh, this is going to be a pain
@L.M.Sherlock can you add an extra command so that the test set loss is calculated without same-day reviews when using --secs?
So that we can evaluate FSRS-5-secs with and without predictions for same-day reviews

L.M.Sherlock · December 9, 2024, 9:30am

It means I need to include same-day reviews in training and then exclude them in testing. It’s easy, but the comparison is still unfair because the TimeSeriesSplit will give different results when we have more samples.

Expertium · December 9, 2024, 9:41am

Well, at least it’s somewhat more fair

L.M.Sherlock · December 9, 2024, 9:42am

OK, I will try.

Edit: Done in add NO_TEST_SAME_DAY arg · open-spaced-repetition/srs-benchmark@e82cb71

Usage:

python other.py --model FSRS-4.5 --secs --no_test_same_day

vaibhav · December 9, 2024, 11:40am

What do you mean by excluding the same-day reviews in testing?

the same-day reviews will be used for updating the memory states (during testing) but their predicted R is not compared with the actual R. Comparison between predicted and actual R is made only for long-term reviews; or
the same-day reviews are not used at all during testing?

If it is the latter case, the metrics would definitely be worse because the parameters are optimised taking the same-day reviews into account and they are suddenly not available during inference.

L.M.Sherlock · December 9, 2024, 11:54am

I mean the first one.

vaibhav · December 9, 2024, 12:11pm

But the same-day reviews seem to have been removed before the memory states are calculated in L2193 of other.py.

If you mean the first one, won’t you need to filter out the same-day reviews after the above mentioned line?

I don’t understand this code well, but I am trying to prevent a possible misinterpretation of the benchmarking results.

L.M.Sherlock · December 9, 2024, 12:40pm

The same-day reviews has been included in features here:

github.com

open-spaced-repetition/srs-benchmark/blob/ece603aeacb6ed69d0d11c65076c440c9e9686e7/other.py#L1937-L1939


      
          t_history = df.groupby("card_id", group_keys=False)["delta_t"].apply(
              lambda x: cum_concat([[i] for i in x])
          )

vaibhav · December 9, 2024, 12:51pm

Ok, so every item in the dataset has its own full t_history and r_history.
Sorry for any inconvenience caused due to my misunderstanding.

Topic		Replies	Views
FSRS 5: <1d Scheduling and Learning Steps FSRS	168	2519	December 25, 2024
New progress in implement the custom algorithm Scheduling	52	6716	May 1, 2023
How to force FSRS to show today's (re-)learned cards tomorrow (1d interval) FSRS	33	1081	June 23, 2024
FSRS Helper - Recommended Steps Add-ons	64	2387	December 10, 2024
How can we make FSRS actually schedule the cards? FSRS	7	173	July 6, 2025

Due Column - Changing Days (from Whole Numbers to Decimals in Scheduling)

Related topics