Due Column - Changing Days (from Whole Numbers to Decimals in Scheduling)

Right now all same-day reviews have an interval length of 0. This prevents us from developing and using an actually decent short-term memory model, since right now for FSRS 5 minutes and 5 hours is the same. If we could use accurate intervals (expressed as a fraction of a day), we could improve the short-term memory formula, improve sorting by retrievability for “learning” cards, and remove the “All learning cards have R=100%” placeholder.

3 Likes

@vaibhav we read from the revlog already when memory state is missing, but it has a performance cost, so I like your suggestion better. We could also fall back on a slower revlog read when the data is missing to address the older-client-reviews problem.

2 Likes

Have you been able to benchmark the improvement behind this :question:

Not yet. Well, strictly speaking, right now FSRS-5-secs performs worse than just FSRS-5, but that is without any changes to actually properly use fractional interval lengths.

So should we start an issue on GitHub to push for fractional intervals :question:

Not yet. Right now we don’t have a model that would benefit from fractional intervals.

1 Like

This is based on data taken from people that had fixed learning and relearning intervals though, so you can’t really say how it performs because the data is bad for that test. You’d need data taken from people using that model (or something that at least doesn’t have fixed intervals for everything less than 24 hours) to know how it actually performs.

We can say how well it performs. We run FSRS and make it predict probabilities, then compare them to real labels (0 or 1). Clarification: I say FSRS-5-secs is worse, but the data is not the same. FSRS-5 doesn’t predict probabilities for same-day reviews. It just uses that data to refine its prediction for the next day. FSRS-5-secs actually tries to predict the probability for same-day reviews, unlike FSRS-5. The comparison is not 100% fair, and I don’t think it can be.

If this is confusing, here’s an analogy: in FSRS-5, >=1d intervals are like exams. We evaluate it based on how well it does on them. Same-day reviews are like homework - FSRS-5 does its homework, but we do not evaluate it based on that.
In FSRS-5-secs, all reviews are exams.

You missed that important part. Could you benchmark it on the same data, filter out sub-day intervals somehow when evaluating?

If we evaluate it with same-day reviews, then the data is not the same.
If we evaluate it without same-day reviews, then we’re not evaluating it on the data that we care about.
So there is no 100% fair way to compare the two.

No, you’d need to benchmark that one too. But for comparisons you’d also need to find out how well both perform on non-sub-day review data. If FSRS-5-secs is fairly worse on this, then you’d need to reconsider what we do here.

Oh, this is going to be a pain
@L.M.Sherlock can you add an extra command so that the test set loss is calculated without same-day reviews when using --secs?
So that we can evaluate FSRS-5-secs with and without predictions for same-day reviews

It means I need to include same-day reviews in training and then exclude them in testing. It’s easy, but the comparison is still unfair because the TimeSeriesSplit will give different results when we have more samples.

Well, at least it’s somewhat more fair

OK, I will try.

Edit: Done in add NO_TEST_SAME_DAY arg · open-spaced-repetition/srs-benchmark@e82cb71

Usage:

python other.py --model FSRS-4.5 --secs --no_test_same_day
2 Likes

What do you mean by excluding the same-day reviews in testing?

  • the same-day reviews will be used for updating the memory states (during testing) but their predicted R is not compared with the actual R. Comparison between predicted and actual R is made only for long-term reviews; or
  • the same-day reviews are not used at all during testing?

If it is the latter case, the metrics would definitely be worse because the parameters are optimised taking the same-day reviews into account and they are suddenly not available during inference.

I mean the first one.

But the same-day reviews seem to have been removed before the memory states are calculated in L2193 of other.py.

If you mean the first one, won’t you need to filter out the same-day reviews after the above mentioned line?

I don’t understand this code well, but I am trying to prevent a possible misinterpretation of the benchmarking results.

The same-day reviews has been included in features here:

2 Likes

Ok, so every item in the dataset has its own full t_history and r_history. :+1:
Sorry for any inconvenience caused due to my misunderstanding.