Improving FSRS for List Memorization

I’m sorry for the unusually long post, but since the changes proposed by this feature are quite substantial, I wanted to provide a detailed explanation and analysis to ensure that the underlying ideas are clearly understood and properly evaluated. The key ideas are highlighted in bold, serving as a summary of the proposal.

DESCRIPTION OF THE IDEA

For clarity, I will refer to any individual member of a list as an item. A card can, in principle, contain any number of items.

My idea is that, within a single card listing multiple items, the user should be able to rate each item individually, using the usual scale from “again” to “easy”.

As an example: in a card listing the items neutrophils, lymphocytes, monocytes, eosinophils, and basophils, the user might recall neutrophils, monocytes, and basophils but forget lymphocytes and eosinophils. The system would then allow separate ratings for each item, reflecting the actual recall performance more accurately.

My method assumes that the user wants the full list, not the single items, to have a retrievability corresponding to the desired retention they have chosen. I will discuss the validity of this assumption later. The retrievability of a card with more items will generally be lower than the one of a basic card with only one item after the same interval, all other things being equal. In other words, lists are hard to remember and easy to forget. However, under the current system — and without using cloze deletions or creating multiple separate cards— there is no way to account for that a priori. As a result, the calculated intervals will often be excessively long, particularly in the early stages of learning the list and even more so when the list is very

large. Also, the likelihood of recalling correctly all items, let’s say five items, after a certain interval is higher if for example four out of five items were remembered in the previous review, compared to only one or two out of five. Currently, without cloze deletions or without multiple separate cards, the user must press “again” in both situations, if the user defines the threshold for “good” as recalling correctly all items.

Based on my basic knowledge of statistics I would suggest that the optimal interval to calculate through FSRS should be approximately the point at which the product of the retrievability of each individual item correctly matches the desired retention. Suppose a card has three items. Let’s call the desired retention of the list R, and the retrievability of each item at a given day in this way, R1, R2, R3. FSRS must find the day such that R1*R2*R3=R. Note that all the items will always be reviewed together on the same card, since the interval is one and is calculated for the entire list. I will return to this point later.

Since the overall retrievability is computed as the product of individual item probabilities, it decreases as the number of items on a card increases. This results in shorter intervals, which makes sense because recalling a larger list is inherently more difficult. The number of items recalled correctly also affects the interval calculation: for example, forgetting one item out of five (and recalling four correctly) may result in a calculated interval that is significantly longer than the previous one, rather than shortening. This reflects a more accurate adjustment based on partial success, rather than treating the card as completely failed.

Finally, I would like to add that the card would still progress even if only one item were rated, since updating the interval for a single item is sufficient to update the card’s overall interval. This feature is particularly useful when, for example, a user is struggling with just one item: they could choose to review and rate only that item without recalling the remaining ones, which would then be automatically treated as skipped. In FSRS, skipping an item would simply indicate that it was not reviewed, accurately reflecting the user’s action.

The “set due date” action could still be applied, but it would affect the entire list as a whole.

ADVANTAGES

-accurate retrievability calculation

I have already mentioned the first advantage, which is that it allows you to calculate the retrievability of an entire list accurately, so matching the desired retention. Using cloze deletions or manually creating multiple cards only enables the calculation of retrievability for individual items. if desired retention is 90%, a list with five items represented by five cloze deletions has an overall probability of being recalled in its entirety of 0.9^5=0.59, which is unacceptable for an exam where lists are asked. You can repeat the whole list right before the exam to make sure you remember it, but you may need to repeat several lists multiple times. This approach is demanding, risky, and easy to overlook.

-no interference

To approach the problem described above, one might consider creating ALSO a separate card that includes all items, as cloze overlapper add-on does automatically. However, this approach introduces significant interference, an issue common to all lists memorised using cloze deletions. By contrast, using a single card to review all items means there are no siblings to cause interference.

OPEN QUESTIONS

-optimizing for lists vs items

Besides the technical challenges, I think the most significant issue with this approach is that optimizing intervals for the retrievability of a list will result in suboptimally short intervals for the retrievability of the individual items. If the desired retention is 90%, then one item may have a probability of 97%, another 94%, and so on, but never 89% or lower.

So, should intervals be optimized so that the desired retention applies to the entire list or rather to the individual items?

My tentative answer is: it depends. If your goal is learning as many notions as possible, for example when you are learning a new language, then memorizing single-item cards, when feasible, is generally more efficient—Piotr Woźniak emphasizes this clearly in his minimum information principle—but if breaking down the list in single-item cards is impractical or not feasible, or if your goal is to recall the entire list accurately (e.g. for an exam), then optimizing for the whole list makes more sense, for the reasons I explained before(interference and too-low retention.

- substantial penalty for forgetting the entire list

The other important issue is that if you forget all the items in a list, let’s say five, you are strongly penalised, as if you had forgotten five separate cards in a row. Yet these two situations are not equivalent. In practice, it sometimes happens that seeing just one forgotten item allows me to recall the remaining four. In that sense, forgetting all the items may be closer to forgetting only one or a few of them. One possible solution could be to give the user the option to reveal the items one at a time, but I am not sure that it will solve the problem and I do not know how to implement this solution. It should also be considered that a so strong penalty is justifiable if you take into account that memorising lists is in general much harder than memorising single facts.

P.S. LMSherlock suggested a more immediate workaround to achieve, for example, a desired retention of 90% for a list of five items, without having to modify anki’s interface and core review process. You could create a note with five cloze deletions and set the desired retention for each individual cloze/item to 0.9^(1/5) ≈ 98%. This way, interference is still a concern, but the probability of recalling all five items correctly at the same time would approach the desired retention.

1 Like

Short answer: just make a cloze card for each of N items in the list
Long answer: this would require keeping N different difficulty, stability, and retrievability values (for each item) in the same card; and N different review histories for them. Anki can’t do that, and making Anki do that would be a feat that nobody would want to do unless there is a REALLY good reason to.

2 Likes

ADVANTAGES

-accurate retrievability calculation

I have already mentioned the first advantage, which is that it allows you to calculate the retrievability of an entire list accurately, so matching the desired retention. Using cloze deletions or manually creating multiple cards only enables the calculation of retrievability for individual items. if desired retention is 90%, a list with five items represented by five cloze deletions has an overall probability of being recalled in its entirety of 0.9^5=0.59, which is unacceptable for an exam where lists are asked. You can repeat the whole list right before the exam to make sure you remember it, but you may need to repeat several lists multiple times. This approach is demanding, risky, and easy to overlook.

-no interference

To approach the problem described above, one might consider creating ALSO a separate card that includes all items, as cloze overlapper add-on does automatically. However, this approach introduces significant interference, an issue common to all lists memorised using cloze deletions. By contrast, using a single card to review all items means there are no siblings to cause interference.

In this part of the post, I discuss the limits of using cloze deletions — in short, that the final retention of a list doesn’t match the desired retention, and that having more sibling cards is an issue. my let’s call it item-by-item rating system doesn’t have these issues.

To this, I’d like to add a very interesting and highly upvoted Reddit post: The cognitive neuroscience of memory and why some of “the 20 rules may be outdated”.

Among other things, the author criticizes the use of cloze deletions, at least in certain contexts. He goes into detail, but the key idea is that cloze deletion often is not a good simulation of a real-life scenario: In real situations, we’re not faced with a text with gaps to fill in, but rather with questions like “Repeat X,” where full recall is considerably harder.

Other Reddit users have expressed similar concerns.

making Anki do that would be a feat that nobody would want to do unless there is a REALLY good reason to

Whether item-by-item rating qualifies as a really good reason depends on what you mean by really good. Memorizing lists or other complex, undecomposable information has always been a major challenge in Anki — even with clozes or other strategies. I don’t claim my approach is the definitive solution, but it’s one more option, with its own trade-offs and new advantages.

In fields like medicine but not only, having to memorize long and complex lists is the norm. If even a tiny fraction of what may be millions of Anki users discover that item-by-item rating works better for them than cloze deletions or other methods, the potential overall benefits — in saved time, improved psychological feeling of being in control, and better performance — could be huge.

If I’m being naive and such a change would require too drastic a redesign of how Anki handles cards, that’s fine — I wouldn’t insist too much. But if there’s any way to even approximate the idea I’ve proposed, I think it’s worth considering.

Any suggestions on how to make this more feasible are welcome.

I personally agree that cloze deletions aren’t the best choice, maybe even most of the time. I have to know a lot of lists (e.g. symptoms) and always use basic cards instead of Cloze for them, because I found early on that cloze didn’t work for me as well. They have

  • the issue of interference, as you said, and
  • they provide way to much context; context which I don’t have in a real life, clinical scenario.

Thus, basic cards are closer to reality compared to cloze cards.

However: I think Expertium is right in this case. Cloze is the way to go if you really want to have this kind of scheduling control. It starts with a simple problem after all, for which there is no simple answer: What exactly is “a list of items” in a non-cloze card (e.g. Basic card)? Is it where the user used ul, ol and li? What if the user doesn’t use those html bullet lists, but e.g. markdown style bullet lists? Or just uses a or a · character? What if the user “lists” the items in a paragraph?

You might argue: Yes, Anon, but that would be bad card design. And you might be right. But look at several shared decks out there: Most don’t really follow best practises. So how would Anki, from a technical point of view, know where an item starts and where it ends? Besides: What if there are multiple list like structures? My cards as an example have the following areas / sections where lists occure:

  1. Question.
  2. Answer.
  3. Additional Remarks.
  4. Sources.

In short: I think it’s technically not feasable to do what you are proposing, which is why I think that Expertium is right regarding Cloze being the best option here.

I understand that there is a technical challenge: currently, Anki doesn’t have the proper framework to implement exactly what I’m proposing. Defining what constitutes an “item” is certainly an issue. I’m so accustomed to using plain text and creating my own cards that I might misunderstand the challenges posed by multiple formatting. However, the problems you describe don’t seem impossible to overcome. After all, users can already divide a text into clozes — so why couldn’t they similarly divide text into items? The two tasks are very similar. Dividing a card into items is something the user— not Anki itself — can and should do.

You might argue that this approach is subjective: one user might call an entire paragraph an item, while another might consider a single step of a biochemical process an item. There is no objective definition of an “item.” However, this is not a new issue. Even now, cards vary in complexity. Asking “what is an item?” is essentially the same as asking, “what is the correct amount of information to include on the back of a card?” The only practical answer I can think of is:” try to be as consistent as possible”.

Unfortunately, the same user may often have one card like “Date of the French Revolution: 1789” and another card like “Definition of health by the WHO: Health is a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity.”

For users like you and me who currently use basic cards for lists, dividing a card into items — however arbitrary the definition of item — could even improve the quality of the data provided to FSRS. The improvement in data quality would come from the fact that a very complex card can be broken down into multiple simpler items, increasing consistency.

I suspect that a more feasible implementation of what I proposed could involve recycling the structure of a cloze deletion note, since it already allows a note to generate multiple cards, each with separate stability, difficulty, retrievability, and review history. The changes needed to transition from cloze deletions to item-by-item rating might include:

-Renaming “note with multiple cards” to “card with multiple items,” if necessary.

-Instead of dispersing the repetition of each item, allowing the user to review all items in a block or sequentially, so they can attempt to recall everything at once and then rate each item individually

-Possibly adjusting the structure of the card so that the question and answer are clearly separated, since with cloze deletions they can sometimes be mixed.

-Computing a new interval for the entire list of items based on the individual retrievability, stability, and difficulty of each item, maybe using the multiplication method I described previously(R1*R2*R3=R).

1 Like