A thread for those interested in low FSRS retention rates

I’m going to lay out a bit of my reasoning for targeting low retention, but I’m not here to argue memory science and I don’t want contributors to the thread to be led off track. The thread is for experiences with this approach, and thoughts on how to make it work best.

Caveat out of the way, a few basic reasons one may want to do this: The evidence for the popular idea of a forgetting curve that resets with each successful recall is basically absent from the literature. There’s also no consistent evidence in favor of any particular spacing method: progressive spaces, equal spaces, on average there’s no evidence it matters. However, it does seem true that the more effort is required for a recall, the greater the boost to long-term retention will be: and this is true even if a card is failed. It’s very evident from the research that just because one fails a recall, this is no indication that it’s being filed in long-term memory any less effectively. Likewise, at 70% retention, each successful recall takes more cognitive effort than each of the successful recalls does at 90% retention. This means the value of that recall for durability of the memory is even greater. Thus, this approach may be especially useful when having information available in immediate memory for a test in the short-term (say, a few weeks) is not one’s priority. Something like learning a language when you’re in no rush to get conversational but want to have maximum progress two or three years in. One can probably both get more bang for the buck in terms of each card’s value, and then multiply these savings by the reduction of time required on flash cards overall, by targeting low retention. The sooner we initiate recall across larger gaps, the sooner we’re beginning the process of building long-term memories—there’s actually just no reason to think we have to successfully juggle something in short-term memory (recall in the handful of days after first learning) before we begin the process of long-term consolidation. We may want learning steps and short to medium term recalls simply for motivation and safety rails: to be able to gauge our progress to some degree along the way, to make sure we’re paying attention and trying to recall and encode the information at all times. It may even be that the only reason to have some minimum target (as opposed to just reading books and attempting recall of every unfamiliar word at whatever frequency they happen to appear) is so that we remain motivated, during a session, to keep attempting recall on every card, so that pathways will get the largest boost even if the card is failed.

If you’re familiar with the research supporting the above — or simply want to see how much value you can get out of Anki while using it less — this thread is for ideas and experiences.

My first experience has been working through the Jōyō kanji. I started with 8 cards a day, targeting 90%. This meant some ~60 reviews per day. I decided to blast through the 1500 remaining by doing 75 a day in a second deck, and I aimed for 70% retention so I wouldn’t get swamped by reviews while doing so. This got me through a chunk of the list in 3 weeks that would have taken me around 6 months to get through otherwise, and a short time afterwards I felt very comfortable with the new information. When I saw how effective this was, I decided to lower the target retention in the primary deck to 70%. My reviews for that day dropped from 60 to 4, and after some time I still felt just as confident with the information. These experiences have me interested in aiming for even lower retention rates, spending even less time on Anki, and making sure that I really concentrate, work hard to retrieve, and strongly encode information in the time I do spend. In fact, I may go ahead and target 50% retention for a list of uncommon kanji, and see how strong my confidence with that list is some months down the line.

2 Likes

You state that there’s no evidence for the fundamental principles behind Anki. (Hmm, why use it then?)

But there does appear to be some longstanding research, starting way back in the 1880s, including Piotr Wozniak’s work, all the way up to L.M.Sherlock’s recent research, published in ACM conference proceedings, that led to the development of FSRS, assisted by a dataset of 1.5 billion reviews by 20k Anki users. So it does seem that this is evidence based.

Meanwhile you make your own claims (“It’s very evident from the research”, “If you’re familiar with the research supporting the above”) without naming or describing that research. Most of us aren’t familiar with it, so that would be a helpful topic for your next post in this thread.

You say you don’t want to argue memory science, and fair enough, but I think you need a little more background exposition.

1 Like

It’s an interesting point about effort and memory and something that confused myself. I made a forum post about it because I was under the impression that cards should permanently remain difficult to optimize memory. Any downsides to more and more mature cards?

For me, a benefit of Anki is that it eventually enables ‘fluency’ of topics. If my knowledge of something is more vague and I need a lot of time to recall it, that’s not too far off from using external devices outside of memory.

I believe you need a high retention in order to be able to recall quickly and dependably(fluently). If you with to recognize Kanji in real-world scenarios, you can easily backtrack mentally, use further context, or simply plow on. But for production, if you ‘fail’ in real-world tasks there can be devious miscommunications or the conversation can stall completely. As you said yourself, a lower target retention may not be ideal for becoming conversational.

That being said, how low can you go with target retention and still get those ‘fluency’ benefits? If there is some target retention rate that can be low enough to give you the benefits you mention while working for a variety of different types of tasks and notes, it could be interesting.

I think it would be useful if you expand your experiment to decks that are quite different than vocabulary recognition such as production as mentioned above or something even more different. Many scientific experiments first change settings to pretty extreme values, such as your idea with setting retention to 50%. As long as it doesn’t completely break, if you try such a low retention rate along with very high retention rates and across a variety of types of decks, you will have a much better idea of if and where there is some optimal value.

I’d like to see something that I can actually think of as related to this graph.
It seems like the Compute optimal retention button tries to find the highest retention you can get in the given time, but if you give it infinite time, it’s happy to use it all to maximize the retention (to 95%), while I want to save time on low-priority cards and maximize the amount of cards I’ve seen, as long as the failure rate is not too demotivating.

It’s an interesting point about effort and memory and something that confused myself. I made a forum post about it because I was under the impression that cards should permanently remain difficult to optimize memory. [Any downsides to more and more mature cards?]

Yes. I think in the case of your question, let’s say you would quickly answer a card correctly for the next five years, well it probably is true that waiting until five years and a day to review that card would be optimal for solidifying it in your fifteen year memory—it’s just that the cost of reviewing it more frequently than that to maintain it over the same period (maybe a couple minutes over years) is outweighed for almost everyone by the benefit of having the information available for all those years either. The principle most likely applies, but who cares? If you stretch this too far, you might be optimizing your long-term long-term memory, but eventually you’re… well, you’re 150 years old and probably dead. We’re all really trying to optimize for some particular finite time frame, exam or no.

I believe you need a high retention in order to be able to recall quickly and dependably(fluently)

Well, if you start a deck today and you want to recall everything in it ‘fluently’ three weeks from now, you need high retention because only frequent reviews will keep all this accessible in short-term memory right now. But the gap here is between “I can fluently pull this information from short-term memory right now” and “I can fluently pull this information from long-term memory two years from now.” Less frequent, more difficult more spaced reviews seem to be simply superior for the latter even if they are failed in the short-term. I say lower retention isn’t ideal for becoming conversational with a set of words ~3 weeks from now because we’re dealing with a time frame where long-term storage is irrelevant.

Short and long-term memories might be compared to the RAM and HDD of a computer: just because the RAM can handle it, doesn’t mean the HDD is writing it down (see: cramming). And just because the RAM doesn’t know it, doesn’t mean the HDD isn’t writing it down (failing a recall has no negative impact on its long-term retention)! So we essentially have to choose between optimizing short- or long-term recall. The book Make It Stick harps on the fact that we just don’t have introspective awareness of what’s effective for our own memories… I would add that when we recall any piece of information, we have no introspective awareness of whether we pulled it from short- or long-term memory.

So, if we took a deck of 2000 words and sentences in a language, two people hitting ~60 cards per day for a month, the person aiming for 90% retention would have more short-term memory access for the first ~6 weeks but if we look ~6 months out, the person aiming for 70% retention would have encoded those terms more efficiently into long-term memory and would be more “fluent” with less work. The person aiming for 90% retention would have spent a lot of time juggling cards in short-term memory before even beginning to recall them over the larger gaps that are more conducive to long-term encoding. The person aiming for 70% would have a lot more failed reviews in the first ~6 weeks, but would have initiated this process with any particular card much sooner.

One review I read recently speculated that what I’m referring to as “juggling things in short-term memory over short periods” is essentially, still, a form of cramming, even if it is drawn out over days. The model they suggested went something like this: when we lay a memory down at any given time, we’re laying it amongst the network of pathways currently open in our brains. If we wait for “forgetting” to happen, what this really means is that those active networks have changed, and when we re-encode the memory, we do it against a new backdrop of active pathways in the brain. This re-encoding is optimal for the long-term, because this now means that if I try to recall that information a year from now, when who knows what “part of my brain I might be in,” there are now two different pathways in two different parts of my brain that will lead me there and there’s a better chance that I’ll find one of them.

I have certainly noticed something that I could attribute to this with kanji: sometimes I’ll associate a keyword to a component and this will form my initial mnemonic for remembering it. Later, I’ll have forgotten this keyword and I’ll end up making a whole new way of remembering. I’ll rely on this new way most of the time, but occasionally I will slip back and recall it by the method I laid down originally.

As for the meaning of 70% or 90% retention, it’s easy to conflate this with other things we care about, and I’m not completely clear on how some concepts overlap myself. But we have to be careful not to think “I only know 70% of this information vs. 90% of it.” It means I only review the information at the time when I get 7 or 9 out of 10 right. If I aim for 70% retention, the gap between reviews will grow not only because I’m waiting for more of it to be “forgotten,” but also because each successful recall has more impact, and this in itself eventually means my memories will be more stable. This is completely unrelated to asking: on the day exactly 1 year from now, exactly how much of this information will I have down? Lower retention is not only likely more effective towards this goal, it also frees more time and mental bandwidth to take in more new information. Of course, it’s also possible that this backfires once we’re looking at recall over months and years: I’m not aware of any studies comparing different approaches to spacing over these kinds of time frames. (So personally I would suggest keeping the maximum interval as low as you’re comfortable with — the net result of low initial retention with a compressed maximum interval being to flatten the curve of spacing overall.)

I think it would be useful if you expand your experiment to decks that are quite different than vocabulary recognition such as production as mentioned above or something even more different. Many scientific experiments first change settings to pretty extreme values, such as your idea with setting retention to 50%. As long as it doesn’t completely break, if you try such a low retention rate along with very high retention rates and across a variety of types of decks, you will have a much better idea of if and where there is some optimal value.

Definitely. Someone claimed that below ~70% you would paradoxically end up getting more reviews instead of less, and that may have been BS, but I had no particular reason to choose any particular number anyway, so I went with it. It seems production contributes a lot more to recognition than recognition does to production, so I wonder if production isn’t simply more effective across the board. (For Kanji as soon as I faintly recognize a character I search for it in an app that gives vocabulary that uses it and has you draw it. This also means it’s too late to try an experiment like this one with kanji: to get a decent test we’d need to look for something that was unfamiliar to everyone involved: maybe some very obscure language using a script that neither of us read.)

This would be a good reason to create separate decks for separate priority level cards! I’ve mentioned raising retention to 90% to get the information in short-term access if it’s going to be actively needed soon: e.g. a visit to a country that speaks the language. I probably don’t want ~30K frequency literary vocabulary in the deck that I bump to 90% retention a few weeks ahead of a trip for that purpose. You also might want a max interval <1yr for regular vocab and a larger gap for literary or technical words.

If you forget all the time, intervals will not grow fast enough.

I am doing that, but I want to know the exact minimum usable retention.
The Compute button used to suggest 83% for most decks when I specified all cards and 1-3 years.

I set 83% for the least important decks, increasing by 1% per priority level (except for the top priority ones with 93-96%)

But then I tried this:
“amount of new cards per day × days in 3, 5, 10 years + amount of reviews due in the next 3, 5, 10 years”, time/day last year.
and Anki suggested 84-87% for most decks, although 80% for one. It wasn’t proportional to whether I review the decks on time. The 5 year suggestion was usually lower than the 3- and 10-year ones.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.