[Anki Collection Size] Exceeds 250MB limit; need reduction methods, preserve schedules, tried standard fixes, ineffective

Dear Anki Community,

As a long-time Anki enthusiast, I am immensely grateful for the success it has brought to my studies and am proud to be part of this wonderful project. I would gladly contribute a monthly fee to ensure its continued existence and development.

Today, I seek your help in addressing a challenge that has arisen with my collection. After a decade of extensive use, primarily for medical studies, my collection has grown to about 400MB, exceeding the 250MB limit and hindering my ability to add new decks. I still can sync but cannot add new decks or implement Ankihub, for which I just signed up.

Here are the key points of my situation:

•	Anki Version: Using the latest version, 23.12.1 on MacOS.
•	Collection Size: Currently at 408 MB (as shown on Ankiweb), over the 250 MB limit.
•	Decks and Cards: Several dozen decks with nearly 100,000 cards, many including large media files (which to my understanding doesn't affect the collection size).
•	Attempted Solutions: Followed standard recommendations like deleting unused decks, “Check Database”, using the “localize media” add-on and deleting empty note types, with no significant reduction in collection size.
•	Media Files: Not specifically compressed or removed.
•	Review history: I somehow suspect the review history to be the culprit. Deleting 25000 cards from an unused deck did not result in a significant lower collection size but deleting some few cards I have had in my collection for years did give me some MB. I have statistics spanning over the past ten years, but the review history is not at all important to me. I'd rather have my scheduling data per card retained.
•	Backups: Complete backup of my collection is available.
•	Synchronization: I usually synchronize with AnkiWeb multiple times daily when opening the iOS app or the desktop one.

I am seeking advice on how to effectively manage my collection size without compromising scheduling data. I am open to advanced methods (but don’t really speak any python or database languages) or insights into the impact of review histories on collection size and if so, how to get rid of review history without compromising my scheduling (and avoiding starting my ten-year-old cards anew).

Thank you for your time and assistance. Any suggestions or guidance would be greatly appreciated!

2 Likes

上次我计划大量使用图像遮挡功能,这会对 Anki 服务器造成重负荷,然后大家都跳出来欺负我。最终,我并没有造成那种伤害,并且我找到了一个解决方案,正如作者所建议的。

在 2.1.57 之后,一个人可以设置他/她自己的服务器(在此之前,有一个社区插件可以做到这一点,但对我来说没有用)。我尝试了这个官方内建的服务器-客户端,是的,它工作了。所以对于原帖作者,他/她只需要这样做。在配置文件中可以某种方式编辑限制。

然而,我自己仍然停留在 2.1.54 上,我只同步几MB的 Anki 数据与服务器,而将收藏集媒体自己同步,只同步到我的 SSD/TF 卡上。Anki 的升级导致了很多问题:

A. 在 2.1.54 中,我使用 v2 和 SM2,而在 2.1.57 或更高版本中,它们强制你使用 v3 并强烈鼓励你使用 FSRS

B. 插件……一些非常非常基础的插件会失效。幸运的是,插件作者们还活着(不是开玩笑,很多人在疫情中消失了……)。我最想要的插件已经更新了,但我仍然需要时间来测试它们。

简而言之:设置你自己的服务器。

我愛中文!

The other user is suggesting to set up your own sync server.

嗯.
是的.

Almost all of your collection is taken up by note data. You likely have one or more fields that include paragraphs or pages of text from a website, which will take up a lot of space when multiplied by many notes.

Thank you for your replies and considerations! I think self-hosting might be an option, but I don’t see myself setting it up, as I’m nowhere near proficient with the command line :blush:.

  • Regarding dae’s suggestion: Is there a smart way to determine which note type contains the fields taking up the most space, and by how much? I have quite a few from all the years of collecting decks, but I think I could browse through the types in an afternoon and report the effects here!
  • Also, is review history a significant factor, or not worth worrying about?
    Thanks again, everyone, for your time and effort!

You can use regex to search for fields with a high number of characters, which should at least give you some clues where to look. For instance, to search for notes with fields containing 1000+ characters – re:.{1000} – or you can restrict that to a specific field – FIELDNAME:re:.{1000} – but you might need to start even higher than 1000 :sweat_smile: .

2 Likes

Look for fields that look like they’ve been harvested from a website, like a ‘definition’ field. The more text you see, the more space it’s likely taking up.

the pre 2.1.54 addon community is hard to setup, failed by a college grad who is first in this class.

but the 2.1.57 one is quite user friendly , may be even for dummy(idk). i guess a primary school student could set it up. (the above college grad did succeed in setting this up)

By using re:.{1000} and focusing on the suspended cards, I managed to delete 15,000 large, unused cards from a French deck. These cards contained embedded scripts and lengthy definitions (and represented the less useful Canadian French portion of the deck :grin:). This action effectively reduced my collection size by more than half. Now, syncing is smooth sailing. Thank you, @Danika_Dakika, and all others!

1 Like

I am not sure if this is to be expected. If I search re:.{1000}, the results come up instantaneously but if I search re:.{2000}, it takes several minutes for the results to come up.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.