Anki Addons Dataset: a detailed list of addons

Hello everyone,

When I started developing addons, I was looking for a good example of an existing addon that uses the same programming languages and values unit tests. It was difficult because there are nearly 2,000 addons, and unfortunately, the Addons List provides very limited details.

Recently, I created a dataset containing more comprehensive information scraped from the Addons List, GitHub, and the Anki Forum:

  1. Anki Web (the Addon List)

    1. Title
    2. Latest update
    3. Versions
    4. Rating, Likes, Dislikes
  2. GitHub

    1. Link to the GitHub repo
    2. Number of repo stars
    3. Last commit
    4. Programming languages
    5. Number of unit tests
    6. Number of GitHub Actions
  3. Anki Forum

    1. Support topic

I published it as a Hugging Face dataset (GitHub repo).

The most useful files are:

  1. Excel file for manual analysis: dowload
  2. JSON file for programmatic analysis (JSON schema)

The JSON file contains all parsed fields, whereas the Excel file includes only a subset.

I plan to refresh the dataset periodically—most likely on a monthly basis.

If you have any questions, feedback, or suggestions, please post a comment in this topic.

5 Likes

Interesting idea. I only skimmed it, but noticed the first add-on is already misnamed. You might want to double-check how you’re pulling the names- could be an issue with the scraping or parsing logic

3 Likes

The parsing algorithm has the ability to override automatically scraped data with correct manual values.
The name “Anki Monitor” was overridden (for testing purposes) in overrides.yaml.
I’ll remove this value.
Thanks for mentioning it!

I’ve updated it:

1 Like

This is against AnkiWeb’s ToS, so it would have been nice to ask first. If you plan to do this on an ongoing basis, please make sure you minimize the load you’re placing on AnkiWeb.

Coincidentally, I had made a fresh dump of all add-ons a few days prior to judge psutil usage, and I’ve just uploaded it to GitHub in case others will find it useful.

@abdo heads up

3 Likes

Hi @dae ,

I’d like to assure you that my script will not place any noticeable load on AnkiWeb:

  1. It caches all raw responses from AnkiWeb and the GitHub API, so it never reads the same page twice (if I need to regenerate the dataset for the same date).
  2. I plan to update the dataset only once per month.
  3. The script reads each add-on page just once and caches it.
  4. Most of the information about the source code is retrieved from the GitHub API.
  5. It doesn’t retrieve anything from the Anki Forum.

Sorry, I didn’t review any terms of use, as I considered this data to be open-source.

Unfortunately, your dump of add-on sources doesn’t contain all the information I would like to present in the dataset, so I can’t use just it.

2 Likes

Thank you for your reply; that sounds ok.

The tarball I provided is a supplement, not a replacement. It’s mainly useful for grepping through the entire codebase to search for issues, or for gauging how common a particular feature/API is being used.

3 Likes

@aleks_ya nice project. Definitely good to have an overview of all add ons.

@dae Just out of curiosity. What server do you use to host Ankiweb so it can handle so many requests? Operating system and database?

2 Likes

I’m reluctant to share too much info, as I don’t want to make life any easier for the copycats, but the key point is scaling horizontally.

2 Likes

@dae, my motivation for creating the addons dataset was my own need to find good examples when I tried to develop my own addons (Note Size and Cross-Field Highlighter).

I wanted to find not just the most popular addon, but one that has unit tests, because I really detest manual testing and want to cover as much as possible with unit tests.

Recently, Svelte has started to replace PyQt in addons, and I wanted to have some examples of addons utilizing it. My dataset includes both the language used and the number of unit tests, so I (and other developers) can find what we need.