Can you add a feature to check word frequency via the Books Ngram Viewer? It’s an excellent tool for building effective decks! Currently, users have to verify each word manually. Would it be possible to automate this?
If you’re creating decks, you can use pre-made frequency lists for that. Or is this for a different use-case?
I tried these lists—even the ones with 219,000 words are missing many common ones. They’re not reliable. For myself, I wrote a Python script to process flashcards using an offline Books Ngram database.
This isn’t a feature that would be built into Anki. It would be different for every language and useless for folks who aren’t studying languages (or aren’t studying based on frequency). You can consider integrating your script into Anki as an add-on.
There are good lists for many languages (and you can ask in your language-specific learning community to find those). But using an Ngram database is just a different form of a frequency list. The quality of the list depends on how well the corpus is built and queried – which is also very language-specific.
Can you recommend any specific good free word lists for English? I only know about the wordfreq library for Python. And maybe the idea isn’t as good as I thought. I’ve always associated Anki with English. If I had been born an English speaker, I wouldn’t have needed to learn another language. Lucky them.
No, I don’t have any learning resources to suggest. Again – a community focused on learning that language is the best place to search for those.
Check out refold’s discord server. Most of them use Anki so they can tell you what to use. Also, check the shared deck for your langauge and see what other people use.
It’s easy enough to add a clickable HTML link in your template text to the appropriate Ngram Viewer page. However, you have to use it sparingly or Google will temporarily block your searches (return no results) because of what it interprets as bot-like behavior.
The Wortschatz Leipzig corpora will show you the N most commonly encountered terms in year Y for language L, compiled from various news media or other sources.
There’s a wide selection of languages (click on “All Languages”).
Here’s the download page specifically for English
Here N has values 10K or 30K or 100K or 300K or 1M.
The years available are typically recent 21st century years, not old historical data like Google Ngrams.
The frequency lists change from year to year. Obviously topical items like “COVID” feature heavily in some years and not others.
Because they just scoop up actual online text, they include all grammatical forms, not just dictionary forms. So you’ll have plurals, various gendered forms like masculine and feminine adjectives, verb conjugations and noun declensions, proper nouns, common words borrowed as-is from other languages, variant spellings and even some common misspellings, and so on.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.