Is there a clever way to count how many unique kanji appear in a deck?

johncaiwa · February 3, 2023, 7:37am

I have over 16,000 cards. Many cards have sentences.

It seems like there should be some way to skim all the cards and get a total for the unique number of kanji that appear.

dae · February 5, 2023, 5:26am

You might be able to do this with the Japanese Support add-on’s stats?

johncaiwa · February 5, 2023, 4:54pm

Thanks

I checked it out, but it showed that I had 0 kanji in my deck of 16,000 cards.

Perhaps because they are all kanji compounds?

dae · February 6, 2023, 5:34am

The add-on expects your note type to be in a standard format, eg to include ‘Japanese’ in the name, and to have text in a field called ‘Expression’.

johncaiwa · February 23, 2023, 4:19pm

For anyone studying Japanese, you might like this. I asked CHAT GPT to rewrite the code. Now it outputs all the Kanji in the Anki deck, writes how many times that kanji appears, then lists them all in descending order.

import sys
import re

def main():
    # set input and output files
    input_file = "kanji.txt"
    output_file = "output.txt"
    
    # read input file
    with open(input_file, "r", encoding="utf-8") as f:
        text = f.read()

    # remove kanji within <div id=tag> tags
    text = re.sub(r'<div id=tag>.*?</div>', '', text, flags=re.DOTALL)

    # count kanji
    kanji_count = {}
    for char in text:
        if '\u4e00' <= char <= '\u9fff':
            if char in kanji_count:
                kanji_count[char] += 1
            else:
                kanji_count[char] = 1
    
    # sort kanji by frequency
    kanji_freq = [(kanji, freq) for kanji, freq in kanji_count.items()]
    kanji_freq.sort(key=lambda x: x[1], reverse=True)
    
    # write results to output file
    with open(output_file, "w", encoding="utf-8") as f:
        f.write(f"Total Unique Kanji: {len(kanji_freq)}\n\n")
        for kanji, freq in kanji_freq:
            f.write(f"{kanji}: {freq}\n")

if __name__ == "__main__":
    if len(sys.argv) == 1:
        main()
    else:
        print("Usage: python kanji.py")

name the script kanji.py
name the exported list of cards kanji.txt
it outputs to output.txt

kanji appearing in tags are not included

if you dont know how to run the script, chat gpt taught me when i asked

Rumo · February 23, 2023, 7:41pm

Apparently, ChatGPT hasn’t scraped this forum, yet, and doesn’t know how to use the debug console.
Here is more concise version you can execute in Anki directly without a Python installation:

text = ""
for nid in mw.col.find_notes('"deck:My Japanese deck"'):
    text += mw.col.get_note(nid).joined_fields()

kanji_count = {}
for char in text:
    if '\u4e00' <= char <= '\u9fff':
        kanji_count[char] = kanji_count.get(char, 0) + 1
    
kanji_freq = [(kanji, freq) for kanji, freq in kanji_count.items()]
kanji_freq.sort(key=lambda x: x[1], reverse=True)

print(f"Total Unique Kanji: {len(kanji_freq)}\n\n")
for kanji, freq in kanji_freq:
    print(f"{kanji}: {freq}")

daddydave · February 23, 2023, 8:09pm

ChatGPT is becoming Frank’s Red Hot sauce, I think. (If you don’t get the reference, their slogan is, “I put that s💩t on everything!”)

johncaiwa · February 23, 2023, 8:56pm

>>> text = ""
... for nid in mw.col.find_notes('"deck:日本語"'):
...     text += mw.col.get_note(nid).joined_fields()
... 
... kanji_count = {}
... for char in text:
...     if '\u4e00' <= char <= '\u9fff':
...         kanji_count[char] = kanji_count.get(char, 0) + 1
...     
... kanji_freq = [(kanji, freq) for kanji, freq in kanji_count.items()]
... kanji_freq.sort(key=lambda x: x[1], reverse=True)
... 
... print(f"Total Unique Kanji: {len(kanji_freq)}\n\n")
... for kanji, freq in kanji_freq:
... pp(    print(f"{kanji}: {freq}"))
Traceback (most recent call last):
  File "aqt.main", line 1774, in onDebugRet
  File "<string>", line 15
    pp(    print(f"{kanji}: {freq}"))
    ^
IndentationError: expected an indented block

Rumo · February 23, 2023, 9:01pm

If that is supposed to be a question, you seem to have pressed Ctrl+Shift+Enter instead of Ctrl+Enter which added pp() breaking the code.

johncaiwa · February 24, 2023, 6:47am

out of curiosity i asked chat gpt to make code that can run in ankis debug console, and it did. albeit not as concise as your code

Rumo · February 24, 2023, 2:35pm

I would be interested to see that code. When I tried it, ChatGPT told me very confidently, but also very incorrectly what to do.

johncaiwa · February 25, 2023, 1:10pm

you have to tell it that. then it will correct its mistakes. it rarely ever spits out corrrect code on the first go

system · March 27, 2023, 1:10pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problems with individual decks' statistics (latest version, on mac) Help	4	482	May 1, 2023
Review Card Count Higher than actual number of cards in deck Help	2	313	February 18, 2024
New Anki user asking for advice Card Design	6	773	April 28, 2023
Creating different cards (design) for different decks Help	13	1633	September 13, 2023
Count cards of my decks Help	7	5444	January 22, 2023

Is there a clever way to count how many unique kanji appear in a deck?

Related topics