Regular Expressions (RegEx) — Cheat Sheet?

Is there a data base somewhere of RegEx expressions that can be used in Anki’s Card Browser?

Eg (kindly provided by user Rumo):

Word count between x and y

"field:re:^\s*\S+(?:\s+\S+){x,y}\s*$"

I found this very helpful; but I’m a noob

For instance, how would the above RegEx expression have to be modified to also work with Cloze fields, that have not-to-be-counted hint words within the curly brackets?

Maybe Chapter 7 of Automate the Boring Stuff with Python can help you get the gist of regex?

Keep in mind that newer versions of Anki use a slightly different regex syntax.

There is actually several cheat sheets out there, but frankly? Just learn the basics and you’ll get independence of them.

1 Like

Well, I don’t think I got what you wanted, but maybe someone else can enhance this regex?

^\s*((\S+)|(\{\{(\s*\S*?)\:(\s*\S*?)\:(\s*\S*?)?\}\}))(?:\s+((\S+)|(\{\{(\s*\S*?)\:(\s*\S*?)\:(\s*\S*?)?\}\}))){x,y}\s*$

btw, there are also sites where you can write your regex interactively.

Edit: fixed the code.

Regex search in recent Anki versions uses Rust’s regex engine, as @cardosaum mentioned. Since Rust’s regex engine doesn’t seem to currently support arbitrary lookahead/lookbehind assertions, it might be difficult to sort out cards by word count while excluding hints in cloze deletions.

Another option to achieve what you want would be to use Anki’s debug console. Try the following:

  • Let’s say you want to get cards whose Text field has 3-5 words, excluding hints in cloze deletions.

  • Change the value of target_field, min_word_count and max_word_count in the following code to suit your needs. If you want to limit the search to a specific deck, change the value of search_query to something like search_query = f'"{target_field}:*" "deck:your deck name"'. (the same syntax as the browser search)

    import re
    from anki.utils import ids2str, stripHTML
    
    target_field = "Text"
    min_word_count = 3
    max_word_count = 5
    
    search_query = f'"{target_field}:*"'
    word_pattern = re.compile(r"\w+")
    cloze_pattern = re.compile(r"{{c\d+::|::.*?}}|}}")
    browser = aqt.dialogs.open("Browser", mw)
    browser.form.searchEdit.lineEdit().setText("")
    browser._lastSearchTxt = ""
    model = browser.model
    result_note_ids = []
    
    model.beginReset()
    
    for note_id in mw.col.find_notes(search_query):
        note = mw.col.getNote(note_id)
        for key, value in note.items():
            if key == target_field:
                text = stripHTML(value)
                text = cloze_pattern.sub("", text)
                cnt = len(word_pattern.findall(text))
                if min_word_count <= cnt <= max_word_count:
                    result_note_ids.append(note_id)
    
    sql = f"select id from cards where nid in {ids2str(result_note_ids)}"
    cards = mw.col.db.list(sql)
    model.cards = cards
    model.endReset()
    
  • Open a debug console from Anki’s main window, paste the above code into the top frame, and run it with Ctrl + Enter.

  • Anki’s browser window will be opened automatically, and only cards whose Text field has 3-5 words should appear in the browser table view. If your collection is huge, the search may take tens of seconds.

6 Likes

RegEx are an extremely empowering tool in the digital world and I second @cardosaum’s advise to familiarise yourself with them, @auntanki. I know they look intimidating, but they aren’t really more complicated than Anki. :wink:

That being said, I agree with @hkr that this concrete task is really hard to accomplish with RegEx (at least without lookaround assertions). @cardosaum made a good start, but this doesn’t quite work yet and is already impossible to understand (at least for me) without external tools.
After all, RegEx aren’t made for counting.

@hkr’s solution is great!
I think there’s a bug, though: The cloze pattern should be r"{{c\d+::.*?}}".
Also, I advise to at least include Text:* (the field name in question) in the search query to speed up the search significantly.

4 Likes

Thank you for the reply, @Rumo!

In the code above, the purpose of cloze_pattern is not to parse {{c<num>::<words>}}, but to remove leading {{c<num>:: and trailing ::hint words}}. For example, if the content of Text field is aaa {{c1::some words::hint words}} bbb {{c2::words with no hint}} ccc, it will be converted to aaa some words bbb words with no hint}} ccc by executing text = cloze_pattern.sub("", text). But I have just realized that I should also remove trailing }} in case there is spaces between clozed word and trailing }}, so I fixed cloze_pattern.

That’s right. I fixed search_query.

2 Likes

Ah, he only wanted to exclude the hint words! I missed that. :slight_smile:
In that case, I’d just add the + or something similar. I believe Anki allows to number clozes with 1 to 500 (not necessarily successively).

1 Like

Oh, sorry. I just noticed that you were pointing out the case of {{c<xxx>::. I fixed the code again.

1 Like