Regular Expressions (RegEx) — Cheat Sheet?

auntanki · November 5, 2020, 7:12am

Is there a data base somewhere of RegEx expressions that can be used in Anki’s Card Browser?

Word count between x and y

"field:re:^\s*\S+(?:\s+\S+){x,y}\s*$"

I found this very helpful; but I’m a noob …

For instance, how would the above RegEx expression have to be modified to also work with Cloze fields, that have not-to-be-counted hint words within the curly brackets?

cardosaum · November 5, 2020, 11:10am

Maybe Chapter 7 of Automate the Boring Stuff with Python can help you get the gist of regex?

Keep in mind that newer versions of Anki use a slightly different regex syntax.

There is actually several cheat sheets out there, but frankly? Just learn the basics and you’ll get independence of them.

cardosaum · November 5, 2020, 11:32am

Well, I don’t think I got what you wanted, but maybe someone else can enhance this regex?

^\s*((\S+)|(\{\{(\s*\S*?)\:(\s*\S*?)\:(\s*\S*?)?\}\}))(?:\s+((\S+)|(\{\{(\s*\S*?)\:(\s*\S*?)\:(\s*\S*?)?\}\}))){x,y}\s*$

btw, there are also sites where you can write your regex interactively.

hkr · November 5, 2020, 12:57pm

Edit: fixed the code.

Regex search in recent Anki versions uses Rust’s regex engine, as @cardosaum mentioned. Since Rust’s regex engine doesn’t seem to currently support arbitrary lookahead/lookbehind assertions, it might be difficult to sort out cards by word count while excluding hints in cloze deletions.

Another option to achieve what you want would be to use Anki’s debug console. Try the following:

Let’s say you want to get cards whose Text field has 3-5 words, excluding hints in cloze deletions.

Change the value of target_field, min_word_count and max_word_count in the following code to suit your needs. If you want to limit the search to a specific deck, change the value of search_query to something like search_query = f'"{target_field}:*" "deck:your deck name"'. (the same syntax as the browser search)

import re
from anki.utils import ids2str, stripHTML

target_field = "Text"
min_word_count = 3
max_word_count = 5

search_query = f'"{target_field}:*"'
word_pattern = re.compile(r"\w+")
cloze_pattern = re.compile(r"{{c\d+::|::.*?}}|}}")
browser = aqt.dialogs.open("Browser", mw)
browser.form.searchEdit.lineEdit().setText("")
browser._lastSearchTxt = ""
model = browser.model
result_note_ids = []

model.beginReset()

for note_id in mw.col.find_notes(search_query):
    note = mw.col.getNote(note_id)
    for key, value in note.items():
        if key == target_field:
            text = stripHTML(value)
            text = cloze_pattern.sub("", text)
            cnt = len(word_pattern.findall(text))
            if min_word_count <= cnt <= max_word_count:
                result_note_ids.append(note_id)

sql = f"select id from cards where nid in {ids2str(result_note_ids)}"
cards = mw.col.db.list(sql)
model.cards = cards
model.endReset()

Open a debug console from Anki’s main window, paste the above code into the top frame, and run it with Ctrl + Enter.
Anki’s browser window will be opened automatically, and only cards whose Text field has 3-5 words should appear in the browser table view. If your collection is huge, the search may take tens of seconds.

Rumo · November 5, 2020, 3:00pm

RegEx are an extremely empowering tool in the digital world and I second @cardosaum’s advise to familiarise yourself with them, @auntanki. I know they look intimidating, but they aren’t really more complicated than Anki.

That being said, I agree with @hkr that this concrete task is really hard to accomplish with RegEx (at least without lookaround assertions). @cardosaum made a good start, but this doesn’t quite work yet and is already impossible to understand (at least for me) without external tools.
After all, RegEx aren’t made for counting.

@hkr’s solution is great!
I think there’s a bug, though: The cloze pattern should be r"{{c\d+::.*?}}".
Also, I advise to at least include Text:* (the field name in question) in the search query to speed up the search significantly.

hkr · November 5, 2020, 4:13pm

Thank you for the reply, @Rumo!

In the code above, the purpose of cloze_pattern is not to parse {{c<num>::<words>}}, but to remove leading {{c<num>:: and trailing ::hint words}}. For example, if the content of Text field is aaa {{c1::some words::hint words}} bbb {{c2::words with no hint}} ccc, it will be converted to aaa some words bbb words with no hint}} ccc by executing text = cloze_pattern.sub("", text). But I have just realized that I should also remove trailing }} in case there is spaces between clozed word and trailing }}, so I fixed cloze_pattern.

That’s right. I fixed search_query.

Rumo · November 5, 2020, 4:33pm

Ah, he only wanted to exclude the hint words! I missed that.
In that case, I’d just add the + or something similar. I believe Anki allows to number clozes with 1 to 500 (not necessarily successively).

hkr · November 5, 2020, 4:42pm

Oh, sorry. I just noticed that you were pointing out the case of {{c<xxx>::. I fixed the code again.

Topic		Replies	Views
RegEx flavor (in Find and Replace) Help	4	1093	May 1, 2023
Why is my regular expression invalid? Help	7	1015	May 1, 2023
Regular expressions in search regarding tags Help	7	1268	May 1, 2023
Why doesn't this regular expression works? Help	2	259	July 9, 2023
Help with Regex(?) Help	5	741	May 1, 2023

Regular Expressions (RegEx) — Cheat Sheet?

Related topics