Help with Regex(?)


I’m translating some cards, but I’m having trouble using Find and Replace. Several cards have the <img…>. I want to keep only the texts that have this tag along with their information and exclude the rest of the text. How could it be possible?

Now, I’m looking for a way to do this with thousands of cards, not manually deleting the words, for example.

I think, it’ll require a few steps to get it done using simple regular expressions (without lookahead and lookbehind assertions).

  1. Remove all text from the start and the end of the HTML.

    Find: ^[^<]+|[^>]+$


  2. Remove all text between all HTML tags.

    Find: >[^<>]+<
    Replace: ><

  3. Replace <> in <img> with something unique (to be able to revert it back later).

    Find: <img ([^>]+)>
    Replace: /:img $1:/

  4. Remove all HTML tags.

    Find: <.*?>

  5. Replace /:img ...:/ with <img ([^>]+)>.

    Find: /:img (.*?):/
    Replace: <img $1>

A possible alternative (that will require some programming knowledge) is to use Ze Add Note Id - AnkiWeb to add the first field with some unique ID, export your notes as HTML, write some script to process the exported text file and create a .tsv file with the same number of columns or with just two columns, the fist column as is and some text in the second column (<img> in this case), import it back to Anki and choose ‘Update …’.

Or maybe make a backup first and just run something similar to this a bit old code in the debug console to automatically batch process some notes with Python and BeautifulSoup, e.g. update all notes in the “Default” deck and keep only <img> tags in the “Text” and “Sketchy” fields.

from bs4 import BeautifulSoup

# ----------- Options -----------
search = "deck:Default"
fields = ["Text", "Sketchy"]
# -------------------------------

nids = mw.col.findNotes(search)
print(f'Found {len(nids)} notes.')
for nid in nids:
    note = mw.col.getNote(nid)
    for fld in fields:
        if fld not in note:
        soup = BeautifulSoup(note[fld], 'html.parser')
        images = [str(tag) for tag in soup.find_all('img')]
        note[fld] = ' '.join(images)

In case you ever run into something that really needs lookahead and lookbehind assertions, you could export the deck as .txt (or .json with CrowdAnki), perform the RegEx edits in your preferred text editor and then reimport the file.


Thank you so much!!

Sorry for reviving the topic. I was studying Regex today and ended up finding an alternative solution.

Find: ((?:(<img.?>)).?)|.*?
Replace: $2

1 Like