Help with Regex(?)

gustavosmen · September 17, 2021, 8:32pm

Hello!

I’m translating some cards, but I’m having trouble using Find and Replace. Several cards have the <img…>. I want to keep only the texts that have this tag along with their information and exclude the rest of the text. How could it be possible?
E.g.

.
Now, I’m looking for a way to do this with thousands of cards, not manually deleting the words, for example.

kelciour · September 18, 2021, 1:05am

I think, it’ll require a few steps to get it done using simple regular expressions (without lookahead and lookbehind assertions).

Remove all text from the start and the end of the HTML.

Find: ^[^<]+|[^>]+$
Replace:
Remove all text between all HTML tags.

Find: >[^<>]+<
Replace: ><
Replace <> in <img> with something unique (to be able to revert it back later).

Find: <img ([^>]+)>
Replace: /:img $1:/
Remove all HTML tags.

Find: <.*?>
Replace:
Replace /:img ...:/ with <img ([^>]+)>.

Find: /:img (.*?):/
Replace: <img $1>

A possible alternative (that will require some programming knowledge) is to use Ze Add Note Id - AnkiWeb to add the first field with some unique ID, export your notes as HTML, write some script to process the exported text file and create a .tsv file with the same number of columns or with just two columns, the fist column as is and some text in the second column (<img> in this case), import it back to Anki and choose ‘Update …’.

Or maybe make a backup first and just run something similar to this a bit old code in the debug console to automatically batch process some notes with Python and BeautifulSoup, e.g. update all notes in the “Default” deck and keep only <img> tags in the “Text” and “Sketchy” fields.

https://docs.ankiweb.net/misc.html?#debug-console

from bs4 import BeautifulSoup

# ----------- Options -----------
search = "deck:Default"
fields = ["Text", "Sketchy"]
# -------------------------------

nids = mw.col.findNotes(search)
print(f'Found {len(nids)} notes.')
for nid in nids:
    note = mw.col.getNote(nid)
    for fld in fields:
        if fld not in note:
            continue
        soup = BeautifulSoup(note[fld], 'html.parser')
        images = [str(tag) for tag in soup.find_all('img')]
        note[fld] = ' '.join(images)
    note.flush()
mw.reset()
print('Done!')

kleinerpirat · September 18, 2021, 9:52am

In case you ever run into something that really needs lookahead and lookbehind assertions, you could export the deck as .txt (or .json with CrowdAnki), perform the RegEx edits in your preferred text editor and then reimport the file.

gustavosmen · September 18, 2021, 2:35pm

Thank you so much!!

gustavosmen · October 3, 2021, 9:58pm

Sorry for reviving the topic. I was studying Regex today and ended up finding an alternative solution.

Find: ((?:(<img.?>)).?)|.*?
Replace: $2

Topic		Replies	Views
Help with find&replace regex Help	6	288	July 2, 2023
Cannot find replace html (regex) in all notes Help	4	576	May 1, 2023
Find and Replace doesn't work properly Help	6	465	October 22, 2023
Find and Replace 2.1.49, 52, 53 (Bug) Help	2	364	May 1, 2023
How do I find and replace HTML? Help	7	1032	February 3, 2023

Help with Regex(?)

Related topics