Automatic removal of `<div></div>`

Sometimes i find my fields cluttered with useless <div></div> or random &nbsp; instead of simple spaces. It may be due to addons or aggressive copy-pasting or only god knows what.
Sometimes i even find something like this

Wouldn’t it be possible to add a simple feature that detects empty <div></div> or more generally “useless html junk” and delete them? Maybe when one checks the database? or when anki shutdown?

4 Likes

Regarding the divs

You can use the “Find & Replace” function in the browser to remove these empty divs.

Cleaning up with Regex

To clean up monsters like the one in your image, regular expressions will not be as good as a proper Python script / parser. But you could try anyways:

Since Anki’s Regular Expression engine doesn’t support recursion like (<div>)(?R)?(<\/div>) (Perl syntax), you would risk unmatched tags with this regular expression:
<div>(<div>|<\/div>)*<\/div>

You could test this on a single note and see if it’s worth risking some side effects.

With the current version of Anki, these empty divs should not appear as often anymore, because Enter no longer inserts <div></div>:

Regarding non-breaking-spaces

I can reliably reproduce the insertion of &nbsp; by copy-pasting formatted content in the editor:

It doesn’t matter whether you paste it between different fields or into the same one. I’d really appreciate this getting fixed.

How to clean up non-breaking-spaces

In the meantime @aPaci, you can select all notes with Ctrl+A and (again) use the “Find & Replace” feature like this:


Related post

Anki inserts erroneous non-breaking spaces when pasting from clipboard

6 Likes

The non-breaking spaces are generated by the web toolkit, and it’s not trivial to work around them without breaking things.

2 Likes

I see that there is already a lot of work on this topic; what i was thinking was a simple workaround like that when you check the database (but this is just an example), an automatic process find and replace every non breaking space with a normal space…

1 Like

I would recommend against automatically converting nonbreaking spaces, as they are a legitimate character with legitimate uses. We use them relatively little in English, but some languages use them much more. For example, my work is in French for a small publisher, and they are in everyday use for punctuation in French. Similarly, I do use them with Anki, and sometimes I have to go out of my way to make sure they are not wrongly converted by Anki.

2 Likes

I saw find-and-replace mentioned. If anybody needs a GUI text preparation tool, I created detergent.io (there’s npm package too) which allows to detect non-breaking space characters (U+00A0) and decide what to do with them; also strip HTML; also encode/decode entities (including non-breaking spaces); set the letter case, and collapse white-space — full toolkit. I originally created it to prepare text for pasting into HTML email templates but it helps me preparing text for Anki, especially lower-casing Cyrillic and German letters. Detergent is Open-Source and not monetized or tracked. I hope it helps somebody.

1 Like