Sometimes i find my fields cluttered with useless <div></div> or random instead of simple spaces. It may be due to addons or aggressive copy-pasting or only god knows what.
Sometimes i even find something like this
Wouldn’t it be possible to add a simple feature that detects empty <div></div> or more generally “useless html junk” and delete them? Maybe when one checks the database? or when anki shutdown?
You can use the “Find & Replace” function in the browser to remove these empty divs.
Cleaning up with Regex
To clean up monsters like the one in your image, regular expressions will not be as good as a proper Python script / parser. But you could try anyways:
Since Anki’s Regular Expression engine doesn’t support recursion like (<div>)(?R)?(<\/div>) (Perl syntax), you would risk unmatched tags with this regular expression: <div>(<div>|<\/div>)*<\/div>
You could test this on a single note and see if it’s worth risking some side effects.
With the current version of Anki, these empty divs should not appear as often anymore, because Enter no longer inserts <div></div>:
Regarding non-breaking-spaces
I can reliably reproduce the insertion of by copy-pasting formatted content in the editor:
I see that there is already a lot of work on this topic; what i was thinking was a simple workaround like that when you check the database (but this is just an example), an automatic process find and replace every non breaking space with a normal space…
I would recommend against automatically converting nonbreaking spaces, as they are a legitimate character with legitimate uses. We use them relatively little in English, but some languages use them much more. For example, my work is in French for a small publisher, and they are in everyday use for punctuation in French. Similarly, I do use them with Anki, and sometimes I have to go out of my way to make sure they are not wrongly converted by Anki.
I saw find-and-replace mentioned. If anybody needs a GUI text preparation tool, I created detergent.io (there’s npm package too) which allows to detect non-breaking space characters (U+00A0) and decide what to do with them; also strip HTML; also encode/decode entities (including non-breaking spaces); set the letter case, and collapse white-space — full toolkit. I originally created it to prepare text for pasting into HTML email templates but it helps me preparing text for Anki, especially lower-casing Cyrillic and German letters. Detergent is Open-Source and not monetized or tracked. I hope it helps somebody.