Do not auto-remove double quotes from re: searches

omganotherusername · October 13, 2024, 3:29am

Is your feature request related to a problem? Please describe.
When I use regex to search through my notes, I switch between simple and complex syntax. What I noticed is that my double quotes around "re:" are removed when seemingly unnecessary. The problem then is that I have to keep re-adding them every time I want to match \d or anything a bit more complex. That becomes cumbersome and counterproductive.

Here’s a concrete example

I’m searching for anything starting in somato and ending in al, so I input:
"re:somato.*al"
which gets automatically edited when I press enter and the double quotes are removed.

I have to add double quotes to it, and remove all the extra spaces - busy work: "re:somato(?:somatic|visceral)|viscero(?:somatic|visceral)"

Describe the solution you’d like
I would like the search field to not edit my input and remove the double quotes, or at least grab everything after re: and consider it one chunk as regex, without editing whitespace before syntax like (?:this|that).

Describe alternatives you’ve considered
Copy pasting but you often do a lot of that while creating/editing cards, so it’s not feasible. I looked for a way to get a shortcut in there, nope. Another option would be to use some text-expander software but let’s be honest, that’s way too much trouble.

search keywords: regex re regular expression advanced search quote quotes double alternate escape metacharacter non-capturing capturing pipe syntax space automatic change

dae · October 15, 2024, 10:27am

@Rumo, any thoughts?

Rumo · October 16, 2024, 7:56pm

This would break any existing searches with search terms following a regex.

I know the auto-formatting can be annoying, but I think your example also demonstrates how valuable it is: Without it, you would have no indication that the search re:somato(?:somatic|visceral)|viscero(?:somatic|visceral) does not what you expect.

Maybe it would already help if the user could undo the normalization with Ctrl+Z. Then they could choose to edit the original input if that’s closer to what they intended. However, that seems to be not as easy as it sounds.

omganotherusername · October 18, 2024, 1:02am

I’m not sure I follow the train of thought on how this provides value.

Also, not sure what was meant by “does not what you expect”. Here’s an example of what the regex might match: regex101.com link.

I think the best approach would be to waste as little of the user’s time as possible. While an undo function would be expected of any text entry, that still requires someone to undo, then add the double quotes which were automatically removed without highlight or notification of said action, and then finally run the search again.

It’s probably a little less time consuming, but not significantly so. And instead of pressing CTRL + Z, I believe pressing CTRL + Arrow down will get the dropdown to pop up and highlight the last search term?

I also don’t understand the logic behind adding spaces before ( after re: when there’s no other whitespace in the pattern. I assume that if there’s a whitespace, the search parser considers the next word to be an independent search term, outside of the regex? That is probably a behavior that would not hurt anyone if changed - aka consume all tokens after re: that are not whitespace which is to be safely assumed part of the intended regex.

I get how grabbing everything after re: would break searches built on existing behavior, and that’s ok, it was just a suggestion.

Rumo · October 18, 2024, 7:30pm

Let’s start from scratch with your example.

You type in re:somato(?:somatic|visceral)|viscero(?:somatic|visceral) and hit enter.
Anki replaces it with re:somato (?:somatic|visceral) |viscero (?:somatic|visceral). This tells you that your search isn’t interpreted as expected; Anki breaks it into 4 search terms.
You revert 2. by removing the spaces.
You add double quotes to mark it as a single regex search term.

Undo would let you perform step 3 without the busy work as you’ve called.
Steps 1. and 4. are inevitable, because Anki can’t read your mind.
Step 2 is a deliberate design choice to give you a hint what went wrong if you don’t see the results you wanted to see.

I cannot tell you whether it was a deliberate choice or an implementation detail at the time, but that’s how it is, and changing it would break existing searches.

omganotherusername · October 19, 2024, 4:04am

Oh, that’s what that meant…

I cannot think of a particular situation where NOT automatically going from re:somato(?:som... to re:somato (?:som would break an existing search. Existing searches most likely have the spaces already added, and while regex patterns can contain spaces, at that point it’ll again become obvious to the person that their search terms were modified and proceed with the manual fixing.

Of course, I’m coming from a standpoint that I know what my regex should be. I don’t know if users rely on this behavior to somehow only maintain the first word as the regex pattern but then transform everything else in parentheses to individual regular search terms… seems far-fetched, but everything is possible.

So, it seems like there’s reticence towards modifying existing search behavior due to potential backlash from the existing userbase. In that case, allow me to propose 2 other, likely developer time intensive options:

Implement a re2: tag that takes everything after it as the regex pattern and does not modify that part of the search input. If any other search terms or filters need to be specified, they go before the re2: tag, and the behavior there can be the same as it is now.
Add an option in preferences, or a checkbox next to the search box that says don't modify my searches, which should turn off all of the automatic changes Anki makes to the search string and just runs what the user gives it. No results? Not Anki’s problem, user took training wheels off.

What do you think?

Rumo · October 20, 2024, 7:13am

That would require to rewrite a lot of the parser. While technically possible, such a syntax would be quirky. For instance, what if re2 is part of a group? What if you want to include multiple regex searches? How can you account for operator precedence if the order of search terms is stipulated?

That’s tempting, but Damien is generally reluctant to add yet another toggle to Anki, and I think that’s prudent. This forum is full of users who dug their own pit, so to speak, and I can’t blame them.

omganotherusername · October 20, 2024, 2:50pm

This doesn’t need special consideration. Nothing happens now when you have "re:(re:lung|reape?)", it’s just considered a literal string of characters re:lung. Think of re2: as being "re:<pattern>", where whatever is inside the double quotes is part of the regex pattern, regardless of it containing another re:.

Then you use a pipe | character to introduce an alternate within your regex pattern. re2: is meant to improve usability, not replace the current re:. The regex engine is self-sufficient when it comes to matching multiple strings and patterns of strings, in a specific order, preceded or not by a specific string, and so on.

You’d just strip re2:(.++) and keep capture group 1 as the regex pattern, then throw the rest of the user’s search input to the existing parser.

I’m not quite sure I get what we call “operator precedence” and if you really mean if the order of search terms is or is NOT stipulated.

If you mean how would people know to put everything not regex before re2:, that would just be a simple NOTE: within the existing regular expression documentation page.

The example above of simply keeping the regex pattern and removing re2: prefix before passing the search string to the normal parser seems like a reasonable approach. Existing behavior doesn’t need to change, whatever the person wrote before re2: is still going to be parsed by Anki as usual.

I’m not familiar with Damien or the dev culture around Anki other than my interaction with you. Obviously whatever you decide is going to stay, no matter how shallow or deep my pit is

I’m just trying to find a way to improve usability and cut down on time wasted when I use Anki, without rocking the boat too hard. Even if it’s a “developer-geared” flag that needs to specified in a file per user profile, or an environment variable, I don’t care. It would save a lot of time in the long run.

Rumo · October 21, 2024, 7:16pm

I’m talking about Anki, not regex search groups. A search group requires a trailing ) which would be impossible with the suggested syntax after re2.

Regarding operator precedence, I meant you become restricted in the logical expressions you can build if individual search terms start stipulating a specific order.

Regex has kind of an OR operator with the pipe, but not an equivalent to Anki’s AND for example.

We’ve come full circle: There is currently a note in the docs that you have to put quotes around re searches.

omganotherusername · October 22, 2024, 4:10am

Actually, super easy, barely an inconvenience! re2:(.+)(?=\)) if you’re expecting a trailing ). or \){n} if you expect n trailing ). Or, maybe some string functions can pull off the expected number of ) from the end of the captured regex string.

Yes, building a logic alongside regex is an interesting choice when wildcards ( this stuff: d_g ) exist. Regex can do everything the search logic does now at the cost of verbosity. It would force the pattern to be more literal than needed as it seems Anki limits the use of lookarounds. I believe Anki uses python’s basic re module given the error to a verb like (*FAIL), but I’m not sure. Also, the basic re module implements fewer features than the regex package, for example, which brings most PCRE features to python . But I digress.

I’m all for having options; people should have options to use regex with anki search logic operators (and they can, just use re: ). Hence my audacity of asking you for yet another option which entails you working more than you already have, for free, which I obviously can never fully repay in "thank you"s or donations.

Indeed! And I do put quotes around my regex as I build it or adapt it to new searches, but then Anki thinks " shouldn’t be there and removes them for me, then adds some spaces throughout my regex pattern. I then put the quotes around re again, remove the superfluous spaces (which gets very tedious if I use spaces in my pattern already). Until again Anki takes the quotes away and messes with my pattern. So yes, full circle, every time I use the re: feature. And this circle unfortunately wastes time, creates frustration…

dae · October 22, 2024, 2:34pm

@Rumo Just brainstorming here - what if we updated maybe_quote() and needs_quotation() to make them force quotes when using a regex?

Rumo · October 26, 2024, 9:13am

Yes, that probably wouldn’t do any harm.

dae · November 2, 2024, 11:26am

I’ve logged this on Do not auto-remove double quotes from re: searches · Issue #3547 · ankitects/anki · GitHub

Topic		Replies	Views
Potential bug with escape sequences on Regex Help	3	564	May 1, 2023
Find and replace "<br>" only replacing in some notes? Help	4	545	June 17, 2023
A rusty regex. Has the Browser RegEx search changed? Beta Testing	3	461	February 1, 2023
Regular expressions in searching with the "deck:" tag Development	2	27	December 17, 2024
Cleaning the {{text:Field}} before launching a link Card Design	9	303	November 20, 2023

Do not auto-remove double quotes from re: searches

Here’s a concrete example

Related topics