Hi @Xirai , thanks for mentioning these cases!
- Sometimes other parts of the sentence get highlighted too.
The target word ζ°[γγ]γ
is broke down (tokenized) by punctuation marks (square brackets in this case) into tokens:
ζ°[γγ]γ
ζ°
γγ
γ
Then each token is highlighted in the text individually.
There is no special logic for furigana.
Unfortunately, in this case, I canβt see an ability to distinguish the 2nd γ
to skip it. Language processing is a complex task where false-positive and false-negative matches sometimes happen. No magic
- It doesnβt work right when the word in the sentence use a different form.
It requires using morphological analysis (word dictionaries) to support word forms for Japanese language. Current addon version relies on regular expressions only.
For English, word forms are handled by replacing 2 last characters (for words longer 3 symbols). E.g. study
β stu*
which matches study
, studies
, studying
, studied
, etc. But it doesnβt work for Japanese.
Maybe Iβll add morphological analysis in future versions, but in v1 I can provide only regular expressions.
Despite I canβt fix these 2 cases, please, proceed reporting any anomalies in highlighting or user interface. They allows me to fix issues I canβt mention with my dataset.