MorphMan: Is it possible to discover the most common phrases?

I have some french series with subtitles. I’m familiar with subs2srs but not with MorphMan… Is there a way to discover the repeated phrases or the collocated words in these subtitles?

Is MorphMan considering “go” and “went” as the same words? And what are the supported languages? Is is useful with phrases?
It is all about how to start learning the easy stuff first, with a fast quick reordering. And I don’t know if there are some solutions.


When I looked at the Morphman code, any latin language used a very simple word matching, so go and went wouldn’t be registered as the same. You just have to specify more in the definition of each word if you feel it needs linking with other words.
French, as well as German, without their own parser (which Morphman doesn’t have afaik) will quickly become a headache because of conjugations from French and agglutination for German.
The languages it deals well with afaik are Japanese, Chinese, and any latin language that doesn’t add weird stuff before or after the root of the word.

It’s no use for sentences the way it is built right now as it decomposes words by space for latin languages, and stores that, and same for both Japanese and Chinese.

At first, you just use morphman to find new words. Then later once you know a lot of words, you can pick up sentences and collocations from your readings or example sentences from the dictionary. Basically you reach a point when Morphman could make a sentence as “comprehension” when actually you do not know the collocations in that sentence. So if you want sometimes look at the sentences morphman has tagged as comprehension, delete the ones you understand, and examine the ones you don’t or where you find grammar/collocation interesting.
It’s possible to do what your want with a syntax analyser/parser, but that would be separately from Morphman. And it gets tricky fast. If I was you I’d either go for the slow method of picking collocations with the new word you learn, explore a word via the dictionary when morphman presents it to you, or get a book specialising in that domain.
Personally I use my time reading sentences in my target language in the dictionary about a specific word I find hard to remember (leech) as a time to start exploring the way the word is used. If a collocation is common, it will appear several time within the 20 first example sentences. If might even be the only way the word is used. I wouldn’t bother before the intermediate stage though as a lack of general vocabulary makes the exercise too hard.