Generating multiple cards from a simple PDF document

I have a PDF document with sentences in Chinese featuring the 2000 most common words. It is formatted the following way on every page:

124- 全/quán/ – All, whole

我们 全都通过了考试。
We all passed the exam.

125- 三/sān/ – Three

我 三天后回来。
I’ll be back in three days.

126- 又/yòu/ – And, also, again

他 又一次让我们失望了。
He failed us once again.

So that’s number line with the word, blank line, chinese sentence, english translation, blank line, next number line. There are no page numbers or other formatting quirks as far as I can tell.

I was wondering if it were possible to generate cards by taking the sentence in Chinese as the front and glomming together the numbered word lines and English translations as the back. I have no idea where to start with this sort of automation. With more than 200 pages, there are too many entries for it to be worth doing manually. Any hints would be much appreciated. :grinning:

PDFs make it hard to access the text properly. Try to copy paste the passage into a decent text editor like Notepad++. It should preserve some sort of structure and you can then use regex replacing to convert it into a text format Anki can read.
Also, you should use an adequate note type with six fields (number, word, phonetics, English, sentence, translation) rather than just front and back. That way you can do whatever you want with it in the future and aren’t restricted to the one card type you currently need.