Getting the OCR addon to run on Ubuntu

Hey there, maybe someone’s here who can help me to get one of the OCR add-ons (https://ankiweb.net/shared/info/450181164)(https://ankiweb.net/shared/info/1746010116) running on Ubuntu (latest version of Anki and latest LTS release of Ubuntu). I’ve tried the whole day but failed miserably.

So obviously I somehow need to install tesseract-ocr. That I plainly tried with sudo apt install tesseract-ocr. Then I figured that I might need pytesseract, so I installed pip3 and did pip install pytesseract. But that’s about as far as I got…I tried to set the Pythonpath to the pytesseract script but I’m not sure whether that was successful and the Anki addons still tell me that either tesseract is not installed or the path is not set right.

Luckily I was able to get it running in the meantime :slight_smile:

2 Likes

Please, if you could explain it, it would be great! :slight_smile:

Sure, I’ll try. So as I said, I first tried to install everything with simple terminal commands:
sudo apt install tesseract-ocr -y
sudo apt install libtesseract-dev -y
sudo apt install python3-pip -y
pip install pytesseract

Then I tried to set the pythonpath to the folder with the pytesseract script because I suspected that it generated an error. This should be able to be done by adding this line to the very end of the bashrc file and then saving it: export PATH=“<path_to_pytesseract_file>:$PATH”

But here’s the thing: I don’t know how much of this was actually relevant to get the thing working in the end. I suspect that pytesseract should be needed as well as libtesseract-dev as I read somewhere it’s a dependency of pytesseract. No certainty here.

What really did the trick was actually carefully following the instructions provided by tesseract (building from git):

sudo apt-get install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config

for necessary tools.

git clone GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)

getting the master branch

cd tesseract
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

building the stuff. I ran all the commands after each other but I guess you could just copy and paste the whole block.

After that, the thing was still not running, because I forgot to put the traineddata of the language I wanted (eng) into /usr/local/share/tessdata. This I had to do with a command because the directory was protected, so I had to do sudo mv eng.traineddata /usr/local/share/tessdata

After that, the thing was running. I’d be very interested to hear if it works for others as well.