Getting the OCR addon to run on Ubuntu

Nostalgia · March 19, 2023, 6:54pm

Hey there, maybe someone’s here who can help me to get one of the OCR add-ons (https://ankiweb.net/shared/info/450181164)(https://ankiweb.net/shared/info/1746010116) running on Ubuntu (latest version of Anki and latest LTS release of Ubuntu). I’ve tried the whole day but failed miserably.

So obviously I somehow need to install tesseract-ocr. That I plainly tried with sudo apt install tesseract-ocr. Then I figured that I might need pytesseract, so I installed pip3 and did pip install pytesseract. But that’s about as far as I got…I tried to set the Pythonpath to the pytesseract script but I’m not sure whether that was successful and the Anki addons still tell me that either tesseract is not installed or the path is not set right.

Nostalgia · March 21, 2023, 7:57am

Luckily I was able to get it running in the meantime

gustavosmen · March 21, 2023, 8:15pm

Please, if you could explain it, it would be great!

Nostalgia · March 22, 2023, 9:40am

Sure, I’ll try. So as I said, I first tried to install everything with simple terminal commands:
sudo apt install tesseract-ocr -y
sudo apt install libtesseract-dev -y
sudo apt install python3-pip -y
pip install pytesseract

Then I tried to set the pythonpath to the folder with the pytesseract script because I suspected that it generated an error. This should be able to be done by adding this line to the very end of the bashrc file and then saving it: export PATH=“<path_to_pytesseract_file>:$PATH”

But here’s the thing: I don’t know how much of this was actually relevant to get the thing working in the end. I suspect that pytesseract should be needed as well as libtesseract-dev as I read somewhere it’s a dependency of pytesseract. No certainty here.

What really did the trick was actually carefully following the instructions provided by tesseract (building from git):

sudo apt-get install automake ca-certificates g++ git libtool libleptonica-dev make pkg-config

for necessary tools.

git clone GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)

getting the master branch

cd tesseract
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

building the stuff. I ran all the commands after each other but I guess you could just copy and paste the whole block.

After that, the thing was still not running, because I forgot to put the traineddata of the language I wanted (eng) into /usr/local/share/tessdata. This I had to do with a command because the directory was protected, so I had to do sudo mv eng.traineddata /usr/local/share/tessdata

After that, the thing was running. I’d be very interested to hear if it works for others as well.

Topic		Replies	Views
How to get TTS working on Linux (Ubuntu)? Help	3	345	January 7, 2025
pasteOCR adds on not working Add-ons	2	309	January 22, 2024
Anki.Help.Linux Help	8	615	May 1, 2023
Anki doesn't start with Kubuntu 20.04 Help	3	263	January 8, 2023
Guide: How to build and run Anki from Source with Xubuntu 20.04 Development	2	2416	May 1, 2023

Getting the OCR addon to run on Ubuntu

Related topics