OCR at your fingertips (part 2)
March 28, 2020.Here’s a five minute tutorial on how to bind OCR capabilities to a shortcut on Mac. It’s really that simple!
High-level overview
The key insight is that those tasks become trivial if only I could take a screenshot, and magically paste it as text. Let’s develop this magic. The plan is:
- You hit
ctrl shift cmd 4 to save a portion of the screen in pasteboard - You hit a special paste shortcut that we create:
ctrl cmd v - Image in pasteboard is sent through OCR
- Resulting text is pasted onto the frontmost app
which all turn out to be doable using the following open-source packages:
Installing dependencies
With Homebrew:
brew install pngpaste tesseract
or manually install pngpaste and tesseract. Ensure that they are installed by running:
pngpaste -v
and
tesseract --version
New toys
Try it out! Hit
pngpaste - | tesseract stdin stdout | pbcopy
This uses pngpaste
to send the image to our OCR tool, tesseract
. The result is sent to pbcopy
which places the resulting scanned text into our pasteboard, ready to be pasted. Try selecting this paragraph, running the above, and pasting the result into this text box:
Hopefully the results are acceptable! Hint: The next article will deal with refining our OCR results.
If nothing happened, try running pngpaste - | tesseract stdin stdout
in a terminal, with an image in your pasteboard. Most likely, you need to set up Tesseract with language data.
The rise of automation
To truly get OCR to our fingertips, we’d like to run pngpaste - | tesseract stdin stdout | pbcopy
at the touch of a button.
Open up Automator.app (preinstalled on all Macs) and create a Service. Then drag in “Run Shell Script” from the left, and enter what we had above:
PATH="/usr/local/bin/:$PATH"
pngpaste - | tesseract stdin stdout | pbcopy
(I prepended /usr/local/bin/
to PATH
, since that is where Homebrew installs pngpaste
and tesseract
for me.)
Add this to the end if you want to automatically paste the result afterwards!
osascript -e 'tell application "System Events" to keystroke "v" using {command down}'
Important: At the top, select “Service receives no input in any application.” We use pngpaste
for input, so Automator would otherwise complain about input.
Here’s what you should end up with:
Save and give this service a name, like “Run OCR”.
Binding to a shortcut ctrl cmd v
After saving this service with a name (say, Run OCR), open up System Preferences > Keyboard > Shortcuts > Services, and scroll all the way down to find Run OCR. All that’s left to do is click on Run OCR to bind a shortcut. I use
Usage: Try taking a screenshot of this paragraph with
Feeling powerful yet?
Different languages
Something I do with OCR is translate comics that aren’t in English. For tesseract
, this means you’d need to download language data to recognize languages that are not English. You can find this language data here. If you want to have tesseract
recognize Korean as well as English, for example, then download and move kor.traineddata
into the $TESSDATA_PREFIX
directory. Then change the tesseract
command like so:
tesseract -l eng+kor stdin stdout
You can make the list as long as you want, like eng+chi_sim+chi_tra+jpn+kor
. Be warned that runtime becomes noticeably long after more than three languages, at least in my experience.
Wrap-up
With very little code, we’ve bound OCR to a hotkey. In the next post we’ll explore ways to get more accurate results with OCR.
OCR at your fingertips (part 2) (permalink) (tweet)OCR at your fingertips (part 1)