temporary tm - about - notes atom/rss

OCR at your fingertips (part 2)

.

Here’s a five minute tutorial on how to bind OCR capabilities to a shortcut on Mac. It’s really that simple!

High-level overview

The key insight is that those tasks become trivial if only I could take a screenshot, and magically paste it as text. Let’s develop this magic. The plan is:

which all turn out to be doable using the following open-source packages:

Installing dependencies

With Homebrew:

brew install pngpaste tesseract

or manually install pngpaste and tesseract. Ensure that they are installed by running:

pngpaste -v

and

tesseract --version

New toys

Try it out! Hit ctrlshiftcmd4 and grab a screenshot of some text. Then in a terminal, type

pngpaste - | tesseract stdin stdout | pbcopy

This uses pngpaste to send the image to our OCR tool, tesseract. The result is sent to pbcopy which places the resulting scanned text into our pasteboard, ready to be pasted. Try selecting this paragraph, running the above, and pasting the result into this text box:

Hopefully the results are acceptable! Hint: The next article will deal with refining our OCR results.

If nothing happened, try running pngpaste - | tesseract stdin stdout in a terminal, with an image in your pasteboard. Most likely, you need to set up Tesseract with language data.

The rise of automation

To truly get OCR to our fingertips, we’d like to run pngpaste - | tesseract stdin stdout | pbcopy at the touch of a button.

Open up Automator.app (preinstalled on all Macs) and create a Service. Then drag in “Run Shell Script” from the left, and enter what we had above:

PATH="/usr/local/bin/:$PATH"
pngpaste - | tesseract stdin stdout | pbcopy

(I prepended /usr/local/bin/ to PATH, since that is where Homebrew installs pngpaste and tesseract for me.)

Add this to the end if you want to automatically paste the result afterwards!

osascript -e 'tell application "System Events" to keystroke "v" using {command down}'

Important: At the top, select “Service receives no input in any application.” We use pngpaste for input, so Automator would otherwise complain about input.

Here’s what you should end up with:
automator

Save and give this service a name, like “Run OCR”.

Binding to a shortcut ctrlcmdv

After saving this service with a name (say, Run OCR), open up System Preferences > Keyboard > Shortcuts > Services, and scroll all the way down to find Run OCR. All that’s left to do is click on Run OCR to bind a shortcut. I use ctrlcmdv.

Usage: Try taking a screenshot of this paragraph with ctrlshiftcmd4, and use your shortcut in the text box below, as if you were pasting text. (That’s why I choose ctrlcmdv – it’s almost like pasting).

Feeling powerful yet?

Different languages

Something I do with OCR is translate comics that aren’t in English. For tesseract, this means you’d need to download language data to recognize languages that are not English. You can find this language data here. If you want to have tesseract recognize Korean as well as English, for example, then download and move kor.traineddata into the $TESSDATA_PREFIX directory. Then change the tesseract command like so:

tesseract -l eng+kor stdin stdout

You can make the list as long as you want, like eng+chi_sim+chi_tra+jpn+kor. Be warned that runtime becomes noticeably long after more than three languages, at least in my experience.

Wrap-up

With very little code, we’ve bound OCR to a hotkey. In the next post we’ll explore ways to get more accurate results with OCR.

tagged: tutorials, mac OCR at your fingertips (part 2) (permalink) (tweet)
OCR at your fingertips (part 1) OCR at your fingertips (part 3)