, , , , , , , , , , , , ,

SaltwashAR – my Python Augmented reality application – has a new OCR (Optical Character Recognition) feature. Every time a robot faces the webcam, it reads out the words in the webcam image. Let’s take a gander:

Great. Sporty Robot is reading out the lyric from my song The Everyman.

But how does this work? First we need to get our filthy paws on a OCR module for Python – I’ve plumped for PyTesser, a wrapper for the open source Tesseract OCR engine sponsored by Google.

PyTesser 0.0.1 can be downloaded from https://pypi.python.org/pypi/PyTesser/ (note PIL dependency)

To use PyTesser on my Windows 7 64-bit PC with Python Tools for Visual Studio I had to:

  1. Rename the downloaded pytesser_v0.0.1 folder to pytesser, and copy it to my Python site-packages folder C:\Anaconda2\Lib\site-packages
  2. Rename the pytesser.py file to __init__, open the file in an editor, and change the lines import Image to from PIL import Image and tesseract_exe_name = ‘tesseract’ to tesseract_exe_name = ‘C:\Anaconda2\Lib\site-packages\pytesser\\tesseract’ (note the double backslash at \\tesseract, as a single backslash will be interpreted as the new tab symbol \t. Oh Lord!)

Now let’s look at the code for the new SaltwashAR feature:

from features.base import Feature, Speaking
from pytesser import *

class OpticalCharacterRecognition(Feature, Speaking):
    # define region of interest
    TOP_BORDER = 20
    LEFT_BORDER = 120
    RIGHT_BORDER = 120

    def __init__(self, text_to_speech):
        Speaking.__init__(self, text_to_speech)

    def _thread(self, args):
        image = args
        # get region of interest
        height, width = image.shape[:2]
        roi = image[self.TOP_BORDER:height-self.BOTTOM_BORDER, self.LEFT_BORDER:width-self.RIGHT_BORDER]

        # convert image format
        roi = Image.fromarray(roi)     

        # get text from image
        text = image_to_string(roi)

        # convert text to speech

The _thread method is where all the shit happens, as a man of the cloth would say. We grab the latest webcam image passed in via the args parameter:


The printed words in the image are font Times New Roman size 48.

Note how the image also has a square pattern in it – this is the 2D marker upon which we render our 3D robot.

Now, we don’t want to inspect the entire webcam image for text – as it will likely have other stuff in it that will confuse PyTesser – so instead we cut out a region of interest:


Superb – PyTesser should have little trouble converting our region of interest into text. If you want to adjust the region of interest for the webcam image, simply tweak the values of the TOP_BORDER, BOTTOM_BORDER, LEFT_BORDER and RIGHT_BORDER constants.

Next, we need to convert the region of interest to a PIL image format (as our webcam image was obtained using OpenCV).

We are now ready to pass our region of interest to PyTesser, using its image_to_string function. The function spits out the text it has managed to recognise in the image. Sometimes it gets things wrong. But don’t we all.

Armed with the text, we can use the Text To Speech functionality built into SaltwashAR to let Sporty Robot speak the words to us. The God damn robot has learnt to read!

And that is that. Please check out the SaltwashAR Wiki for details on how to install and help develop the SaltwashAR Python Augmented Reality application.



If you want to check the region of interest, simply import cv2 and add the following debug code:

cv2.imshow('ocr roi', roi)

A window will pop up, showing you the part of the webcam image that PyTesser will attempt to find words in.