, , , , , , , , , , , ,

Monita, my Hispanic maid, has had a great idea for a children’s TV programme. ‘I will call it Mr Fruit,’ she told me, whilst working a Brillo Pad over my steel bathtub. ‘Aren’t you sailing a bit close to the wind, legally speaking, with regard to the Mr Men franchise?’ I replied. She screwed up her nose and spat on the scourer. ‘Let the dogs try sue me, sweetie!’ was her defiant response.

Anyhow, to help her create a pilot of the show – in an attempt to sell it to the media bigwigs – I unpacked my tiny Raspberry Pi computer and dusted down my webcam. Here’s the plan…

Monita will create a stage for her fruity characters to appear on, a simple model made from a cardboard box and a sheet of A4 paper as a backdrop. Upon the backdrop will appear the name of the character presently on stage.

I will point the webcam at the stage. Python code on my Raspberry Pi will receive a snap from the webcam, using it to determine the character in shot. To figure out the fruit, it will use OCR (optical character recognition) software to read the name of the character from the backdrop. Upon success, the code will announce through a set of speakers that particular character’s catchphrase. Those media bigwigs will burst into spontaneous rapturous applause and Monita will be filthy rich. I, on the other hand, will insert an ad in the local paper for a replacement maid.

Okay, so first I need to hunt down some OCR software to run on my Raspberry Pi. Tesseract is much touted, and actively supported by Google, so seems a good fit. Let’s get it installed on the Pi:

sudo apt-get install tesseract-ocr

Next up, I need a Python wrapper, so as to use Tesseract from my code. Enter PyTesser. I simply downloaded the zip file and ran my own code alongside it, in the same directory.

Let’s have a look at the main program for detecting the toy fruits:

import cv2
from PIL import Image
from pytesser import *
from speech import Speech
from time import sleep

speech = Speech()

IMAGE_FILE = 'mister_fruits.jpg'

# loop forever
while True:

    # save image from webcam
    img = cv2.VideoCapture(0).read()[1]
    cv2.imwrite(IMAGE_FILE, img)

    # load image
    img = Image.open(IMAGE_FILE)

    # detect words in image
    words = image_to_string(img).strip()
    print words

    # annouce the arrival of Mr Puce! 
    if(words == 'Mr Puce'):
        speech.text_to_speech("Watch out, here comes Mr Puce")


Pretty straightforward. Whilst in a loop, we first grab an image from the webcam and save it to file using OpenCV. Next we load the image in a format suitable for passing to PyTesser’s image_to_string method. If Tesseract OCR works well, we will receive the name of the fruit character as it appears in the image. Google’s Text To Speech service is then employed to announce the character’s catchphrase through some speakers, which, in the case of the pilot is “Watch out, here comes Mr Puce”.

Time for a demo. I run the code, and it saves the following webcam image to file:


But has the OCR software recognised the character’s name in the photo? Here’s a screenshot of the program output:


Hurray! The words in the image have been successfully converted to text by the Tesseract engine and the catchphrase of Mr Puce, the grumpy tomato, has been announced through a set of speakers attached to the Pi. Children the land over will marvel at the slick puppetry, dragging mothers to department stores to snap up a range of vitamin-based merchandise.

‘So, who’s first on the list?’ I asked Monita expectantly, ‘The CBeebies channel? Or maybe Nickelodeon?’

Her eyes grew as wide as saucers. ‘Darling, this only for village fête. It just a piece of shit, really.’

Deflated, I packed my Raspberry Pi in its box and tossed the webcam back into the corner of the room to gather dust.


Here’s the Speech class used by the main program, which calls Google’s Text To Speech service:

from subprocess import PIPE, call
import urllib
class Speech(object):
    # converts text to speech
    def text_to_speech(self, text):
            # truncate text as google only allows 100 chars
            text = text[:100]
            # encode the text
            query = urllib.quote_plus(text)
            # build endpoint
            endpoint = "http://translate.google.com/translate_tts?tl=en&q=" + query
            # debug
            # get google to translate and mplayer to play
            call(["mplayer", endpoint], shell=False, stdout=PIPE, stderr=PIPE)
            print ("Error translating text")