, , , , , , , , , , , , ,

I am building a robot girlfriend for my pathetic buddy, Arkwood. In my previous post Voice recognition with Python (Mark II) I wrote some Python code to determine whether Arkwood was saying the word Yes or No into the microphone.

‘But when I say Yes to my girlfriend,’ he moaned, ‘I am sometimes being sarcastic.’

It’s true. Arkwood is the most caustic person I have ever had the displeasure of acquainting myself with.

‘Not to worry,’ I replied, ‘I will build some gesture recognition into your girlfriend too, so that she can tell when you are being sarcastic.’

Here’s the code:

from webcam import Webcam
from detection import Detection
from audiorecord import AudioRecord
from audioanalysis import AudioAnalysis
from time import sleep

webcam = Webcam()
detection = Detection()
audio_record = AudioRecord()
audio_analysis = AudioAnalysis()

while True:
    # girlfriend's question to Arkwood
    print "Do you still love me?"

    # get Arkwood's voice from microphone
    voice_file = audio_record.voice()
    is_yes = audio_analysis.is_yes(voice_file)
    # get Arkwoods's hand gesture from webcam
    image = webcam.get_current_frame()
    is_okay = detection.is_item_detected_in_image('haarcascade_okaygesture.xml', image.copy())
    is_vicky = detection.is_item_detected_in_image('haarcascade_vickygesture.xml', image.copy())
    # Arwood's answer to girlfriend
    if is_yes and is_okay:
        print "I love you!"
    elif is_yes and is_vicky:
        print "Like, I would die a thousand times just to kiss your feet. Whatever."
        print "Snooker's on TV. Can't you bother me later?"

    # give Arkwood a break before nagging him again

Once in a while loop, Arkwood’s android sweetheart asks him “Do you still love me?”

Arkwood speaks into the microphone attached to my PC to say Yes or No (using audio slices as per previous post).

Next we take a snap from the webcam attached to my PC, of Arkwood’s hand gesture. The gesture will be one of two things:


An Okay hand gesture, which tells his girlfriend that he really does love her.


A Vicky hand gesture, which tells his girlfriend that he is just being sarcastic when he says that he loves her.

Now, in order to recognise the hand gesture in the webcam image, I have created two OpenCV Haar Feature-based Cascade Classifiers (details at the foot of this post).

All that is left to do is combine the voice and hand gesture recognition, so that Arkwood can answer his girlfriend accordingly…

If he has spoken the word Yes and made the Okay hand gesture then he responds “I love you!”

If he has spoken the word Yes and made the Vicky hand gesture then he responds sarcastically “Like, I would die a thousand times just to kiss your feet. Whatever.”

In all other cases he simply says “Snooker’s on TV. Can’t you bother me later?”

Time for a demo…



The audio graph and debug output confirm that Arkwood has spoken the word Yes into the microphone.


The Vicky hand gesture captured by the webcam has been successfully recognised by our haarcascade_vickygesture classifier.


With the word Yes and the Vicky hand gesture recognised, Arkwood answers his girlfriend with thick sarcasm.

‘What do you think about your robot girlfriend now?’ I asked him.

‘She’s great, but I’d really like to give her this hand gesture,’ Arkwood replied, moving his fingers vigorously about in the air.

I really can’t tell you what the hand gesture was. Suffice to say he was frogmarched to the bathroom, to wash his mouth out with soap and water.



My post Guitar detection using OpenCV details how to create a haar cascade classifier. I used 22 positive images for each hand gesture, and 200 negative images.

I created my Okay and Vicky training samples:

perl createtrainsamples.pl positives.dat negatives.dat samples 500 "./opencv_createsamples  -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 20 -h 27"
perl createtrainsamples.pl positives.dat negatives.dat samples 500 "./opencv_createsamples  -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 20 -h 26"

And used them to train classifiers:

opencv_haartraining -data haarcascade_osign -vec samples.vec -bg negatives.dat -nstages 40 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 500 -nneg 200 -w 20 -h 27 -nonsym -mem 2048 -mode ALL
opencv_haartraining -data haarcascade_vsign -vec samples.vec -bg negatives.dat -nstages 40 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 500 -nneg 200 -w 20 -h 26 -nonsym -mem 2048 -mode ALL

Our haarcascade_okaygesture and haarcascade_vickygesture classifiers completed 20 stages of training.

I used the following parameters when detecting hand gestures with the classifiers:

items = item_cascade.detectMultiScale(gray_image, scaleFactor=1.1, minNeighbors=4, minSize=(200, 260))

Both classifiers can be found at my repository, along with some images to put them through their paces.

The Webcam and Detection classes can be found on my post Lego detection using OpenCV (Mark III).

Here’s the Okay gesture in its full glory:


With both hand gestures together, our haarcascade_okaygesture classifier picks out the Okay gesture:


…and our haarcascade_vickygesture classifier picks out the Vicky gesture:


Our Vicky gesture with a placebo Thumbs-Up gesture:


Our Okay gesture with a placebo Thumbs-Up gesture:


And I will bid my farewell with a brace of placebos: