, , , , , , , , , , , ,

In my last post, OpenCV and OpenGL using Python, I was able to detect a Lego policeman in my webcam and then draw the policeman onto a 3D cube.

In this post I will use OpenCV to detect different hand gestures. I will then use OpenGL to manipulate a 3D cube, depending on the hand gesture detected.

But what hand gestures do we detect?

Here’s the Okay hand gesture:


Here’s the Vicky hand gesture:


I will use OpenCV Haar Feature-based Cascade Classifiers to detect the hand gestures made into the webcam. My post Voice and hand gesture recognition has detail on how I went about creating the Okay and Vicky classifiers. Both classifiers – along with some images to put them through their paces – can be found at my repository.

Okay, so now we know which hand gestures can be detected, how are we going to use them to manipulate the 3D cube?

It will work like this. If we detect the friendly Okay hand gesture then the cube will move towards us. If we detect the hostile Vicky hand gesture then the cube will move away from us. If we do not detect any hand gesture then the cube will stay exactly where it is.

We’ll take a look at the Python code later, but first a demo…

Brilliant! We are manipulating the cube using our hands, changing its zoom position.

I’ve used some nice OpenGL effects. Texture mapping allows for a devil to be drawn on the cube. Blending allows the cube to be transparent. Lighting allows the cube to be brighter, adding a reddish tint. As always, the NeHe tutorials are a great help.

Mrs Elderflower, my 76-year-old neighbour, popped round for a donation to the church fête. As I handed her the Baileys cake fresh from the oven, she spied the cube on my computer monitor and let out a gasp.

‘It’s the devil incarnate! Beelzebub is on your screen!’

I explained to Mrs Elderflower that the devil could be moved into the distance with a hand gesture. No sooner had I told her, the old dear was using both of her bony hands, vigorously sticking up two fingers apiece into the webcam. ‘Begone you fucker!’ she screamed. Only once the cube had gone into the distance did her frail body rest easy, her breathing not so frantic.



Here’s the Python code, which combines OpenCV and OpenGL to detect hand gestures and manipulate a 3D cube:

from OpenGL.GL import *
from OpenGL.GLUT import *
from OpenGL.GLU import *
import cv2
from Image import *
from webcam import Webcam
from detection import Detection

class HandTracker:

    def __init__(self):
        self.webcam = Webcam()
        self.detection = Detection()

        self.x_axis = 0.0
        self.y_axis = 0.0
        self.z_axis = 0.0
        self.z_pos = -7.0
    def _handle_gesture(self):
        # get image from webcam 
        image = self.webcam.get_current_frame()
        # detect hand gesture in image
        is_okay = self.detection.is_item_detected_in_image('haarcascade_okaygesture.xml', image.copy() )
        is_vicky = self.detection.is_item_detected_in_image('haarcascade_vickygesture.xml', image.copy() )

        if is_okay:
             # okay gesture moves cube towards us
            self.z_pos = self.z_pos + 1.0
        elif is_vicky:
             # vicky gesture moves cube away from us
            self.z_pos = self.z_pos - 1.0

    def _draw_cube(self):
        # draw cube
        glTexCoord2f(0.0, 0.0); glVertex3f(-1.0, -1.0,  1.0)
        glTexCoord2f(1.0, 0.0); glVertex3f( 1.0, -1.0,  1.0)
        glTexCoord2f(1.0, 1.0); glVertex3f( 1.0,  1.0,  1.0)
        glTexCoord2f(0.0, 1.0); glVertex3f(-1.0,  1.0,  1.0)
        glTexCoord2f(1.0, 0.0); glVertex3f(-1.0, -1.0, -1.0)
        glTexCoord2f(1.0, 1.0); glVertex3f(-1.0,  1.0, -1.0)
        glTexCoord2f(0.0, 1.0); glVertex3f( 1.0,  1.0, -1.0)
        glTexCoord2f(0.0, 0.0); glVertex3f( 1.0, -1.0, -1.0)
        glTexCoord2f(0.0, 1.0); glVertex3f(-1.0,  1.0, -1.0)
        glTexCoord2f(0.0, 0.0); glVertex3f(-1.0,  1.0,  1.0)
        glTexCoord2f(1.0, 0.0); glVertex3f( 1.0,  1.0,  1.0)
        glTexCoord2f(1.0, 1.0); glVertex3f( 1.0,  1.0, -1.0)
        glTexCoord2f(1.0, 1.0); glVertex3f(-1.0, -1.0, -1.0)
        glTexCoord2f(0.0, 1.0); glVertex3f( 1.0, -1.0, -1.0)
        glTexCoord2f(0.0, 0.0); glVertex3f( 1.0, -1.0,  1.0)
        glTexCoord2f(1.0, 0.0); glVertex3f(-1.0, -1.0,  1.0)
        glTexCoord2f(1.0, 0.0); glVertex3f( 1.0, -1.0, -1.0)
        glTexCoord2f(1.0, 1.0); glVertex3f( 1.0,  1.0, -1.0)
        glTexCoord2f(0.0, 1.0); glVertex3f( 1.0,  1.0,  1.0)
        glTexCoord2f(0.0, 0.0); glVertex3f( 1.0, -1.0,  1.0)
        glTexCoord2f(0.0, 0.0); glVertex3f(-1.0, -1.0, -1.0)
        glTexCoord2f(1.0, 0.0); glVertex3f(-1.0, -1.0,  1.0)
        glTexCoord2f(1.0, 1.0); glVertex3f(-1.0,  1.0,  1.0)
        glTexCoord2f(0.0, 1.0); glVertex3f(-1.0,  1.0, -1.0)

    def _init_gl(self, Width, Height):
        glClearColor(0.0, 0.0, 0.0, 0.0)
        gluPerspective(45.0, float(Width)/float(Height), 0.1, 100.0)

        # initialize lighting 
        glLightfv(GL_LIGHT0, GL_AMBIENT, (0.5, 0.5, 0.5, 1.0))
        glLightfv(GL_LIGHT0, GL_DIFFUSE, (1.0, 0.8, 0.0, 1.0)) 

        # initialize blending
        glColor4f(0.2, 0.2, 0.2, 0.5)
        glBlendFunc(GL_SRC_ALPHA, GL_ONE)

        # initialize texture 
        image = open("devil.jpg")
        ix = image.size[0]
        iy = image.size[1]
        image = image.tostring("raw", "RGBX", 0, -1)

        glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, image)

    def _draw_scene(self):
        # handle any hand gesture


        # position and rotate cube

        # position lighting
        glLightfv(GL_LIGHT0, GL_POSITION, (0.0, 0.0, 2.0, 1.0))

        # draw cube

        # update rotation values
        self.x_axis = self.x_axis - 10
        self.z_axis = self.z_axis - 10


    def main(self):
        # setup and run OpenGL
        glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH)
        glutInitWindowSize(640, 480)
        glutInitWindowPosition(800, 400)
        glutCreateWindow("OpenGL Hand Tracker")
        self._init_gl(640, 480)

# run instance of Hand Tracker 
handTracker = HandTracker()

Note: we can turn off depth testing when drawing transparent objects i.e. glDisable(GL_DEPTH_TEST)

The Webcam class to obtain snaps from my webcam:

import cv2
from threading import Thread
class Webcam:
    def __init__(self):
        self.video_capture = cv2.VideoCapture(0)
        self.current_frame = self.video_capture.read()[1]
    # create thread for capturing images
    def start(self):
        Thread(target=self._update_frame, args=()).start()
    def _update_frame(self):
            self.current_frame = self.video_capture.read()[1]
    # get the current frame
    def get_current_frame(self):
        return self.current_frame

The Detection class to detect hand gestures:

import cv2
class Detection(object):
    def is_item_detected_in_image(self, item_cascade_path, image):
        # detect items in image
        item_cascade = cv2.CascadeClassifier(item_cascade_path)
        gray_image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        items = item_cascade.detectMultiScale(gray_image, scaleFactor=1.1, minNeighbors=4, minSize=(200, 260))
        # debug: draw rectangle around detected items 
        for (x,y,w,h) in items:

        # debug: show detected items in window
        cv2.imshow('OpenCV Detection', image)
        # indicate whether item detected in image
        return len(items) > 0

I ran the code on my Windows 7 PC using Python Tools for Visual Studio.