Tags

, , , , , , , , , , , , ,

In my previous post, Hand gestures with OpenCV and OpenGL, I was able to manipulate a 3D cube with hand gestures:

Cool. But what about some augmented reality, where the cube is actually floating in front of us? No problem:

As per the previous post, I am using OpenCV computer vision to obtain a snap from the webcam and inspect it for hand gestures. But this time I am also using the snap as a background for my spinning cube.

If the Okay hand gesture is detected, I am using OpenGL to blend and render the cube. However, if the Vicky hand gesture is detected, the cube will vanish.

Let’s look at the Python code, with a walk through the Hand Tracker class. First up, the import statements and initializer:

from OpenGL.GL import *
from OpenGL.GLUT import *
from OpenGL.GLU import *
import cv2
from PIL import Image
from webcam import Webcam
from detection import Detection
 
class HandTracker:
 
    def __init__(self):
        self.webcam = Webcam()
        self.webcam.start()
         
        self.detection = Detection()
 
        self.x_axis = 0.0
        self.z_axis = 0.0
        self.show_cube = False
        self.texture_background = None
        self.texture_cube = None

We are importing the required OpenGL and OpenCV libraries, as well as my Webcam and Detection classes (which can be found in said previous post).

__init__ initializes our Webcam and Detection classes. It also sets class instance variables for rotating the cube on its axis, managing textures for the cube and background, as well as a flag to determine if the cube should be shown.

def _init_gl(self, Width, Height):
    glClearColor(0.0, 0.0, 0.0, 0.0)
    glClearDepth(1.0)
    glDepthFunc(GL_LESS)
    glEnable(GL_DEPTH_TEST)
    glShadeModel(GL_SMOOTH)
    glMatrixMode(GL_PROJECTION)
    glLoadIdentity()
    gluPerspective(45.0, float(Width)/float(Height), 0.1, 100.0)
    glMatrixMode(GL_MODELVIEW)
        
    # enable texture
    glEnable(GL_TEXTURE_2D)
    self.texture_background = glGenTextures(1)
    self.texture_cube = glGenTextures(1)

    # create cube texture 
    image = Image.open("devil.jpg")
    ix = image.size[0]
    iy = image.size[1]
    image = image.tostring("raw", "RGBX", 0, -1)

    glBindTexture(GL_TEXTURE_2D, self.texture_cube)
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST)
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST)
    glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, image)

_init_gl will only be executed once, at the beginning of our program. It takes care of loading and binding the texture for our cube (the face of a devil!) alongside some initial OpenGL settings.

def _draw_scene(self):
    # handle any hand gesture
    self._handle_gesture()
 
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
    glLoadIdentity()

    # draw background
    glBindTexture(GL_TEXTURE_2D, self.texture_background)
    glPushMatrix()
    glTranslatef(0.0,0.0,-11.2)
    self._draw_background()
    glPopMatrix()

    # draw cube if enabled
    if self.show_cube:
        glColor4f(1.0, 1.0, 1.0, 1.0)
        glBlendFunc(GL_SRC_ALPHA, GL_ONE)
        glEnable(GL_BLEND)
        glDisable(GL_DEPTH_TEST)

        glBindTexture(GL_TEXTURE_2D, self.texture_cube)
        glPushMatrix()
        glTranslatef(0.0,0.0,-7.0)
        glRotatef(self.x_axis,1.0,0.0,0.0)
        glRotatef(0.0,0.0,1.0,0.0)
        glRotatef(self.z_axis,0.0,0.0,1.0)
        self._draw_cube()
        glPopMatrix()

        glDisable(GL_BLEND)
        glEnable(GL_DEPTH_TEST)

        # update rotation values
        self.x_axis = self.x_axis - 10
        self.z_axis = self.z_axis - 10
 
    glutSwapBuffers()

_draw_scene will be executed continually by our program, redrawing the window. This is where all the shit happens, as the kids would say.

First we call our _handle_gesture method, which uses OpenCV to attempt to detect a hand gesture in our webcam (more on that later).

Then we draw the background, which is the current webcam snap.

Next – and only if our class instance variable show_cube is enabled – do we draw our cube. The cube is drawn on top of the background (notice how its glTranslatef z value is -7.0, which is closer to the screen that the background’s glTranslatef z value of -11.2). We blend the cube to make it transparent. We rotate the cube.

Finally, we update the rotation values of our class instance variables x_axis and z_axis. Otherwise the cube would stay in the same position each time the window is sketched.

def _handle_gesture(self):
    # get image from webcam 
    image = self.webcam.get_current_frame()
         
    # detect hand gesture in image
    is_okay = self.detection.is_item_detected_in_image('haarcascade_okaygesture.xml', image.copy())
    is_vicky = self.detection.is_item_detected_in_image('haarcascade_vickygesture.xml', image.copy())
 
    if is_okay:
            # okay gesture shows cube
        self.show_cube = True
    elif is_vicky:
            # vicky gesture hides cube
        self.show_cube = False

    # convert image to OpenGL texture format
    image = cv2.flip(image, 0)
    gl_image = Image.fromarray(image)     
    ix = gl_image.size[0]
    iy = gl_image.size[1]
    gl_image = gl_image.tostring("raw", "BGRX", 0, -1)
 
    # create background texture
    glBindTexture(GL_TEXTURE_2D, self.texture_background)
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST)
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST)
    glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, gl_image)

As promised, the _handle_gesture method, which is called from our _draw_scene method.

First up, I fetch a snap from the webcam using my Webcam class.

Then I use my Detection class to detect whether there are any hand gestures in the snap. If an Okay hand gesture is detected then I set the show_cube class instance variable to True. If a Vicky hand gesture is detected then I set show_cube to False.

Finally, I need to bind the webcam snap to my background texture. The image format is converted from OpenCV to OpenGL (which also requires the image to be flipped upside-down).

def _draw_background(self):
    # draw background
    glBegin(GL_QUADS)
    glTexCoord2f(0.0, 1.0); glVertex3f(-4.0, -3.0, 4.0)
    glTexCoord2f(1.0, 1.0); glVertex3f( 4.0, -3.0, 4.0)
    glTexCoord2f(1.0, 0.0); glVertex3f( 4.0,  3.0, 4.0)
    glTexCoord2f(0.0, 0.0); glVertex3f(-4.0,  3.0, 4.0)
    glEnd( )

def _draw_cube(self):
    # draw cube
    glBegin(GL_QUADS)
    glTexCoord2f(0.0, 0.0); glVertex3f(-1.0, -1.0,  1.0)
    glTexCoord2f(1.0, 0.0); glVertex3f( 1.0, -1.0,  1.0)
    glTexCoord2f(1.0, 1.0); glVertex3f( 1.0,  1.0,  1.0)
    glTexCoord2f(0.0, 1.0); glVertex3f(-1.0,  1.0,  1.0)
    glTexCoord2f(1.0, 0.0); glVertex3f(-1.0, -1.0, -1.0)
    glTexCoord2f(1.0, 1.0); glVertex3f(-1.0,  1.0, -1.0)
    glTexCoord2f(0.0, 1.0); glVertex3f( 1.0,  1.0, -1.0)
    glTexCoord2f(0.0, 0.0); glVertex3f( 1.0, -1.0, -1.0)
    glTexCoord2f(0.0, 1.0); glVertex3f(-1.0,  1.0, -1.0)
    glTexCoord2f(0.0, 0.0); glVertex3f(-1.0,  1.0,  1.0)
    glTexCoord2f(1.0, 0.0); glVertex3f( 1.0,  1.0,  1.0)
    glTexCoord2f(1.0, 1.0); glVertex3f( 1.0,  1.0, -1.0)
    glTexCoord2f(1.0, 1.0); glVertex3f(-1.0, -1.0, -1.0)
    glTexCoord2f(0.0, 1.0); glVertex3f( 1.0, -1.0, -1.0)
    glTexCoord2f(0.0, 0.0); glVertex3f( 1.0, -1.0,  1.0)
    glTexCoord2f(1.0, 0.0); glVertex3f(-1.0, -1.0,  1.0)
    glTexCoord2f(1.0, 0.0); glVertex3f( 1.0, -1.0, -1.0)
    glTexCoord2f(1.0, 1.0); glVertex3f( 1.0,  1.0, -1.0)
    glTexCoord2f(0.0, 1.0); glVertex3f( 1.0,  1.0,  1.0)
    glTexCoord2f(0.0, 0.0); glVertex3f( 1.0, -1.0,  1.0)
    glTexCoord2f(0.0, 0.0); glVertex3f(-1.0, -1.0, -1.0)
    glTexCoord2f(1.0, 0.0); glVertex3f(-1.0, -1.0,  1.0)
    glTexCoord2f(1.0, 1.0); glVertex3f(-1.0,  1.0,  1.0)
    glTexCoord2f(0.0, 1.0); glVertex3f(-1.0,  1.0, -1.0)
    glEnd()

The _draw_background and _draw_cube methods are fairly self-explanatory. We are applying the appropriate texture (webcam snap for the background, devil face for the cube) and drawing the shapes.

    def main(self):
        # setup and run OpenGL
        glutInit()
        glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH)
        glutInitWindowSize(640, 480)
        glutInitWindowPosition(800, 400)
        glutCreateWindow("OpenGL Hand Tracker")
        glutDisplayFunc(self._draw_scene)
        glutIdleFunc(self._draw_scene)
        self._init_gl(640, 480)
        glutMainLoop()
 
# run instance of Hand Tracker 
handTracker = HandTracker()
handTracker.main()

main is our entry point for all our code. We create our window. We assign our _draw_scene method to each sketch of the window. We execute our aforementioned _init_gl method. Then we sit back and let the program loop forever (or until the window is closed, or crashes).

Indeed, we only need two lines of code outside of our Hand Tracker class. The first line creates an instance of the class. The second line executes its main method.

And that is that! Drop me a line if anything is unclear, for one is eager to cleanse the opaque.

Ciao!

P.S.

The OpenGL transparency and blending article from TutorialsPlay was a great help. As always, the NeHe tutorials were a guiding light.

I ran the code on my Windows 7 PC using Python Tools for Visual Studio.