Tags

, , , , , , , , , , , , ,

Arkwood, my weedy Belgian friend, was wailing on my shoulder. ‘She has dumped me for Wayne!’ he cried, turning my lapels into sodden lumps of fabric.

It was true enough. Daphne, the plump spotty girl who works down the chippy, had indeed fallen for Wayne the deep fat frier. Wayne has the upper hand, you see, plying Daphne with free cod and chips.

‘Don’t worry,’ I told him, ‘I will soon fix your broken heart.’ I explained to Arkwood that I had that very minute finished the prototype of a robot girlfriend. ‘She is perfect for you! Why don’t you take her out on a date?’

The robot girlfriend uses the following technologies for an augmented reality experience:

  • Blender to create a 3D robot
  • OpenCV computer vision to detect a 2D glyph in my webcam
  • OpenGL graphics library to render the robot upon the glyph

I have also used Speech To Text and Text To Speech technologies to permit Arkwood to hold a conversation with his android sweetheart.

The Python code is at the foot of the post. But first, let’s watch the flames of romance flicker as Arkwood talks to his robot girlfriend.

Brings a tear to the eye! Granted, his robot girlfriend ain’t much of a looker, though we can apply some cosmetic surgery in time. We can also add lots of conversation, rather than just an answer to the question “what is your name”.

I quietly slipped out of the room – to turn a few pages of The Inscrutable Diaries Of Rodger Saltwash – keen to leave my buddy alone with his bionic babe.

Ciao!

P.S.

I ran the code on my Windows 7 PC using Python Tools for Visual Studio.

My previous post, 3D book reviews using Augmented Reality, provides the lowdown on augmented reality using Blender, OpenCV and OpenGL.

Here’s my main OpenGL program, adapted for the robot girlfriend:

from OpenGL.GL import *
from OpenGL.GLUT import *
from OpenGL.GLU import *
import cv2
from PIL import Image
import numpy as np
from webcam import Webcam
from glyphs import Glyphs
from speechtotext import SpeechToText
from texttospeech import TextToSpeech
from objloader import *
from constants import *

class OpenGLRobot:
 
    # constants
    INVERSE_MATRIX = np.array([[ 1.0, 1.0, 1.0, 1.0],
                               [-1.0,-1.0,-1.0,-1.0],
                               [-1.0,-1.0,-1.0,-1.0],
                               [ 1.0, 1.0, 1.0, 1.0]])

    def __init__(self):
        # initialise webcam and start thread
        self.webcam = Webcam()
        self.webcam.start()

        # initialise glyphs
        self.glyphs = Glyphs()
        self.glyphs_cache = None

        # initialise speech to text
        self.speechtotext = SpeechToText()
        self.speechtotext.start()

        # initialise text to speech
        self.texttospeech = TextToSpeech()
        self.texttospeech.start()

        # initialise robot
        self.robot = None 

        # initialise texture
        self.texture_background = None

    def _init_gl(self, Width, Height):
        glClearColor(0.0, 0.0, 0.0, 0.0)
        glClearDepth(1.0)
        glDepthFunc(GL_LESS)
        glEnable(GL_DEPTH_TEST)
        glShadeModel(GL_SMOOTH)
        glMatrixMode(GL_PROJECTION)
        glLoadIdentity()
        gluPerspective(33.7, 1.3, 0.1, 100.0)
        glMatrixMode(GL_MODELVIEW)
        
        # assign robot
        self.robot = OBJ('robot.obj')
        
        # assign texture
        glEnable(GL_TEXTURE_2D)
        self.texture_background = glGenTextures(1)

    def _draw_scene(self):
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
        glLoadIdentity()

        # get image from webcam
        image = self.webcam.get_current_frame()

        # convert image to OpenGL texture format
        bg_image = cv2.flip(image, 0)
        bg_image = Image.fromarray(bg_image)     
        ix = bg_image.size[0]
        iy = bg_image.size[1]
        bg_image = bg_image.tostring("raw", "BGRX", 0, -1)
 
        # create background texture
        glBindTexture(GL_TEXTURE_2D, self.texture_background)
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST)
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST)
        glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, bg_image)
        
        # draw background
        glBindTexture(GL_TEXTURE_2D, self.texture_background)
        glPushMatrix()
        glTranslatef(0.0,0.0,-10.0)
        self._draw_background()
        glPopMatrix()

        # handle glyphs
        glyphs_handled = self._handle_glyphs(image)

        if glyphs_handled:
            
            # get question for robot
            question = self.speechtotext.get_text()

            # robot's answer
            if question:
                if question == 'what is your name':
                    self.texttospeech.set_text('my name is roberta')
                else:
                    self.texttospeech.set_text('what did you say, honey?')

        glutSwapBuffers()

    def _handle_glyphs(self, image):

        # attempt to detect glyphs
        glyphs = []

        try:
            glyphs = self.glyphs.detect(image)
        except Exception as ex: 
            print(ex)

        # manage glyphs cache
        if glyphs:
            self.glyphs_cache = glyphs
        elif self.glyphs_cache: 
            glyphs = self.glyphs_cache
            self.glyphs_cache = None
        else:
            return False

        for glyph in glyphs:
            
            rvecs, tvecs, _, _ = glyph

            # build view matrix
            rmtx = cv2.Rodrigues(rvecs)[0]

            view_matrix = np.array([[rmtx[0][0],rmtx[0][1],rmtx[0][2],tvecs[0]],
                                    [rmtx[1][0],rmtx[1][1],rmtx[1][2],tvecs[1]],
                                    [rmtx[2][0],rmtx[2][1],rmtx[2][2],tvecs[2]],
                                    [0.0       ,0.0       ,0.0       ,1.0    ]])

            view_matrix = view_matrix * self.INVERSE_MATRIX

            view_matrix = np.transpose(view_matrix)

            # load view matrix and draw cube
            glPushMatrix()
            glLoadMatrixd(view_matrix)
            glCallList(self.robot.gl_list)
            glColor3f(1.0, 1.0, 1.0)
            glPopMatrix()

        return True

    def _draw_background(self):
        # draw background
        glBegin(GL_QUADS)
        glTexCoord2f(0.0, 1.0); glVertex3f(-4.0, -3.0, 0.0)
        glTexCoord2f(1.0, 1.0); glVertex3f( 4.0, -3.0, 0.0)
        glTexCoord2f(1.0, 0.0); glVertex3f( 4.0,  3.0, 0.0)
        glTexCoord2f(0.0, 0.0); glVertex3f(-4.0,  3.0, 0.0)
        glEnd( )

    def main(self):
        # setup and run OpenGL
        glutInit()
        glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH)
        glutInitWindowSize(640, 480)
        glutInitWindowPosition(800, 400)
        self.window_id = glutCreateWindow("OpenGL Glyphs")
        glutDisplayFunc(self._draw_scene)
        glutIdleFunc(self._draw_scene)
        self._init_gl(640, 480)
        glutMainLoop()
 
# run an instance of OpenGL Robot 
openGLRobot = OpenGLRobot()
openGLRobot.main()

And my Speech To Text class, which used a Python Speech Recognition library to convert speech to text (I’ve configured the library to target Google Speech Recognition):

import speech_recognition as sr
from threading import Thread
import time

class SpeechToText:

    def __init__(self):
        self.recognizer = sr.Recognizer()
        self.text = None
         
    # create thread for capturing text
    def start(self):
        Thread(target=self._capture_text, args=()).start()
 
    def _capture_text(self):
        while True:
            self.text = None

            with sr.Microphone() as source:
                print "listening..."
                audio = self.recognizer.listen(source)

            try:
                self.text = self.recognizer.recognize_google(audio)
                print self.text
            except sr.UnknownValueError:
                print("Google Speech Recognition could not understand audio")
            except sr.RequestError:
                print("Could not request results from Google Speech Recognition service")
        
            time.sleep(5)

    # get the current text
    def get_text(self):
        text = self.text
        self.text = None
        
        return text

And my Text To Speech class, which uses pyttsx to convert text to speech.

import pyttsx
from threading import Thread

class TextToSpeech:

    def __init__(self):
        self.pyttsx = pyttsx.init()
        self.text = None
         
    # create thread for converting text
    def start(self):
        Thread(target=self._convert_text, args=()).start()
 
    def _convert_text(self):
        while True:
            if self.text:
                
                print self.text
                
                self.pyttsx.say(self.text)
                self.pyttsx.runAndWait()
                
                self.text = None

    # set the current text
    def set_text(self, text):
        self.text = text