, , , , , , , , , , , , ,

In my last post, Robot girlfriend using Augmented Reality, I built a robot girlfriend for Arkwood out of some Python code.

The robot uses the following technologies for an augmented reality experience:

  • Blender to create a 3D robot
  • OpenCV computer vision to detect a 2D glyph in my webcam
  • OpenGL graphics library to render the robot upon the glyph

I also used Text To Speech and Speech To Text technologies to permit Arkwood to hold a conversation with his android sweetheart.

Today, my seedy Belgian chum said, ‘I want to marry her!’

Wow, that was quick. You see, he has only been dating the robot for six days. ‘No problem,’ I told him, ‘But she’ll need to seek her father’s permission.’

Here’s the plan. Once the robot father appears on the scene, the robot daughter will tell her daddy “I want to get married”.

Now, unfortunately for my buddy, the robot father knows just how pathetic and vile Arkwood really is, and so will promptly reply to his robot daughter, “no you cannot”.

Let’s take a peek at the two robots in chat (the daughter is pink, the father is blue):

Cool. The robot daughter uses Text To Speech to tell her father that she wants to get married. The robot father can listen to what his daughter says using Speech To Text – and only if he hears her say “I want to get married” will he use Text To Speech to tell her “no you cannot”.

And so on the conversation goes, back and forth – both robots chatting to each other, using Speech To Text to decipher what the other is saying and replying accordingly!

I told Arkwood the bad news. ‘Sorry, old fruit. But your sweetheart’s father will not permit the marriage.’

Arkwood blew into a rage. ‘Well, you better get coding some more Python! I want her to grind the old boy down.’



I ran the code on my Windows 7 PC using Python Tools for Visual Studio.

Now for the Python code. First, a new Marriage class:

from texttospeech import TextToSpeech
from speechtotext import SpeechToText

class Marriage:

    DAUGHTER_QUESTION = 'I want to get married'
    FATHER_ANSWER = 'no you cannot'
    FATHER_RATE = 50

    def __init__(self):
        self.text_to_speech = TextToSpeech()
        self.speech_to_text = SpeechToText()
        self.conversation_started = False
    def seek_permission(self):

        # check conversation has started
        if self.conversation_started == False:
            self.conversation_started = True

            self.text_to_speech.set_text_and_rate(self.DAUGHTER_QUESTION, self.DAUGHTER_RATE)

        # obtain speech
        speech = self.speech_to_text.get_text()

        # reply to speech
        if speech == self.DAUGHTER_QUESTION:
            self.text_to_speech.set_text_and_rate(self.FATHER_ANSWER, self.FATHER_RATE)
        elif speech == self.FATHER_ANSWER:
            self.text_to_speech.set_text_and_rate(self.DAUGHTER_QUESTION, self.DAUGHTER_RATE)

Our Marriage class takes care of orchestrating the Text To Speech and Speech To Text between the two robots. I use my computer speakers to play Text To Speech. I use my computer microphone to detect Speech To Text.

Next up, the amended Text To Speech class:

import pyttsx
from threading import Thread

class TextToSpeech:

    def __init__(self):
        self.pyttsx = pyttsx.init()
        self.text = None
        self.rate = None
    # create thread for converting text
    def start(self):
        Thread(target=self._convert_text, args=()).start()
    def _convert_text(self):
        while True:
            if self.text:
                print self.text

                self.pyttsx.setProperty('rate', self.rate)

                self.text = None
                self.rate = None

    # set the current text and rate
    def set_text_and_rate(self, text, rate):
        self.text = text
        self.rate = rate

You’ll notice that I am changing the rate of the speech depending on whether the daughter or father is talking (the father speaks more slowly). I was rather hoping to change the voice between female and male instead, but my Windows 7 PC only has one voice: Microsoft Anna.

Here’s the Speech To Text class:

import speech_recognition as sr
from threading import Thread

class SpeechToText:

    def __init__(self):
        self.recognizer = sr.Recognizer()
        self.text = None
    # create thread for capturing text
    def start(self):
        Thread(target=self._capture_text, args=()).start()
    def _capture_text(self):
        while True:
            if self.text == None:

                with sr.Microphone() as source:
                    print "listening..."
                    audio = self.recognizer.listen(source)

                    self.text = self.recognizer.recognize_google(audio)
                    print self.text
                except sr.UnknownValueError:
                    print("Google Speech Recognition could not understand audio")
                except sr.RequestError:
                    print("Could not request results from Google Speech Recognition service")

    # get the current text
    def get_text(self):
        text = self.text
        self.text = None
        return text

Pretty much as before. We’re using Google Speech Recognition to work out what the robots are saying to each other.

And here’s the main OpenGL program:

from OpenGL.GL import *
from OpenGL.GLUT import *
from OpenGL.GLU import *
import cv2
from PIL import Image
import numpy as np
from webcam import Webcam
from glyphs import Glyphs
from marriage import Marriage
from objloader import *
from constants import *

class OpenGLRobot:
    # constants
    INVERSE_MATRIX = np.array([[ 1.0, 1.0, 1.0, 1.0],
                               [ 1.0, 1.0, 1.0, 1.0]])

    def __init__(self):
        # initialise webcam and start thread
        self.webcam = Webcam()

        # initialise glyphs
        self.glyphs = Glyphs()
        self.glyphs_cache = None

        # initialise marriage
        self.marriage = Marriage()

        # initialise robots
        self.robot_daughter = None 
        self.robot_father = None 

        # initialise texture
        self.texture_background = None

    def _init_gl(self, Width, Height):
        glClearColor(0.0, 0.0, 0.0, 0.0)
        gluPerspective(33.7, 1.3, 0.1, 100.0)
        # assign robots
        self.robot_daughter = OBJ('robot_daughter.obj')
        self.robot_father = OBJ('robot_father.obj')
        # assign texture
        self.texture_background = glGenTextures(1)

    def _draw_scene(self):

        # get image from webcam
        image = self.webcam.get_current_frame()

        # convert image to OpenGL texture format
        bg_image = cv2.flip(image, 0)
        bg_image = Image.fromarray(bg_image)     
        ix = bg_image.size[0]
        iy = bg_image.size[1]
        bg_image = bg_image.tostring("raw", "BGRX", 0, -1)
        # create background texture
        glBindTexture(GL_TEXTURE_2D, self.texture_background)
        glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, bg_image)
        # draw background
        glBindTexture(GL_TEXTURE_2D, self.texture_background)

        # handle glyphs
        glyphs = self._handle_glyphs(image)
        # handle marriage
        glyph_names = [glyph[GLYPH_NAME_INDEX] for glyph in glyphs]

        if ROBOT_DAUGHTER in glyph_names and ROBOT_FATHER in glyph_names:


    def _handle_glyphs(self, image):

        # attempt to detect glyphs
        glyphs = []

            glyphs = self.glyphs.detect(image)
        except Exception as ex: 

        # manage glyphs cache
        if glyphs:
            self.glyphs_cache = glyphs
        elif self.glyphs_cache: 
            glyphs = self.glyphs_cache
            self.glyphs_cache = None
            return glyphs

        for glyph in glyphs:
            rvecs, tvecs, _, glyph_name = glyph

            # build view matrix
            rmtx = cv2.Rodrigues(rvecs)[0]

            view_matrix = np.array([[rmtx[0][0],rmtx[0][1],rmtx[0][2],tvecs[0]],
                                    [0.0       ,0.0       ,0.0       ,1.0    ]])

            view_matrix = view_matrix * self.INVERSE_MATRIX

            view_matrix = np.transpose(view_matrix)

            # load view matrix and draw cube

            if glyph_name == ROBOT_DAUGHTER:
            elif glyph_name == ROBOT_FATHER:

            glColor3f(1.0, 1.0, 1.0)

        return glyphs

    def _draw_background(self):
        # draw background
        glTexCoord2f(0.0, 1.0); glVertex3f(-4.0, -3.0, 0.0)
        glTexCoord2f(1.0, 1.0); glVertex3f( 4.0, -3.0, 0.0)
        glTexCoord2f(1.0, 0.0); glVertex3f( 4.0,  3.0, 0.0)
        glTexCoord2f(0.0, 0.0); glVertex3f(-4.0,  3.0, 0.0)
        glEnd( )

    def main(self):
        # setup and run OpenGL
        glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH)
        glutInitWindowSize(640, 480)
        glutInitWindowPosition(800, 400)
        self.window_id = glutCreateWindow("OpenGL Robot")
        self._init_gl(640, 480)
# run an instance of OpenGL Robot 
openGLRobot = OpenGLRobot()

The robot daughter is rendered to screen if her glyph has been detected. The robot father is rendered to screen if his glyph has been detected. However, the marriage chat will only occur if both robots are rendered to screen at the same time.