In my last post, Robot girlfriend using Augmented Reality, I built a robot girlfriend for Arkwood out of some Python code.
The robot uses the following technologies for an augmented reality experience:
- Blender to create a 3D robot
- OpenCV computer vision to detect a 2D glyph in my webcam
- OpenGL graphics library to render the robot upon the glyph
I also used Text To Speech and Speech To Text technologies to permit Arkwood to hold a conversation with his android sweetheart.
Today, my seedy Belgian chum said, ‘I want to marry her!’
Wow, that was quick. You see, he has only been dating the robot for six days. ‘No problem,’ I told him, ‘But she’ll need to seek her father’s permission.’
Here’s the plan. Once the robot father appears on the scene, the robot daughter will tell her daddy “I want to get married”.
Now, unfortunately for my buddy, the robot father knows just how pathetic and vile Arkwood really is, and so will promptly reply to his robot daughter, “no you cannot”.
Let’s take a peek at the two robots in chat (the daughter is pink, the father is blue):
Cool. The robot daughter uses Text To Speech to tell her father that she wants to get married. The robot father can listen to what his daughter says using Speech To Text – and only if he hears her say “I want to get married” will he use Text To Speech to tell her “no you cannot”.
And so on the conversation goes, back and forth – both robots chatting to each other, using Speech To Text to decipher what the other is saying and replying accordingly!
I told Arkwood the bad news. ‘Sorry, old fruit. But your sweetheart’s father will not permit the marriage.’
Arkwood blew into a rage. ‘Well, you better get coding some more Python! I want her to grind the old boy down.’
I ran the code on my Windows 7 PC using Python Tools for Visual Studio.
Now for the Python code. First, a new Marriage class:
from texttospeech import TextToSpeech from speechtotext import SpeechToText class Marriage: DAUGHTER_QUESTION = 'I want to get married' FATHER_ANSWER = 'no you cannot' DAUGHTER_RATE = 150 FATHER_RATE = 50 def __init__(self): self.text_to_speech = TextToSpeech() self.speech_to_text = SpeechToText() self.conversation_started = False def seek_permission(self): # check conversation has started if self.conversation_started == False: self.text_to_speech.start() self.speech_to_text.start() self.conversation_started = True self.text_to_speech.set_text_and_rate(self.DAUGHTER_QUESTION, self.DAUGHTER_RATE) return # obtain speech speech = self.speech_to_text.get_text() # reply to speech if speech == self.DAUGHTER_QUESTION: self.text_to_speech.set_text_and_rate(self.FATHER_ANSWER, self.FATHER_RATE) elif speech == self.FATHER_ANSWER: self.text_to_speech.set_text_and_rate(self.DAUGHTER_QUESTION, self.DAUGHTER_RATE)
Our Marriage class takes care of orchestrating the Text To Speech and Speech To Text between the two robots. I use my computer speakers to play Text To Speech. I use my computer microphone to detect Speech To Text.
Next up, the amended Text To Speech class:
import pyttsx from threading import Thread class TextToSpeech: def __init__(self): self.pyttsx = pyttsx.init() self.text = None self.rate = None # create thread for converting text def start(self): Thread(target=self._convert_text, args=()).start() def _convert_text(self): while True: if self.text: print self.text self.pyttsx.setProperty('rate', self.rate) self.pyttsx.say(self.text) self.pyttsx.runAndWait() self.text = None self.rate = None # set the current text and rate def set_text_and_rate(self, text, rate): self.text = text self.rate = rate
You’ll notice that I am changing the rate of the speech depending on whether the daughter or father is talking (the father speaks more slowly). I was rather hoping to change the voice between female and male instead, but my Windows 7 PC only has one voice: Microsoft Anna.
Here’s the Speech To Text class:
import speech_recognition as sr from threading import Thread class SpeechToText: def __init__(self): self.recognizer = sr.Recognizer() self.text = None # create thread for capturing text def start(self): Thread(target=self._capture_text, args=()).start() def _capture_text(self): while True: if self.text == None: with sr.Microphone() as source: print "listening..." audio = self.recognizer.listen(source) try: self.text = self.recognizer.recognize_google(audio) print self.text except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError: print("Could not request results from Google Speech Recognition service") # get the current text def get_text(self): text = self.text self.text = None return text
Pretty much as before. We’re using Google Speech Recognition to work out what the robots are saying to each other.
And here’s the main OpenGL program:
from OpenGL.GL import * from OpenGL.GLUT import * from OpenGL.GLU import * import cv2 from PIL import Image import numpy as np from webcam import Webcam from glyphs import Glyphs from marriage import Marriage from objloader import * from constants import * class OpenGLRobot: # constants INVERSE_MATRIX = np.array([[ 1.0, 1.0, 1.0, 1.0], [-1.0,-1.0,-1.0,-1.0], [-1.0,-1.0,-1.0,-1.0], [ 1.0, 1.0, 1.0, 1.0]]) def __init__(self): # initialise webcam and start thread self.webcam = Webcam() self.webcam.start() # initialise glyphs self.glyphs = Glyphs() self.glyphs_cache = None # initialise marriage self.marriage = Marriage() # initialise robots self.robot_daughter = None self.robot_father = None # initialise texture self.texture_background = None def _init_gl(self, Width, Height): glClearColor(0.0, 0.0, 0.0, 0.0) glClearDepth(1.0) glDepthFunc(GL_LESS) glEnable(GL_DEPTH_TEST) glShadeModel(GL_SMOOTH) glMatrixMode(GL_PROJECTION) glLoadIdentity() gluPerspective(33.7, 1.3, 0.1, 100.0) glMatrixMode(GL_MODELVIEW) # assign robots self.robot_daughter = OBJ('robot_daughter.obj') self.robot_father = OBJ('robot_father.obj') # assign texture glEnable(GL_TEXTURE_2D) self.texture_background = glGenTextures(1) def _draw_scene(self): glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) glLoadIdentity() # get image from webcam image = self.webcam.get_current_frame() # convert image to OpenGL texture format bg_image = cv2.flip(image, 0) bg_image = Image.fromarray(bg_image) ix = bg_image.size iy = bg_image.size bg_image = bg_image.tostring("raw", "BGRX", 0, -1) # create background texture glBindTexture(GL_TEXTURE_2D, self.texture_background) glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST) glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST) glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, bg_image) # draw background glBindTexture(GL_TEXTURE_2D, self.texture_background) glPushMatrix() glTranslatef(0.0,0.0,-10.0) self._draw_background() glPopMatrix() # handle glyphs glyphs = self._handle_glyphs(image) # handle marriage glyph_names = [glyph[GLYPH_NAME_INDEX] for glyph in glyphs] if ROBOT_DAUGHTER in glyph_names and ROBOT_FATHER in glyph_names: self.marriage.seek_permission() glutSwapBuffers() def _handle_glyphs(self, image): # attempt to detect glyphs glyphs =  try: glyphs = self.glyphs.detect(image) except Exception as ex: print(ex) # manage glyphs cache if glyphs: self.glyphs_cache = glyphs elif self.glyphs_cache: glyphs = self.glyphs_cache self.glyphs_cache = None else: return glyphs for glyph in glyphs: rvecs, tvecs, _, glyph_name = glyph # build view matrix rmtx = cv2.Rodrigues(rvecs) view_matrix = np.array([[rmtx,rmtx,rmtx,tvecs], [rmtx,rmtx,rmtx,tvecs], [rmtx,rmtx,rmtx,tvecs], [0.0 ,0.0 ,0.0 ,1.0 ]]) view_matrix = view_matrix * self.INVERSE_MATRIX view_matrix = np.transpose(view_matrix) # load view matrix and draw cube glPushMatrix() glLoadMatrixd(view_matrix) if glyph_name == ROBOT_DAUGHTER: glCallList(self.robot_daughter.gl_list) elif glyph_name == ROBOT_FATHER: glCallList(self.robot_father.gl_list) glColor3f(1.0, 1.0, 1.0) glPopMatrix() return glyphs def _draw_background(self): # draw background glBegin(GL_QUADS) glTexCoord2f(0.0, 1.0); glVertex3f(-4.0, -3.0, 0.0) glTexCoord2f(1.0, 1.0); glVertex3f( 4.0, -3.0, 0.0) glTexCoord2f(1.0, 0.0); glVertex3f( 4.0, 3.0, 0.0) glTexCoord2f(0.0, 0.0); glVertex3f(-4.0, 3.0, 0.0) glEnd( ) def main(self): # setup and run OpenGL glutInit() glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH) glutInitWindowSize(640, 480) glutInitWindowPosition(800, 400) self.window_id = glutCreateWindow("OpenGL Robot") glutDisplayFunc(self._draw_scene) glutIdleFunc(self._draw_scene) self._init_gl(640, 480) glutMainLoop() # run an instance of OpenGL Robot openGLRobot = OpenGLRobot() openGLRobot.main()
The robot daughter is rendered to screen if her glyph has been detected. The robot father is rendered to screen if his glyph has been detected. However, the marriage chat will only occur if both robots are rendered to screen at the same time.