3D Graphics, Augmented Reality, Beautiful Soup, Blender, Computer Vision, Object Detection, OpenCV, OpenGL, Pose Estimation, PyOpenGL, Python, Python Tools for Visual Studio, Speech To Text, Text To Speech
In my last post, Robot father using Augmented Reality, I built a robot daughter and father out of some Python code.
The robots used the following technologies for an augmented reality experience:
- Blender to create a 3D robot
- OpenCV computer vision to detect a 2D marker in my webcam
- OpenGL graphics library to render the robot upon the marker
I also used Text To Speech and Speech To Text technologies to allow the robots to talk to each other.
Today I will recycle the robots. Today the robots will help me search the web!
Introducing the robots
Here’s Rocky Robot, who will help us search the web for rock music content:
She’s wearing her favourite Nirvana T-shirt.
And here’s Sporty Robot, who specialises in sport content:
He’s wearing a Hibernian football top.
As each robot appears on the screen, we can ask it a question. For example, I can ask Rocky Robot for “Pixies news” and she will tell me about the band’s new album.
Our web browser
How exactly does a robot search the web and tell us the latest news? Let’s take a look at the Browser class:
from threading import Thread import requests from bs4 import BeautifulSoup from texttospeech import TextToSpeech from speechtotext import SpeechToText from searchdatabase import * from constants import * class Browser: MIN_LINE_LENGTH = 60 def __init__(self): self.text_to_speech = TextToSpeech() self.speech_to_text = SpeechToText() self.category = None self.is_enabled = False # create thread for processing content def start(self): Thread(target=self._process, args=()).start() def _process(self): while True: if self.is_enabled: # browser asks question self.text_to_speech.convert('What do you want to load, buddy?') # user gives answer answer = self.speech_to_text.convert() if not answer: continue # get url from search engine url = search_engine(self.category, answer) if not url: continue # browser tells user that content is being retrieved self.text_to_speech.convert("Cool. I will get you stuff now...") # get web content request = requests.get(url) soup = BeautifulSoup(request.text) # get text from web content [s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])] text = soup.getText() # speak each line of text try: for line in text.split('\n'): if not self.is_enabled: break if len(line) >= self.MIN_LINE_LENGTH: self.text_to_speech.convert(line) except: print "Browser: error converting text to speech" self.category = None self.is_enabled = False # load def load(self, category): self.category = category self.is_enabled = True # halt def halt(self): self.is_enabled = False
Before we get into the nuts and bolts of the browser, cast your eye on the load method. This is the method that our robot interacts with – for example, our Rocky Robot will pass it a category of “rock”. The method will enable the browser, so that we are able to process our robot’s load request.
We also have a halt method that the robot can use to disable the browser.
Now, all the shit happens in the _process method, as the kids would say.
Notice that the _process method runs in a thread, which is very important as we do not want to freeze our on-screen augmented reality experience while the robot searches the web.
At the heart of the method is a while loop, which means that it is constantly processing code. But only if the browser is enabled by a robot.
First up, the browser assumes the role of the robot, asking us “What do you want to load, buddy?”. As with my previous post, I am using Text To Speech to ask the question through the computer speakers (class at foot of post).
Next, we answer the question using Speech To Text through the computer microphone (again, class at foot of post).
Now that the browser knows what information we want, it will use our search engine to retrieve a suitable url. We pass the search engine our answer, which it will use as a search term. We also pass the engine a category, which is determined by the robot we are currently interacting with. For example, if Rocky Robot appears on the screen and we ask her for “Pixies news” then the category is “rock” and the term is “Pixies news”.
Armed with the url from our search engine, it is time to retrieve our web content. The python library Beautiful Soup is used to fetch the HTML. Notice how we ditch unwanted sections of our HTML (Style, Script, etc) so that we are left only with readable text. Stack Overflow article BeautifulSoup Grab Visible Webpage Text provided the detail.
All that is left to do is loop through each line of our web content, using Text To Speech so that the robot can tell us all the online gossip on the Pixies. Each line of text has to be a certain minimum length for it to be spoken out through the computer speakers (after all, we are only interested in beefy bits of news).
Notice that we also check on every line of text whether the browser is still enabled. Why do we do this? Well, if the robot is no longer in front of our webcam then the browser’s halt method is called to stop the browser processing our request. If we do not check whether the browser has been disabled, then the web content continues to be blabbed out of the speakers long after the robot has gone!
That’s it. The very last thing our _process method does is clear our category and disable the browser, ready for the next robot to appear on the scene.
Our search engine
Dead simple search engine:
from constants import * # search table SEARCH_TABLE = [(ROCK, [('pixies news', 'https://www.teamrock.com/news/2015-06-03/pixies-working-on-6th-record'), ('lush reunion', 'http://www.nme.com/news/lush/88713')]), (SPORT, [('football', 'http://www.dailymail.co.uk/sport/football/index.html'), ('andy murray ice cream', 'http://www.sbnation.com/2015/8/18/9174215/andy-murray-dresses-up-as-an-ice-cream-worker')])] # match category and term to database record def search_engine(category, term): url = '' for record in SEARCH_TABLE: if record == category: for category_item in record: if category_item == term.lower(): url = category_item break if url: break return url
We have a search table, with a separate record for each category. There is a ‘rock’ record for Rocky Robot to use. And a ‘sport’ record for Sporty Robot to use.
Each record in turn has a list of search terms with associated urls.
Our search_engine database method simply matches the category and term we supply to the correct record. For example, if we are currently interacting with Rocky Robot then the category that it will be passed from our Browser class is “rock”. The words “Pixies news” that we speak into our computer microphone, in answer to Rocky Robot’s question “What do you want to load, buddy?”, are used as the search term (notice that the term is put to lowercase, to ensure that a match can be made with the database).
At the foot of this post is the main program which makes use of our browser and search engine, so that we can search the web using augmented reality. But first, a demonstration:
Superb. We can see the text being processed in the top left command line as we interact with each robot in the bottom right window. Notice how at the end of the video, Sporty Robot is having a few problems showing himself to us – the browser is rightly disabled until he sorts himself out!
So what’s next? Well, the search engine certainly needs to index a ton of web content, tagged up to the category of each robot, along with advanced matching algorithms. And the robots need to offer specialised services, such as Rocky Robot letting us know what concert tickets are available and Sporty Robot telling us when a goal has been scored. It would also be useful if the robots displayed some animation and used different voices, so we could really get to know them!
But that’s enough for now. I’m off to turn a few pages of The Inscrutable Diaries Of Rodger Saltwash. Poor Rodger, if only a robot could help solve all of his calamitous conundrums.
I ran the code on my Windows 7 PC using Python Tools for Visual Studio.
Here’s the main program, updated from my previous post to accommodate Rocky Robot and Sporty Robot:
from OpenGL.GL import * from OpenGL.GLUT import * from OpenGL.GLU import * import cv2 from PIL import Image import numpy as np from webcam import Webcam from glyphs import Glyphs from browser import Browser from objloader import * from constants import * class OpenGLRobot: # constants INVERSE_MATRIX = np.array([[ 1.0, 1.0, 1.0, 1.0], [-1.0,-1.0,-1.0,-1.0], [-1.0,-1.0,-1.0,-1.0], [ 1.0, 1.0, 1.0, 1.0]]) def __init__(self): # initialise webcam and start thread self.webcam = Webcam() self.webcam.start() # initialise glyphs self.glyphs = Glyphs() self.glyphs_cache = None # initialise browser self.browser = Browser() self.browser.start() # initialise robots self.rocky_robot = None self.sporty_robot = None # initialise texture self.texture_background = None def _init_gl(self, Width, Height): glClearColor(0.0, 0.0, 0.0, 0.0) glClearDepth(1.0) glDepthFunc(GL_LESS) glEnable(GL_DEPTH_TEST) glShadeModel(GL_SMOOTH) glMatrixMode(GL_PROJECTION) glLoadIdentity() gluPerspective(33.7, 1.3, 0.1, 100.0) glMatrixMode(GL_MODELVIEW) # assign robots self.rocky_robot = OBJ('rocky_robot.obj') self.sporty_robot = OBJ('sporty_robot.obj') # assign texture glEnable(GL_TEXTURE_2D) self.texture_background = glGenTextures(1) def _draw_scene(self): glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) glLoadIdentity() # get image from webcam image = self.webcam.get_current_frame() # convert image to OpenGL texture format bg_image = cv2.flip(image, 0) bg_image = Image.fromarray(bg_image) ix = bg_image.size iy = bg_image.size bg_image = bg_image.tostring("raw", "BGRX", 0, -1) # create background texture glBindTexture(GL_TEXTURE_2D, self.texture_background) glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST) glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST) glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, bg_image) # draw background glBindTexture(GL_TEXTURE_2D, self.texture_background) glPushMatrix() glTranslatef(0.0,0.0,-10.0) self._draw_background() glPopMatrix() # handle glyphs glyphs = self._handle_glyphs(image) # handle browser glyph_names = [glyph[GLYPH_NAME_INDEX] for glyph in glyphs] if ROCKY_ROBOT in glyph_names: self.browser.load(ROCK) elif SPORTY_ROBOT in glyph_names: self.browser.load(SPORT) else: self.browser.halt() glutSwapBuffers() def _handle_glyphs(self, image): # attempt to detect glyphs glyphs =  try: glyphs = self.glyphs.detect(image) except Exception as ex: print(ex) # manage glyphs cache if glyphs: self.glyphs_cache = glyphs elif self.glyphs_cache: glyphs = self.glyphs_cache self.glyphs_cache = None else: return glyphs for glyph in glyphs: rvecs, tvecs, _, glyph_name = glyph # build view matrix rmtx = cv2.Rodrigues(rvecs) view_matrix = np.array([[rmtx,rmtx,rmtx,tvecs], [rmtx,rmtx,rmtx,tvecs], [rmtx,rmtx,rmtx,tvecs], [0.0 ,0.0 ,0.0 ,1.0 ]]) view_matrix = view_matrix * self.INVERSE_MATRIX view_matrix = np.transpose(view_matrix) # load view matrix and draw cube glPushMatrix() glLoadMatrixd(view_matrix) if glyph_name == ROCKY_ROBOT: glCallList(self.rocky_robot.gl_list) elif glyph_name == SPORTY_ROBOT: glCallList(self.sporty_robot.gl_list) glColor3f(1.0, 1.0, 1.0) glPopMatrix() return glyphs def _draw_background(self): # draw background glBegin(GL_QUADS) glTexCoord2f(0.0, 1.0); glVertex3f(-4.0, -3.0, 0.0) glTexCoord2f(1.0, 1.0); glVertex3f( 4.0, -3.0, 0.0) glTexCoord2f(1.0, 0.0); glVertex3f( 4.0, 3.0, 0.0) glTexCoord2f(0.0, 0.0); glVertex3f(-4.0, 3.0, 0.0) glEnd( ) def main(self): # setup and run OpenGL glutInit() glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH) glutInitWindowSize(640, 480) glutInitWindowPosition(800, 400) self.window_id = glutCreateWindow("OpenGL Robot") glutDisplayFunc(self._draw_scene) glutIdleFunc(self._draw_scene) self._init_gl(640, 480) glutMainLoop() # run an instance of OpenGL Robot openGLRobot = OpenGLRobot() openGLRobot.main()
The Text To Speech class:
import pyttsx class TextToSpeech: def __init__(self): self.pyttsx = pyttsx.init() # convert text to speech def convert(self, text): print text self.pyttsx.say(text) self.pyttsx.runAndWait()
And the Speech To Text class:
import speech_recognition as sr class SpeechToText: def __init__(self): self.recognizer = sr.Recognizer() # convert speech to text def convert(self): with sr.Microphone() as source: print "listening..." audio = self.recognizer.listen(source) text = None try: text = self.recognizer.recognize_google(audio) print text except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError: print("Could not request results from Google Speech Recognition service") return text