Tags

, , , , , , , , , , , , ,

In my last post, Robot father using Augmented Reality, I built a robot daughter and father out of some Python code.

The robots used the following technologies for an augmented reality experience:

  • Blender to create a 3D robot
  • OpenCV computer vision to detect a 2D marker in my webcam
  • OpenGL graphics library to render the robot upon the marker

I also used Text To Speech and Speech To Text technologies to allow the robots to talk to each other.

Today I will recycle the robots. Today the robots will help me search the web!

Introducing the robots

Here’s Rocky Robot, who will help us search the web for rock music content:

rocky_robot

She’s wearing her favourite Nirvana T-shirt.

And here’s Sporty Robot, who specialises in sport content:

sporty_robot

He’s wearing a Hibernian football top.

As each robot appears on the screen, we can ask it a question. For example, I can ask Rocky Robot for “Pixies news” and she will tell me about the band’s new album.

Our web browser

How exactly does a robot search the web and tell us the latest news? Let’s take a look at the Browser class:

from threading import Thread
import requests
from bs4 import BeautifulSoup
from texttospeech import TextToSpeech
from speechtotext import SpeechToText
from searchdatabase import *
from constants import *

class Browser:

    MIN_LINE_LENGTH = 60
       
    def __init__(self):
        self.text_to_speech = TextToSpeech()
        self.speech_to_text = SpeechToText()
        self.category = None
        self.is_enabled = False
         
    # create thread for processing content
    def start(self):
        Thread(target=self._process, args=()).start()

    def _process(self):
        while True:
            if self.is_enabled:
                # browser asks question
                self.text_to_speech.convert('What do you want to load, buddy?')

                # user gives answer
                answer = self.speech_to_text.convert()
                if not answer: continue

                # get url from search engine
                url = search_engine(self.category, answer)
                if not url: continue

                # browser tells user that content is being retrieved
                self.text_to_speech.convert("Cool. I will get you stuff now...")

                # get web content
                request = requests.get(url)
                soup = BeautifulSoup(request.text) 

                # get text from web content
                [s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
                text = soup.getText()
                
                # speak each line of text        
                try:
                    for line in text.split('\n'):
                        if not self.is_enabled: break
                                
                        if len(line) >= self.MIN_LINE_LENGTH:
                            self.text_to_speech.convert(line)
                except:
                    print "Browser: error converting text to speech"

                self.category = None
                self.is_enabled = False

    # load
    def load(self, category):
        self.category = category
        self.is_enabled = True

    # halt
    def halt(self):
        self.is_enabled = False

Before we get into the nuts and bolts of the browser, cast your eye on the load method. This is the method that our robot interacts with – for example, our Rocky Robot will pass it a category of “rock”. The method will enable the browser, so that we are able to process our robot’s load request.

We also have a halt method that the robot can use to disable the browser.

Now, all the shit happens in the _process method, as the kids would say.

Notice that the _process method runs in a thread, which is very important as we do not want to freeze our on-screen augmented reality experience while the robot searches the web.

At the heart of the method is a while loop, which means that it is constantly processing code. But only if the browser is enabled by a robot.

First up, the browser assumes the role of the robot, asking us “What do you want to load, buddy?”. As with my previous post, I am using Text To Speech to ask the question through the computer speakers (class at foot of post).

Next, we answer the question using Speech To Text through the computer microphone (again, class at foot of post).

Now that the browser knows what information we want, it will use our search engine to retrieve a suitable url. We pass the search engine our answer, which it will use as a search term. We also pass the engine a category, which is determined by the robot we are currently interacting with. For example, if Rocky Robot appears on the screen and we ask her for “Pixies news” then the category is “rock” and the term is “Pixies news”.

Armed with the url from our search engine, it is time to retrieve our web content. The python library Beautiful Soup is used to fetch the HTML. Notice how we ditch unwanted sections of our HTML (Style, Script, etc) so that we are left only with readable text. Stack Overflow article BeautifulSoup Grab Visible Webpage Text provided the detail.

All that is left to do is loop through each line of our web content, using Text To Speech so that the robot can tell us all the online gossip on the Pixies. Each line of text has to be a certain minimum length for it to be spoken out through the computer speakers (after all, we are only interested in beefy bits of news).

Notice that we also check on every line of text whether the browser is still enabled. Why do we do this? Well, if the robot is no longer in front of our webcam then the browser’s halt method is called to stop the browser processing our request. If we do not check whether the browser has been disabled, then the web content continues to be blabbed out of the speakers long after the robot has gone!

That’s it. The very last thing our _process method does is clear our category and disable the browser, ready for the next robot to appear on the scene.

Our search engine

Dead simple search engine:

from constants import *

# search table
SEARCH_TABLE = [(ROCK, [('pixies news', 'https://www.teamrock.com/news/2015-06-03/pixies-working-on-6th-record'), ('lush reunion', 'http://www.nme.com/news/lush/88713')]), (SPORT, [('football', 'http://www.dailymail.co.uk/sport/football/index.html'), ('andy murray ice cream', 'http://www.sbnation.com/2015/8/18/9174215/andy-murray-dresses-up-as-an-ice-cream-worker')])]

# match category and term to database record
def search_engine(category, term):
    url = ''

    for record in SEARCH_TABLE:
        if record[0] == category:
            for category_item in record[1]:
                if category_item[0] == term.lower():
                    url = category_item[1]
                    break

        if url: break

    return url

We have a search table, with a separate record for each category. There is a ‘rock’ record for Rocky Robot to use. And a ‘sport’ record for Sporty Robot to use.

Each record in turn has a list of search terms with associated urls.

Our search_engine database method simply matches the category and term we supply to the correct record. For example, if we are currently interacting with Rocky Robot then the category that it will be passed from our Browser class is “rock”. The words “Pixies news” that we speak into our computer microphone, in answer to Rocky Robot’s question “What do you want to load, buddy?”, are used as the search term (notice that the term is put to lowercase, to ensure that a match can be made with the database).

Demonstration

At the foot of this post is the main program which makes use of our browser and search engine, so that we can search the web using augmented reality. But first, a demonstration:

Superb. We can see the text being processed in the top left command line as we interact with each robot in the bottom right window. Notice how at the end of the video, Sporty Robot is having a few problems showing himself to us – the browser is rightly disabled until he sorts himself out!

So what’s next? Well, the search engine certainly needs to index a ton of web content, tagged up to the category of each robot, along with advanced matching algorithms. And the robots need to offer specialised services, such as Rocky Robot letting us know what concert tickets are available and Sporty Robot telling us when a goal has been scored. It would also be useful if the robots displayed some animation and used different voices, so we could really get to know them!

But that’s enough for now. I’m off to turn a few pages of The Inscrutable Diaries Of Rodger Saltwash. Poor Rodger, if only a robot could help solve all of his calamitous conundrums.

Ciao!

P.S.

I ran the code on my Windows 7 PC using Python Tools for Visual Studio.

Here’s the main program, updated from my previous post to accommodate Rocky Robot and Sporty Robot:

from OpenGL.GL import *
from OpenGL.GLUT import *
from OpenGL.GLU import *
import cv2
from PIL import Image
import numpy as np
from webcam import Webcam
from glyphs import Glyphs
from browser import Browser
from objloader import *
from constants import *

class OpenGLRobot:
 
    # constants
    INVERSE_MATRIX = np.array([[ 1.0, 1.0, 1.0, 1.0],
                               [-1.0,-1.0,-1.0,-1.0],
                               [-1.0,-1.0,-1.0,-1.0],
                               [ 1.0, 1.0, 1.0, 1.0]])

    def __init__(self):
        # initialise webcam and start thread
        self.webcam = Webcam()
        self.webcam.start()

        # initialise glyphs
        self.glyphs = Glyphs()
        self.glyphs_cache = None

        # initialise browser
        self.browser = Browser()
        self.browser.start()

        # initialise robots
        self.rocky_robot = None 
        self.sporty_robot = None 

        # initialise texture
        self.texture_background = None

    def _init_gl(self, Width, Height):
        glClearColor(0.0, 0.0, 0.0, 0.0)
        glClearDepth(1.0)
        glDepthFunc(GL_LESS)
        glEnable(GL_DEPTH_TEST)
        glShadeModel(GL_SMOOTH)
        glMatrixMode(GL_PROJECTION)
        glLoadIdentity()
        gluPerspective(33.7, 1.3, 0.1, 100.0)
        glMatrixMode(GL_MODELVIEW)
        
        # assign robots
        self.rocky_robot = OBJ('rocky_robot.obj')
        self.sporty_robot = OBJ('sporty_robot.obj')
        
        # assign texture
        glEnable(GL_TEXTURE_2D)
        self.texture_background = glGenTextures(1)

    def _draw_scene(self):
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
        glLoadIdentity()

        # get image from webcam
        image = self.webcam.get_current_frame()

        # convert image to OpenGL texture format
        bg_image = cv2.flip(image, 0)
        bg_image = Image.fromarray(bg_image)     
        ix = bg_image.size[0]
        iy = bg_image.size[1]
        bg_image = bg_image.tostring("raw", "BGRX", 0, -1)
 
        # create background texture
        glBindTexture(GL_TEXTURE_2D, self.texture_background)
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST)
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST)
        glTexImage2D(GL_TEXTURE_2D, 0, 3, ix, iy, 0, GL_RGBA, GL_UNSIGNED_BYTE, bg_image)
        
        # draw background
        glBindTexture(GL_TEXTURE_2D, self.texture_background)
        glPushMatrix()
        glTranslatef(0.0,0.0,-10.0)
        self._draw_background()
        glPopMatrix()

        # handle glyphs
        glyphs = self._handle_glyphs(image)
       
        # handle browser
        glyph_names = [glyph[GLYPH_NAME_INDEX] for glyph in glyphs]

        if ROCKY_ROBOT in glyph_names:
            self.browser.load(ROCK)
        elif SPORTY_ROBOT in glyph_names:
            self.browser.load(SPORT)
        else:
            self.browser.halt()

        glutSwapBuffers()

    def _handle_glyphs(self, image):

        # attempt to detect glyphs
        glyphs = []

        try:
            glyphs = self.glyphs.detect(image)
        except Exception as ex: 
            print(ex)

        # manage glyphs cache
        if glyphs:
            self.glyphs_cache = glyphs
        elif self.glyphs_cache: 
            glyphs = self.glyphs_cache
            self.glyphs_cache = None
        else:
            return glyphs

        for glyph in glyphs:
            
            rvecs, tvecs, _, glyph_name = glyph

            # build view matrix
            rmtx = cv2.Rodrigues(rvecs)[0]

            view_matrix = np.array([[rmtx[0][0],rmtx[0][1],rmtx[0][2],tvecs[0]],
                                    [rmtx[1][0],rmtx[1][1],rmtx[1][2],tvecs[1]],
                                    [rmtx[2][0],rmtx[2][1],rmtx[2][2],tvecs[2]],
                                    [0.0       ,0.0       ,0.0       ,1.0    ]])

            view_matrix = view_matrix * self.INVERSE_MATRIX

            view_matrix = np.transpose(view_matrix)

            # load view matrix and draw cube
            glPushMatrix()
            glLoadMatrixd(view_matrix)

            if glyph_name == ROCKY_ROBOT:
                glCallList(self.rocky_robot.gl_list)
            elif glyph_name == SPORTY_ROBOT:
                glCallList(self.sporty_robot.gl_list)
            
            glColor3f(1.0, 1.0, 1.0)
            glPopMatrix()

        return glyphs

    def _draw_background(self):
        # draw background
        glBegin(GL_QUADS)
        glTexCoord2f(0.0, 1.0); glVertex3f(-4.0, -3.0, 0.0)
        glTexCoord2f(1.0, 1.0); glVertex3f( 4.0, -3.0, 0.0)
        glTexCoord2f(1.0, 0.0); glVertex3f( 4.0,  3.0, 0.0)
        glTexCoord2f(0.0, 0.0); glVertex3f(-4.0,  3.0, 0.0)
        glEnd( )

    def main(self):
        # setup and run OpenGL
        glutInit()
        glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH)
        glutInitWindowSize(640, 480)
        glutInitWindowPosition(800, 400)
        self.window_id = glutCreateWindow("OpenGL Robot")
        glutDisplayFunc(self._draw_scene)
        glutIdleFunc(self._draw_scene)
        self._init_gl(640, 480)
        glutMainLoop()
 
# run an instance of OpenGL Robot 
openGLRobot = OpenGLRobot()
openGLRobot.main()

The Text To Speech class:

import pyttsx

class TextToSpeech:

    def __init__(self):
        self.pyttsx = pyttsx.init()
 
    # convert text to speech
    def convert(self, text):
        print text

        self.pyttsx.say(text)
        self.pyttsx.runAndWait()

And the Speech To Text class:

import speech_recognition as sr

class SpeechToText:

    def __init__(self):
        self.recognizer = sr.Recognizer()
 
    # convert speech to text
    def convert(self):

        with sr.Microphone() as source:
            print "listening..."
            audio = self.recognizer.listen(source)

        text = None

        try:
            text = self.recognizer.recognize_google(audio)
            print text
        except sr.UnknownValueError:
            print("Google Speech Recognition could not understand audio")
        except sr.RequestError:
            print("Could not request results from Google Speech Recognition service")

        return text
Advertisements