, , , , , , , , , , , , ,

I’ll make this post brief, as Arkwood has just burnt the lasagne. My Belgian buddy exercises Google Street View for some shady stalking, but hates using a mouse. ‘My hands are usually busy elsewhere,’ he told me. Hm. Anyhow, I wrote some Python code so that he could use his voice instead of a mouse.

First up, we utilise Google’s speech to text service on the Raspberry Pi, so that we can tell our program where we want to go:

from constants import FORWARD, BACKWARD, LEFT, RIGHT, STOP
from speech import Speech
from storage import Storage
from time import sleep

speech = Speech()
storage = Storage()

#loop forever
while True:

    # request direction from Arkwood
    print("Please state your direction: {}, {}, {}, {}, {}".format(FORWARD, BACKWARD, LEFT, RIGHT, STOP))

    # get direction from Arkwood, through microphone
    direction = speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh')

    # ensure direction is valid
    if direction in (FORWARD, BACKWARD, LEFT, RIGHT, STOP):

        # store direction
        storage.write_value("streetview.conf", direction)


We have a simple constants file to hold our directions:

FORWARD = "forward"
BACKWARD = "backward"
LEFT = "left"
RIGHT = "right"
STOP = "stop"

To convert Arkwood’s voice (from the microphone attached to the Pi) into text, we employ the Speech class that I have talked about at length in my previous posts:

from subprocess import Popen, PIPE, call
import urllib
class Speech(object):
    # converts speech to text
    def speech_to_text(self, filepath):
            # utilise PiAUISuite to turn speech into text
            text = Popen(['sudo', filepath], stdout=PIPE).communicate()[0]
            # tidy up text
            text = text.replace('"', '').strip()
            # debug

            return text
            print ("Error translating speech")
    # converts text to speech
    def text_to_speech(self, text):
            # truncate text as google only allows 100 chars
            text = text[:100]
            # encode the text
            query = urllib.quote_plus(text)
            # build endpoint
            endpoint = "http://translate.google.com/translate_tts?tl=en&q=" + query
            # debug
            # get google to translate and mplayer to play
            call(["mplayer", endpoint], shell=False, stdout=PIPE, stderr=PIPE)
            print ("Error translating text")

We only need to use the speech_to_text function from this class. And once we have the direction in text format, we store it in a file using a write_value function in my Storage class:

    # write to file
    def write_value(self, filename, value):
            with open(filename, "a") as value_file:
                value_file.write(value.strip() + '\n')
            print ("Error writing value")

    # read last value from file
    def read_last_value(self, filename):
            with open(filename) as value_file:
                lines = value_file.readlines()
                return lines[-1].strip()
            print ("Error reading last value")

Champion! So I guess you’re wondering what happens next? Well, I had planned to control Google Street View on the Raspberry Pi, but, as you may know, the Pi is as slow as a penguin. So that ain’t gonna fly. Instead, since we now have the directions stored in a file, we will use our Windows PC to pick up the latest value over the network and then render a browser window with the updated co-ordinates:

from storage import Storage
from constants import FORWARD, BACKWARD, LEFT, RIGHT, STOP
import webbrowser
from time import sleep

storage = Storage()

# set up co-ordinates
latitude = -27.286900
rotation = 180.12

#loop forever
while True:

    # WARNING! make sure you sleep or your computer will open
    # an insane amount of browsers and crash (my God, I know)

    # retrieve last direction
    direction = storage.read_last_value(r"\\RASPBERRYPI\MyPython\Streetview\streetview.conf")

    # if direction is STOP then avoid launching browser
    if direction == STOP:

    # update co-ordinates with direction
    if direction == FORWARD:
        latitude -= 0.000100
    elif direction == BACKWARD:
        latitude += 0.000100
    elif direction == LEFT:
        rotation -= 90
    elif direction == RIGHT:
        rotation += 90

    # update url with new co-ordinates
    url = "https://maps.google.com/?layer=c&cbll={},152.974600&cbp=12,{},0,0,0".format(latitude, rotation)

    # open web browser with url

Lovely. For this simple demonstration, I am targeting a street called Coronation Street in Brisbane, Australia. Totally random. Once we have read the latest direction from our file on the Raspberry Pi, we can update the co-ordinates and launch a browser window to move forward, backward or turn left or right by 90 degrees. Way cool. Ideally I would not be launching browser windows but simply updating the current browser tab, but as I understand it Python cannot manage this browser behaviour. Anyhow, it keeps Arkwood’s hands free, which is the main thing.

Time for a demo. Here’s the voice commands issued from the Raspberry Pi:

…and the Windows PC launching a browser window for the first voice command ‘forward’:

And since the second voice command was not made, ‘forward’ is re-read from file:

Now we say ‘right’, and the browser obliges:

When I say ‘stop’, no more browser windows are launched. Still need to get the Pi and the PC in sync a bit better, as I need to account for the time it takes the Pi to convert speech into text. But you get the general gist.

Now, I’d better see what I can salvage from that burnt lasagne.


p.s. useful post on Stack Overflow regarding Google Street View