Tags

, , , , , , , , , , ,

Monita, my Hispanic maid, was not altogether impressed with my post Wikipedia people on Raspberry Pi. I had used Python code to display images of all the linked people on her favourite wrestler’s Wikipedia page (WWE superstar Randy Orton). ‘But all the pictures are of old men, sweetie,’ she complained. Not one to take criticism lying down, I set about revising the software.

‘Okay,’ I announced, ‘How about if you can provide the program with your favourite pop star, and it will follow the first link on their Wikipedia page to that of another person.’

‘Mm,’ she replied, suitably intrigued. ‘Sounds like fun, honey.’

So, let’s crank up Python Tools for Visual Studio and write the code:

from speech import Speech
from wikipedia import Wikipedia
from screen import Screen

speech = Speech()
wikipedia = Wikipedia()
screen = Screen()

# get person's name from microphone
print("State the pop star's full name")
full_name = speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh')

while True:

    # get linked person
    link = wikipedia.get_person_link(full_name)
    
    # exit loop if nothing found
    if (link is None):
        break

    # display image of linked person
    screen.display_images(link.text, 1)
    
    # announce linked person through speakers
    speech.text_to_speech(link.text)

    # update name with linked person
    full_name = link.text

First we provide the program with the name of our rock God, using Google Speech To Text service to convert our voice from the microphone into text.

Next we find a link on the star’s Wikipedia page that relates to another person. Assuming a link is found, we can display an image of the linked person in a browser, before using Google’s Text To Speech service to announce the link through a set of speakers.

Once this is done, we simply switch our name to the linked person and start the process over.

So let’s break this down a bit, and look at the class that handles finding the linked person:

import requests
from BeautifulSoup import BeautifulSoup

class Wikipedia(object):

    # class constants
    WIKIPEDIA = "http://en.wikipedia.org/wiki/"
    WIKIPEDIA_VALID_HREF = "/wiki/"
    BEHINDTHENAME_FORENAME = "http://www.behindthename.com/name/"
    BEHINDTHENAME_SURNAME = "http://surnames.behindthename.com/name/"
    BEHINDTHENAME_ERRORS = ["Error", "404 Not Found"]

    # used links
    used_links = []

    # get webpage content
    def _get_soup(self, url):
        request = requests.get(url)
        return BeautifulSoup(request.text)

    # is link text valid
    def _is_linktext_valid(self, text):
        if (not text):
            return False
        
        if (len(text.split()) != 2):
            return False

        return True

    # is name valid
    def _is_name_valid(self, url, name):
        behindthename_soup = self._get_soup(url + name)
        
        if (behindthename_soup.find('h1') != None) and (behindthename_soup.find('h1').text in self.BEHINDTHENAME_ERRORS):
            return False

        return True

    # get hyperlink to a person
    def get_person_link(self, full_name):
        try:
            # add name to used links
            self.used_links.append(full_name)           
            
            # get wikipedia webpage
            wikipedia_soup = self._get_soup(self.WIKIPEDIA + full_name.replace(" ", "_"))

            # loop all wikipedia links
            for link in wikipedia_soup.findAll('a', href=True):

                # check link href valid
                if (self.WIKIPEDIA_VALID_HREF not in link['href']):
                    continue

                # check link text valid
                if (not self._is_linktext_valid(link.text)):
                    continue

                # check link not previously used
                if (link.text in self.used_links):
                    continue

                # check link forename valid
                if (not self._is_name_valid(self.BEHINDTHENAME_FORENAME, link.text.split()[0])):
                    continue

                # check link surname valid
                if (not self._is_name_valid(self.BEHINDTHENAME_SURNAME, link.text.split()[1])):
                    continue

                # return link
                return link
        except:
            print ("Error getting person link")

The main point of interest here is the get_person_link function. After it stores the name of our pop star (so that we don’t end up in some recursive loop between two Wikipedia pages) we can use the Beautiful Soup library to get their Wikipedia page.

Now we can loop through all the hyperlinks on the wiki page, and make various checks to determine if the current link is to another person. We check that the link has a valid href (i.e. it is to another wiki page) and that there is text in the link that resembles a person’s name (for now, we’ll assume that two words in the text is a suitable candidate). The real magic comes with our use of the behindthename.com website, which we use to determine whether we have a valid forename and a surname. If all is well, we can return the link to another person.

Great. Now I won’t go into the Google services code, or the displaying of images in a browser, as I’ve covered this in previous posts. But I will provide the code…

speech_to_text and text_to_speech functions:

from subprocess import Popen, PIPE, call
import urllib
 
class Speech(object):
 
    # converts speech to text
    def speech_to_text(self, filepath):
        try:
            # utilise PiAUISuite to turn speech into text
            text = Popen(['sudo', filepath], stdout=PIPE).communicate()[0]
 
            # tidy up text
            text = text.replace('"', '').strip()
 
            # debug
            print(text)

            return text
        except:
            print ("Error translating speech")
 
    # converts text to speech
    def text_to_speech(self, text):
        try:
            # truncate text as google only allows 100 chars
            text = text[:100]
 
            # encode the text
            query = urllib.quote_plus(text)
 
            # build endpoint
            endpoint = "http://translate.google.com/translate_tts?tl=en&q=" + query
 
            # debug
            print(endpoint)
 
            # get google to translate and mplayer to play
            call(["mplayer", endpoint], shell=False, stdout=PIPE, stderr=PIPE)
        except:
            print ("Error translating text")

display_images function:

import requests
from BeautifulSoup import BeautifulSoup
import webbrowser

class Screen(object):

    # displays google images
    def display_images(self, search_term, amount):
        try:
            # check params
            if (amount < 1):
                return

            #build url
            url = "https://www.google.co.uk/search?q=image+{}&tbm=isch".format(search_term)
             
            #get webpage
            request = requests.get(url)
            soup = BeautifulSoup(request.text)    

            # get images
            images = []
            for img in soup.findAll('img'):
                images.append(img['src'])

            # ensure enough images
            if amount > len(images):
                amount = len(images)

            # display each image in a browser
            for i in range(0, amount):
                webbrowser.open(images[i], new=0 )
        except:
            print ("Error displaying images")

‘Monita,’ I shouted in expectation, ‘It’s ready for a demo.’

So Monita spoke into the microphone attached to my Raspberry Pi. Kurt Cobain, she said.

pi_wikipeople_kurtcobain

As you can see from the screenshot, it found a link on Kurt Cobain’s Wikipedia page to Michael Jackson. On Michael Jackson’s page it found a link to Austin Brown. On Austin Brown’s page it found a link to Joe Jackson. And there it stopped.

‘How come it did not find Courtney Love?’ Monita asked, quite reasonably. Hm. It seems that the behindthename.com website does not have an extensive list of surnames, so certain people will be ignored.

‘That not so good, honey,’ she stated dismissively.

Oh well. I thought it was pretty nifty, for what it counts.

Advertisements