, , , , , , , , ,

Arkwood is lonely. Arkwood is sad. Arkwood is also the most vile and repulsive human being on God’s green and concrete-strewn earth. So when he asked me to build a robot girlfriend to ease his solitude, I felt it a duty to society to do so.

In my last post, Voice and hand gesture recognition, I wrote some Python code to allow him to address his android sweetheart when she asked “Do you still love me?”

  • Arkwood spoke the word Yes or No into the microphone attached to my PC. I was able to use my audio slices technique to tell which word he uttered.
  • Arkwood used hand gestures, which I captured on a webcam. I used OpenCV Haar Cascade Classifiers to tell whether he was gesturing Okay or instead giving her the V sign.

‘But what I really need,’ he moaned, ‘is for my girlfriend to speak back to me. I long to hear her tender voice.’

Okay. Let’s see if we can do some really basic Text To Speech using Python. First up, I need to build a Python dictionary of words that Arkwood’s mechanical paramour can utter:

T2S_DICTIONARY = {"i": "i.wav", 
                  "love": "love.wav", 
                  "hate": "hate.wav", 
                  "you": "you.wav", 
                  "do": "do.wav", 
                  "me": "me.wav"}

Cool. Our dictionary matches each word to an appropriate sound file.

Next, we need to record the words. Now, I did tell you that Arkwood is one sick individual. So it came as no surprise when he said, ‘Let me record her voice. But you’ll need to add some effects so that it sounds like a girl.’ Oh dear.

I opened up my Magix Music Maker 14 software and Arkwood recorded the sentence “Do you love me I hate”:


I cut the sentence into individual words:


I normalized the sound. I also cranked the pitch of each word up to 8.0, so that Akrwood’s girlfriend will not sound exactly like Arkwood, but instead a Smurf:


Great. We now have a sound file for each word in our Python dictionary. Let’s create an AudioPlay class which will convert our text to speech:

import pygame

class AudioPlay(object):

    def __init__(self):

    def text_to_speech(self, text):

        for word in text.lower().split():
            sound_file = self.T2S_DICTIONARY.get(word)

            if sound_file:
                sound = pygame.mixer.Sound("audio/words/{}".format(sound_file))

                while pygame.mixer.get_busy() == True: 

The class has one method, text_to_speech, which is passed the text we want to convert to speech. Let’s take a look and see what it does…

We loop through each word in the text, attempting to find a match in our aforementioned Python dictionary. If a match is found, we play the appropriate sound file using Pygame. We wait until the sound has finished playing before moving on to the next word.

And that is it. The robot girlfriend can now speak to my buddy!

Time for a demo:

from audioplay import AudioPlay
from time import sleep

audio_play = AudioPlay()

while True:
    # girlfriend's question to Arkwood
    audio_play.text_to_speech("Do you love me")

    # Arkwood's answer to girlfriend
    is_yes = False

    # girlfriend's emotional outpouring
    if is_yes:
        audio_play.text_to_speech("I love you")
        audio_play.text_to_speech("I hate you")

    # give Arkwood a break before nagging him again

Arkwood’s girlfriend asks him “Do you love me”, which we convert from text to speech and play through the computer speakers.

To keep things simple, I’ve replaced the voice and hand gesture recognition from the previous post with a stock answer. Basically, Arkwood’s reply to his girlfriend’s question is a heartless No.

His sweetheart understandably retorts through the speakers “I hate you”.

Notice how we can reuse sound files – the words You Love and I crop up more than once.

‘What do you think?’ I asked my sordid Belgian chum.

Arkwood furrowed his brow. ‘She sounds a bit constipated.’

What the hell does he expect for a first stab! Now he wants her to dance. Can she boogie? Yes sir, she can boogie.