, , , , , , , , , , , ,

Not so long ago, Arkwood asked me for some help in playing the retro Commodore 64 computer game, Commando. I dutifully obliged, creating a robot co-pilot that would tell him when grenade supplies appeared on the screen. But today, whilst playing the classic platform game Bubble Bobble, he cried, ‘Help! I need the robot to tell me how to tackle each level.’ ‘Fear not,’ I stated, ‘I have been studying the Python package PyBrain and its Reinforcement Learning task. It be just the ticket for your predicament!’

The objective is thus: add PyBrain Reinforcement Learning to the robot, so that it can learn how to help Arkwood complete each level of the computer game. If you want to get up to speed with Reinforcement Learning, try the PyBrain documentation, along with an excellent tutorial on Simon’s technical blog. My previous post, PyBrain on Raspberry Pi, provides detail on installing PyBrain on a Windows 7 and Raspberry Pi computer.

Okay, let’s have a gander at the main Python program:

from pybrain.rl.learners.valuebased import ActionValueTable
from pybrain.rl.learners import Q
from pybrain.rl.explorers import EpsilonGreedyExplorer
from pybrain.rl.agents import LearningAgent
from gameenvironment import GameEnvironment
from gametask import GameTask
from pybrain.rl.experiments import Experiment

# setup AV Table
av_table = ActionValueTable(6, 3)

# setup a Q-Learning agent
learner = Q(0.5, 0.0)
agent = LearningAgent(av_table, learner)

# setup environment
environment = GameEnvironment()

# setup task
task = GameTask(environment)

# setup experiment
experiment = Experiment(task, agent)

# okay, let's play the game
while True:

Basically, we need to set up a bunch of components which will fit together to give us a robot that can help my Belgian buddy with his gameplay. We create an Action Value table, which will store six states (in our case, the first six levels of Bubble Bobble) and three actions (whether to veer towards the left, middle or right of screen when playing a certain level of the game). Next we create an Agent, which comprises of a Q-Learning algorithm and the aforementioned Action Value table. Then we create an Environment and pass it to a Task (more on that later), before bringing everything together under an Experiment. Now all that’s left to do is loop and learn – this is the point at which Arkwood’s robot will help him with each level of his computer game.

That’s a fairly brisk overview, but let’s get under the hood and find out exactly what is going on…

Here’s our implementation of the Task class:

from pybrain.rl.environments.task import Task
from gameinteraction import GameInteraction

class GameTask(Task):

    def __init__(self, environment):
        self.env = environment
        self.game_interaction = GameInteraction()
    def getObservation(self):
        return self.env.getSensors()

    def performAction(self, action):

    def getReward(self):
        return self.game_interaction.get_result()

In the constructor, __init__, note that we create a reference to our Game Interaction class – which will be used to interact with Arkwood. Other than that, the class is configured much as it appears in the study material referred to above. We have a getObservation method that will determine the state of our environment (i.e. what level of the game my buddy is on). The performAction method will tell Arkwood whether to favour the left, middle or right of screen. The getReward method will determine whether Arkwood has completed the game level without losing a life.

Now for our Environment class, which the Task class manages:

from pybrain.rl.environments.environment import Environment
from gameinteraction import GameInteraction

class GameEnvironment(Environment):
    def __init__(self):
        self.game_interaction = GameInteraction()

    def getSensors(self):
        return self.game_interaction.get_level()
    def performAction(self, action):

Not too much going on here. Again, it is simply concerned with calling our Game Interaction class.

Okay. Before we go any further, it’s worth stating that the classes I’ve outlined so far encompass the basics of PyBrain Reinforcement Learning. Have another gander at the main program, to confirm how all the components fit together.

Right, let’s check out the Game Interaction class, which helps the Task and Environment classes handle their values:

from speech import Speech

class GameInteraction(object):

    def __init__(self):
        self.speech = Speech()

    # 6 states
    LEVEL = {
        '1': [0.0],
        '2': [1.0],
        '3': [2.0],
        '4': [3.0],
        '5': [4.0],
        '6': [5.0]

    # 3 actions
    COMMAND = {
        0.0: "play left of screen",
        1.0: "play middle screen",
        2.0: "play right of screen"

    # 2 rewards
    RESULT = {
        'no': -1.0,
        'yes': 1.0

    def get_level(self):
        # ask Arkwood for game level
        self.speech.text_to_speech("What level are you on?")
        # wait for Arkwood's response
        response = ""
        while response not in self.LEVEL:
            response = self.speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh').lower()

        return self.LEVEL[response]

    def give_command(self, action):
        # get command
        command_key = float(action)
        command = ""

        if command_key in self.COMMAND:
            command = self.COMMAND[command_key]

        # give command to Arkwood

    def get_result(self):
        # ask Arkwood for result
        self.speech.text_to_speech("Did you complete the level without losing a life?")

        # wait for Arkwood's response
        response = ""
        while response not in self.RESULT:
            response = self.speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh').lower()

        return self.RESULT[response]

The constructor creates a reference to a Speech class. I’ve used this Speech class in many of my previous posts. It has a text_to_speech method, which utilises Google’s Text To Speech service to convert text into an audio file which can be ‘spoken’ out through a set of speakers attached to the computer. It also has a speech_to_text method, which utilises Google’s Speech To Text service to convert words spoken through a microphone attached to the computer into text.

Our Game Interaction class has a get_level method. We ask Arkwood, through our Speech class, what level of the game he is currently on. Once we have got his response, we return it to our Environment class.

Our Game Interaction class has a give_command method, called from our Environment class. It converts its Action parameter into a command that we can ‘speak’ to Arkwood.

Finally, our Game Interaction class has a get_result method. We ask Arkwood, through our Speech class, whether he completed the current game level without losing a life. Once we have got his response, we return it to our Task class.

That’s all the code in place. For completeness sake, here’s the Speech class:

from subprocess import Popen, PIPE, call
import urllib
class Speech(object):
    # converts speech to text
    def speech_to_text(self, filepath):
            # utilise PiAUISuite to turn speech into text
            text = Popen(['sudo', filepath], stdout=PIPE).communicate()[0]
            # tidy up text
            text = text.replace('"', '').strip()
            # debug

            return text
            print ("Error translating speech")
    # converts text to speech
    def text_to_speech(self, text):
            # truncate text as google only allows 100 chars
            text = text[:100]
            # encode the text
            query = urllib.quote_plus(text)
            # build endpoint
            endpoint = "http://translate.google.com/translate_tts?tl=en&q=" + query
            # debug
            # get google to translate and mplayer to play
            call(["mplayer", endpoint], shell=False, stdout=PIPE, stderr=PIPE)
            print ("Error translating text")

Phew! What we really need now is a demo, to confirm that it all works. Arkwood sits down with his Windows 7 laptop, ready to play Bubble Bobble. I start the main Python program on my Raspberry Pi computer, which will ask Arkwood which level he is on and then give a command on what he should do. If the robot’s advice has helped Arkwood complete the level without the loss of life, then Arkwood tells the robot. The robot can then reinforce its learning, so the next time Arkwood tackles that level he will get better help.

Here’s Arkwood tackling the first and fourth levels of Bubble Bobble:



Note that our little green dragon always starts at the bottom left of the screen, so a command of ‘play right of screen’ is going to take him on a journey of discovery, whilst a command of ‘play left of screen’ is going to cool his heels.

And here’s the output of the main program, indicating how the robot has tried to help Arkwood:


Arkwood will continue to play Bubble Bobble around the clock. And the robot will be at hand throughout, learning how to serve up the best tactics for each level of the game.

‘How’s it going, Arkwood?’ I asked him at 6am. He had been up all night.

With red-rimmed eyes, he responded, ‘Yeah. The robot’s been a great help. Only problem is, I mastered the first six levels ages ago. Can’t it help me with later levels?’

No problem, I told him. I will simply add more states to the Action Value table and Game Interaction class. But he wasn’t listening to me. A pool of yellow liquid was busy forming on the carpet, under his crossed legs.

‘For fuck’s sake!’ I screamed. ‘Get to the toilet, you filthy animal!’

‘I can’t,’ he wailed. ‘I’m on the final level!’



If you want to see Bubble Bobble in all its finery, then sit back and watch this YouTube video.


I wrote all the code using Python Tools for Visual Studio 2.1 on my Windows 7 PC, before porting it to my Raspberry Pi.

Here’s a screenshot of PTVS in debug mode (we can see rewards being assigned to our Action Value table):