, , , , , , , , , , , , ,

Arkwood, my seedy Belgian friend, wants to get the highest ever score in retro computer game Donkey Kong. ‘Please help me,’ he pleaded, ‘Just think of the fame and fortune, and all the pretty girls, if I am king of Kong!’ He was of course referring to the astonishing The King of Kong: A Fistful of Quarters documentary, which he has been watching non-stop for two weeks. ‘Okay, I will help you,’ I relented.

As it happens, I have already built a ‘co-pilot’ robot that can detect objects in computer games. The robot has also been used to learn how to offer advice during gameplay. All I need to do is put these two features together i.e.

1. Detect the umbrella object in the computer game Donkey Kong
2. Learn how to offer advice on collecting the umbrella

Now, an umbrella strategy is hardly going to be the key to achieving a high score in Donkey Kong. But it will demonstrate how a robot can ‘see’ objects in a computer game, and how it can learn to help the game player. It’s a promising start.

So, before we look at the code, let’s recap the setup as per my previous posts…

First we train an OpenCV haar cascade classifier for detecting umbrella objects, using 68 positive images of an umbrella:


Our settings produced 10 stages of training:

perl createtrainsamples.pl positives.dat negatives.dat samples 500 "./opencv_createsamples  -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 22 -h 15"
opencv_haartraining -data haarcascade_umbrella -vec samples.vec -bg negatives.dat -nstages 20 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 500 -nneg 200 -w 22 -h 15 -nonsym -mem 2048 -mode ALL

Next, we attach a webcam to our Raspberry Pi computer and point it at the Windows 7 laptop that Arkwood will be using to play Donkey Kong.

Finally, we run the Python code:

from webcam import Webcam
from detection import Detection
from speech import Speech

class GameInteraction(object):

    def __init__(self):
        self.webcam = Webcam()
        self.detection = Detection()    
        self.speech = Speech()

    # 8 states
    LEVEL = {
        '1': [0.0],
        '2': [1.0],
        '3': [2.0],
        '4': [3.0],
        '5': [4.0],
        '6': [5.0],
        '7': [6.0],
        '8': [7.0]

    # 3 actions
    COMMAND = {
        0.0: "beeline to umbrella",
        1.0: "collect umbrella at your leisure",
        2.0: "avoid umbrella"

    # 3 rewards
    RESULT = {
        'none': 1.0,
        '1': 0.0,
        '2': -1.0

    def get_level(self):

        # await detection of umbrella
        while True:

            image = self.webcam.read_image()
            item_detected = self.detection.is_item_detected_in_image('haarcascade_umbrella.xml', image)

            if item_detected:
        # ask Arkwood for game level
        self.speech.text_to_speech("What level are you on?")
        # wait for Arkwood's response
        response = ""
        while response not in self.LEVEL:
            response = self.speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh').lower()

        return self.LEVEL[response]

    def give_command(self, action):
        # get command
        command_key = float(action)
        command = ""

        if command_key in self.COMMAND:
            command = self.COMMAND[command_key]

        # give command to Arkwood

    def get_result(self):
        # ask Arkwood for result
        self.speech.text_to_speech("How many lives did you lose on the level?")

        # wait for Arkwood's response
        response = ""
        while response not in self.RESULT:
            response = self.speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh').lower()

        return self.RESULT[response]

My previous posts indicate how the Game Interaction class slots into PyBrain’s Reinforcement Learning, so that the robot can learn how to offer advice to Arkwood. Plus how the robot can detect an object in a computer game, via the webcam. So let’s just concern ourselves with how the Game Interaction class has been modified to work with Donkey Kong.

The __init__ constructor creates an instance of the Webcam, Detection and Speech classes.

The get_level method is concerned with the state of the game. First it waits until it detects an umbrella via the webcam. Then it asks Arkwood ‘What level are you on?’, via Google’s Text To Speech service. Arkwood responds, via Google’s Speech To Text service. For now we will concentrate on the first eight levels of Donkey Kong.

Note: Donkey Kong actually has four stages per level, so we’ll just treat each stage as a level in its own right. Also, the first stage of Donkey Kong does not have an umbrella, but for completeness sake we will include all stages.

The give_command method is concerned with performing an action. It tells Arkwood either to make a beeline for the umbrella, collect it at his leisure, or to ignore it.

The get_result method is concerned with obtaining a reward. It asks Arkwood ‘How many lives did you lose on the level?’ based on the previous action. For example, if he lost two or more lives whilst being instructed to beeline for the umbrella, a negative reward of -1 is administered (the robot uses this reward to learn how to offer better advice next time).

Cool. What we need now is a demo, to find out if our robot can help Arkwood get a top score at Donkey Kong…

Arkwood sits down to play Donkey Kong on his Windows 7 laptop. I point the webcam, which is attached to my Raspberry Pi, at his laptop screen. ‘Okay, let’s go!’ my buddy shouts, and I start the Python code on my Pi:


Hurray! The umbrella has been detected and the robot co-pilot asks Arkwood, through a set of speakers, what level he is on. Arkwood tells his co-pilot, through a microphone, that he is on level 3.

Next the co-pilot tells Arkwood to ‘beeline to umbrella’, which he does.

Finally, Arkwood tells his co-pilot that the advice was disastrous, and that he lost 2 lives. The co-pilot will learn to offer better advice next time.


So now we have a robot that can spot objects, listen and talk to Arkwood, and learn how to help him – what’s next? Perhaps the robot can spot more than one object at a time, offering more sophisticated strategies? From a practical point of view, we need to ensure that the robot is able to interject at the right moments, so as not to put my waif-like chum off his gameplay.

‘You do realise,’ I said to Arkwood, ‘the co-pilot can only help to train you. When it comes to getting the world record at Donkey Kong, you will have to do it alone.’

My buddy was perfectly aware of the strict rules on submitting ‘record-breaking’ video tapes to Walter Day at Twin Galaxies. If the footage contained any chat between him and a robot, it would null and void his score.

Arkwood pondered for a moment then said, ‘Can’t you just make it sound like I’m having a conversation with my mum?’

‘You’re joking, right?’ I replied incredulously, ‘Not even Rihanna talks about umbrellas that much.’


The positive images used to train our classifier were sourced from webcam snaps of Donkey Kong on the Windows 7 laptop.

The umbrella on stage two was not detected as it was overlapping a set of ladders.