, , , , , , , , , , , , ,

Arkwood asked me to test him on shapes.

‘Okay,’ I said, ‘What is the name of a shape with five sides?’

Arkwood thought for a moment then replied, ‘Pomegranate.’

Oh dear – I’ll need to help him. I added a new shape detection feature to SaltwashAR – the Python Augmented Reality application – so that the robots can teach him the difference between a triangle and a square.

Here’s how it will work. Arkwood will draw a shape on a piece of paper and hold it up to the webcam attached to my PC. SaltwashAR will then detect the shape Arkwood has sketched, and Sporty Robot will announce the name of the shape through the computer speakers. Furthermore, our friendly robot will also render a 3D example of the shape e.g. a cube if Arkwood has drawn a square, a pyramid if Arkwood has drawn a triangle. Easy.

‘Don’t worry,’ I told my Belgian buddy, ‘you will soon be taught the name of each shape. No longer will you be the laughing stock of the village.’

Okay, so here’s the video of Sporty Robot in action, teaching Arkwood about the shapes he has drawn onto paper:

Nice work, Sporty!

When the square is successfully detected, the robot announces: “You have sketched a square, which I have used six times to build this cube”. The robot shows Arwood a spinning cube.

When the triangle is successfully detected, the robot announces: “You have drawn a triangle, which has helped me build this lovely pyramid”. The robot shows Arwood a spinning pyramid.

Note that a green line has been etched around each shape, to show it being detected.

But how the hell does the Shapes feature work? Let’s take a peek at the Python code a bit at a time…

from features.base import Feature, Speaking
from shapesfunctions import *
import numpy as np
import cv2
from threading import Thread
from time import sleep

class Shapes(Feature, Speaking):

    # region of interest constants
    TOP_BORDER = 10
    LEFT_BORDER = 10
    RIGHT_BORDER = 320

    # shape constants
    SHAPE_MIN_AREA = 100

    def __init__(self, text_to_speech):
        Speaking.__init__(self, text_to_speech)
        self.is_pyramid = False
        self.is_cube = False
        self.rotation = 0
        self.background_image = np.array([])
        self.speech_thread = None

The Shapes class inherits from the Feature base class, which provides threading (all features run in threads so as not to block the main application process from rendering to screen). We also inherit from the Speaking base class, to let the robot’s mouth move when he speaks.

There are some constants defined, to handle shape detection – more on that later.

The class __init__ method is passed a Text To Speech parameter, so that the robot can talk to Arkwood. We have class variables to control the rendering of the spinning pyramid and cube, along with a background image to draw the green detection lines upon.

# start thread
def start(self, args=None):
    Feature.start(self, args)
    # draw rotating pyramid or cube
    self.rotation += 1

    if self.is_pyramid:
    elif self.is_cube:

# stop thread
def stop(self):
    self.background_image = np.array([])

Okay, so next we override the thread start and stop methods. Why? Well, we cannot draw the spinning pyramid and cube within the feature’s thread, as it is not the correct context for OpenGL commands – so instead we draw them in the start method, which is executed on every screen render.

The actual OpenGL commands to draw the pyramid and cube are in a supporting functions file, which we call via draw_pyramid(self.rotation) and draw_cube(self.rotation). Notice how the rotation variable is being incremented on each screen render, so as to spin the 3D shapes.

The stop method clears the background image when the robot is no longer in front of the webcam.

# run thread
def _thread(self, args):
    image = args
    # get region of interest
    height, width = image.shape[:2]
    roi = image[self.TOP_BORDER:height-self.BOTTOM_BORDER, self.LEFT_BORDER:width-self.RIGHT_BORDER]

    # detect edges
    gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
    gray = cv2.GaussianBlur(gray, (5,5), 0)
    edges = cv2.Canny(gray, 100, 200)

    # get contours
    contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted(contours, key=cv2.contourArea, reverse=True)[:6]

    # find shape
    shape_contour = np.array([])
    shape_points = 0

    for contour in contours:
        perimeter = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.01*perimeter, True)
        shape_points = len(approx)

        if shape_points == self.TRIANGULAR_POINTS or shape_points == self.QUADRILATERAL_POINTS:
            shape_contour = contour

Now we are at the _thread method, where all the shit happens (as a geometry teacher would say).

We start by collecting the original background image from our webcam as a parameter, so that we can draw the green detection lines upon it.

Once we have the image’s width and height, we can use our aforementioned constants to cut out a region of interest in which to go searching for shapes. But why are we not looking at the whole of the webcam image for shapes? Well, the image also has a square 2D marker in it, to project the robot upon, which we don’t want to detect. So we will just use the right-hand side of the image for shape detection.

Great. Let’s go ahead and detect our shapes using OpenCV computer vision. The Canny Edge Detection and Contours functions help us pick out candidate shapes on the paper. Looping through our contours, we can determine whether any of those candidates have three points (a triangle), or four points (a square) – if so, we save the location of the shape within our region of interest and bail out.

Now, yes, I know that a shape with four points is not necessarily a square. It could be a rectangle. Or a diamond. But for now we will simply treat all quadrilaterals as squares.

Let’s see the rest of the thread function:

# if shape found...
if shape_contour.size > 0 and cv2.contourArea(shape_contour) >= self.SHAPE_MIN_AREA: 

    # draw green line around shape
    cv2.drawContours(roi, shape_contour, -1, (0, 255, 0), 3)
    image[self.TOP_BORDER:height-self.BOTTOM_BORDER, self.LEFT_BORDER:width-self.RIGHT_BORDER] = roi
    self.background_image = image

    # draw pyramid or cube
    text = None

    if shape_points == self.TRIANGULAR_POINTS:
        self.is_pyramid = True
        self.is_cube = False
        text = "You have drawn a triangle, which has helped me build this lovely pyramid"
        self.is_pyramid = False
        self.is_cube = True
        text = "You have sketched a square, which I have used six times to build this cube"
    # tell user about shape
    if not self.speech_thread or not self.speech_thread.is_alive():
        self.speech_thread = Thread(target=self._speech_thread, args=(text,))
    self.is_pyramid = False
    self.is_cube = False
    self.background_image = np.array([])

We check whether we have detected a shape of suitable size (and not just some squiggle on the paper).

If so, we draw a green detection line around the shape, and embed the region of interest back into the original background image, ready to render to screen.

If the shape is a triangle, we set the is_pyramid class variable to True so that the spinning pyramid can be rendered to screen.

Otherwise the shape is a square, so we set the is_cube class variable to True for the spinning cube to be rendered to screen.

Notice that we also set the appropriate text for the robot to speak.

Lastly, we kick off another thread _speech_thread so that the robot can speak to us. Why another thread? Well, the robot takes a while to speak (and then pauses 4 seconds before speaking again) – but we don’t want the shape detection logic to halt whilst the robot chats.

Here is that _speech_thread:

# speech thread
def _speech_thread(self, text):

And that is that. Sporty Robot can now tell Arkwood all about the shape he has drawn onto paper, with a spinning 3D shape to boot.

Please check out the SaltwashAR Wiki for details on how to install and help develop the SaltwashAR Python Augmented Reality application.

The NeHe tutorial 3D Shapes helped with the rendering of the pyramid and cube.

Arkwod is chuffed to bits with the Shapes feature. ‘What a sterling educational tool it is! Can you add more shape detection to it?’

Of course, I replied. After all, I can’t have my chum unable to distinguish between a pentagon and a pomegranate.



The region of interest can be viewed by adding the following code to the _thread method, immediately after the assignment to the roi local variable:

cv2.imshow('shapes roi', roi)

For example, here’s the region of interest I was working with:


As you can see, the triangle is within our region of interest and ready for detection.