Arkwood asked me to test him on shapes.
‘Okay,’ I said, ‘What is the name of a shape with five sides?’
Arkwood thought for a moment then replied, ‘Pomegranate.’
Oh dear – I’ll need to help him. I added a new shape detection feature to SaltwashAR – the Python Augmented Reality application – so that the robots can teach him the difference between a triangle and a square.
Here’s how it will work. Arkwood will draw a shape on a piece of paper and hold it up to the webcam attached to my PC. SaltwashAR will then detect the shape Arkwood has sketched, and Sporty Robot will announce the name of the shape through the computer speakers. Furthermore, our friendly robot will also render a 3D example of the shape e.g. a cube if Arkwood has drawn a square, a pyramid if Arkwood has drawn a triangle. Easy.
‘Don’t worry,’ I told my Belgian buddy, ‘you will soon be taught the name of each shape. No longer will you be the laughing stock of the village.’
Okay, so here’s the video of Sporty Robot in action, teaching Arkwood about the shapes he has drawn onto paper:
Nice work, Sporty!
When the square is successfully detected, the robot announces: “You have sketched a square, which I have used six times to build this cube”. The robot shows Arwood a spinning cube.
When the triangle is successfully detected, the robot announces: “You have drawn a triangle, which has helped me build this lovely pyramid”. The robot shows Arwood a spinning pyramid.
Note that a green line has been etched around each shape, to show it being detected.
But how the hell does the Shapes feature work? Let’s take a peek at the Python code a bit at a time…
from features.base import Feature, Speaking from shapesfunctions import * import numpy as np import cv2 from threading import Thread from time import sleep class Shapes(Feature, Speaking): # region of interest constants TOP_BORDER = 10 BOTTOM_BORDER = 10 LEFT_BORDER = 10 RIGHT_BORDER = 320 # shape constants TRIANGULAR_POINTS = 3 QUADRILATERAL_POINTS = 4 SHAPE_MIN_AREA = 100 def __init__(self, text_to_speech): Feature.__init__(self) Speaking.__init__(self, text_to_speech) self.is_pyramid = False self.is_cube = False self.rotation = 0 self.background_image = np.array() self.speech_thread = None
The Shapes class inherits from the Feature base class, which provides threading (all features run in threads so as not to block the main application process from rendering to screen). We also inherit from the Speaking base class, to let the robot’s mouth move when he speaks.
There are some constants defined, to handle shape detection – more on that later.
The class __init__ method is passed a Text To Speech parameter, so that the robot can talk to Arkwood. We have class variables to control the rendering of the spinning pyramid and cube, along with a background image to draw the green detection lines upon.
# start thread def start(self, args=None): Feature.start(self, args) # draw rotating pyramid or cube self.rotation += 1 if self.is_pyramid: draw_pyramid(self.rotation) elif self.is_cube: draw_cube(self.rotation) # stop thread def stop(self): Feature.stop(self) self.background_image = np.array()
Okay, so next we override the thread start and stop methods. Why? Well, we cannot draw the spinning pyramid and cube within the feature’s thread, as it is not the correct context for OpenGL commands – so instead we draw them in the start method, which is executed on every screen render.
The actual OpenGL commands to draw the pyramid and cube are in a supporting functions file, which we call via draw_pyramid(self.rotation) and draw_cube(self.rotation). Notice how the rotation variable is being incremented on each screen render, so as to spin the 3D shapes.
The stop method clears the background image when the robot is no longer in front of the webcam.
# run thread def _thread(self, args): image = args # get region of interest height, width = image.shape[:2] roi = image[self.TOP_BORDER:height-self.BOTTOM_BORDER, self.LEFT_BORDER:width-self.RIGHT_BORDER] # detect edges gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (5,5), 0) edges = cv2.Canny(gray, 100, 200) # get contours contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True)[:6] # find shape shape_contour = np.array() shape_points = 0 for contour in contours: perimeter = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, 0.01*perimeter, True) shape_points = len(approx) if shape_points == self.TRIANGULAR_POINTS or shape_points == self.QUADRILATERAL_POINTS: shape_contour = contour break
Now we are at the _thread method, where all the shit happens (as a geometry teacher would say).
We start by collecting the original background image from our webcam as a parameter, so that we can draw the green detection lines upon it.
Once we have the image’s width and height, we can use our aforementioned constants to cut out a region of interest in which to go searching for shapes. But why are we not looking at the whole of the webcam image for shapes? Well, the image also has a square 2D marker in it, to project the robot upon, which we don’t want to detect. So we will just use the right-hand side of the image for shape detection.
Great. Let’s go ahead and detect our shapes using OpenCV computer vision. The Canny Edge Detection and Contours functions help us pick out candidate shapes on the paper. Looping through our contours, we can determine whether any of those candidates have three points (a triangle), or four points (a square) – if so, we save the location of the shape within our region of interest and bail out.
Now, yes, I know that a shape with four points is not necessarily a square. It could be a rectangle. Or a diamond. But for now we will simply treat all quadrilaterals as squares.
Let’s see the rest of the thread function:
# if shape found... if shape_contour.size > 0 and cv2.contourArea(shape_contour) >= self.SHAPE_MIN_AREA: # draw green line around shape cv2.drawContours(roi, shape_contour, -1, (0, 255, 0), 3) image[self.TOP_BORDER:height-self.BOTTOM_BORDER, self.LEFT_BORDER:width-self.RIGHT_BORDER] = roi self.background_image = image # draw pyramid or cube text = None if shape_points == self.TRIANGULAR_POINTS: self.is_pyramid = True self.is_cube = False text = "You have drawn a triangle, which has helped me build this lovely pyramid" else: self.is_pyramid = False self.is_cube = True text = "You have sketched a square, which I have used six times to build this cube" # tell user about shape if not self.speech_thread or not self.speech_thread.is_alive(): self.speech_thread = Thread(target=self._speech_thread, args=(text,)) self.speech_thread.start() else: self.is_pyramid = False self.is_cube = False self.background_image = np.array()
We check whether we have detected a shape of suitable size (and not just some squiggle on the paper).
If so, we draw a green detection line around the shape, and embed the region of interest back into the original background image, ready to render to screen.
If the shape is a triangle, we set the is_pyramid class variable to True so that the spinning pyramid can be rendered to screen.
Otherwise the shape is a square, so we set the is_cube class variable to True for the spinning cube to be rendered to screen.
Notice that we also set the appropriate text for the robot to speak.
Lastly, we kick off another thread _speech_thread so that the robot can speak to us. Why another thread? Well, the robot takes a while to speak (and then pauses 4 seconds before speaking again) – but we don’t want the shape detection logic to halt whilst the robot chats.
Here is that _speech_thread:
# speech thread def _speech_thread(self, text): self._text_to_speech(text) sleep(4)
And that is that. Sporty Robot can now tell Arkwood all about the shape he has drawn onto paper, with a spinning 3D shape to boot.
Please check out the SaltwashAR Wiki for details on how to install and help develop the SaltwashAR Python Augmented Reality application.
The NeHe tutorial 3D Shapes helped with the rendering of the pyramid and cube.
Arkwod is chuffed to bits with the Shapes feature. ‘What a sterling educational tool it is! Can you add more shape detection to it?’
Of course, I replied. After all, I can’t have my chum unable to distinguish between a pentagon and a pomegranate.
The region of interest can be viewed by adding the following code to the _thread method, immediately after the assignment to the roi local variable:
cv2.imshow('shapes roi', roi) cv2.waitKey(2000) cv2.destroyAllWindows()
For example, here’s the region of interest I was working with:
As you can see, the triangle is within our region of interest and ready for detection.