‘What the hell is a glyph?’ my angry neighbour Alistair said, a pair of hedge shears in his red raw hands.
Here’s a glyph:
Here’s another one:
‘What in damnation are they used for?’ he barked, jabbing at me with the blades.
Well, they can be put in places such as my smoking room:
And when a webcam spots them, we can superimpose an image on top:
Alistair bunched his sausage fingers and strolled over to my lawn with intent. ‘So, before I knock your block off, I will give you one last chance to tell me the purpose of such a venture.’ My neighbour is a very angry man. He drinks too much. His wife ran off with a sailor.
I explained that glyphs can be used for a number of purposes in computer vision. They can tell a robot its location, or instruct it what to do. But me, I want to use glyphs to render 2D and 3D images, so as to provide augmented reality.
There is a very interesting article over at AForge.NET on glyph recognition. AForge.NET is an open source C# framework for computer vision and artificial intelligence. In this post I am going to take a similar approach to the AForge.NET article, but instead use OpenCV and Python.
We will go through each stage in turn, inspecting the main code and its output. The supporting functions for the code will be at the foot of the post.
Stage 1: Read an image from our webcam
import cv2 from glyphfunctions import * from webcam import Webcam webcam = Webcam() webcam.start() QUADRILATERAL_POINTS = 4 SHAPE_RESIZE = 100.0 BLACK_THRESHOLD = 100 WHITE_THRESHOLD = 155 GLYPH_PATTERN = [0, 1, 0, 1, 0, 0, 0, 1, 1] while True: image = webcam.get_current_frame()
We start by adding import statements for OpenCV, our supporting functions and Webcam class (which we initialize). A few constant values are also set up, for later use.
Next, we drop into a while loop, so that we can constantly fetch images from our webcam and inspect them for the presence of glyphs.
Stage 2: Detect edges in image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (5,5), 0) edges = cv2.Canny(gray, 100, 200)
Now that we have a snap from our webcam, let’s convert it to grayscale, blur it and detect edges using Canny:
As you can see, without using GaussianBlur we end up with a lot more noise:
Stage 3: Find contours
contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True)[:10] for contour in contours:
OpenCV findContours allows us to form objects out of our edges. Here’s the first object detected:
The second object:
And the third:
In fact, we retrieve the top ten contours by area size and loop through them to attempt to find our glyph.
Stage 4: Shape check
perimeter = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, 0.01*perimeter, True) if len(approx) == QUADRILATERAL_POINTS:
Using OpenCV arcLength and approxPolyDP, we approximate the shape of our detected object. Objects that do not have four points will be discarded – after all, our glyphs are square-shaped.
Stage 5: Perspective warping
topdown_quad = get_topdown_quad(gray, approx.reshape(4, 2))
So now that we’ve found a quadrilateral object, we need to determine if it is a glyph. But in order to inspect our object, we really need to get a top-down view of it.
Thankfully our get_topdown_quad function uses OpenCV getPerspectiveTransform and warpPerspective to transform our object from:
Stage 6: Border check
resized_shape = resize_image(topdown_quad, SHAPE_RESIZE) if resized_shape[5, 5] > BLACK_THRESHOLD: continue
Next, I will resize the object to a consistent width of 100 pixels. My resize_image function makes use of OpenCV resize.
Once resized, we check inside the edge of our object for a dark pixel. If we do not find one, then we discard the object as it will not be a glyph (all glyphs have a black border).
Stage 7: Glyph pattern
glyph_found = False for i in range(4): glyph_pattern = get_glyph_pattern(resized_shape, BLACK_THRESHOLD, WHITE_THRESHOLD) if glyph_pattern == GLYPH_PATTERN: glyph_found = True break resized_shape = rotate_image(resized_shape, 90) if glyph_found:
We are getting close to detecting our glyph! All we have to do is rotate our glyph by 90 degrees, checking whether its pattern matches our constant:
GLYPH_PATTERN = [0, 1, 0, 1, 0, 0, 0, 1, 1]
But what does this pattern mean? Well, it tells us the series of black (0) and white (1) cells unique to our glyph from left to right, top to bottom. Can you see the dots in the image below, where my get_glyph_pattern function checks each cell of the glyph for a black or white pixel?
Stage 8: Substitute glyph
substitute_image = cv2.imread('substitute.jpg') image = add_substitute_quad(image, substitute_image, approx.reshape(4, 2)) break
Fantastic! We have detected our glyph and can now substitute it for a 2D image. Once our substitute is read from disk, we use our add_substitute_quad function to transform it from:
The substitute is added to the webcam snap, replacing our glyph.
Stage 9: Show augmented reality
cv2.imshow('2D Augmented Reality using Glyphs', image) cv2.waitKey(10)
All that is left to do is render our augmented image in a window:
On its own, the image is not that impressive. But with a continuous stream of frames from our webcam being interrogated, we can see augmented reality come to life!
AForge.NET have used glyph recognition as a foundation for 3D Augmented Reality.
I told my neighbour, Alistair, all that I had learnt. But he really couldn’t give a shit. He’s back outside, cutting his hedge vigorously whilst spitting venom.
What’s next to do? Well, if we want to detect the other glyph in the snap, we can add its pattern to a constant and tweak the code accordingly. And perhaps a bit of work to blend the 2D image into the scene. Also the border check could target more than one pixel, so as to be resilient to noise.
Here’s the supporting functions I promised:
import numpy as np import cv2 def order_points(points): s = points.sum(axis=1) diff = np.diff(points, axis=1) ordered_points = np.zeros((4,2), dtype="float32") ordered_points = points[np.argmin(s)] ordered_points = points[np.argmax(s)] ordered_points = points[np.argmin(diff)] ordered_points = points[np.argmax(diff)] return ordered_points def max_width_height(points): (tl, tr, br, bl) = points top_width = np.sqrt(((tr - tl) ** 2) + ((tr - tl) ** 2)) bottom_width = np.sqrt(((br - bl) ** 2) + ((br - bl) ** 2)) max_width = max(int(top_width), int(bottom_width)) left_height = np.sqrt(((tl - bl) ** 2) + ((tl - bl) ** 2)) right_height = np.sqrt(((tr - br) ** 2) + ((tr - br) ** 2)) max_height = max(int(left_height), int(right_height)) return (max_width,max_height) def topdown_points(max_width, max_height): return np.array([ [0, 0], [max_width-1, 0], [max_width-1, max_height-1], [0, max_height-1]], dtype="float32") def get_topdown_quad(image, src): # src and dst points src = order_points(src) (max_width,max_height) = max_width_height(src) dst = topdown_points(max_width, max_height) # warp perspective matrix = cv2.getPerspectiveTransform(src, dst) warped = cv2.warpPerspective(image, matrix, max_width_height(src)) # return top-down quad return warped def add_substitute_quad(image, substitute_quad, dst): # dst (zero-set) and src points dst = order_points(dst) (tl, tr, br, bl) = dst min_x = min(int(tl), int(bl)) min_y = min(int(tl), int(tr)) for point in dst: point = point - min_x point = point - min_y (max_width,max_height) = max_width_height(dst) src = topdown_points(max_width, max_height) # warp perspective (with white border) substitute_quad = cv2.resize(substitute_quad, (max_width,max_height)) warped = np.zeros((max_height,max_width, 3), np.uint8) warped[:,:,:] = 255 matrix = cv2.getPerspectiveTransform(src, dst) cv2.warpPerspective(substitute_quad, matrix, (max_width,max_height), warped, borderMode=cv2.BORDER_TRANSPARENT) # add substitute quad image[min_y:min_y + max_height, min_x:min_x + max_width] = warped return image def get_glyph_pattern(image, black_threshold, white_threshold): # collect pixel from each cell (left to right, top to bottom) cells =  cell_half_width = int(round(image.shape / 10.0)) cell_half_height = int(round(image.shape / 10.0)) row1 = cell_half_height*3 row2 = cell_half_height*5 row3 = cell_half_height*7 col1 = cell_half_width*3 col2 = cell_half_width*5 col3 = cell_half_width*7 cells.append(image[row1, col1]) cells.append(image[row1, col2]) cells.append(image[row1, col3]) cells.append(image[row2, col1]) cells.append(image[row2, col2]) cells.append(image[row2, col3]) cells.append(image[row3, col1]) cells.append(image[row3, col2]) cells.append(image[row3, col3]) # threshold pixels to either black or white for idx, val in enumerate(cells): if val < black_threshold: cells[idx] = 0 elif val > white_threshold: cells[idx] = 1 else: return None return cells def resize_image(image, new_size): ratio = new_size / image.shape return cv2.resize(image,(int(new_size),int(image.shape*ratio))) def rotate_image(image, angle): (h, w) = image.shape[:2] center = (w / 2, h / 2) rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1.0) return cv2.warpAffine(image, rotation_matrix, (w, h))
Adrian Rosebrock’s post 4 Point OpenCV getPerspective Transform Example was a great help in putting together the perspective warping code.
Here’s the Webcam class, which runs in a thread to avoid frame lag:
import cv2 from threading import Thread class Webcam: def __init__(self): self.video_capture = cv2.VideoCapture(0) self.current_frame = self.video_capture.read() # create thread for capturing images def start(self): Thread(target=self._update_frame, args=()).start() def _update_frame(self): while(True): self.current_frame = self.video_capture.read() # get the current frame def get_current_frame(self): return self.current_frame