, , , , , , , , , , , , ,

‘What the hell is a glyph?’ my angry neighbour Alistair said, a pair of hedge shears in his red raw hands.

Here’s a glyph:


Here’s another one:


‘What in damnation are they used for?’ he barked, jabbing at me with the blades.

Well, they can be put in places such as my smoking room:


And when a webcam spots them, we can superimpose an image on top:


Alistair bunched his sausage fingers and strolled over to my lawn with intent. ‘So, before I knock your block off, I will give you one last chance to tell me the purpose of such a venture.’ My neighbour is a very angry man. He drinks too much. His wife ran off with a sailor.

I explained that glyphs can be used for a number of purposes in computer vision. They can tell a robot its location, or instruct it what to do. But me, I want to use glyphs to render 2D and 3D images, so as to provide augmented reality.

There is a very interesting article over at AForge.NET on glyph recognition. AForge.NET is an open source C# framework for computer vision and artificial intelligence. In this post I am going to take a similar approach to the AForge.NET article, but instead use OpenCV and Python.

We will go through each stage in turn, inspecting the main code and its output. The supporting functions for the code will be at the foot of the post.

Stage 1: Read an image from our webcam

import cv2
from glyphfunctions import *
from webcam import Webcam

webcam = Webcam()

GLYPH_PATTERN = [0, 1, 0, 1, 0, 0, 0, 1, 1]

while True:

    image = webcam.get_current_frame()

We start by adding import statements for OpenCV, our supporting functions and Webcam class (which we initialize). A few constant values are also set up, for later use.

Next, we drop into a while loop, so that we can constantly fetch images from our webcam and inspect them for the presence of glyphs.

Stage 2: Detect edges in image

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5,5), 0)
edges = cv2.Canny(gray, 100, 200)

Now that we have a snap from our webcam, let’s convert it to grayscale, blur it and detect edges using Canny:


As you can see, without using GaussianBlur we end up with a lot more noise:


Stage 3: Find contours

contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)[:10]

for contour in contours:

OpenCV findContours allows us to form objects out of our edges. Here’s the first object detected:


The second object:


And the third:


In fact, we retrieve the top ten contours by area size and loop through them to attempt to find our glyph.

Stage 4: Shape check

perimeter = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.01*perimeter, True)

if len(approx) == QUADRILATERAL_POINTS:

Using OpenCV arcLength and approxPolyDP, we approximate the shape of our detected object. Objects that do not have four points will be discarded – after all, our glyphs are square-shaped.

Stage 5: Perspective warping

topdown_quad = get_topdown_quad(gray, approx.reshape(4, 2))

So now that we’ve found a quadrilateral object, we need to determine if it is a glyph. But in order to inspect our object, we really need to get a top-down view of it.

Thankfully our get_topdown_quad function uses OpenCV getPerspectiveTransform and warpPerspective to transform our object from:




Stage 6: Border check

resized_shape = resize_image(topdown_quad, SHAPE_RESIZE)
if resized_shape[5, 5] > BLACK_THRESHOLD: continue

Next, I will resize the object to a consistent width of 100 pixels. My resize_image function makes use of OpenCV resize.

Once resized, we check inside the edge of our object for a dark pixel. If we do not find one, then we discard the object as it will not be a glyph (all glyphs have a black border).

Stage 7: Glyph pattern

glyph_found = False

for i in range(4):
    glyph_pattern = get_glyph_pattern(resized_shape, BLACK_THRESHOLD, WHITE_THRESHOLD)
    if glyph_pattern == GLYPH_PATTERN: 
        glyph_found = True
    resized_shape = rotate_image(resized_shape, 90)

if glyph_found:

We are getting close to detecting our glyph! All we have to do is rotate our glyph by 90 degrees, checking whether its pattern matches our constant:

GLYPH_PATTERN = [0, 1, 0, 1, 0, 0, 0, 1, 1]

But what does this pattern mean? Well, it tells us the series of black (0) and white (1) cells unique to our glyph from left to right, top to bottom. Can you see the dots in the image below, where my get_glyph_pattern function checks each cell of the glyph for a black or white pixel?


Stage 8: Substitute glyph

substitute_image = cv2.imread('substitute.jpg')
image = add_substitute_quad(image, substitute_image, approx.reshape(4, 2))

Fantastic! We have detected our glyph and can now substitute it for a 2D image. Once our substitute is read from disk, we use our add_substitute_quad function to transform it from:




The substitute is added to the webcam snap, replacing our glyph.

Stage 9: Show augmented reality

cv2.imshow('2D Augmented Reality using Glyphs', image)

All that is left to do is render our augmented image in a window:


On its own, the image is not that impressive. But with a continuous stream of frames from our webcam being interrogated, we can see augmented reality come to life!

AForge.NET have used glyph recognition as a foundation for 3D Augmented Reality.

I told my neighbour, Alistair, all that I had learnt. But he really couldn’t give a shit. He’s back outside, cutting his hedge vigorously whilst spitting venom.



What’s next to do? Well, if we want to detect the other glyph in the snap, we can add its pattern to a constant and tweak the code accordingly. And perhaps a bit of work to blend the 2D image into the scene. Also the border check could target more than one pixel, so as to be resilient to noise.

Here’s the supporting functions I promised:

import numpy as np
import cv2

def order_points(points):

    s = points.sum(axis=1)
    diff = np.diff(points, axis=1)
    ordered_points = np.zeros((4,2), dtype="float32")

    ordered_points[0] = points[np.argmin(s)]
    ordered_points[2] = points[np.argmax(s)]
    ordered_points[1] = points[np.argmin(diff)]
    ordered_points[3] = points[np.argmax(diff)]

    return ordered_points

def max_width_height(points):

    (tl, tr, br, bl) = points

    top_width = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    bottom_width = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    max_width = max(int(top_width), int(bottom_width))

    left_height = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    right_height = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    max_height = max(int(left_height), int(right_height))

    return (max_width,max_height)

def topdown_points(max_width, max_height):
    return np.array([
        [0, 0],
        [max_width-1, 0],
        [max_width-1, max_height-1],
        [0, max_height-1]], dtype="float32")

def get_topdown_quad(image, src):

    # src and dst points
    src = order_points(src)

    (max_width,max_height) = max_width_height(src)
    dst = topdown_points(max_width, max_height)
    # warp perspective
    matrix = cv2.getPerspectiveTransform(src, dst)
    warped = cv2.warpPerspective(image, matrix, max_width_height(src))

    # return top-down quad
    return warped

def add_substitute_quad(image, substitute_quad, dst):

    # dst (zero-set) and src points
    dst = order_points(dst)
    (tl, tr, br, bl) = dst
    min_x = min(int(tl[0]), int(bl[0]))
    min_y = min(int(tl[1]), int(tr[1]))

    for point in dst:
        point[0] = point[0] - min_x
        point[1] = point[1] - min_y

    (max_width,max_height) = max_width_height(dst)
    src = topdown_points(max_width, max_height)

    # warp perspective (with white border)
    substitute_quad = cv2.resize(substitute_quad, (max_width,max_height))

    warped = np.zeros((max_height,max_width, 3), np.uint8)
    warped[:,:,:] = 255

    matrix = cv2.getPerspectiveTransform(src, dst)
    cv2.warpPerspective(substitute_quad, matrix, (max_width,max_height), warped, borderMode=cv2.BORDER_TRANSPARENT)

    # add substitute quad
    image[min_y:min_y + max_height, min_x:min_x + max_width] = warped

    return image

def get_glyph_pattern(image, black_threshold, white_threshold):

    # collect pixel from each cell (left to right, top to bottom)
    cells = []
    cell_half_width = int(round(image.shape[1] / 10.0))
    cell_half_height = int(round(image.shape[0] / 10.0))

    row1 = cell_half_height*3
    row2 = cell_half_height*5
    row3 = cell_half_height*7
    col1 = cell_half_width*3
    col2 = cell_half_width*5
    col3 = cell_half_width*7

    cells.append(image[row1, col1])
    cells.append(image[row1, col2])
    cells.append(image[row1, col3])
    cells.append(image[row2, col1])
    cells.append(image[row2, col2])
    cells.append(image[row2, col3])
    cells.append(image[row3, col1])
    cells.append(image[row3, col2])
    cells.append(image[row3, col3])

    # threshold pixels to either black or white
    for idx, val in enumerate(cells):
        if val < black_threshold:
            cells[idx] = 0
        elif val > white_threshold:
            cells[idx] = 1
            return None

    return cells

def resize_image(image, new_size):
    ratio = new_size / image.shape[1]
    return cv2.resize(image,(int(new_size),int(image.shape[0]*ratio)))

def rotate_image(image, angle):
    (h, w) = image.shape[:2]
    center = (w / 2, h / 2)
    rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
    return cv2.warpAffine(image, rotation_matrix, (w, h))

Adrian Rosebrock’s post 4 Point OpenCV getPerspective Transform Example was a great help in putting together the perspective warping code.

Here’s the Webcam class, which runs in a thread to avoid frame lag:

import cv2
from threading import Thread
class Webcam:
    def __init__(self):
        self.video_capture = cv2.VideoCapture(0)
        self.current_frame = self.video_capture.read()[1]
    # create thread for capturing images
    def start(self):
        Thread(target=self._update_frame, args=()).start()
    def _update_frame(self):
            self.current_frame = self.video_capture.read()[1]
    # get the current frame
    def get_current_frame(self):
        return self.current_frame