, , , , , , , , , , , , ,

In my last post, I was able to create a disparity map from a stereo image. An OpenCV Disparity Map can determine which objects are nearest to the stereo webcams by calculating the shift between the object from ‘left eye’ and ‘right eye’ perspective – the bigger the shift, the nearer the object.

In this post, I am going to calculate the disparity between a series of stereo images. But – on top of that – I am going to apply OpenCV Thresholding to the images, so as to remove background objects. And then I am going to apply OpenCV Morphological Transformations, so as to remove noise. I will end up with a series of stereo images which will clearly show the foreground object, ready for tracking.

Okay, so first up, here’s an example of the stereo images we will be handling:


As you can see, we have two Feng Shui tubes of incense sticks sitting on a table – an image from the left eye perspective and an image from the right eye perspective.

But how in God’s good name do we capture these images? Well, we can use ArkwoodAR, a Python Augmented Reality application for Google Cardboard. Here’s the cardboard glasses, fitted with a ‘left eye’ webcam and a ‘right eye’ webcam:


Splendid. Now we can capture a series of stereo images of the incense tubes. I’ve updated the ArkwoodAR code to do just that:

# increment counter
self.counter += 1

# save stereo image
filename = str(self.counter).zfill(4)

cv2.imwrite('image_left/image_left_{}.png'.format(filename), image_one)
cv2.imwrite('image_right/image_right_{}.png'.format(filename), image_two)

Pretty simple. We use OpenCV to save the stereo images from ArkwoodAR’s left and right webcams. Notice that we use zfill to pad our counter filename with leading zeros, so as to maintain sort order e.g. image_left_0001.png, image_left_0002.png

With our series of stereo images saved to disk, I can write a Python script that will load them in turn and compute disparity:

import cv2
import numpy as np

# disparity settings
window_size = 5
min_disp = 32
num_disp = 112-min_disp
stereo = cv2.StereoSGBM(
    minDisparity = min_disp,
    numDisparities = num_disp,
    SADWindowSize = window_size,
    uniquenessRatio = 10,
    speckleWindowSize = 100,
    speckleRange = 32,
    disp12MaxDiff = 1,
    P1 = 8*3*window_size**2,
    P2 = 32*3*window_size**2,
    fullDP = False

# morphology settings
kernel = np.ones((12,12),np.uint8)

counter = 450

while counter < 650:

    # increment counter
    counter += 1

    # only process every third image (so as to speed up video)
    if counter % 3 != 0: continue

    # load stereo image
    filename = str(counter).zfill(4)

    image_left = cv2.imread('image_left/image_left_{}.png'.format(filename))
    image_right = cv2.imread('image_right/image_right_{}.png'.format(filename))

    # compute disparity
    disparity = stereo.compute(image_left, image_right).astype(np.float32) / 16.0
    disparity = (disparity-min_disp)/num_disp

After importing the OpenCV Computer Vision package and NumPy scientific package, I create a stereo object with disparity settings. I also set a kernel for use with the morphological transformation.

I start at image number 450, so as to avoid unwanted images, and loop through to image number 650. Notice that I am using the Python modulus operator to control the number of images that will be processed – in this example, every third image – so as to speed up the series of images that will play as a video.

Once each stereo image is loaded, it is computed for disparity. Here’s an example:


The disparity map shows the incense tube on the right in bright white pixels – bright white pixels means that the tube is near the webcams.

The disparity map shows the incense tube on the left in dark grey pixels – dark grey pixels means that the tube is not near the webcams.

The ‘left eye’ and ‘right eye’ images above the disparity map confirm that the right tube has more shift than the left tube – hence the right tube is calculated as nearer the webcams.

Notice that the front of the table is also shown in bright white pixels. Again, its shift between the left and right images has confirmed it is near the webcams.

Next, we can use OpenCV Thresholding to remove all grey pixels from the disparity map, so that we are left only with the objects near the webcams:

# apply threshold
threshold = cv2.threshold(disparity, 0.6, 1.0, cv2.THRESH_BINARY)[1]

And here is the result of thresholding:


Much better. Now we can concentrate on the incense tube nearer the webcams. But there is still some noise in the disparity map – for example, the bright white pixels at the far left of the image.

Finally, we can use OpenCV Morphological Transformations to remove the noise from the disparity map:

# apply morphological transformation
morphology = cv2.morphologyEx(threshold, cv2.MORPH_OPEN, kernel)

And the result of morphology:


Perfect. Now we have a disparity map, with the incense tube nearest to the webcams and nothing else (well, apart from the front of the table).

From here, we can use OpenCV computer vision to track the foreground object.



Here’s a video of the series of stereo images. First we compute disparity. Next we apply thresholding. Finally we apply morphological transformation.

And the OpenCV code that displays the images:

# show images
cv2.imshow('left eye', image_left)
cv2.imshow('right eye', image_right)
cv2.imshow('disparity', disparity)
cv2.imshow('threshold', threshold)
cv2.imshow('morphology', morphology)

And the highlight of my weekend (which, shockingly, was not disparity maps).