Binocular Cues, Convergence, Depth from motion, Depth Perception, Disparity, Lighting and Shading, Monocular Cues, Motion Parallax, Occlusion, Perspective, Relative Size, Stereo image, Stereopsis, Texture Gradient
In my post, Disparity of stereo images with Python and OpenCV, I was able to calculate the disparity of a stereo image:
As you can see, the stereo image at the top comprises of a ‘left eye’ and ‘right eye’ view of two tubes of incense sticks sitting on a table.
The disparity map below the stereo image shows the tube nearest to the camera in bright white and the tube further from the camera in dark grey. It can calculate how near an object is to the camera by the amount it shifts between the left eye and right eye view. Objects nearer the camera shift more than objects further away.
But how does the brain perceive depth in the images it receives from the eyes? Turns out there are many depth cues.
There are monocular cues in the images it receives from just one eye. Here’s what the left eye sees:
And there are binocular cues in the images it receives from both eyes. Here’s what both eyes can see:
I will try to determine how the cues in the Wikipedia Depth perception article are represented in my image of two tubes of incense sticks sitting on a table. If I make a dog’s breakfast of it, be sure to let me know!
Motion parallax (monocular cues)
If I was to walk past the tubes, the tube on the right would shift more against the background than the tube on the left. Motion parallax has indicated that the tube on the right is nearer to me.
Depth from motion (monocular cues)
If the table was on wheels and coming towards me, the tubes would expand in size. Depth from motion would allow me to perceive the distance of the tubes and how soon they are to hit me in the face.
Perspective (monocular cues)
The table narrows towards the wall and plug socket. Perspective indicates that the tube on the right is nearer to me, as it sits on a wider part of the table.
Curvilinear perspective (monocular cues)
Parallel lines such as the table’s can curve as they reach the outer extremes of our visual field – this distortion provides depth information.
Elevation (monocular cues)
Imagine if the tubes were sitting on a beach, with the blue sea and cloudy sky behind them separated by the horizon. The closer a tube is to the horizon, the further I perceive it to be from me.
Relative size (monocular cues)
We might not know the exact size of the tubes, but we know they are both the same size. Relative size tells us that the right tube is nearer to me, as it’s taller.
Familiar size (monocular cues)
We already know the size of a tube of Feng Shui incense sticks, right? This previous knowledge can be used to confirm the depth of the tubes in the image.
Absolute size (monocular cues)
We might not know the size of a tube of Feng Shui incense sticks. And we might not have another tube in the image to compare it against. But if the tube is big then it seems nearer.
Occlusion (monocular cues)
Both tubes overlap the rear edge of the table, and there is a hint of the table corner overlapping the plug socket on the wall. Occlusion means we can rank the relative nearness of objects due to how objects overlap each other.
Texture gradient (monocular cues)
See how clearer the picture and writing is on the right tube, compared to the left tube. Texture gradient suggests that the right tube is nearer to me.
Aerial perspective (monocular cues)
Does the left tube seem more hazy to you? It does to me. The lower luminance contrast and lower color saturation of the left tube indicates that it is further away.
Defocus blur (monocular cues)
Do I defocus on the left tube? Is it blurry? Due to the depth of focus in my eyes being limited, this may indicate that the left tube is further away.
Lighting and shading (monocular cues)
Can you see the shadow cast by each tube? The left tube has a long shadow, stretching across the table towards me. The right tube has a short shadow towards me. Lighting and shading helps determine the position of objects in space.
Kinetic depth effect (monocular cues)
What if we rotated a tube and cast its shadows onto a translucent screen – would the depth of its 3D shape be revealed?
Stereopsis, or retinal (binocular) disparity, or binocular parallax (binocular cues)
I am looking at the tubes from the perspective of my left eye and my right eye. The distance between the left tube and a table’s rear corner does not shift too much between the left eye and right eye view. However, the distance between the right tube and a table’s rear corner shifts a good deal between the left eye and right eye view. Stereopsis – the disparity between my left eye and my right eye view – informs me that the right tube is nearer to me.
Shadow Stereopsis (binocular cues)
But what if the disparity between my left eye and my right eye view was the same for both the left and right tube? Well, a difference in the shadows they cast can be used to impart depth perception.
Convergence (binocular cues)
The muscles for my eyes will stretch to focus on an object, when bringing together the left eye and right eye view. The difference in shift of the left tube from the centre point is not as great as the difference in shift of the right tube from the centre point. Convergence of the right tube has stretched the eye muscles more, the sensation confirming that it is nearer to me.
Accommodation (monocular cues)
Muscles stretch our eye lens thinner as we focus on the left tube, the sensation confirming that it is further away than the right tube.