, , , , , , , , , , , , ,

Arkwood was delighted with my endeavours to accost the postman at his front door – at least, that is what I thought. So it came as a surprise when he told me this morning, ‘Fuckin’ piece of shite, your code. A car caused it to trigger’. Granted, I hadn’t accounted for an automobile engaging with the system, and so set about introducing face detection to the Python program, so that it only struck up conversation with humans.

The OpenCV Python Tutorials provided the code, which I wrapped into my Webcam class:

import cv2
from datetime import datetime

class Webcam(object):

    WINDOW_NAME = "Arkwood's Surveillance System"

    # constructor
    def __init__(self):
        self.webcam = cv2.VideoCapture(0)
    # save image to disk
    def _save_image(self, path, image):
        filename = datetime.now().strftime('%Y%m%d_%Hh%Mm%Ss%f') + '.jpg'
        cv2.imwrite(path + filename, image)

    # obtain changes between images
    def _delta(self, t0, t1, t2):
        d1 = cv2.absdiff(t2, t1)
        d2 = cv2.absdiff(t1, t0)
        return cv2.bitwise_and(d1, d2)

    # detect faces in webcam
    def detect_faces(self):

        # get image from webcam
        img = self.webcam.read()[1]
        # do face/eye detection
        face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
        eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

        gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.3, 5)

        for (x,y,w,h) in faces:


            roi_gray = gray[y:y+h, x:x+w]
            roi_color = img[y:y+h, x:x+w]
            eyes = eye_cascade.detectMultiScale(roi_gray)

            for (ex,ey,ew,eh) in eyes:

        # save image to disk
        self._save_image('WebCam/Detection/', img)

        # show image in window
        cv2.imshow(self.WINDOW_NAME, img)

        # tidy and quit

        if len(faces) == 0:
            return False

        return True

    # wait until motion is detected 
    def detect_motion(self):

        # set motion threshold
        threshold = 170000

        # hold three b/w images at any one time
        t_minus = cv2.cvtColor(self.webcam.read()[1], cv2.COLOR_RGB2GRAY)
        t = cv2.cvtColor(self.webcam.read()[1], cv2.COLOR_RGB2GRAY)
        t_plus = cv2.cvtColor(self.webcam.read()[1], cv2.COLOR_RGB2GRAY)

        # now let's loop until we detect some motion
        while True:
          # obtain the changes between our three images 
          delta = self._delta(t_minus, t, t_plus)
          # display changes in surveillance window
          cv2.imshow(self.WINDOW_NAME, delta)

          # obtain white pixel count i.e. where motion detected
          count = cv2.countNonZero(delta)

          # debug
          #print (count)

          # if the threshold has been breached, save some snaps to disk
          # and get the hell out of function...
          if (count > threshold):

              self._save_image('WebCam/Motion/', delta)
              self._save_image('WebCam/Photograph/', self.webcam.read()[1])

              return True

          # ...otherise, let's handle a new snap
          t_minus = t
          t = t_plus
          t_plus = cv2.cvtColor(self.webcam.read()[1], cv2.COLOR_RGB2GRAY)

The xml files required for face and eye detection can be found on GitHub: haarcascade_frontalface_default.xml and haarcascade_eye.xml

The Webcam class now has two public functions, detect_motion and detect_faces. detect_motion was discussed in my previous post, and is all to do with waiting until something moves in front of the webcam and triggers a threshold. detect_faces is the new function, which takes a snap from the webcam and determines whether the motion that triggered the threshold was a human or, say, vehicle. If it’s got a face and eyes then it’s a human, was my logic.

The rest of the program flows as in said previous post, using Google’s text to speech and speech to text services to converse with the visitor. Lovely.

from webcam import Webcam
from speech import Speech

webcam = Webcam()
speech = Speech() 

# wait until motion detected at front door

# if faces at front door
if (webcam.detect_faces()):
    # ask visitor for identification
    speech.text_to_speech("State your name punk")

    # capture the visitor's reply
    visitor_name = speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh')

    # ask visitor if postman
    speech.text_to_speech("Are you the postman " + visitor_name)

    # capture the visitor's reply
    is_postman = speech.speech_to_text('/home/pi/PiAUISuite/VoiceCommand/speech-recog.sh')

    # if postman, provide instruction
    if (is_postman == "yes"):
        speech.text_to_speech("Please leave the parcel at the back gate and leave")
        speech.text_to_speech("Fuck off")

Before installing the system to monitor my Belgian friend’s front door, I thought it best to take it for a test run. Click the image to see me playing the part of the postman and having my face clocked.


It worked a charm, so I took the gear round to Arkwood’s and set it up to wait for his first visitor. Rubbing my hands with glee, I trotted home and waited for my buddy to telephone with the good news.

The telephone rung a few hours later. Arkwood was in a rage.

‘A fuckin’ squirrel set the motion detection off! What a crock of crap.’

Hm. I hadn’t accounted for a furry rodent having a face and a set of eyes. Damn. Still, it gives me a good excuse to have a go at some facial recognition. Maybe I’ll even get a match on the postman without having to ask him for identification.