Cascade Classifier, Cygwin, Dileep Kumar, Google Text-To-Speech (TTS), Guitar Detection, Haar Cascades, Naotoshi Seo, Object Detection, OpenCV, OpenCV for Windows, Python, Raspberry Pi, Robin Mehner, Webcam
Arkwood said, ‘The Glastonbury Festival is shit. I just wanna watch rock ‘n’ roll, but they’re puttin’ on all sorts of pop and rap crap these days.’ I didn’t much care for his bad language, but nevertheless I replied, ‘Not to worry. I will write some Python code on my Raspberry Pi that will detect when guitars are being played onstage during the Glastonbury TV coverage. That way, you can just watch the rock ‘n’ roll bits’.
So I need some way of detecting when there are guitars on the TV screen. For this I will use OpenCV object detection. Now, I have previously posted about using OpenCV face detection in order to greet my postman. But now I will need to create my own OpenCV haar cascade classifier for detecting guitars. Thankfully there are some great posts already out in the wild, to help me get started.
I began by reading Dileep Kumar’s article on creating a classifier for detecting a pen. What I really like about this article is that it gets straight to the point, with a relatively simple example to get you up and running. As Dileep mentions, he needs far more images to make the classifier robust. But to get the basics in place, it’s well worth a read.
I then spent some considerable time on Naotoshi Seo’s OpenCV haartraining tutorial. So many other articles online make reference this tutorial – and you can understand why. It is an extensive piece, incorporating some of Naotoshi’s scripts and resources which help in creating a classifier.
I also studied Robin Mehner’s post, which is a fantastically clear example of a classifier for a banana. This post really brought it all together for me, because, after a weekend of researching OpenCV haar cascade classifiers the head really starts to hurt. It’s just one of those things that you have to put a lot of time into, plus some sweat and tears. Well, maybe not tears. But almost.
Anyhow, on with the show.
So, first I take some snaps of my guitars, namely a Fender Stratocaster electric guitar, a Tanglewood Rebel 4K bass guitar and a Washburn acoustic. Maybe I should stick to just classifying electric guitars? We’ll see how it goes.
I took 15 photos in all (five of each guitar) – these will be my cropped ‘positive’ .png images.
For the ‘negative’ images, I took around 200 random snaps of my kitchen, garden, toilet U-bend – anywhere really, as long as there wasn’t a guitar in shot.
Next up, I downloaded Cygwin, which allows me to run Linux tools on my Windows 7 operating system. The setup exe provides a means of downloading the software from a selection of mirrored sites, as well as obtaining specific packages as and when I need them.
I’ll also need to grab the latest version (2.4.9) of Open CV for Windows, to make use of the bin folder dlls and exes.
Using the Cygwin terminal, I created .dat files which contained the paths to my positive and negative images:
find ./Positive_Images -name '*.png' >positives.dat find ./Negative_Images -name '*.png' >negatives.dat
Next I used the terminal to execute a helpful script from Naotoshi Seo, which generates the required .vec files.
perl createtrainsamples.pl positives.dat negatives.dat samples 250 "./opencv_createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 40 -h 20"
I needed some way of combining these .vec files using Cygwin, and once again Naotoshi Seo comes to the rescue:
find samples/ -name '*.vec' > samples.dat ./mergevec.exe samples.dat samples.vec
Now we are ready to create our haar cascade classifier for our guitars. Let’s give these settings a try within a Windows command prompt:
opencv_haartraining -data guitarcascade -vec samples.vec -bg negatives.dat -nstages 12 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 250 -nneg 99 -w 40 -h 20 -nonsym -mem 2048 -mode ALL
So let’s drum our fingers on the table and wait for the training process to complete on Arkwood’s PC. Whilst we are waiting, I should mention that I transferred my files to Arkwood’s box to do the training, as it is a much better spec than mine. i5 chipset, 8GB RAM, he says.
Hurray! The training has completed and we have a guitarcascade.xml output file – our very own guitar classifier.
So, how exactly are we going to use it to detect guitars on a TV screen? Well, this is where our Python code and the Raspberry Pi come in. I’ll attach a webcam to my Pi, pointing it towards the TV screen. Then I’ll run some Python on my Pi – which will take pictures of the TV screen with said webcam, and then use the guitar classifier to determine if there are any guitars in shot. If there are guitars, the program will inform Arkwood by way of announcing a message through a set of speakers attached to the Pi.
Here’s the main program:
from webcam import Webcam from speech import Speech webcam = Webcam() speech = Speech() # wait until guitars detected while (webcam.detect_guitars() == False): print ("no guitars detected yet") # now tell Arkwood speech.text_to_speech("come and watch some rock and roll")
And here’s the Webcam class that detects guitars:
import cv2 from datetime import datetime class Webcam(object): WINDOW_NAME = "Arkwood's Surveillance System" # constructor def __init__(self): self.webcam = cv2.VideoCapture(0) # save image to disk def _save_image(self, path, image): filename = datetime.now().strftime('%Y%m%d_%Hh%Mm%Ss%f') + '.jpg' cv2.imwrite(path + filename, image) # detect guitars in webcam def detect_guitars(self): # get image from webcam img = self.webcam.read() # do guitar detection guitar_cascade = cv2.CascadeClassifier('haarcascade_guitar.xml') gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) guitars = guitar_cascade.detectMultiScale(gray, 1.3, 5) for (x,y,w,h) in guitars: cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2) # save image to disk self._save_image('WebCam/Detection/', img) # show image in window cv2.imshow(self.WINDOW_NAME, img) cv2.waitKey(2000) # tidy and quit cv2.destroyAllWindows() if len(guitars) == 0: return False return True
To announce a message to Arkwood, I use Google’s Text To Speech service:
from subprocess import PIPE, call import urllib class Speech(object): # converts text to speech def text_to_speech(self, text): try: # truncate text as google only allows 100 chars text = text[:100] # encode the text query = urllib.quote_plus(text) # build endpoint endpoint = "http://translate.google.com/translate_tts?tl=en&q=" + query # debug print(endpoint) # get google to translate and mplayer to play call(["mplayer", endpoint], shell=False, stdout=PIPE, stderr=PIPE) except: print ("Error translating text")
I am targeting Python 2.7.3 and OpenCV 2.3 on the Raspberry Pi.
So I guess you’ll want to ask me, How the devil did it perform?
Okay, I admit, it has not detected a guitar as such. In fact, it has detected Joey Ramone holding a baseball bat during a live performance of his song Beat on the Brat. But this is in the early stages of testing, using a YouTube video and a small sample of positive and negative images. I remain hopeful.
I showed Arkwood the results. ‘Hey, even better!’ my malnourished buddy said, ‘Let’s make it recognise weapons, like guns and knives. And dead bodies!’ I told him No, I shall stick to the more timid grounds of rock ‘n’ roll, and aim to have the guitar classifier ready for Glastonbury Festival 2014. I’ll update the blog with my progress, as I gain a firmer grip on OpenCV.
OpenCV provides a convert_cascade tool for generating a partial classifier file during training, in case you want to check up on progress. It also provides an opencv_performance tool which you can run against the completed classifier file to determine if it is likely to be a dud or a roaring success. Again, Naotoshi Seo’s tutorial does a great job of detailing these tools.
Resources such as mergevec.exe (incl. dependencies highgui100.dll and cxcore100.dll) can be found on GitHub (Google blocked me downloading resources on Naotoshi Seo’s tutorial).
I got some of extra packages for Cygwin using a command line approach highlighted in jessies wiki.
I converted my images from .jpg to .png using the ImageMagick Mogrify Command-Line Tool from within the Cygwin terminal:
mogrify -format png *.JPG
I was able to view images in my .vec file with the following Windows command:
opencv_createsamples -vec samples.vec -w 40 -h 20