‘Facial Recognition’ is a term which has become almost commonplace in the tech world, but for some it can still seem complicated and intimidating.
There are a wide variety of libraries out there which combined with a little bit of smarts that make it achievable.
There is still a reasonable barrier to entry with face recognition, if you’re a complete non-techy then you might struggle to follow this guide.
Create a system that can detect and identify faces from a real-time video stream.
- We could only use open-source tools
- It must be able to run on a Raspberry Pi.
Want to go straight to the good stuff? Check out the project on github!
Firstly, the required hardware:
- Raspberry Pi (we’re going to be pretty much maxing out our Pi’s processing power so either get a kit with a fan included or get one separately)
- Raspberry Pi Camera Module
- A camera stand (this will help with camera positioning)
Once you’ve got your Raspberry Pi set up, you’ll then need to get all the necessary tools installed. There are 2 main libraries we will be utilising:
Dlib is a C++ toolkit used for a variety of machine learning solutions deep learning facial recognition. It offers face detection and face identification capabilities as well as facial feature identification.
The face_recognition library is an awesome wrapper around Dlib which allows us to write python code rather than C++!
OpenCV is a popular open-source computer vision library. Combined with Python, it is incredibly quick to put together computer vision solutions and offers a wide array of capabilities including imaging processing and object detection.
To get them both installed, you can follow these handy tutorials:
In order to be able to identify a face within an image, we first need to find the face. The face_recognition module offers face detection capabilities, however, it struggles to find faces when not in the near field of vision, not great for what we wanted to achieve, good for photos close up photos though.
OpenCV has multiple options for face detection, one of which is a Haar Cascade Classifier. A whole library of these cascades exist; they are essentially out-of-the-box ready object detection models (though there are also Cascades which can be trained). The one of particular interest to us was the Frontal Face Cascade (haarcascade_frontalface_alt.xml).
Although it can detect faces that are further away than the face_recognition module could, it can only reliably detect faces that were looking straight ahead which also doesn’t cut the mustard for us.
If a person’s head was even slightly tilted it really struggles to identify it as a face and, unfortunately, people tend to move their heads. The method that we find produces the best performance in this space and for our context is Deep Learning-based Face Detector in OpenCV. It is able to identify faces at a reasonable distance and from multiple angles.
We need two files in order to load the deep learning model: the .prototxt file and the .caffemodel file. The first lays out the blueprints of the model and the second provides the context (in our case: face detection).
detection_model = cv2.dnn.readNetFromCaffe("./caffe/deploy.prototxt","./caffe/res10_300x300_ssd_iter_140000.caffemodel")
Now we can start processing images. The model only accepts blobs with the correct dimensions, in this case 300x300 pixels. First resize the images and convert them into blobs.
def normalize_frame(frame): h, w = frame.shape[:2] resized_frame = cv2.resize(frame, (300, 300)) blob = cv2.dnn.blobFromImage(resized_frame, 1.0, (300, 300), (104.0, 177.0, 123.0)) return resized_frame, blob, h, w
Next we give the blob to the model and output the location of the face within the blob.
def detect_faces_in_frame(detection_model, blob): detection_model.setInput(blob) detections = detection_model.forward() face_locations =  for i in range(0, detections.shape): confidence = detections[0, 0, i, 2] if confidence > 0.8: face_locations.append(detections[0, 0, i, 3:7]) return np.array(face_locations)
In the first two lines we are submitting the image to the model. We then iterate through the results and compare the model’s confidence to our confidence threshold of 0.8 to determine if each one is a face or not. If it is, then we add its location to our collection.
The indices 3:7 describe a box drawn around the face. More precisely the position of the top, right, bottom and left of the box (in that order).
Once we have the coordinate locations of all the faces in our image, we can label and identify them.
In order to use the face_recognition module to encode the faces, we need to convert our resized image to rgb.
def convert_to_rgb(frame): rgb = frame[:, :, ::-1] return rgb def encode_faces_from_locations(frame, locations): encodings = face_recognition.face_encodings(frame, locations) return encodings
The face_encodings function translates the face into an 128-dimensional embedding an array which represents its defining features as a face. The algorithm has been trained on the best way to represent faces accurately for comparison.
Now we need something to compare our faces to. We organised images into folders with the person's name and created a script to iterate through each folder and encode each image. The encoding was then stored against the person. A person could be identified relatively reliably using only one image, however, we decided to write it in such a way that more images could be added to give more accurate and reliable results.
import cv2 import numpy as np import os # load detection model from disk prototxt_path = "./caffe/deploy.prototxt" model_path = "./caffe/res10_300x300_ssd_iter_140000.caffemodel" detection_model = cv2.dnn.readNetFromCaffe(prototxt_path, model_path) employee_encodings =  employee_directory_path = "./employee_directory" # Get employee folders employee_folders = os.listdir(employee_directory_path) for employee_folder in employee_folders: employee_folder_path = os.path.join(employee_directory_path, employee_folder) # Get all images of employee images = os.listdir(employee_folder_path) for image_name in images: image_path = os.path.join(employee_folder_path, image_name) image = cv2.imread(image_path) resized_frame, blob, h, w = normalize_frame(image) # Find faces in the image and check only 1 exists face_location = detect_faces_in_frame(detection_model, blob) if face_location.shape != 1: raise ValueError("Multiple/no faces identified in training image", image_path) # Encode the face left, top, right, bottom = face_location rgb_frame = convert_to_rgb(resized_frame) face_encoding = encode_faces_from_locations(rgb_frame, [(int(top * 300), int(right * 300), int(bottom * 300), int(left * 300))]) # Associate the encoding with the employee using the employee name string_encoding = np.array([str(x) for x in face_encoding]) row = np.insert(string_encoding, 0, employee_folder) employee_encodings.append(row) # Save the encodings np.savetxt("employee_encodings.csv", employee_encodings, '%s', ',')
Note that when we submit the faces for encoding, we alter both the ordering and the dimensions of the face location. OpenCV represents the position of the face in the image on a scale of 0 - 1 while the face_recognition module needs the actual pixel position. Before we can encode the image we need to translate the OpenCV location into the face_recognition location, watch out, they also use a different orderings.
All we need to do now is load our stored encodings and compare our new face’s encoding against them.
def identify_face_from_encoding(encoding_to_compare, threshold, stored_employee_encodings): encodings = stored_employee_encodings[:, 1:].astype(float) face_distances = face_recognition.face_distance(encodings, encoding_to_compare) # Create an array containing all the employee images and their corresponding distance encoding_distances = np.c_[ stored_employee_encodings[:, 0], face_distances ] # Get all the possible employee matches employees = np.unique(encoding_distances[:, 0]) # Calculate the mean distance for each employee mean_distances = [np.mean(encoding_distances[np.where(encoding_distances[:, 0] == employee)][:, 1].astype(float)) for employee in employees] employees_avg = np.c_[ employees, mean_distances ] # Get the minimum average distance minimum = employees_avg[np.argmin(employees_avg[:, 1])] # If the minimum distance is within the desired threshold then return the identified employee if float(minimum) <= threshold: return True, minimum, minimum return False, 'Unknown', minimum
Under the hood, face_distance calculates the euclidean distance (the difference) between our encodings and all our comparison encodings. Then we find the smallest mean of the distances over all the faces of a person and compare it against our given threshold to determine if the faces are similar enough to actually be classed as the same person.
Visualising the Results
Now we have set out the groundwork, we can start recognising people in a live video. The full code can be viewed here but for now we will just look at the important bits.
video_capture = VideoStream(src=0).start() time.sleep(1.0)
Start up the video stream and give it time to get going before we start grabbing frames.
while True: # Grab a single frame of video frame = video_capture.read() resized_frame, blob, h, w = normalize_frame(frame) if count == 3: face_locations = detect_faces_in_frame(detection_model, blob) rgb_frame = convert_to_rgb(resized_frame) face_encodings = encode_faces_from_locations(rgb_frame, [(int(top * 300), int(right * 300), int(bottom * 300), int(left * 300)) for (left, top, right, bottom) in face_locations]) face_names = [identify_face_from_encoding(encoding, threshold, stored_employee_encodings) for encoding in face_encodings] count = 0 count+= 1
As in the training script from earlier, we are finding faces in the image, this time a frame from the video stream, and encoding them. Then we submit each encoding and our stored comparison encodings to our identification function.
We are only running our detection every 3 frames to help the video run smoother, if you were running the script on a machine with more processing power then it wouldn’t be necessary to do this. We just wanted to be kind to the Raspberry Pi!
for (left, top, right, bottom), (recognised, name, certainty) in zip(face_locations, face_names): top = int(top * h) bottom = int(bottom * h) left = int(left * w) right = int(right * w) # If we know who it is, draw in green, else draw in red colour = (0, 255, 0) if not recognised: colour = (0, 0, 255) cv2.rectangle(frame, (left, top), (right, bottom), colour, 2) cv2.rectangle(frame, (left, bottom), (right, bottom - 30), colour, cv2.FILLED) font = cv2.FONT_HERSHEY_SIMPLEX cv2.putText(frame, name, (left + 6, bottom - 6), font, (right - left) / 250, (255, 255, 255), 1) cv2.imshow('Video', frame)
Now we have a handy box highlighting each face and a label identifying who they are (if we know)!
Once you can identify faces, it’s totally up to you what you do. We kept it simple for this example and opted to just draw a box and the name of the person identified. You could easily add a “stranger danger” siren when an unrecognised face is detected.
There are a vast number of other cool and interesting things that people have done using face_recognition that you can add to your solution such as adding makeup to detected faces. There are also of course many serious applications of facial recognition all of which follow the same process as we have for this tutorial.
Just think what else computer vision could do, not just limited to faces and people. How could this be applied to your organisation?
If you’re looking for a technical partner to help your company design and implement cutting edge solutions then please do get in touch. Otherwise, if you are interested in creating solutions to complex tech problems then check out our current job openings
Also check out our slightly longer demo below: