Face Recognition System Design Considerations

Building a face recognition algorithm involves a whole string of design decisions, each of which will impact how well the system can identify faces.

Here is a partial list of these decisions (*):

– Choice of faces used to train the detection algo. Note that both a positive and negative set is needed. In other words you need to show it faces it should recognize, and faces (or other images) it should reject.
– Image preprocessing decisions (size, greyscale, normalization).
– Face alignment corrections.
– Number of eigen vectors to use. In other words, the number of dimensions of your face space.
– Number of low order eigen vectors to exclude. The idea being that low order vectors encode irrelevant (for face recognition) information like lighting.
– Similarity measures. Aka nearest neighbor measures. (Euclidean vs Mahalanobis etc)
– Choice of PCA methodology (Eigenfaces vs Fisherfaces).

After laboriously doing numerous comparisons across multiple training sets on my own, I discovered this paper published in 2001 where some researchers investigated these issues in a very systematic fashion.
Computational and Performance Aspects of PCA-based Face Recognition Algorithms, by H. Moon and P. J. Phillips, Perception, 2001, Vol 30, pg 303-321 and (NISTIR 6486)
http://www.nist.gov/customcf/get_pdf.cfm?pub_id=151458

I was reassured to see that their conclusions were close to my own (which I’ll write up later). It is worth reading the entire paper, but some of main conclusions I took away are:

– Image Normalization helps, but the particular implementation is not critical.
– Performance increases until approximately 200 eigenvectors, then decreases slightly. However much of the performance gain can be captured with 100 eigenvectors.
– Removing the first low-order eigen vectors is best, removing the 2nd helps slightly.
– The similarity measure has a huge effect. Using an enhanced Mahalanobis classifier gives the best result.

(*) note that I’m following the framework implemented in the OpenCV software, namely using Haar classifiers for detection and some form of PCA for recognition.