DEVELOPMENT A SYSTEM FOR CLASSIFYING AND RECOGNIZING PERSON'S FACE
DOI: 10.31618/ESU.2413-9335.2020.4.73.677 1Boranbayev S.N., 2Amirtayev M.S.
1,2 L.N.Gumilyov Eurasian National University, Nur-Sultan, Kazakhstan
ABSTRACT
The purpose of this article is to summarize the knowledge gained in the development and implementation of a neural network for facial recognition. Neural networks are used to solve complex tasks that require analytical calculations similar to what the human brain does. Machine learning algorithms are the foundation of a neural network. As input, the algorithm receives an image with people's faces, then searches for faces in this image using HOG (Histogram of oriented gradients). The result is images with explicit face structures. To determine unique facial features, the Face landmark algorithm is used, which finds 68 special points on the face. These points can be used to center the eyes and mouth for more accurate encoding. To get an accurate "face map" consisting of 128 dimensions, you need to use image encoding. Using the obtained data, the convolutional neural network can determine people's faces using the SVM linear classifier algorithm.
Keywords: face recognition; method; algorithm; neural network; machine learning; deep learning; histogram; directed gradients; affine transformations.
1. Introduction
The interest in face recognition and recognition processes has always been significant, especially in connection with increasing practical needs: security systems, credit card verification, forensic analysis, teleconferences, etc. Despite the clarity of the everyday fact that a person is good at identifying people's faces, it is not obvious how to teach a computer to do this, including how to decode and store digital images of faces. Human authentication technology based on facial recognition, unlike technologies based on fingerprint scanning, iris or retina, does not require physical contact of the person with the device. Based on the rapid development of digital technology, human authentication based on the image of the face is the most acceptable for mass use. In turn, neural networks are one of the most common methods of facial recognition.
Neural networks are trained using various training examples. The essence of training is reduced to correcting the weights of inter-neural connections in the process of solving an optimization problem using the gradient descent method. During the study, key features are automatically identified, their significance is determined, and the relationship between them is built. It is assumed that, the trained NN will be able to apply the experience gained in the learning process to unknown images by generalizing abilities.
Various machine learning algorithms are used to create a neural network. After conducting an empirical study of the most popular algorithms, the most optimal ones were selected.
2. System of classification and recognition of human face
To solve these tasks, the following main steps can be identified:
2.1 Search for all faces in the photo: this is necessary in order to select the area of the image that is passed to further processing. A HOG (Histogram of Oriented Gradients) is used for this purpose[1].
According to this algorithm, the image is converted into black and white format, because color data is not needed to search for faces. The loop looks at each pixel and its neighboring pixels in order to find out how dark the current pixel is compared to the surrounding ones. Then an arrow is added to indicate which direction the image is getting darker. After performing this procedure for each individual pixel in the image, each pixel is replaced with an arrow. These arrows are called gradients, and they show the direction from light to dark pixels throughout the image. If you use dark and light images of the same person, the pixels will have different brightness values, but when you consider the direction of brightness changes, you get the same image regardless of the brightness of the original image. To save resources and get rid of redundant information, the image is divided into blocks of 16x16 pixels. Then replace this square in the image with arrows pointing in the same direction as the majority[2].
Eventually, the original image is transformed so that the basic structure of the face is clearly visible. To find a face in a HOG image, you need to find the part of the image that most closely resembles the well-known HOG drawing obtained from a variety of other faces during training. An example of a fragment that is highlighted in an image is shown in Figure 1.
HOG version of our image
HOG face pattern generated from lots of face images
S I
•+-//// V \ \ I X * X / x
чч /**s \ Я z'** V /V ^ / Л / К \ \ s //\\sx f - - X / / Ar \ X /++ * \\\ / / / * 4 + \ / / X
\\\
Figure 1 - Search for a face in a photo [3]
2.2 Face position: to make it easier for the computer to work with faces turned, you must convert each image so that the eyes and lips are always in a specific place. To solve this problem, we use the face landmark estimation algorithm [4]. The basic idea is that there are 68 special points (called landmarks) that
exist on each face - the upper part of the chin, the outer point of each eye, the inner point of each eyebrow, and so on. Then a machine learning algorithm is trained to find these 68 special points on any face. Figure 2 and 3 show the selected face landmarks.
Figure 2 - 68 face landmarks [3]
Figure 3 - The identification of the landmarks of the face [5]
When the eyes and mouth are defined in an image, you can rotate, zoom, and shift it so that the eyes and mouth are centered as best as possible. For centering, only basic image transformations are used, such as rotation and scaling, which preserve parallel linesaffine transformations.
This method of image transformation is based on the following principle: for each point in the final image, a fixed set of points in the source image is taken and interpolated according to their relative positions. The affine transformation is the most General one-to-one mapping of a plane to a plane, which preserves straight lines and the ratio of lengths of segments lying on a single line. After this transformation, the face is centered in approximately the same position in the image, which makes the next step more accurate.
2.3 Face encoding: the face image obtained after the first steps must be" encoded " in a unique way, since comparing the resulting image with all the previous
ones is an irrational approach when there is a large database of images. So we need a way to take a few basic measurements from each face, which we can compare with the closest known measurements and find the most similar face. In 1960, Woodrow Bledsoe proposed an algorithm that identifies obvious facial features. However, these measurements for the human brain (eye color, nose size, and so on) do not really make sense for a computer that looks at individual pixels in an image. Researchers have shown that the most accurate approach is to let the computer measure what it needs. Deep learning, determines which parts of the face to measure, better than people[6].
To solve this problem, a convolutional deep learning neural network is created that generates 128 dimensions (face map) for each person (Figure 4). The idea of converting images into a list of computergenerated numbers is extremely important for machine learning (especially for automated translation).
Figure 4 - Measurement results for the test image [7]
The algorithm of image analysis: 3 images are analyzed to train the network:
1. Training image of a famous person's face
2. Another photo of the same famous person
3. An image of a completely different person The algorithm then looks at the measurements it
makes for each of these three images. Then it tweaks
the neural network a little to make sure that the dimensions created for images #1 and #2 are more similar, and the dimensions for #2 and #3 are less similar. A schematic representation of the algorithm is shown in Figure 5.
Tweak neural net slightly so that the measurements for the two Will Farrell pictures are closer and the Chad Smith measurements are further away
Figure 5 - Demonstration of the algorithm [7]
After repeating this step m times for n images of different people, the neural network is able to reliably create 128 characteristics for each person. Any 10-15 different images of the same person may well give the same characteristics.
The algorithm allows you to search the database for an image that has characteristics that are closest to the characteristics of the image you are looking for[8].
By training a network with different images of the same person, the algorithm creates approximately the same dimensions. The resulting 128 dimensions are called a face map. The idea of converting an array of raw data, such as an image, into a list of computergenerated numbers is extremely important for machine learning (especially for automated translation).
2.4 Search name by face map: the SVM linear classifier algorithm (Support Vector Method) is used to search for a person's name on a face map. This algorithm is based on the concept of the solution plane. The plane divides the received data with different classes. One of the main advantages of SVM is the high speed of learning the algorithm, and, consequently, the
ability to use a fairly large amount of source data for training. Using this method solved the problem of binary classification, in this example determined the identity photographs to the existing sample. To do this, you need to train a classifier that takes measurements from the image being checked and shows who this person is most similar to from the entire training sample of the system.
Other classifiers include the RVM - Relevance Vector Machine method. In contrast to SVM, this method gives the probabilities with which the face image belongs to a given person. i.e., if SVM says "the image belongs to person 1", then RVM will say "the image belongs to person 1 with probability p and class person 2 with probability 1-p" [9]. In addition, the reference vector method is unstable with respect to noise in the source data. If the training sample contains noise emissions, they are significantly taken into account when constructing the dividing hyperplane. The method of relevant vectors does not have this disadvantage.
Despite the existence of various algorithms, it is possible to distinguish a General scheme of the face recognition process as shown in Figure 6.
Figure 6 - Diagram of the face recognition process
Before you start the face recognition process, you must complete the following steps:
- localize faces in an image
- to align the image of the face (geometrical and luminance)
- detect signs
- face recognition itself is a comparison of the found features with the standards previously included in the database.
3. Convolutional neural network for solving the recognition problem
The pre-trained model is a convolutional neural network (CNN), a type of neural network that builds state-of-the-art models for computer vision. The CNN which is used in this article itself was trained by "Davis King" on a dataset approximately of 1.2 million images and evaluated on the ImageNet classification dataset that consist of 1000 classes. Architecture of this pre-trained model is based on ResNet-34, with 34 layers(Figure 7).
Figure 7 - Architecture of ResNet [10]
The convolutional layers of ResNet(Figure 7 - the third column) is based on the plain network(Figure 8 -the second column). But the difference with Plain Network is shortcut connections, which turns the
network into a residual version of it. Shortcut connections skip one or more layers and perform ID mapping. Their outputs are added to the outputs of the layers being laid (Figure 8).
relu
Figure 8 - shortcut connections [10]
Here is simple way of using ResNet-34 for face recognition:
•Firstly, make sure that you have installed Python, OpenFace, dlib.
•Then, execute this command: pip3 install face_recognition
•After that you have to create 2 folders, "pictures_of_people_i_know" and
"unknown_pictures". First folder should contain people that you already know, second one should contain people that you want to identify.
•The next step is running the command face_recognition that passes through both folders, and tells you who is in each image:
face_recognition ./pictures_of_people_i_know/ ./unknown_pictures/ [11]
The results will look like this: /unknown_pictures/obama.jpg,Barack_Obam a.jpg - "obama.jpg" and "Barack_Obamajpg" are the same person
/face_recognition_test/unknown_pictures/unk nown.jpg,unknown_person - did not find any person [11]
4. Analysis of the results obtained
The software system uses the HOG method to solve the problem of detecting faces in images, and convolutional neural networks are used for the recognition problem.
The following methods were used for conducting the work:
- Computer: MacBook Pro 13 2015 laptop, Intel Core i5-5257U [email protected], 8 Gb RAM; Intel Iris Graphics 6100 1536 MB graphics card
-Operating system: Mac OS 10.13.6, 64-bit
-Python 3.7.6 programming language; libraries for python-dev development; NumPy library for scientific computing; dlib library, OpenFace[12]; Git-distributed version control system.
Model Training Validation Loss Validation Time
accuracy accuracy Loss
ResNet 18 0.83 0.87 1.0436 1.006 2701s
ResNet34 0.8651 0.8519 1.5983 1.7510 4800s
ResNet50 0.8662 0.8095 4.3967 4.5914 5580s
ResNet 101 0.8594 0.7884 8.4274 8.7475 6112s
ResNet 152 0.8798 0.8836 11.943 12.05 9248s
Table 1 - Comparison of ResNet models [13]
To evaluate this model, we need to consider other ResNet models with different layers. The ResNet CNN model could contain 152 layers. In this research, TensorFlow ResNet Library is used for its simple deployment and for obtaining accuracy. An ultimate training accuracy of 0.8651 and validation of 0.8519 was discovered in the model. Experimentally
determined ultimate measures of loss was 1.5983 whereas validation loss was 1.006. Table 1 provides a summary results of the analyzed ResNet models. Next, Figure 9 and Figure 10 summarize time accuracy and compare training and testing accuracy of the analyzed models.
Figure 9 - Training and testing accuracy [13]
Figure 10 - Execution time of models [13]
Figures 11-15 show screenshots, demonstrate the developed system.
code that
demo.py - demo - [-/Downloads/demo]
D» demo > ^ demo.py •
у IP project
demo - ► #".
demo.py »
G» demo /Downloads/demo
▼ Ы .idea J
» misc xml
Terminal L
bash: face_recognition: command not found
MacBook-Pro-Mukhamedjan:demo muklia$ Import face_recognition bash: import: command not found MacBook-Pro-Mukhamedjan:demo muklia$ pipenv shell Launching subshell in virtual environment-
bash-3.2$ . /users/muklia/.virtualenvs/deino—Vgc2jPij/bifi/acti¥ate (demo) bash-3.2$ face_recognition ./known ./unknown , /unknown/donald-trump-010220.jpg,donald-t rump . /unknown /Mark_Wafil.be rg_2017. j pg,unknown_person . /unknowfi/donald-trump-gt. jpg, donald-t rump ./unknown/stevejobsl-w494.jpg,steve-jobs . /unknowri/overview-Barack-Obama. jpg, ba raek-obama ./unknown/keanu-reeves-9454211-1-402.jpg,unknown_person ./unknown/keanu-reeves-9454211-1-402. jpg,unknown_person ./unknown/keanu-reeves-9454211-1-402.jpg,unknown_person ./unknown/keanu-reeves-9454211-l-402.jpg,unknown_person ./unknown/keanu-reeves-9454211-1-402.jpg,unknown_person ./unknown/keanu-reeves-9454211-1-402.jpg,unknown_person •/unknawn/keanu-reeves-9454211-1-402.jpg,unknown_person ./unknown/keanu-reeves-9454211-1-402.jpg„unknown_person ./unknown/keanu-reeves-9454211-1-402.jpg,unknown_person . /unknown/BiU-Gates-20n. jpg, bill-gates ./unknown/SteweJobsBook.j pg,steve-jobs . /unknown/biU-gates ■ jpg, bill-gates
./unknown/250px-Michael_Jordan_in_2014.jpg,unknown_person ,/unknown/6el5fcbe36b6edbl83e73f286f40bc59.jpg.barack-obama ./unknown/biU-gates-lookalike. jpg, bill-gates , /unknctwfi/michael-j ordan.jpg, unknown_person (demo) bash-3.2$ |
►
D PEPSrbl
5 Ê: TOOO 15] Terminal
re at end of file
Hi Event Log 1:1 LF* UTF-3* Ъ
Figure 11 - Similarity list of faces
О demo
demo - [-/Downloads/demo]
j ff1 Project -г О ф 1-
r» demo -/Downloads/demo
' -idea
1 a misc.xml
3 modules.xml
n workspace,xml
known
, = barack-obama.ioa
Terminal »• J.
Launching subshell in virtual environment-
bash-3.2$ . /Users/muklia/.virtualenvs/demo—Vgc2jPG/bin/activate (demo) bash-3-2$ face_recognition —show-distance true ./known ./unknown ./unkncwn/donald-trump-01022i.jpg,donald-trump,0.4638884367605418 ./unknown/Hark_WahIberg_2017.jpg,unknown_person,None ./unknown/donaId-trump-gt.j pg,donaId-trump,0.4574101751196349 ./unknown/stevejobsl-w494.jpg,steve-jobs,0,42558509709994863 ./unknown/averview-Barack-Obama.jpg,barack-obama,0.3156328517587405 . /unknown/keanu-reeves-9454211-1-402.j pg,unknown_person,None ./u nk n own/keanu-reeves-9454211-1-402.j pg,u n kn own_pe rson,None ./unknown/keanu-reeves-9454211-1-402.jpg,unknown_person,None . /unkftcwn/keanu-reeves-9454211-1-402. j pg,unknovm_pe rson. None ./unknown/keanu-reeves-9454211-1-402.jpg,unknownperson,None ./u nknown/kean u-reeves-9454211-1-402.j pg,u n known_pe rson,None ./unkotwn/keanu-reeves-9454211-1-402. jpg, unkriowo_person,None ./unknown/kean u-reeves-9454211-1-402.j pg,unknown_person,None ./unknown/keanu-reeves-9454211-1-402.j pg,unknown_person,None , /unknovflVBiU-Gates-2011 .jpg-bill-gates ,0.4397822742091361 ./unknown/SteveJobsBook.jpg,steve-jobs,0.4301B90276437051 ./unkoown/bill-gates.jpg,bill-gates,0.46144767142021553 ./unknown/250px-Michael_Jordan_in_2014.jpg,unknown_person,Hone ./unknown/6el5fcbe36b6edbl83e73f2B6f40bc59.jpg,barack-obama,0.3207y ./unkncwn/bill-gates-lookalike■j pg,blll-ga tes,0,5760797295932827 ./unknown/michael-jordan.jpg,unknown_person,None (demo) bash-3.2$
If the algorithm finds a similar other person, a lower tolerance value is necessary in order to make the comparison of faces more strict.
Q Platform and Plugin Updates: IntelliJ IDEA is ready to update, (10 m
Figure 12 - Calculated face distance
Figure 13 - The results of setting lower tolerance
After making face comparisons more strict by setting lower tolerance to 0.54, we got good accuracy, you can see it in Figure 13, there is no Bill Gates look alike person, but in Figure 12 that person shown. That
is because of default value of tolerance settled as 0.6 in Figure 12. By making this value lower, you will get more better results.
% __^^^J/ identity.py - demo - [-/Downloads/demo]
imo -< identify.py
» demopy x pullfaces.py x * identify.py x import ...
imageOfFirstPerson = face_recognition.load_image_file('/Users/muklia/Downloads/demo/known/Nurlan.jpeg') First_Person_face_encoding = face_recognition.face_locations(ImageOfFirstPerson)10]
imageOfSecondPerson = face_recognition.load_image_file('/Users/muklia/Downloads/demo/known/Zhahan.jpeg") Second_Person_face_encoding = face_recognition.face_locations(imageOfSecondPerson)[01
known_face_encodings = [...]
known_face_narnes = [...]
i
test_image = face_recognition.load_image_file{VUsers/muklia/Downloads/deaio/groups/NurlanAndZhahan.jpeg')
face_locations = face_recognition.face_locations(test_image) face_encodings = face_recognition.face_encodings(test_image, face_locations)
pil_image = Image.fromarray(test_image)
draw = ImageDraw.Draw(pil_image)
for(top, right, bottom, left), face_encoding in zip(face_locations, face_encodings): matches = face_recognition.compare_faces(face_encodings, face_encoding)
if True in matches:
first_match_index = matches.index(True) name = known_face_names[first_match_index]
draw.rectangle*((left, top), (right, bottom)), outline=(0,0,0)) text_width, text_height = draw.textsize(name) ,„ 4i draw.rectangle(((left, bottom - textjieight - 10), (right,bottom)), fill=(0,0,0), outline = (0,0,0))
| 42 draw.text((left + 6, bottom - textjieight - 5), name, fill = (255, 255, 255, 255))
S 43
" 44 del draw * 46 pil_image.show()
^ 4: Run J 6: TOOO t£ Terminal Q Platform and Plugin Updates: IntelliJ IDEA is ready to update, (today 00:32)
Figure 14 - Example oof facial recognition code for group photos
demo - ► ft
a a
Figure 15 - The result offacial recognition code in group photos
5. Conclusion
In the course of the research, a full-fledged convolutional neural network was implemented that can solve the task set for it. The proposed classification algorithm is simple to implement, which makes it possible to effectively recognize people's faces in images or in real-time video. The recognition of faces was implemented through certain algorithms mentioned above that were also observed to have effective results. The first algorithm, that is HOG, includes searching for faces in the given image. The second one, Face landmark, is used for finding 68 special points on the face. The next algorithm deals with constructing a "face map", consisting of 128 dimensions. Lastly, SVM linear classifier algorithm can help CNN to determine people's faces.
The current implementation of the facial recognition algorithms is a scalable solution, so the results can be used to build complex functional systems in any area where it may be necessary to solve this kind of problem.
The developed model can be used in photos or in real time applications or special solutions for embedded
systems.
References
1.Navneet Dalal, Bill Triggs, Histograms of Oriented Gradients for Human Detection [Electronic source] // URL: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf.
2."Extract HOG Features". [Electronic source] URL:
https://www.mathworks.com/help/vision/ref/extractho gfeatures.html
3.M. Yang, N. Ahuja and D. Kriegman. Face recognition using kernel eigenfaces. Image Processing: IEEE Transactions - 2000. - Vol.1. pp. 37- 40
4.V. Kazemi and S. Josephine. One millisecond face alignment with an ensemble of regression trees. In
CVPR, 2014. [Electronic source] // URL:
http://www.csc.kth.se/~vahidk/papers/KazemiCVPR1
4.pdf.
5.Facial recognition. [Electronic source] URL: https://ai.nal. vn/facial-reco gnition/
6.F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. CVPR, 2015. [Electronic source] // URL: https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A _089.pdf.
7.Neha Rudraraju, Kotoju Rajitha, K. Shirisha, Constructing Networked, Intelligent and Adaptable Buildings using Edge Computing [Electronic source] // URL:
http://ijrar.com/upload_issue/ijrar_issue_20542753.pd f
8.Dalal N., Triggs B. Histograms of Oriented Gradients for Human Detection // Proc. of the IEEE Conference Computer Vision and Pattern Recognition. 2005. pp. 886-893.
9.Christopher M. Bishop F.R.Eng. Pattern Recognition and Machine Learning. [Electronic source] // URL: https://goo.gl/WLqpHN.
10.Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition [Electronic source] // URL: https://arxiv.org/abs/1512.03385
11.Face recognition [Electronic source] // URL: https://github.com/ageitgey/face_recognition
12.Howse J. OpenCV Computer vision with Python. - Packt Publishing Ltd., UK. 2013.
13.Riaz Ullah Khan, Xiaosong Zhang, Rajesh Kumar. Analysis of ResNet and GoogleNet models for malware detection [Electronic source] // URL: https://www.researchgate.net/publication/327271897_ Analysis_of_ResNet_and_GoogleNet_models_for_ma lware_detection