RECOGNITION OF UZBEK HANDWRITTEN TEXTS BY SELECTION OF ACTIVATION FUNCTIONS FOR CONVOLUTIONAL NEURAL NETWORK
1Iskandarova Sayyora Nurmamatovna, 2Kuchkarov Zoir Turdiboy ogli
1Associate Professor of Tashkent University of Information Technologies named after
Muhammad al-Khorazmi, Ph.D. 2Graduate student of Samarkand branch of Tashkent University of Information Technologies
named after Muhammad al-Khorazmi https://doi.org/10.5281/zenodo.7603595
Abstract. Activation functions are important in each neural network. This article highlights the advantages and disadvantages of activation functions in convolutional neural network recognition of handwritten text images. A comparative analysis is presented based on the remaining neural network models through selected activation functions. 95 percent recognition efficiency was achieved through the organized convolutional neural network.
Keywords: convolutional neural network, recognition, activation functions, Hopfield, Hamming.
Introduction
In handwritten text image recognition, letter recognition results have been achieved well in almost all neural network models. In letter recognition, Hopfield, Hamming neural network models, and convolutional neural network models show results above 90-95 percent. Nevertheless, each model has its own disadvantages in some cases. A high percentage of errors in some similar letters was observed in the Hopfield neural network. The slight similarity of o and a etc the similarities of the letters, lead to errors in recognition. In the Hamming model, an increase in errors was observed as the number of characters increased [1].
Main part
Using a multi-layer neural network perceptron, the shapes of letter cuts in the Latin graphic script were formed. This dataset contains 300 forms of 29 letters in the Latin script. Their model was created and trained.
Table 1.
Letter recognition models
Model type Result of letter recognition Result of recognition in words and lines
Perceptron 90 61
Hemming 95 85
Usually, each recognition software uses a certain pattern or format to recognize handwritten text images. Therefore, it creates a number of problems in the creation of a general recognition system.
Convolutional neural networks provide partial stability against scale changes, shifts, rotations, perspective shifts, and other perturbations. Convolutional Neural Networks combine three architectural ideas to ensure invariance to changes in scale, rotation, displacement, and spatial distortion:
- local receptor fields (provide two-dimensional local connections of neurons);
- general synaptic coefficients (ensures the detection of certain signs in any part of the image and reduces the total number of weight coefficients);
- a hierarchical structure with spatial sub-selections.
Currently, the Convolutional Neural Network and its modifications are the best object detection algorithms in terms of accuracy and speed. Since 2012, neural networks have been taking first place in the famous ImageNet international competition for pattern recognition.
It is for this reason that the Convolutional Neural Network, based on the principles of neocognitron and supplemented with training on the inverse propagation of error algorithm, was used.
Usually the following set of data allows you to identify only these letter forms. Due to the wealth of different letter forms in the recognition of handwritten text images, several data sets have been formed for one language. [2].
Errors occurred due to incorrect segmentation when performing recognition based on the trained model. Because in line segmentation, word segmentation, and letter segmentation, wrong pruning based on the trained model will create wrong recognition results. We observed that with the separation of the letters in the given word in Latin graphics, its meaning changes radically. At the same time, a number of complexities are brought to the wrong result.
One of the steps in the development of a neural network is the selection of the activation function of the neurons. The appearance of the activation function largely determines the functional capabilities of the neural network and the method of training this network. The classical backpropagation algorithm works well for two-layer and three-layer neural networks, but starts to run into problems when the depth increases. One of the reasons is the phenomenon called gradient descent.
As the error propagates from the output layer to the input layer, at each layer the current result is multiplied by the derivative of the activation function. In the traditional sigmoidal activation function, the derivative is less than one in the entire detection area, so the error approaches zero after several layers. Conversely, if the activation function has an unbounded derivative (such as a hyperbolic tangent), the error may grow exponentially as it propagates, leading to instability of the training procedure. [3-4].
In this work, hyperbolic tangent is used as activation function in hidden and output layers, ReLU in convolutional layer.
ReLU (rectified linear unit) Giperbolik tangens
Picture 1. Picture 2
ReLU graph Hyperbolic tangent graph
1.0 0.5 / ■ 1
-10 -5 4 / -1.0 5 10
f (5) = max(0,5)
1, 5 > 0 rand(0.01,0.05), 5 < 0 Advantages
f ( * ) =
e
2 s
f '( S )
e2 S +1
f ' (s) = 1 - f (s)2 Advantages
• simplicity of calculating the derivative by the value of its function
• the range of values is -1 to 1
Disadvantages
• the gradient does not increase or decrease
• Resource intensive compared to ReLU
• resource intensive devoid of actions
• removes unnecessary parts
• the gradient does not increase / does not decrease
• teaching is accelerated Disadvantages
• not always reliable, it can be "lost" during the teaching process
• very dependent on the installation of vzn
Activation function in hyperbolic tangent form
In this work, the hyperbolic tangent is used as the activation function for the hidden and output layers. This is determined by the following reasons:
- symmetric activation functions of the hyperbolic tangent type provide faster approximation compared to the standard logistic function;
- the function has a continuous first derivative;
- the function has a simple derivative that can be calculated by its values, which allows to save calculations [5-9].
The graph of the hyperbolic tangent function is shown in Picture 3 below:
Pictre 3.
The graph of the hyperbolic tangent function
1.0 0.5 (
-10 -5 7 -~ 1.0 5 10
ReLU activation function
It is known that neural networks have the ability to approximate any complex function when they have a sufficient number of layers and the activation function is nonlinear. Sigmoidal or tangential activation functions are considered nonlinear, but they cause problems with vanishing or increasing gradients. However, a much simpler option is the rectified linear unit (ReLU) activation function, which is expressed by the following formula: f (5) = max(0,5)
The graph of the ReLU function corresponds to Picture 4 below:
Picture 4. Graph of the ReLU
Advantages of using the ReLU function:
- its derivative is equal to one or zero, and therefore gradients cannot exceed or vanish, that is, multiplying unity by the delta of the error, we get the delta error, if we had used another function, for example, the hyperbolic tangent, the error delta can grow or stay the same, meaning that the hyperbolic tangent derivative returns a number with different sign and magnitude, which can strongly affect whether the gradient fades or grows. In addition, using this function causes the weights to be sparse;
- requires resource-intensive operations such as scaling sigmoids and hyperbolic tangents, and ReLU can be implemented using a simple thresholding transformation of the activation matrix at zero;
- removes unnecessary detail in the channel in the negative output.
Among the disadvantages, it can be noted that ReLU is not always reliable and may crash ("die") during training. For example, a large gradient across ReLU may cause the weights to be updated in such a way that the neuron never fires again. If this happens, the gradient through this neuron will always be zero from then on. Accordingly, this neuron is irreversibly disabled. Misol uchun, o'ta yuqori o'rganish stavkalarida ReLUlarning 40% gacha "o'lik" bo'lishi mumkin (ya'ni hech qachon faollashtirilmaydi). Bu muammoni o'qitishning tegishli tezligini tanlash orqali hal qilish mumkin [9-12].
Picture 5.
Letter teaching error graph graph of the loss function
0,45
0,05 -
0.......................................
1234567891011121314151617181920212223242526272829303132333435363738
period
Based on the trained model, the following analytical results were obtained.
Model type Result of letter recognition Result of recognition in words and lines
Perceptron 90 61
Hemming 95 85
Convolutional 95 95
neural network
Conclusion
The convolutional neural network model has a much higher recognition percentage, and the result is up to 95% accuracy in handwritten letters. Despite this, many problems were encountered in recognizing words and series of texts. By selecting activation functions for convolutional neural networks, we have achieved a recognition rate of 95%. Algorithms and software for recognizing handwritten text images have been created. Comparative analysis of the recognition of handwritten text images was carried out on the basis of the program. 95% recognition results were achieved in the process of recognizing handwritten texts when a sufficient d a
t REFERENCES
1. Singh, Y.P. Yadav, V.S. Gupta, A. & Khare A. Bi Directional Associative Memory Neural Network Method in the Character Recognition. Journal of Theoretical and Applied
s Information Technology. 5(4) - 2009, pp.382-386.
2. Johansen, editor. Proceedings of the Copenhagen Workshop on Gaussian Scale-Space t Theory, May 10-13 1996. Technical Rapport 96/19 ISSN 01078283.
3. Wshah, S., Shi, Z., Govindaraju, V.: Segmentation of Arabic handwriting based on both w contour and skeleton segmentation. In: International Conference Document Analysis and a Recognition, 2009, pp. 793-797
4. N. Nilsson, Principles of Artificial Intelligence, ser. Symbolic Computation / Artificial Intelligence. Springer, 1982.
5. M. Bulacu, R. van Koert, L. Schomaker, and T. van Der Zant, "Layout analysis of handwritten r historical documents for searching the archive of the cabinet of the dutch queen," in e Document Analysis and Recognition, 2007. ICDAR '07. 9th International Conference on, a vol. 1, 2007, pp. 357-361.
6. M. Arivazhagan, H. Srinivasan, and S. Srihari, "A statistical approach to line segmentation in e handwritten documents. center of excellence for document analysis and recognition," in d proceedings of Document Recognition and Retrieval XIV, SPIE, Tech. Rep., 2007
7. R. Chamchong and C. C. Fung, "Text line extraction using adaptive partial projection for palm leaf manuscripts from thailand," in Frontiers in Handwriting Recognition, 2012. ICFHR '12, 14th International Conference on, 2012, pp. 588-593
8. A. Garz, A. Fischer, R. Sablatnig, and H. Bunke, "Binarization-free text line segmentation for historical documents based on interest point clustering," in Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on, 2012, pp. 95-99.
9. G. Louloudis, B. Gatos, I. Pratikakis, and C. Halatsis, "Text line and word segmentation of handwritten documents," Pattern Recognition, 2009, vol. 42, no. 12, pp. 3169-3183.
10. O. Surinta, L. Schomaker, and M. Wiering, "A comparison of feature extraction and pixel-based methods for recognizing handwritten bangla digits," in Document Analysis and Recognition, 2013. ICDAR '13. 12th International Conference on, 2013, pp. 165-169.
11. J. Sauvola and M. Pietikainen, "Adaptive document image binarization," " Pattern Recognition, 2000, vol. 33, no. 2, pp. 225-236.
12. T. R. Singh, S. Roy, O. I. Singh, T. Sinam, and K. M. Singh, "A new local adaptive thresholding technique in binarization," International Journal of Computer Science Issues, 2011, vol. 8. pp 273-282.
13. Kudratov G., Eshmuradov D., Yadgarova M. GENERAL ISSUES OF PROTECTION OF
THE BACKLINE COMPUTER NETWORKS //Science and Innovation. - 2022. - T. 1. - №. 8. - C. 684-688.