Научная статья на тему 'CREATING CONVOLUTIONAL NEURAL NETWORK FOR HANDWRITTEN DIGIT RECOGNITION'

CREATING CONVOLUTIONAL NEURAL NETWORK FOR HANDWRITTEN DIGIT RECOGNITION Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
0
0
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
IT / artificial intelligence / augmentation / neural networ ks

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Druzik Aliaksei Nikolaevich

The purpose of this article is to show the creation of a simple convolutional neural network for recognizing handwritten numbers. The main tools in this article will be Tensorflow, Keras, and the MNIST dataset.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «CREATING CONVOLUTIONAL NEURAL NETWORK FOR HANDWRITTEN DIGIT RECOGNITION»

«

I

SCIENCE TIME

I

CREATING CONVOLUTIONAL NEURAL NETWORK FOR HANDWRITTEN DIGIT RECOGNITION

Druzik Aliaksei Nikolaevich, Scientific supervisor: Gulyakina Natalia Anatolyevna, Belarusian State University of Informatics and Radioelectronics, Minsk, Belarus

E-mail: alexeydruzik@gmail.com

Abstract. The purpose of this article is to show the creation of a simple convolutional neural network for recognizing handwritten numbers. The main tools in this article will be Tensorflow, Keras, and the MNIST dataset.

Key words: IT, artificial intelligence, augmentation, neural networks.

The simplest version of creating a neural network for recognizing handwritten numbers from 0 to 9, considered in this article, uses only two layers - the input and output layers. Since the size of the source image is 28x28 pixels, the size of the input layer is 28x28=784 neurons [1]. Each of the neurons is connected to one of the pixels in the image. The output is a layer with 10 neurons, one per digit. The scheme of this convolutional neural network is presented at the Fig 1.

Fig. 1 Architecture of the simpliest convolutional neural network

М№ЗТ - это один из классических датасетов на котором принято пробовать всевозможные подходы к классификации (и не только) изображений [2]. Набор содержит 60'000 (тренировачная часть) и 10'000 (тестовая часть) черно-белых изображений размера 28*28 пикселей рукописных цифр от 0 до 9. В

»

10

1 SCIENCE TIME 1

TensorFlow есть стандартный скрипт для скачивания и развёртывания этого датасета и соответственно подгрузки данных в тензоры, что весьма удобно.

Loading the MNIST data set with samples and splitting it by train and test dataset is displayed at the Fig 2.

(X^train, v.train), (x_test, y_test) = mnist.loacLdataO Fig. 2 Loading the MNIST data set with samples and splitting it

On the Fig.3 are displayed examples of random 20 elements from the MNIST dataset.

Fig. 3 Presenting first 20 elements of the MNIST dataset

On the Fig.4 is displayed transformation of values from 255 to 1 in order to following working with TensorFlow library.

X„train = tf.keras.utils.normalize(X_train, axis=l) X_test = tf .keras.utils.norma"Lize(X_test, axis=l) Fig. 4 Normalizing the data (making length = 1)

Next, let's look at the creating the convolutional neural network itself. Sequential - serves to describe a neural network model that has an input, hidden and output layer - Fig.5. Flatten is used to transform multidimensional array into flatten input dimension. In the first step, we create an input layer equal to 128 neurons [3]. The first layer has 128 "neurons" and uses a special math function called tf.nn.relu for activation. This layer helps the network learn patterns in the data. The second parameter is the activation function. As a rule, for convolutional neural networks, the function is used the rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. It has become the default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance.

1 SCIENCE TIME 1

In the second step, we create an output layer equal to the number of recognized handwritten digits, in our case it is 10. This layer helps the network make predictions. In this step is used softmax activation function: It is like a voting system for the 10 neurons in the second layer. It takes the numbers coming from these neurons and turns them into probabilities [4]. Imagine you have 10 numbers representing the network's confidence in different options. These numbers could be like this: [2.0, 3.0, 1.0, 0.1, 2.5, 1.8, 0.5, 1.2, 0.7, 2.2]. When we use softmax, it makes these numbers more understandable. It compresses them so they add up to 1. It's like saying, "How sure is the network about each option?" After applying tf.nn.softmax, the numbers might look like this: [0.193, 0.383, 0.043, 0.008, 0.232, 0.137, 0.015, 0.055, 0.025, 0.101]. Now, these new numbers tell as the probabilities. For example, the network is most confident (38.3%) about the second option (3.0), and not very confident (0.8%) about the fourth option (0.1).

model.add(tf.keras.layers.Dense(units=128, activation=tf.nn.relu)) model.add(tf.keras.layers.Dense(units=lfl, activation=tf.nn.softmax))

Fig. 5 Create a neural network model

On the Fig.6 is displayed process of compiling and training our neural network.

¡nodel, compile (optimizer^ 'adam', loss=' sparse_categorical_crossentropy', netrics=[' accuracy' ])

Fig. 6 Compiling and training our neural network

On the next step we need to evaluate the performance of a deep learning model on a test set of data using Keras [5]. The purpose of this statement is to measure how well the model generalizes to unseen data, and to compare the results with the training and validation sets. A good model should have a low loss and a high accuracy on all sets, and avoid overfitting or underfitting - Fig.7.

valjloss, val.acc = model.evaluate(X_test, y_test) model.save('handwritten_digits.model')

Fig. 7 Evaluating the model

Then we need to recognize handwritten digits from images with the help of created model. We read the files as an image using cv2 and extract the first channel (assuming it is a grayscale image) using [:,:,0]. It inverts the pixel values of the image using np.invert, so that the background is black and the digit is white. It prints a message saying "The number is probably a {}" where {} is replaced by the index of

1 SCIENCE TIME 1

the highest probability using np.argmax. For example, if the prediction is [0.1, 0.2, 0.3, 0.4, 0, 0, 0, 0, 0, 0], it will print "The number is probably a 3". Then we display the image using plt.imshow with a binary colormap and plt.show - Fig.8, Fig.9.

image_number = 1

while os.path,isfile(f"digits/digit{imagejriumber>.png".format(iniage_number)): try:

img = cv2.imread(f"digits/digit-(image_nun]ber}.png".format(image_number))[:,0] img = np.invert(np.array([img])) prediction = model.predict(img)

printC'The number is probably a {}".format(np.argmax(prediction))) p"Lt.imshowfimg[o], cmap=plt.cm.binary) plt.sbowO image_number += I except:

printC'Error reading image! Proceeding with next image...") image.number l

Fig. 8 Load custom images and predict them

I ?

The number is probably a 7 Fig. 9 Displaying a random picture with 7 number and its pridiction

The final result with the predicted pictures of numbers lying in the recognition folder is shown at the Fig.10. In our case the number of recognized files is 5.

1/1 [==============================] - as 76ms/step

The number is probably a 7

l/l [==================:=====J - 0s I7ms/step

The number is probably a 2

1/1 [===«c=========================] - 0s 19ms/step

The number is probably a 9

1/1 [==============================] . 0s 16ms/step

The number is probably a 3

1/1 [==============================] - 0s 17ms/step

The number is probably a 5

Fig. 10 Displaying of all images predictions in a folder with 5 images from 0 to 9

»

13

1 SCIENCE TIME 1

References:

1. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions". -arXiv:1409.4842. - 2014.

2. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, "Rethinking the Inception Architecture for Computer Vision". - arXiv:1512.00567. - 2015.

3. Kumar M.S. Beginning with Deep Learning Using TensorFlow. - New Delhi: BPB. - 2022.

4. Long L. Beginning Deep Learning with TensorFlow: Work with Keras, MNIST Data Sets, and Advanced Neural Networks. - Los-Angeles: Apress. - 2022.

5. Ganegedara T. Natural Language Processing with TensorFlow: The definitive NLP book to implement the most sought-after machine learning models and tasks. -Los-Angeles: Packt. - 2022.

i Надоели баннеры? Вы всегда можете отключить рекламу.