Научная статья на тему 'Cifar10 image classification based on ResNet'

Cifar10 image classification based on ResNet Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
938
92
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
base / classification / fully-connected layers resnet.

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Nie Jinliang

We trained 4 different deep residual networks to classify 60 thousands low-resolution images in the Cifar10 dataset into 10 different classes. Then We compared these four different depth of residual net (ResNet20,ResNet56,Resnet110,ResNet164) and we found that among all networks we tried ,ResNet164 gave best performance ,achieved 94% accuracy on the test dataset. To reduce overfitting , in the fully-connected layers we employed a regularization method “drop out”, also we employed data augmentation, batch normalization and decayed learning rate to combat overfitting.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Cifar10 image classification based on ResNet»

кие РНС с несколькими слоями, а также попытаться добавить в модели регуляризацию.

Список литературы

1. https://www.kaggle.eom/c/quora-insincere-questions-classification (дата обращения: 15.04.2019).

2. А. С. Суркова, И. Д. Чернобаев. Сравнение нейросетевых архитектур в задаче автоматической классификации текста.

3. Чернобаев И. Д., Суркова А. С., Панкратова А. З. МОДЕЛИРОВАНИЕ ТЕКСТОВ С ИСПОЛЬЗОВАНИЕМ РЕКУРРЕНТНЫХ НЕЙРОННЫХ СЕТЕЙ. -2018.

4. Chung J. et al. Empirical evaluation of gated recurrent neural networks on sequence modeling //arXiv preprint arXiv:1412.3555. - 2014.

5. Ravanelli M. et al. Light gated recurrent units for speech recognition //IEEE Transactions on Emerging Topics in Computational Intelligence. - 2018. - Т. 2. - №. 2. -С. 92-102.

6. Ioffe S., Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift //arXiv preprint arXiv:1502.03167. - 2015.

7. Ramachandran P., Zoph B., Le Q. V. Searching for activation functions //arXiv preprint arXiv:1710.05941. - 2017.

УДК 62-405

Nie Jinliang,

Master Degree Student

CIFAR10 IMAGE CLASSIFICATION BASED ON RESNET

Russia, St.Petersburg, Peter the Great St.Petersburg Polytechnic University

272729768@qq.com

Abstract. We trained 4 different deep residual networks to classify 60 thousands low-resolution images in the Cifar10 dataset into 10 different classes. Then We compared these four different depth of residual net (ResNet- 20,ResNet56,Resnet110,ResNet164) and we found that among all networks we tried ,ResNet164 gave best performance ,achieved 94% accuracy on the test dataset. To reduce overfitting , in the fully-connected layers we employed a regularization method "drop out", also we employed data augmentation, batch normalization and decayed learning rate to combat overfitting.

Keywords: base, classification, fully-connected layers resnet.

1 Introduction

In recent years deep convolutional neural networks have achieved series of breakthroughs in the field of image classifications. Deep convolutional neural nets (CNNs) have a layered structure and each layers is consisted of convolutional filters. By convolving these filters with the input image, feature vectors for the next layer are produced and through sharing parameters, they can be learnt quite easily. Although deep networks can have better performance in classification most of the times, they are harder to train mainly due to two reasons:

• Vanishing / exploding gradients: sometimes a neuron dies during training process and depending on its activation function it might never come back .

• Harder optimization: when the model introduces more parameters, it becomes more difficult to train the network.

One effective way to solve these problems is Residual Networks (Res-Nets). The main difference in ResNets is that they have shortcut connections parallel to their normal convolutional layers. Contrary to convolution layers, these shortcut connections are always alive and the gradients can easily back propagate through them, which results in a faster training.

In this paper we are going to use ResNets in CifarlO image classification task. In the section 2 we will describe the CifarlO data set and keras ,the frame work we used for our implementations. In the section 3 we explain how we design our networks, the basic block that we employed in all our networks, and the data augmentation technique and some tricks we used to prevent overfitting. In the section 4 we will discuss our results and show how does the best performance ResNet compare to others. In our conclusion in section 5 we will point out a few considerations that must be accounted when designing a ResNet.

2 The dataset and implementations

In this section we describe the dataset we worked on and the framework we used for network implementations and model training. 2.1. Dataset

In this project we worked on the CifarlO dataset. This dataset consists of a training set of 50, 000 images and a test set of 10, 000 images from 10 different classes of objects. All images in Cifar10 are 32 x 32 color images and 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class. Figure 1 shows a few sample images from different classes of Cifar10 datasets.

airplane ^ j iHSBm^I

automobile □Q^QBH^HB^

■an v bhi — mmmmmm^m

ship C2i3k -Ifc

truck jmamwumnm®

Figure 1. Few sample images from cifar10 datasets

2.2 Keras

Keras is a scientific open source computing framework with wide support for neural network implementaions. In this project we used this framework to implement and train different ResNet. Keras has many predefined neural network layers and also packages that enable us to run our training algorithms on GPUs.

3 Network design and

3.1. Network Architectures

Just like with normal convolution layers, these residual blocks can be layered to create networks of increasing depth. Below is the basic structure of the Cifar10 residual network, with the depth being controlled by a multiplier n which dictates how many residual blocks to insert between each down sampling layer. Down sampling is done by increasing the stride of the first convolution layer in a residual block. Whenever the number of filters are increased the first convolution layer within a residual block will do the down sampling. The structure of this architecture can be seen below, the depth is controlled by a multiplier n and Figure 2 shows a basic block of a ResNet.

Group Size Multiplier

Convl [3x3, 16] -

Conv2 [3x3, 16] [3x3, 16] n

Conv3 [3x3, 32] [3x3, 32] n

Conv4 [3x3, 64] [3x3, 64] n

Avg-Pool 8x8

Softmax 10 -

Figure 2. A RestNet basic block

3.2. Data Augmentation

As the data were fed into the network we used real-time data augmentation by Image data generator function in keras. This effectively results in random translations. Additionally, a horizontal flip was applied with probability 0.5, and randomly shift images horizontally and vertically with 0.1 also was applied.

4. Results

4. 1. Training and regularization

For each networks (except RestNet-20 without BN and drop out) were trained with 200 epochs (full passes through training dataset) with Adam, batch size 32 and categorical cross-entropy loss. For the each architecture, the initial learning rate was set to 0.001 to warm up the network and was increased to 0.1 at epoch 80 then continued on the same schedule as the other networks. For all networks the learning rate was adjusted by the following schedule {0:0.001, 80: 0.0001, 120:0.00001,160:le-5,180:0.5 le-6}.Drop out of 0.3 was used before fully connected layer.

4.2. Results on test set

Our results on Cifar10 are summarized in table 2. In the first model we trained ResNet-20 with no batch normalization and drop out, our network exhibits substantial overfitting by only achieved 74.6% accuracy rate on test set. We also compare 20,56,110,164-layers networks,Table 1 shows the behaviors of the ResNets. The deep ResNet benefit from increased depth, and exhibit higher training accuracy and test accuracy when going deeper and we got best performance on the ResNet-164.

Table. 1 Classification accuracy on the test set

ResNet Type Acc on training set Acc on test set Params

RestNet-20 (without BN and drop out) 90.3% 74.6% 277K

ResNet-20 98.2% 90.1% 277K

ResNet-56 98.4% 91.3% 850K

RestNet-110 98.1% 93.39% 1.7M

ResNet-164 99.2% 94.2% 2.5M

5. Conclusion

As we explained in our results adding a BN layer after convolutional layers and drop out can improve the accuracy in the image classification task. But the trade of is that residual networks are more prone to overfitting which is undesirable. We showed that by using different machine learning techniques like drop out layer and image augmentation we can reduce this overfitting .We also observed that ResNets are more powerful for very deep networks.

References

1.K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

2. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Identity Mappings in Deep Residual Networks"

i Надоели баннеры? Вы всегда можете отключить рекламу.