PPSUTLSC-2024
PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 2IST CENTURY
TASHKENT, e-s MAY 2024
www.in-academy.uz
REAL TIME LOGO RECOGNITION USING YOLO ON ANDROID
Primbetov Abbaz'
University of Tashkent for Applied Sciences, Gavhar Str.
1, Tashkent 100149, Uzbekistan [email protected] https://doi.org/10.5281/zenodo.13364888 Abstract: Humans can easily detect and identify objects present in an image. The human visual system is fast and accurate and can perform complex tasks like identifying multiple objects and detect obstacles with little conscious thought. For a long time, humans have been trying to make computers understand what is on the images. With the availability of large amounts of data, faster Graphics Processing Unit (GPU)s, and better algorithms, we can now easily train computers to detect and classify multiple objects within an image with high accuracy. The goal of this paper is to implement an object detection model suitable in terms of size and speed to run on an Android device and detect logos in real-time. The proposed approach is based on YOLOv2 (You Only Look Once) state-of-the-art, real-time object detection for logos and this project used the FlickrLogos-32 dataset. The experimental results show that we obtained a final accuracy of 82.3% and a speed of 35 fps (frames per second) on the NVidia GeForce GTX 1070.
Keywords: Object detection, Convolutional Neural Network (CNN), You Only Look Once (YOLO), Faster R-CNN (Region-based Convolutional Neural Networks).
INTRODUCTION
A logo is a graphical mark used to identify a company, organization, product or brand. Logos are used to represent a company's name, a particular product or service and are used heavily in the marketing of products and services [1-9]. Logos have become an integral part of a company's identity and a well-recognized logo can increase a company's goodwill. A logo usually has a recognizable and distinctive graphic design, stylized name or unique symbol for identifying an organization. It is affixed, included, or printed on all advertising, buildings, communications, literature, products, stationery, vehicles, etc. Logo can be seen anywhere in the surrounding in our vdaily life, such as in the streets, supermarkets, on the products or services, on administrative documents, etc. Examples of different logos are shown in Figure 1. Logo detection is a challenging object recognition and classification problem as there is no clear definition of what constitutes a logo. A logo can be thought of as an artistic expression of a brand, it can be either a (stylized) letter or text, a graphical figure or any combination of these [10]. Furthermore, some logos
Figure 1: Some figures illustrate that logos appear everywhere in our surrounding.
have a fixed set of colors with known fonts while others vary a lot in color and specialized unknown fonts [1114]. Additionally, due to the nature of a logo (as brand identity), there is no guarantee about its context or placement in an image, in reality logos could appear on any product, background or advertising surface. Also, this problem has large intra-class variations e.g. for a specific brand, there exist various logos types (old and new Adidas logos, small and big versions of Nike) and inter-class variations e.g. there exists logos which belong to different brands but look similar (see Figure 2).
A\ É ^
odidas adidas adidas
GD3D
o©o©@
Figure 2: Logo variations exemplar images
PPSUTLSC-2024
PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 21ST CENTURY
tashkent, o-8 mav 2004 www.in~academy.uz
Left variations of brands Adidas. Notice, different graphical figures. Right variations of brands Chanel -Gucci, Vodafone, Target, beats, Bebo and Pinterest. Notice, similar looking logos but belong to different brands.
Related work
The problem of logo recognition itself has a rich research history. In the 1990's the problem was mainly explored in information retrieval use-cases. An image descriptor was generated using affine transformations and stored in a database for retrieval [15]. There were also some neural network-based approaches but the networks were not as deep nor the results as impressive as recent work.In the 2000's, with the advent of SIFT and related approaches better image descriptors were possible. This method provides representations and transformations to image gradients that are invariant to affine transformations and robust when facing lighting conditions and clutter. A Recent initiative in logo recognition uses deep neural networks, which offer superior performance with end to end pipeline automation, i.e. from image and logo identification to recognition. Multiple methods for object detection using CNNs have been presented this recent year [16-20]. The Region-Based Convolutional Neural Network (R-CNN) is an architecture that locates and classifies multiple objects by combining a CNN and an external region proposal method. A region proposal method is an algorithm that outputs thea set of regions of interest, typically defined with bounding boxes [21]. A commonly used region proposal method is Selective Search. This algorithm proposes regions of interest by using similarity measures based on color and visual features. R-CNN method crops and resize each region of interest and classifies them using a CNN. The original architecture uses a CNN with five convolutional layers and two fully connected layers, although any CNN classifier can be used.Some more complex methods for object detection include Fast R-CNN and Faster R-CNN. Fast R-CNN is a method based on R-CNN in which the full image is processed by the convolutional layers and then, regions of the output of the last convolutional layer are cropped and classified. The network is formed by a set of convolutional layers, fully-connected layers, an external region proposal method (typically Selective Search) and a Region of Interest (RoI) pooling layer. The RoI pooling layer applies max-pooling to each region of interest using a grid of a fixed size (typically 7 x 7) [22].
Fast R-CNN also introduces a bounding box regressor, a layer that outputs a fine-tuned location of bounding boxes. Faster R-CNN is based on Fast R-CNN
but substitutes the external region proposal methods by a Region Proposal Network (RPN). RPN is a neural network that generates regions of interest using the features of the output of the last convolutional layer. RPN is formed by a 3 x 3 sliding window that outputs a set of bounding boxes (typically 9) with different sizes and aspect ratios and a fully connected layer that assigns a binary class (foreground or background) to each bounding box [23].
Many other object detection algorithms, including the previous ones described, output several overlapping bounding boxes. In order to merge them, the Non-Maximum Suppression (NMS) algorithm is used. NMS removes a bounding box if it largely overlaps with another bounding box of the same class with a higher confidence score. New methods for object detection based on deep learning are constantly appearing. Some of them include: Single Shot Detector (SSD) or You Only Look Once (YOLO) and YOLOv2. This method typically provides faster performance than Faster R-CNN but obtains a lower accuracy. YOLO is a recent, unified CNN based object detection model, proposed by Joseph et. in 2016. It explores using a single network to predict both objects' positions and class scores at one time. The motivation is to reframe the detection problem as a regression problem, which regresses from the input image directly to class probabilities and locations. Benefit from the unified design, YOLO's detection speed is many times faster than other state-of-the-art methods [24]. Network Architecture
YOLOv2 is an improved version of YOLOv1 introduced in (Redmon et al. 2016b). We applied our project with YOLOv2 because compared to YOLOv1, YOLOv2 is a more accurate and faster detection method. However, the development team also came up with a "tiny" variation which is much smaller than the original [25-28]. This tiny modelbased implementation is called Tiny YOLOv2.
Tiny YOLOv2 has 11 ^-----
layers. Out of these 9 are
convolutional and 2 are fully connected. This is much smaller than the regular model which is perfect for
PPSUTLSC-2024
PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 2IST CENTURY
tashkent, o-8 may 2004 www.in~academy.uz
android. Figure 3 shows the structure of Fast YOLO. The tiny version is composed of 9 convolution layers with leaky relu activations. Observe that after 6 maxpool the 446x446 input image becomes a 13x13xD image
Figure 3: The network of YOLOv2
YOLO divides up the image into a grid of 13 by 13 cells. In object detection, we also have to predict the location and the shape of an object, not only classification [29]. Therefore, the output of an object detection network becomes a little bit complicated. In our case of YOLOv2, the output is a 3-dimensional array (or Tensor in TensorFlow). Particularly in YOLOv2, the shape of output is 13x13xD, where D varies depending on how many classes of objects we want to detect (For example D=5 for a single class). The first 2-dimensional array (13x13) is called grid cells. So, there are 169 grid cells in total.One grid cell is 'responsible' for detecting 5 bounding boxes, that is we can detect up to 5 boxes on a grid cell. This means that the network can detect up to 169 x 5 = 845 boxes at once [30]. This number of bounding boxes a grid cell can detect is actually the number of Anchor-Boxes we prepare, and we can change this number to whatever we want. So, for example, if we want to detect humans and cars and think that just two Anchor-Boxes (vertical rectangle for humans, and horizontal rectangle for cars) are enough to detect them, then the number 5 above becomes 2. (In the paper of YOLOv2, this number is denoted as 'B'). Figure 4: shows the output of the network for YOLOv2 looks like this.
Layer kernel Stride/Fillers Output shape
Iflf»! 416x416x3
Convohu un 3*3 1/16 416x416*16
MaxPonlmg 2*2 2 208*208x16
Convolution 3*5 1/31 208x208x32
MnPoolng 2*2 2 104x104x32
Convolution I'M 104x104x64
M a* Poo ling 2*2 2 52x52x64
Convohu un 3*3 1/12V 52x52 Hi 28
Mul'wlmjj 2*2 2 26x26x128
Convolution 3*3 1/256 26*26x256
Mul'wlmjj 2*2 2 13x13x256
Convolution 3*3 1/512 13x13x512
MMPoolmg 2*2 I 13x13x1024
ConvohM ion 3*3 1/1024 13x13x1024
Convolution 3*3 I 13x13x1024
Convohu ion 1"! 1/17Î I3xl3xl?5
Figure 4: The output of the network for YOLOv2 Each grid cell has depth of D. The value of D depends on the number of classes we want to detect. When we have C classes of object, D is D=B(5+C) The output of the network looks like this. There are 13x13 = 169 grid cells in total, and each grid cell can detect up to B bounding boxes. One bounding box has 5 + C properties, therefore a grid cell has D = Bx(5+C) values
(this is depth) Tensor=SxSxSx(5+c) In our case classes number C=30 and B=5
Figure 5: This 13x13 tensor can be considered as a 13x13 grid representing the input image, where each cell of this tensor will hold the 5 box definitions and 30 class probabilities.
The input to the network is 416x416x3 image in YOLOv2-tiny. There is no fully connected layer in it [31]. (Table 1)
t
X G ft
Table 1: Details of Network Experimental Results
In our project we used FlickrLogos-32 dataset. The FlickrLogos-32 dataset contains photos showing brand logos and is meant for the valuation of multi-class logo recognition as well as logo retrieval methods on
mAP = 82.53 %
huawe starbucks -becks stellaartois
aot^d®: texaco tsingtao guinness carlsberg esso fosters paulaner -bmw corona -cocacola -hp-underarmour ford chimay adidas shell nvidia apple dhl fedex -ferrari
■ 0.94
■ 0.94 I 0.93
■ 0.92
■ 0.92 I 0.91
■ 0.85
■ 0.85
■ O.S3
■ 0.83 1 0.83 I 0.82
H 0.81 H 0.81 H 0.81
■ O.SO
■ 0.80
■ 0.79
■ 0.79 I 0.78 I 0.78 I 0.78
■ 0.77
■ 0.77
■ 0.77 I 0.74
I 0.74
Average Precision
real-world images. Logos of 32 different logo classes and 6000 negative images were collected by downloading them from Flickr. The dataset includes images, ground truth, annotations (bounding boxes plus binary masks), evaluation scripts and pre computed visual features [32]. The dataset FlickrLogos-32 contains photos depicting logos and is meant for the evaluation of multi-class logo detection/recognition as well as logo retrieval methods on real-world images. One of the most time-consuming and costly processes in constructing the Flickrlogos-32 database is to annotate logo objects from the collected product images. For each product image,a logo annotator needsto identify the logo objects, annotate the bounding box of each logo
PPSUTLSC-2024
PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 2IST CENTURY
TASHKENT. 0-8 MAY 2024
www.in-academy.uz
object, and then tag it with the corresponding logo class id. Figure 5 shows examples of logo object annotation on product images.
Figure 5: Instruction example of logo object annotation.The left-hand side is rejected due to too loose bounding box. Metric
mAP (mean Average Precision) is a popular metric in measuring the accuracy of object detectors like YOLO, SSD, etc. Average precision computes the average precision value for recall value over 0 to 1. Using this criterium, we calculate the precision/recall curve. Then we compute a version of the measured precision/recall curve with precision monotonically, Figure 6: Show us the result of mean average precision (mAP) by setting the precision for recall r to the maximum precision obtained for any recall r' > r. Finally, we compute the AP as the area under this curve by numerical integration. No approximation is involved since the curve is piecewise constant and finally, we can calculate mean average precision object detection(mAP), resulting in a mAP value from 0 to 100% [33].
(Mean average precision) of 82% and it can track logos very smoothly. In mobile android phones (Honor 9) we have made the process result as shown in Figure 12 by conducting a series of experiments, the quantitative performance measure of logo detection. Training dark flow and our custom CNN architecture took an immense amount of time. We trained our models in batches of 64 in 8 mini-batches. This allowed us to efficiently train 64 images every step [34].
experimented with running different learning rates our accuracy never got any better.
Training on a NVidia GeForce 1070, each step took 0.5 seconds. This allowed us to train each model for 2000 epochs, so we can observe the early stopping point and the weights that gave us the best accuracies. YOLO's implementation allowed us to save our weight files every 10000 steps, so we just let it continually train overnight so we can scrap the accuracy in the morning using a script. We have significant results that show our model works better with our dataset above with a little less than 2000 epochs. We trained up to 2000 epochs and the accuracy peaked at epoch 1500. We
Figure 7. Shows the logo detection through Honor 9. CONCLUSIONS
I have trained the model on the FlickrLogos-32 dataset and experiment results to show that YOLOv2 performs very well in real-time logo detection. By performing a comprehensive analysis of YOLOv2 over FlickrLogos-32 dataset, we found that the experiment result showed that we managed to achieve a final mean average precision (mAP) 82.53% and 30-35 FPS (frames per second) speed on an NVIDIA GeForce Gtx 1070 and our models performed well at the detection, with very low false-positive rates possible for a fairly reasonably. The application runs smoothly on the current test hardware. However, the main part of the goal was successfully implemented, a working application that utilizes a neural network model for object detection..
REFERENCES
[1] Feh'erv'ari, I., Appalaraju, S. (2019, January). Scalable logo recognition using proxies. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 715-725). IEEE.
[2] Su, Hang, Xiatian Zhu, and Shaogang Gong. "Open logo detection challenge." arXiv preprint arXiv: 1807.01964 (2018).
[3] Oliveira, G., Fraz~ao, X., Pimentel, A., Ribeiro, B. (2016, July). Automatic graphic logo detection via fast region-based convolutional networks. In 2016 International Joint Conference on Neural Networks (IJCNN) (pp. 985-991).IEEE.
[4] Hoi, S. C., Wu, X., Liu, H., Wu, Y., Wang, H., Xue, H., Wu, Q. (2015). Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv: 1511.02462.
[5] Shafiee, M. J., Chywl, B., Li, F., Wong, A. (2017). Fast YOLO: A Fast You Only Look Once System for Realtime Embedded Object Detection in Video. arXiv: Computer Vision and Pattern Recognition.
[6] Feh'erv'ari, Istv'an, and Srikar Appalaraju. "Scalable logo recognition using proxies." 2019 IEEE Winter
PPSUTLSC-2024
PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 2IST CENTURY
tashkent, o-8 may 2004 www.in~academy.uz
Conference on Applications of Computer Vision (WACV). IEEE, 2019.
[7] Ren, S., He, K., Girshick, R., Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Neural information processing systems.
[8] S., He, K., Girshick, R., Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).
[9] Le, Viet Phuong. "Logo detection, recognition and spotting in context by matching local visual features." PhD diss., Universit'e de La Rochelle, 2015.
[10] [10] Eggert, C., Brehm, S., Winschel, A., Zecha, D. and Lienhart, R., 2017, July. A closer look: Small object detection in faster R-CNN. In 2017 IEEE international conference on multimedia and expo (ICME) (pp. 421-426). IEEE.
[11] Saitov E.B., Akhmedov U.M. Tracking maximum power point of photovoltaic modules, AIP Conference Proceedings, 2023, 2789, 060005, https://doi.org/10.106375.0145470
[12] Zikrillayev N., Ayupov K.a Mamarajabova Z.b, Yuldoshev I.a, Saitov E.a, Sabitova I.a Adaptation and optimization of a photovoltaic system for a country house, E3S Web of Conferences, 2023, 383, 04054, https://doi.org/10.1051/e3sconf/202338304054
[13] Saitov, E., Kudenov, I., Qodirova, F., Askarov, D., Sultonova, M. Analysis of the performance and economic parameters of different types of solar panels taking into account degradation processes. E3S Web of Conferences, 2023, 383, 04059 https://doi.org/10.1051/e3sconf/202338304059
[14] Saitov, E., Khushakov, G., Masharipova, U., Mamasaliyev, O., Rasulova, S. Investigation of the working condition of large power solar panel cleaning device. E3S Web of Conferences, 2023, 383, 04060, https://doi.org/10.1051/e3sconf/202338304060
[15] Zikrillayev, N., Saitov, E., Toshov, J., Muradov, B., Muxtorov, D. Autonomous Solar Power Plant for Individual use Simulation in LTspice Software Package Booster Voltage Converter. Proceedings of International Conference on Applied Innovation in IT, 2023, 11(1), pp. 207-211, https://doi:10.25673/101939
[16] Yuldoshov, B., Saitov, E., Khaliyarov, J., Toshpulatov, S., Kholmurzayeva, F. Effect of Temperature on Electrical Parameters of Photovoltaic Module. Proceedings of International Conference on Applied Innovation in IT, 2023, 11(1), pp. 291-295, https://doi:10.25673/101957
[17] Saitov, E., Jurayev, O., Axrorova, S., Ismailov, J., Baymirzaev, B. Conversion and use of Solar Energy Calculation Methodology for Photovoltaic Systems, Proceedings of International Conference on Applied Innovation in IT, 2023, 11(1), pp. 227-232, https://doi: 10.25673/101942
[18] Zikrillayev, N.F., Shoabdurahimova, M.M., Kamilov, T., Norkulov, N., Saitov, E.B. Obtaining manganese silicide films on a silicon substrate by the diffusion method.
Physical Sciences and Technology, 2022, 9(2), pp. 89-93, https://doi.org/10.26577/phst.2022.v9.i2.011
[19] Zikrillayev, N.F., Saitov, E.B., Ayupov, K.S., Yuldoshev, I.A. Determination of the resistance of external parameters to the degradation of the parameters of silicon photocells with input nickel atoms. Physical Sciences and Technology, 2022, 9(1), pp. 30-36, https://doi.org/10.26577/phst.2022.v9.i1.04
[20] Saitov, E.B., Kodirov, Sh., Beknazarova, Z.F., Nortojiyev, A., Siddikov, N. Developing Renewable Sources of Energy in Uzbekistan Renewable Energy Short Overview: Programs and Prospects. AIP Conference Proceedings, 2022, 2432, 020015, https://doi.org/10.1063Z5.0090438
[21] E. B. Saitov; T. B. Sodiqov. Modeling an autonomous photovoltaic system in the MATLAB Simulink software environment, AIP Conf. Proc. 2432, 020022 (2022), https://doi.org/10.1063/5.0089914
[22] E. B. Saitov, Sh. Kodirov, B. M. Kamanov, N. Imomkulov, I. Kudenov, Increasing the efficiency of autonomous solar photovoltaic installations for power supply of agricultural consumers, AIP Conf. Proc. 2432, 040036 (2022), https://doi.org/10.1063/5.0090439
[23] F. Zikrillayev, E. B. Saitov, J. B. Toshov, B. K. Ilyasov, M. B. Zubaydullayev, A software package for determining the optimal composition and parameters of a combined autonomous power supply system based on renewable energy sources, AIP Conf. Proc. 2432, 020021
(2022), https://doi.org/10.1063/5.0090460
[24] Zikrillayev Nurullo, Toshov Javohir, Saitov Elyor, Sodikov Temur, Development and selection of equipment for a peak autonomous photo power plant, AIP Conf. Proc. 2552, 030020 (2023) https://doi.org/10.1063/5.0117594
[25] Turabdjanov Sadritdin, Toshov Javohir, Saitov Elyor, Axmedov Usmonjon, Research of electrophysical parameters of different types of solar panels taking into account degradation processes, AIP Conf. Proc. 2552, 030019
(2023), https://doi.org/10.1063/5.0117592
[26] Yu.M.Kurbonov, E.B.Saitov and B.M.Botirov, Analysis of the influence of temperature on the operating mode of a photovoltaic solar station, IOP Conference Series: Earth and Environmental Science, Volume 614, 1st International Conference on Energetics, Civil and Agricultural Engineering 2020 14-16 October 2020, Tashkent, Uzbekistan, Citation 2020 IOP Conf. Ser.: Earth Environ. Sci. 614 012034, https://doi: 10.1088/1755-1315/614/1/012034
[27] E.B. Saitov, Optimal model for additional operation of the storage system for photovoltaic wind power plants, E3S Web of Conferences 220, 01080 (2020), https://doi.org/10.1051/e3sconf/202022001080
[28] E.B.Saitov, Y.B.Sobirov, I.A.Yuldoshev, I.R. Jurayev and Sh.Kodirov, Study of Solar Radiation and Wind Characteristics in Various Regions of Uzbekistan, E3S Web of Conferences 220, 01061 (2020), https://doi.org/10.1051/e3sconf/202022001061
[29] E.B.Saitov, J.B.Toshov, A.O.Pulatov, B.M.Botirov and Yu.M.Kurbanov, Networked interactive solar panels over the roof photovoltaic system (PVS) and its cost analysis at Tashkent state technical University, E3S Web of Conferences
PPSUTLSC-2024
PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 2IST CENTURY
tashkent, o-8 may 2004 www.in~academy.uz
216, 01133 (2020),
https://doi.org/10.1051/e3sconf/202021601133
[30] E.B. Saitov, Renewable energy development in Uzbekistan: current status, problems and solutions, E3S Web of Conferences 216, 01134 (2020), https://doi.org/10.1051/e3 sconf/202021601134
[31] Sanjar Shoguchkarov, Isroil Yuldoshev, Elyor Saitov and Alisher Boliev, The effect of the surface geometry of a photovoltaic battery on its efficiency, E3S Web of Conferences 216, 01149 (2020), https://doi.org/10.1051/e3sconf/202021601149
[32] I.Sapaev, E.Saitov, N.Zoxidov and B.Kamanov, Matlab-model of a solar photovoltaic station integrated with a local electrical network, IOP Conference Series: Materials Science and Engineering, Volume 883, International Scientific Conference Construction Mechanics, Hydraulics and Water Resources Engineering (CONMECHYDRO - 2020) 23-25 April 2020, Tashkent Institute of Irrigation and Agricultural Mechanization Engineers, Tashkent, Uzbekistan, https://doi: 10.1088/1757-899X/883/1/012116
[33] Javoxir Toshov and Elyor Saitov, Portable autonomous solar power plant for individual use, E3S Web of Conferences 139, 01087 (2019), https://doi.org/10.1051/e3sconf/201913901087
[34] M. K. Bakhadyrkhanov, S. A. Valiev, N. F. Zikrillaev, S. V. Koveshnikov, E. B. Saitov & S. A. Tachilin, Silicon photovoltaic cells with clusters of nickel atoms, Direct Conversion of Solar Energy into Electric Energy, Published: 22 February 2017. Volume 52, pages 278-281, (2016), https://doi.org/10.3103/S0003701X1604006X