GREENHOUSE PRODUCTIVITY ESTIMATION BASED ON THE
OPTIMIZED YOLOV5 MODEL
Eraliev Oybek Maripjon ugli
Teacher of Information Technology Department, Inha University in Tashkent, PhD
Rashidov Kodirjon Ilkhomjon ugli
Head of International Trade Department, Korea International University in
Ferghana, teacher
Eraliev Khojiakbar Abdinabi ugli
Teacher of Electrical Engineering Department, Ferghana Polytechnic Institute, PhD
candidate
eraliyevhoj [email protected]
Abstract: In modern agriculture, precision monitoring and efficient resource management are paramount for maximizing crop yields. This research presents a novel approach to greenhouse productivity estimation by leveraging the state-of-the-art YOLOv5 object detection model, tailored and optimized for a custom tomato dataset. The study focuses on detecting and classifying tomatoes into three categories-green, pink, and red-providing a comprehensive understanding of the ripening process in realtime. The optimized YOLOv5 model demonstrated superior performance compared to the standard version, showcasing enhanced accuracy in tomato identification. The model was deployed in a real-world greenhouse equipped with a meticulously arranged seven-camera system, capturing a row of tomato plants per camera. By extrapolating the results from the single row to the entire greenhouse (comprising eight rows), an accurate estimation of overall productivity was achieved. A web application was developed to facilitate real-time monitoring of tomato plant states and key statistics. The application provides insights into the percentages of green, pink, and red tomatoes, allowing greenhouse operators to make informed decisions on resource allocation and management. The proposed methodology offers a scalable and practical solution for greenhouse productivity assessment, potentially revolutionizing the precision agriculture landscape. The findings contribute to the advancement of computer vision
applications in agriculture, fostering sustainable and efficient practices in greenhouse cultivation.
Key words: greenhouse Productivity, YOLOv5 Optimization, Tomato Detection, Precision Agriculture, Real-time Monitoring.
OPTIMALLASHTIRILGAN YOLOV5 MODELI ASOSIDA ISSIQXONA
MAHSULDORLIGINI BAHOLASH
Eraliev Oybek Maripjon o'g'li
Toshkent shahridagi Inha universiteti Informatsion texnologiyalar kafedra
o 'qituvchisi, PhD
Rashidov Qodirjon Ilxomjon o'g'li
Farg 'ona shahridagi Koreya xalqaro universiteti Xalqaro savdo kafedra mudiri,
o 'qituvchi
Eraliev Xojiakbar Abdinabi o'g'li
Farg 'ona Politexnika instituti Elektr energetika kafedra katta o'qituvchisi
eraliyevhoj [email protected]
Annotatsiya: Zamonaviy qishloq xo'jaligida hosildorlikni maksimal darajada oshirish uchun aniq monitoring va resurslarni samarali boshqarish muhim ahamiyatga ega hisoblanadi. Ushbu tadqiqot pomidor o'simligi ma'lumotlar to'plami uchun moslashtirilgan va optimallashtirilgan zamonaviy YOLOv5 ob'ektni aniqlash modelidan foydalangan holda issiqxona mahsuldorligini baholashga yangi yondashuvni taqdim etadi. Tadqiqot pomidorlarni uch toifaga - yashil, pushti va qizil rangga aniqlash va tasniflashga qaratilgan - real vaqt rejimida pishib yetilish jarayonini har tomonlama tushunishni ta'minlaydi. Optimallashtirilgan YOLOv5 modeli boshqa standard versiyali moddellarga nisbatan yuqori unumdorlikni namoyish etib, pomidorni aniqlashda yaxshilangan yuqori aniqlikni namoyish etadi. O'tkazilgan tajribada Model tartibga solingan yetti kamerali tizim bilan jihozlangan haqiqiy issiqxonada joylashtirilib har bir kamera orqali bitta qatordagi pomidor o'simliklarini suratga olib, sinchkovlik bilan kuztuv olib borildi. Bitta qatordan olingan natijalarni butun issiqxonaga (sakkiz qatordan iborat) ekstrapolyatsiya qilish orqali umumiy hosildorlikni aniq baholashga erishildi. Pomidor o'simliklari holati va asosiy statistik
ma'lumotlarning real vaqt rejimida monitoringini osonlashtirish uchun veb-ilova ishlab chiqilgan bo'lib, Ilova yashil, pushti va qizil pomidorlarning foizlari haqida ma'lumot beradi. Bu issiqxona operatorlariga resurslarni taqsimlash va boshqarish bo'yicha ongli qarorlar qabul qilish imkonini beradi. Taklif etilayotgan metodologiya issiqxona mahsuldorligini baholash uchun keng ko'lamli va amaliy yechim taklif etadi. Olingan natijalar qishloq xo'jaligida kompyuter orqali boshqarish dasturlarini rivojlantirishga, issiqxonalar mahsuldorligini oshirishga, issiqxonalarda barqaror va samarali amaliyotlarni rivojlantirishga yordam beradi.
Kalit so'zlar: issiqxona mahsuldorligi, YOLOv5 optimallashtirish, pomidorni aniqlash, aniq qishloq xo'jaligi, real vaqt rejimida monitoring.
ОЦЕНКА ПРОДУКТИВНОСТИ ТЕПЛИЦ НА ОСНОВЕ ОПТИМИЗИРОВАННОЙ МОДЕЛИ YOLOV5
Эралиев Ойбек Марипжон угли
Преподаватель кафедры Информационных технологий Университета Инха в
Ташкенте, PhD
Рашидов Кодиржон Илхомжон угли
Заведующий кафедрой Международной торговли Корейского международного
университета в г.Фергана, преподаватель
Эралиев Хожиакбар Абдинаби угли
Старший преподаватель кафедры Электроэнергетики Ферганского
политехнического института
eraliyevhoj [email protected]
Аннотация: В современном сельском хозяйстве точный мониторинг и эффективное управление ресурсами имеют первостепенное значение для максимизации урожайности. В этом исследовании представлен новый подход к оценке продуктивности теплиц с использованием современной модели обнаружения объектов YOLOv5, адаптированной и оптимизированной для специального набора данных о помидорах. Исследование направлено на обнаружение и классификацию помидоров на три категории - зеленые, розовые и красные - что обеспечивает полное понимание процесса созревания в режиме
реального времени. Оптимизированная модель YOLOv5 продемонстрировала превосходную производительность по сравнению со стандартной версией, продемонстрировав повышенную точность идентификации помидоров. Модель была развернута в реальной теплице, оснащенной тщательно продуманной системой из семи камер, каждая из которых фиксирует ряд растений помидора. Путем экстраполяции результатов с одного ряда на всю теплицу (состоящую из восьми рядов) была достигнута точная оценка общей продуктивности. Было разработано веб-приложение для облегчения мониторинга состояния растений помидора и ключевых статистических данных в режиме реального времени. Приложение предоставляет информацию о процентном соотношении зеленых, розовых и красных помидоров, позволяя операторам теплиц принимать обоснованные решения по распределению и управлению ресурсами. Предлагаемая методология предлагает масштабируемое и практичное решение для оценки продуктивности теплиц, потенциально революционизирующее ландшафт точного земледелия. Полученные результаты способствуют развитию приложений компьютерного зрения в сельском хозяйстве, способствуя устойчивым и эффективным методам выращивания в теплицах.
Ключевые слова: продуктивность теплицы, оптимизация YOLOv5, обнаружение помидоров, точное земледелие, мониторинг в реальном времени.
INTRODUCTION
In a time marked by remarkable technological progress, the agricultural industry is on the brink of a significant change towards sustainability, effectiveness, and accuracy. As the cornerstone of human survival, agriculture confronts the daunting task of nourishing a rapidly growing world population while reducing its environmental impact. Greenhouse farming, a product of innovation, has surfaced as a hopeful remedy to these complex issues. By offering a controlled setting, it enables farmers to grow crops with exceptional precision, maximizing the use of resources and increasing yields. Picture 1 illustrates the progression of agriculture from 12,000 B.C. onward, highlighting advancements in techniques, crop cultivation, and mechanization. Prehistoric farming relied on rudimentary tools like sticks, sickles, and manual harvesting, alongside animal husbandry. Modern agriculture integrates technology such as cellphones for remote crop monitoring and genetically modified seeds to enhance crop yield and resilience against pests and diseases. This technological evolution has likely contributed to a reduction in global food scarcity by improving crop quality and quantity.
Picture 1. Representation of the evaluation of agricultural practices1
LITERATURE REVIEW
Modern agriculture is undergoing a transformative shift with the integration of advanced technologies, paving the way for precision farming and enhanced productivity. Among these technologies, computer vision-based object detection models have emerged as powerful tools for automating crop monitoring processes. Recently, the use of computer vision in intelligent plant and factory systems has increased significantly [1]—[5]. Almost all production links are covered by this technology, including raising seedlings, transplanting, managing, harvesting, and fruit sorting. By imitating human vision, it gathers information from images, evaluates them, and directs practical production [6], [7]. When opposed to chemical and physical methods, machine vision captures plant data without causing any harm to the plants. Additionally, it operates consistently, cheaply, and with excellent efficiency. An image capture section, a data processing section, and a job execution section make up the traditional machine vision system. The first one is utilized to take pictures and transmit them to the following component. This section consists of image capturing cards, optical systems, and light sources. The data processing component gathers and
1 The Picture was designed and prepared by the author.
examines data from photographs, makes a choice based on the outcomes of learning, and then delivers instructions to the final segment. In this area, a computer system serves as the main element. Usually a mechanical module, the job execution portion carries out operations like watering and fruit picking. In this section, it's typical to use robots, environmental control systems, and nutrition supply systems. Numerous reviews have discussed the status of agricultural computer vision applications. They mostly deal with field crops [8], [9], and just a few of them mention plant factories. A factory's environment is complex and distinct from the outside. In addition to the actual plant, there are irrigation pipelines, hanging ropes, mechanical devices, and other supporting infrastructure. Computer vision applications are made more difficult by the fact that the illumination conditions change often in accordance with the needs of the plant.
Object detection, a subfield of computer vision, has witnessed remarkable advancements in recent years and has found a multitude of applications in agriculture, particularly within the context of smart farm management. The ability to accurately identify and track objects, such as plants, pests, and diseases, has proven invaluable for enhancing crop health, optimizing resource utilization, and reducing the environmental impact of smart farm agriculture. This section delves into the various applications and methodologies of object detection in smart farm environments[10].
One of the primary applications of object detection in smart farm agriculture is the monitoring and management of plant growth. Computer vision techniques, notably CNNs, have demonstrated their capacity to identify individual plants and monitor their growth progress over time. This granular level of monitoring enables growers to tailor cultivation practices to the specific needs of each plant, ensuring optimal conditions for growth. By analyzing the data collected from object detection systems, growers can adjust variables such as irrigation, fertilization, and lighting to maximize crop yields. For instance, identifying overcrowded areas or uneven plant distribution allows for timely interventions, such as thinning or rearranging plants, to ensure that each plant receives sufficient resources and space for healthy development[11].
While object detection applications offer significant advantages for smart farm agriculture, several challenges and limitations must be considered. These include variations in lighting conditions, occlusions, and the diversity of plant species and growth stages. Developing robust object detection models that can perform effectively under these conditions remains an ongoing research challenge. Additionally, data annotation and model training can be time-consuming and resource intensive. The need for large and diverse annotated datasets is essential for training accurate object detection models, and the availability of such datasets can vary across different agricultural contexts.
Based on reasonably regulated environmental conditions, plant factories are an innovative vertical agriculture solution that constitute an advanced form of smart farm agriculture capable of producing sustainable supply of vegetables, herbs, flowers, and other crops year-round [12]. They also act as an urban agriculture option, bringing high-quality, fresh, and nutritious plant products to cities so that people can eat freshly picked vegetables [13]. Grown in smart farms and plant factories, tomatoes are prized commodities. Tomato target detection accuracy is lowered when small-target tomato varieties are obstructed by dense foliage of tomato plants. Furthermore, in order to improve detection accuracy, detection models frequently rely on large, complicated heavyweight models. These models raise the cost of manufacturing mobile and intelligent devices by requiring a significant amount of processing power and storage.
Deep convolutional neural networks (DCNNs) have been successfully used in agriculture in recent years, opening up new research opportunities for tomato fruit recognition and classification using computer-vision-based DCNN detection methods. Depending on how many detection steps there are, the DCNN target detection techniques fall into one of two categories: (1) Two-stage detection techniques count the image's candidate frames first, then categorize and forecast them. This kind of detection technique is based on convolutional neural networks (CNNs) and includes regional convolutional neural networks (RCNNs) [14], Fast RCNNs [15], Faster RCNNs [16], and so on. Models for two-stage detection perform well in terms of recall and precision. However, because of their enormous network size and poor operation speed, their implementation in real-time detection scenarios is difficult. (2) One-step detection techniques locate and categorize the target based on the features that are directly derived from the input image. This kind of detection technique includes the you only look once (YOLO) series [17]-[22] and the single-shot multibox detector (SSD) [23]. Single-stage detection models may achieve real-time performance requirements with their rapid operation speed and accuracy, matching that of two-stage detection models, owing to their network structure design.
METHODOLOGY
This study focuses on the application of the YOLOv5 object detection model, optimized for a custom tomato dataset, to estimate greenhouse productivity in realtime. Recent developments in deep learning and object detection have shown promising results in various agricultural applications. However, adapting these techniques to the specific needs of greenhouse cultivation, particularly for tomatoes, poses unique challenges. Achieving accurate detection and classification of tomatoes at different ripening stages is critical for precise monitoring and resource management. The optimization of the YOLOv5 model addresses the challenges associated with tomato detection, providing a robust solution for greenhouse productivity estimation.
Recent findings in computer vision and precision agriculture underscore the potential impact of such models in improving crop management strategies. This research builds upon these advancements, contributing a tailored approach for real-world implementation in a greenhouse environment.
Challenges in this area include the need for accurate and efficient detection across varying lighting conditions, occlusions within the dense foliage of tomato plants, and the dynamic nature of ripening stages. The proposed model aims to tackle these challenges, offering a reliable and scalable solution for greenhouse operators. As agriculture strives for sustainability and resource efficiency, the integration of optimized YOLOv5 for tomato detection stands as a significant step towards smart, data-driven greenhouse management. This research addresses a critical gap in the current literature and sets the stage for further advancements in precision agriculture methodologies.
DISCUSSION AND RESULTS
YOLOv5 Object Detection Model Network Architecture: The practice of locating a specific object in a picture or video is known as object detection. Nowadays, computer vision makes extensive use of object detection algorithms. For object detection, deep learning and machine learning both employ several techniques. It is possible to think of the difficulty of object detection in plant leaves [24]. A number of detection techniques, including SIFT [25], Haar [26], HOG [27], and finally convolutional features [28], have been examined in the literature. Following the feature extraction, objects in the feature space are identified using localizers or classifiers. The choice of an object detection technique is a critical decision in the development of any computer vision system, particularly in the context of my research on smart farm agriculture. The selection of YOLOv5 (You Only Look Once version 5) as the primary object detection technique for this smart farm agriculture research is rooted in its impressive combination of real-time processing speed, accuracy, efficiency, ease of implementation, versatility, and strong community support. These advantages position YOLOv5 as a pivotal tool in achieving the goals of this research, including the precise monitoring of plants, pests, and environmental conditions, ultimately contributing to more efficient and sustainable smart farm agriculture practices.
One of the most recent iterations of YOLO is version 5 [29]. It offers a high level of accuracy in both detection and inference speed. The weight file for YOLOv5 is 90 percent smaller than that of YOLOv4. It is thus applicable to embedded devices for real-time detection. When compared to earlier YOLO versions, YOLOv5 boasts quick detection times, light weight, and great detection accuracy. Accuracy and effectiveness are crucial when identifying plant diseases. Bell pepper plant disease detection is enhanced by the YOLOv5 design. There are four various versions in architecture,
including the YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The feature extraction module and the convolutional network kernel of the four designs differ from one another. The size of the model and the quantity of model parameters change for each of the four architectures, which is the other difference.
In Picture 2, the YOLO framework is displayed. The basic concept of YOLO is to divide a picture into S x S grids and use confidence and bounding box prediction to identify the item (object) in each grid. Each grid must list the bounding boxes and the accuracy ratings associated with each. The intersection over union, or IoU, is equal to one if the bounding box that was identified fits the ground truth (GT). By doing this, bounding boxes that differ in size from the actual box are avoided.
probability
Picture 2. YOLOv5 model2
The three components of the YOLOv5 model are the head, neck, and backbone, as seen in Picture 3. At various granularities, the backbone extracts specific features from the input image. Following the backbone's feature extraction, the neck aggregates the data before moving on to the following layer for prediction. Ultimately, the head guesses the class labels and creates the bounding boxes.
2 The Picture was prepared by the author based on experiments conducted in a greenhouse.
Picture 3. YOLOv5 model architecture3
The core of YOLOv5 performs slice operation by down sampling using the Focus layer. The focus layer receives the original RGB input image, executes a slice operation to create a 12-dimensional feature map, and then uses 32 kernels to perform a convolution operation to create a 32-dimensional feature map. Cross Stage Partial structures (CSP) come in two varieties in YOLOv5: CSP1_X and CSP2_X. By lowering the calculation cost, the first CSP structure, known as CSP1_X, is used in the backbone to obtain rich gradient information. Backbone uses spatial pyramid pooling (SPP) to produce feature maps of a given size while preserving the accuracy of image detection.
The main purposes of the YOLOv5 neck are to build feature pyramids, enhance the model's ability to identify objects of different sizes, and enable recognition of the same object at different scales. To aggregate the features, YOLOv5 use the CSP2_X structure, the Feature Pyramid Network (FPN) [30], and the Path Aggregation Network (PAN) [31] as the neck. The loss function and nom-max suppression make up the YOLOv5 head. The bounding-box loss, confidence loss, and classification loss are the three components that make up the loss function. The generalized IoU (GloU) is used to determine the bounding box loss [32]. When post-processing target object detection, YOLOv5 uses weighted NMS to filter multiple targets bounding boxes and eliminate duplicate boxes.
Hyperparameter Optimization of YOLOv5 Model: In machine learning, hyperparameters regulate several facets of training, and determining the ideal values for them can be difficult. The enormous dimensionality of the search field is one reason
' The Picture was prepared by the author using data obtained from experiments conducted in a greenhouse.
why traditional techniques like grid searches can easily become unmanageable. Genetic algorithm (GA) is a good option for hyperparameter searches due to three factors: 1) the costliness of assessing the fitness at each point; 2) the dimensions' unknown correlations; and 3) both of which affect the results. About thirty hyperparameters in YOLOv5 are employed for different training configurations.
These are specified in the /data/hyps directory's *.yaml files. Correct initialization of these variables is crucial before evolving, since better starting estimations will yield better ultimate results. However, several hyperparameters related to the data augmentation part are removed in order to save model optimization time. The main hyperparameters and their recommended values from authors and our optimized parameters are shown in Table 1.
Table 1
Optimized hyperparameters of the YOLOv5s model4
Warmup initial bias learning rate
The value we aim to optimize is fitness. We define a default fitness function in YOLOv5 as a weighted mixture of measures, where 10% of the weight is contributed by [email protected] and the rest 90% is contributed by [email protected]:0.95, with precision (P) and recall (R) being absent. Evolution is carried out based on a baseline situation that we aim to enhance for 300 generations. In this study, utilizing pretrained YOLOv5s, COCO 128 is fine-tuned for 10 epochs. Crossover and mutation are the two most important genetic operators. In this work, mutation is employed to create new children with an 80% chance and a 0.04 variance based on a mixture of the best parents from
4 The table was prepared by the author
all past generations. The outcomes are stored in runs/evolve/exp/evolve.csv, directory and the most fit offspring are kept in runs/evolve/hyp_evolved.yaml directory.
Experimental Setup: Camera Installation in a Greenhouse
In the pursuit of advancing agricultural monitoring and productivity estimation, a dedicated experimental setup has been established for the purpose of tomato detection in an outdoor smart farm environment. This section outlines the key components and methodologies employed in this innovative endeavor. To facilitate real-time monitoring of tomato plants, seven FXT CCVT cameras have been strategically installed within the smart farm. These cameras provide continuous streaming, capturing the growth and development of one-row tomato plants. There are eight rows of tomato plants within the smart farm. The deployment of multiple cameras ensures comprehensive coverage, allowing for detailed observation and analysis of each plant's status. The process of installation of cameras for tomato detection in a smart farm is shown in Picture 4 (a). The schematic illustration of the proposed strategy for detecting tomatoes and predicting overall productivity of the smart farm is shown in Picture 4 (b). The goal of this set up is to predict crop production prediction of the smart farm by detecting tomatoes in a row and multiplying with the number of tomato plant's row.
Picture 4. a) Camera installation process in a smart farm, b) schematic illustration of camera installation5.
A robust object detection model has been implemented to identify and classify tomatoes within the smart farm. The model is trained to recognize three distinct classes of tomatoes: green, pink, and red. This multi-class classification allows for a nuanced understanding of the ripening stages, enabling precise monitoring of the entire tomato crop. Upon successful detection and classification of tomatoes, the system undertakes
5 The Picture was prepared by the author using data obtained from experiments conducted in a greenhouse
an essential step in estimating the overall productivity of the smart farm. By multiplying the number of rows of tomato plants within the smart farm, the system derives an estimate of the total tomato yield. This calculated productivity metric provides valuable insights into the smart farm's efficiency and the potential harvest output. This experimental setup not only enables real-time tracking of tomato growth but also incorporates a sophisticated object detection model to categorize tomatoes based on their ripeness. The subsequent productivity estimation serves as a critical metric for assessing the success of smart farm operations. The implementation of this tomato detection system represents a pioneering approach to smart farm agriculture, promising enhanced monitoring capabilities and improved productivity assessment. The insights gained from this experimental setup contribute to the broader landscape of precision agriculture, where technology plays a pivotal role in optimizing crop management and resource utilization.
Tomato Dataseis: In this research, a comprehensive dataset has been curated to facilitate the training and evaluation of the tomato ripeness detection model. The dataset is a crucial component, encompassing a diverse range of images collected from various sources to ensure its representativeness and effectiveness in addressing the research objectives. A substantial portion of the dataset comprises images gathered from open-source repositories and publicly available datasets related to agriculture and food. These images offer a wide variety of contexts and scenarios, contributing to the robustness of the model. To enhance the authenticity of the dataset, real-world examples have been incorporated, capturing images of tomatoes in different stages of ripeness (green, pink, and red tomatoes) obtained through direct observation and collection. These images provide a more accurate representation of the challenges faced in real-world scenarios. A unique aspect of the dataset collection involves crowdsourcing data through mobile devices. Contributors are encouraged to share images of tomatoes, thereby expanding the dataset, and incorporating diverse perspectives.
Images in the dataset exhibit a wide range of dimensions, mirroring the variability encountered in practical applications. This diversity ensures the model's adaptability to different image sizes and resolutions. The dataset comprises a total of 424 images, carefully curated to strike a balance between sufficiency and diversity. The distribution of ripeness levels within the dataset ensures a comprehensive representation of the tomato ripening process. This meticulously collected dataset serves as a valuable resource for training and evaluating the proposed tomato ripeness detection model, contributing to the robustness and applicability of the research outcomes. Some samples of the tomato dataset are shown in Picture 5.
Picture 5. Samples from the tomato dataset6.
An 8:1:1 ratio is used to divide both datasets into training, validation, and test sets. Preprocessing techniques are performed on the training dataset. Preprocessing photos and unifying attributes, such as image size and color, can increase the number of samples by ensuring that parameters, such as image size, suit the needs of model training and reduce noise during image collecting. The more often used picture preparation techniques are resize, padding resize, and letterbox as well as image vector normalization. The most common picture transformation techniques are flipping, blurring, enhancing colors, enhancing edge detection, logarithmic transformation, and image denoising. In order to enhance the number of datasets, prevent overfitting of the model, ensure the dataset's availability, and accelerate model training in this study, a mixture of the aforementioned strategies has been utilized to modify the gathered photos.
Image calibration is accomplished using the Python-written LabelImg tool, which is a graphic image annotation tool. The training set and test set photos are annotated in YOLO format to acquire the txt files. One of them contains the target category names and the rest of them contain target frame positions. The target frame's coordinate location [xmin, xmax, ymin, ymax] represents the X and Y coordinate values of the upper left corner and lower right corner of the target frame, respectively. The [class, xcenter, ycenter, width, height] elements of the YOLO label data stand for the target box's category, the X and Y coordinates of the center point, and the width and height of the target frame, respectively.
To make it easier to test the home-made tomato, plant and pest image dataset, the image data is converted into data information, and the data configuration file and model configuration file are adjusted. In Picture 26, the labeling procedure is displayed.
6 The Picture was prepared by the author using data obtained from experiments conducted in a greenhouse
The Python-based LabelImg tool is utilized for image calibration, serving as a graphical image annotation tool. Both the training and test set images are annotated in YOLO format to generate corresponding text files. One of these files contains the category names, while the others contain the positional information of target frames. The coordinates [xmin, xmax, ymin, ymax] of the target frame denote the X and Y values of its upper left and lower right corners, respectively. The YOLO label data comprises [class, xcenter, ycenter, width, height], representing the category of the target box, its center coordinates, and its width and height.
To facilitate the testing of the custom dataset containing images of tomatoes, plants, and pests, the image data is converted into data information, and adjustments are made to the data configuration file and model configuration file. Picture 6 illustrates the labeling process.
Picture 6. Labeling process of tomato dataset7.
Performance Metrics: In this study, TensorBoard is set up to view the training process and dynamically monitor the model's operation and training state so that training times' effects on model performance and equipment conditions could be seen.
Let's quickly go over the evaluation metrics that are applied to the outcomes of the detection. Let's quickly define the following in order to do this:
- The situation in which the detector properly identifies an instance of class i as
7 The Picture was prepared by the author using data obtained from experiments conducted in a greenhouse
being a member of the same class is defined by a prediction designated as c(i, i). This can be viewed as a real plus.
- An inaccurate classification of an instance of class j as an instance of class i b y the detector is denoted by the prediction c(j, i). This could be considered a false pos itive instance.
- A prediction with the label c (i, j) designates the situation in which the detecto r misclassifies a member of class i as a member of class j. This could be considered a false negative instance.
From this, the precision of the detector associated with objects of class i is defined as:
p. = C(M) (1)
F1 ZjcUi) (1)
As a result, accuracy is defined as the ratio of the number of instances of a given class that have been correctly identified to all instances of that class that have been detected. Following is a definition of recall:
* = iM) (2)
Recall, then, is the ratio of the total instances of objects of class i available in the dataset to the number of times an item of class i has been successfully identified.
Let's keep in mind that the precision and recall definitions given only take class i into account. When there are many classes, the precision and recall are calculated as a weighted average over all classes, where the weight is typically the ratio of the number of examples in the dataset that belong to each class i to the total number of instances in the dataset. Precision and recall can be synthesized using the F1 score, defined as follows:
¥1 = 2 •— (3)
P+R v '
The mean average precision (mAP) is used to rate the network's performance. According to its definition:
mAP = llAPi (4)
Where APt is the area under the curve produced by the precision-recall plot for the detection of instances of class i, which is the average precision for the class i.
Performance of Optimized YOLOv5 on Tomato Dataset: Table 2 provides a comprehensive overview of the performance metrics for various YOLOv5 variants, measured in terms of mAP, precision (P), recall (R), and F1 score. From the table, YOLOv5l exhibits the highest mAP of 97.2 %, closely followed by the optimized YOLOv5s at 97.4 %. This metric reflects the model's proficiency in accurately localizing objects with a high degree of certainty. Moreover, the optimized YOLOv5s
model achieves the highest precision at 92.7%, indicating its ability to minimize false positives and accurately classify detected objects, while YOLOv5x demonstrates the highest recall at 96.2%, showcasing its capability to effectively identify the majority of relevant objects within the dataset. The optimized YOLOv5s model leads in F1 score at 93.93%, harmonizing precision and recall, showcasing its balanced performance in object detection. These performance metrics collectively highlight the nuanced strengths of each YOLOv5 variant. The optimized YOLOv5s model emerges as a standout performer, emphasizing its superiority in achieving a delicate balance between precision, recall, and overall object detection accuracy.
Table 2
A comprehensive overview of the performance metrics for various YOLOv5 variants and optimized YOLOv5s model8
According to Picture 7, a mAP of optimized YOLOv5s model is compared to standard YOLOv5s model. It is observed that the optimized model performance is
0.4 O.J 0.1 0.1 0
Epochs
—YOLOvSs —Opt.VOLOvSs
Picture 7. Comparison of standard and optimized YOLOv5s detection model9.
8 The table was prepared by the author
9 The Picture was prepared by the author based on comparison of standard and optimized YOLOv5s detection model
Several photos from the dataset have been selected for testing to evaluate the model's performance, and the detection performance of the optimized YOLOv5s and classic YOLOv5s models under various conditions is displayed in the figures below. Picture 8 displays tomato detection under several conditions such as overlapping, very small tomatoes. For example, the original YOLOv5s model cannot detect tiny tomatoes, while our optimized model detects them in the first column photos from left side. When it comes to overlapping conditions in the second and the last columns photos, the optimized model detects overlapped tomatoes successfully, while the original model skips the tomatoes in detecting. In addition, the original model also faces some challenges in classification. In the third column of the photos, it classifies the pink tomato as green tomato. In summary, when it comes to recognizing tiny and dense targets, the optimized YOLOv5s model outperforms the standard YOLOv5s, resulting in enhanced accuracy, detection, and identification. If a larger number of photos are considered, the proposed algorithm's accuracy could improve.
Picture 8. Identifying tomatoes under several challenging conditions10.
Figure 9 depicts the various performance indicators of the proposed optimized YOLOv5s model for 200 epochs during training and validation.
10 The Picture was prepared by the author based on comparison of standard and optimized YOLOv5s detection model
Picture 9. Performance metrics of optimized model
11
The optimized YOLOv5 model has been seamlessly integrated into the outdoor smart farm environment to address the primary objective of predicting the overall crop production within the tomato cultivation area. The intricate task of tomato detection involves the model processing images captured by strategically positioned cameras along a single row of tomato plants within the smart farm. Subsequently, the model diligently classifies the detected tomatoes into three distinct categories, namely green, orange, and red, as visually represented in Picture 10.
Picture 10. Tomato detection results in the outdoor smart farm12
11 The Picture was prepared by the author based on performance metrics of optimized model
12 The Picture was prepared by the author based on tomato detection results in the outdoor smart farm
The culmination of this detection process unfolds in the form of the crop production prediction page within the bespoke web application, a transformative tool for smart farm farmers. Figure 11 provides a visual representation of this predictive interface. Offering real-time insights into the state of tomato plants, the top frame of the web page serves as a dynamic window into the ongoing agricultural processes.
* c O SJ7C6I" a 53 O □ # ^
^ ^VPff ■ T RSIIAR-M «WW! ABOUT ill
Crop rnxluction l'ivilittion
Crop Prod uctlon Prediction Tomato ripeness per centres
* I
-^1 'MmM
urn n -ti-iv r. id rt «
t*ti :. trïi
Green tomato: 67% Orange Tomato: 11% Red Tomato: 22%
Picture 11. Crop production prediction page of the web application13
Within the crop production prediction frame, the artificial intelligence model contributes a wealth of statistical information to empower farmers. Notably, the left side of the frame hosts a line graph in blue, depicting the total number of tomatoes within the smart farm, while the percentage of tomatoes ready for harvesting is vividly highlighted in red. Meanwhile, on the right side of the frame, a bar chart meticulously delineates the distribution of total tomatoes across three distinct categories: green,
13 The Picture was prepared by the author based on Crop production prediction page of the web application
orange, and red. The prominence of the red segment within the bar chart offers farmers a quick and intuitive estimate of the overall crop production status within the smart farm. In essence, the deployment of the optimized YOLOv5 model coupled with the innovative web application interface not only augments real-time monitoring capabilities for farmers but also provides a comprehensive and visually intuitive platform for anticipating and understanding the broader dynamics of crop production within the smart farm.
CONCLUSION
In conclusion, this research introduces a pioneering approach to greenhouse productivity estimation through the optimization of the YOLOv5 object detection model for a custom tomato dataset. The study demonstrates the model's efficacy in accurately identifying and classifying tomatoes into green, pink, and red categories, providing a real-time assessment of the ripening process. The optimization efforts yield superior performance compared to the standard YOLOv5 model, showcasing its potential for widespread adoption in precision agriculture. The deployment of the optimized YOLOv5 model in a real-world greenhouse equipped with a systematic seven-camera array proves its practical utility. Extrapolating the results to estimate overall productivity across eight rows of tomato plants illustrates the scalability and reliability of the proposed methodology. The web application developed for real-time monitoring empowers greenhouse operators with valuable insights, including the percentages of tomatoes at different ripening stages.
This research contributes to the growing body of literature at the intersection of computer vision and agriculture, particularly in the context of greenhouse cultivation. The findings address current challenges in tomato detection, such as variations in lighting conditions, occlusions, and dynamic ripening stages. As precision agriculture continues to evolve, the optimized YOLOv5 model emerges as a valuable tool for enhancing resource efficiency and decision-making in greenhouse management. The success of this study not only furthers our understanding of how advanced computer vision models can be tailored for specific agricultural contexts but also opens avenues for future research in optimizing and extending such models for diverse crop types. As the agricultural industry embraces technology-driven solutions, this research contributes to the ongoing discourse on sustainable and efficient crop management practices.
REFERENCES
1. T. Dewi, P. Risma, and Y. Oktarina, "Fruit sorting robot based on color and size for an agricultural product packaging system," Bull. Electr. Eng. Informatics, vol. 9, no. 4, pp. 1438-1445, 2020, doi: 10.11591/eei.v9i4.2353.
2. Z. Tian, W. Ma, Q. Yang, and F. Duan, "Application status and challenges of machine vision in plant factory—A review," Inf. Process. Agric., vol. 9, no. 2, pp. 195-211, 2022, doi: 10.1016/j.inpa.2021.06.003.
3. N. Schor, S. Berman, A. Dombrovsky, Y. Elad, T. Ignat, and A. Bechar, "Development of a robotic detection system for greenhouse pepper plant diseases," Precis. Agric., vol. 18, no. 3, pp. 394-409, 2017, doi: 10.1007/s11119-017-9503-z.
4. I. Bechar, S. Moisan, E. P. I. Pulsar, and I. S. Antipolis-mediterranee, "Online counting of pests in a greenhouse using computer vision," VAIB 2010 - Vis. Obs. Anal. Anim. Insect Behav., no. August, pp. 1-4, 2010.
5. M. V. Giuffrida, "Leaf counting from uncontrolled acquired images from greenhouse workers".
6. S. Yang, L. Huang, and X. Zhang, "Research and application of machine vision in monitoring the growth of facility seedling crops," Jiangsu Agric. Sci, vol. 47, pp. 179-187, 2019.
7. Y. Ren, "Development of transplanting robot in facility agriculture based on machine vision," Dissertation, Zhejiang University, 2007.
8. H. Tian, T. Wang, Y. Liu, X. Qiao, and Y. Li, "Computer vision technology in agricultural automation —A review," Inf. Process. Agric., vol. 7, no. 1, pp. 1-19, 2020, doi: 10.1016/j.inpa.2019.09.006.
9. K. Lin, J. Chen, H. Si, and J. Wu, "A review on computer vision technologies applied in greenhouse plant stress detection," Commun. Comput. Inf. Sci., vol. 363, pp. 192-200, 2013, doi: 10.1007/978-3-642-37149-3_23.
10. Z. Li et al., "A high-precision detection method of hydroponic lettuce seedlings status based on improved Faster RCNN," Comput. Electron. Agric., vol. 182, no. October 2020, 2021, doi: 10.1016/j.compag.2021.106054.
11. Z. Wu, R. Yang, F. Gao, W. Wang, L. Fu, and R. Li, "Segmentation of abnormal leaves of hydroponic lettuce based on DeepLabV3+ for robotic sorting," Comput. Electron. Agric., vol. 190, no. August, p. 106443, 2021, doi: 10.1016/j. compag.2021.106443.
12. L. Xi, M. Zhang, L. Zhang, T. T. S. Lew, and Y. M. Lam, "Novel Materials for Urban Farming," Adv. Mater., vol. 34, no. 25, pp. 1-28, 2022, doi: 10.1002/adma.202105009.
13. G. Ares, B. Ha, and S. R. Jaeger, "Consumer attitudes to vertical farming (indoor plant factory with artificial lighting) in China, Singapore, UK, and USA: A multi-method study," Food Res. Int., vol. 150, no. November, 2021, doi: 10.1016/j.foodres.2021.110811.
14. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit., pp. 580-587, 2014, doi: 10.1109/CVPR.2014.81.
15. T. Li et al., "Tomato recognition and location algorithm based on improved YOLOv5," Comput. Electron. Agric., vol. 208, no. March, p. 107759, 2023, doi: 10.1016/j. compag.2023.107759.
16. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
17. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 779-788, 2016, doi: 10.1109/CVPR.2016.91.
18. J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," 2018, [Online]. Available: http://arxiv.org/abs/1804.02767
19. A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," 2020, [Online]. Available: http://arxiv.org/abs/2004.10934
20. Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, "YOLOX: Exceeding YOLO Series in 2021," pp. 1-7, 2021, [Online]. Available: http://arxiv.org/abs/2107.08430
21. J. Redmon and A. Farhadi, "YOLO9000: Better, faster, stronger," Proc. -30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517-6525, 2017, doi: 10.1109/CVPR.2017.690.
22. C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors," pp. 7464-7475, 2023, doi: 10.1109/cvpr52729.2023.00721.
23. W. Liu et al., "SSD: Single Shot MultiBox Detector BT - Computer Vision - ECCV 2016," B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Cham: Springer International Publishing, 2016, pp. 21-37.
24. M. P. Mathew and T. Y. Mahesh, "Leaf-based disease detection in bell pepper plant using YOLO v5," Signal, Image Video Process., vol. 16, no. 3, pp. 841847, 2022, doi: 10.1007/s11760-021-02024-y.
25. D. G. Lowe, "Object recognition from local scale-invariant features," Proc. IEEE Int. Conf. Comput. Vis., vol. 2, pp. 1150-1157, 1999, doi: 10.1109/iccv.1999.790410.
26. C. P. Papageorgiou, M. Oren, and T. Poggio, "General framework for object detection," Proc. IEEE Int. Conf. Comput. Vis., pp. 555-562, 1998, doi: 10.1109/iccv.1998.710772.
27. N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," Proc. - 2005 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, CVPR 2005, vol. I, pp. 886-893, 2005, doi: 10.1109/CVPR.2005.177.
28. J. Donahue et al., "DeCAF: A deep convolutional activation feature for generic visual recognition," 31st Int. Conf. Mach. Learn. ICML 2014, vol. 2, pp. 988996, 2014.
29. G. Yang et al., "Face Mask Recognition System with YOLOV5 Based on Image Recognition," 2020 IEEE 6th Int. Conf. Comput. Commun. ICCC 2020, vol. 1, no. January 2020, pp. 1398-1404, 2020, doi: 10.1109/ICCC51575.2020.9345042.
30. T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 936-944, 2017, doi: 10.1109/CVPR.2017.106.
31. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 8759-8768, 2018, doi: 10.1109/CVPR.2018.00913.
32. H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized intersection over union: A metric and a loss for bounding box regression," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 658-666, 2019, doi: 10.1109/CVPR.2019.00075.