Алгоритм обнаружения дефектов дорожного полотна на основе модели YOLOv8s

Цинь Хуну; Янь Янюн; Цзинь Цзои; Чье Ен Ун; Воронин В.В.

системный анализ, управление

и обработка информации,

ВЕСТНИК ТОГУ. 2024. № 2 (73)

статистика

УДК 004.942:625.8

DOI https://doi.org/10.38161/1996-3440-2024-2-51-62

Qin Hongwu, Yan Yangyong, Jin Zuoyi, Chye En Un, V. V. Voronin

THE IMPROVED ROAD DEFECT DETECTION ALGORITHM BASED ON YOLOv8s

Qin Hongwu - PhD, Professor, School of Electronic Information Engineering, Changchun University, Changchun, e-mail: [email protected] (China); Yan Yangyong - Master of Engineering; School of Electronic Information Engineering, Changchun University, Changchun, e-mail: [email protected] (China); Jin Zuoyi - Master of Engineering; School of Electronic Information Engineering, Changchun University, Changchun, e-mail: [email protected] (China); Chye En Un - Doctor of Technical Sciences, Professor of the Higher School of Cybernetics and Digital Technologies, Pacific National University, e-mail: [email protected]; Voronin V. V. - Doctor of Technical Sciences, Professor of the Higher School of Cybernetics and Digital Technologies, Pacific National University (Russia), email: [email protected].

Road defect detection is a key task to ensure road safety and repair damage in a timely manner. However, traditional manual inspection methods are inefficient and costly. To address such problems, a BL-YOLOv8 enhanced road defect detection algorithm is proposed. By integrating the BiFPN concept, the neck structure of the YOLOv8s model is reconstructed, reducing model parameters, computational load and overall size, and introducing a dynamic large convolution kernel attention mechanism (LSK-attention) to expand the model's receptive field and improve the accuracy of target detection. The experimental results show that on the road defect data set (KITTI), the number of parameters is reduced by 23.03 %, the amount of calculation is reduced by 5.85 %, and the average accuracy of [email protected] is increased by 2.1 %, verifying the effectiveness of this method for automatic road defect detection technology effectiveness.

Keywords: YOLOv8s, LSK-attention, BiFPN, road defects, image recognition

Introduction

Cracks are a common problem on road surfaces and are detrimental to road safety and driving conditions. Traditional manual inspections are time-consuming,

3Tl

laborious, and subject to subjective judgment. In contrast, multi-functional road inspection vehicles are equipped with sensors such as GPS, cameras, and lidar, which can easily detect road defects. But costs limit large-scale rollout. Therefore, it is of great significance to research and apply fast, efficient and accurate crack detection technology.

In recent years, deep learning methods have gradually replaced traditional target detection technologies and are divided into one-stage target detection methods and two-stage target detection methods. One-stage target detection methods, such as YOLO [1-4] and SSD [5], directly extract target locations and categories from images, which are simple and efficient. The two-stage target detection method, such as Faster R-CNN [6] and Mask R-CNN [7], first extracts candidate frames and then performs classification and position regression on them, which has higher accuracy. Wang [8] et al. proposed an AF-FPN network based on YOLOv5 for multi-scale traffic signal detection, and analyzed the impact of different factors on detection performance. Arya [9] and others used the Cascade R-CNN model and data augmentation technology to achieve an F1 score of 0.635 in the Global Road Damage Detection Challenge. Xie [10] and others proposed a traffic sign detection algorithm based on YOLOv8s, using Pconv convolution and designing the C2faster module to achieve a lightweight network structure while maintaining network accuracy.

Although existing research has contributed to road damage detection, there is still room for improvement in terms of accuracy and speed. Due to the advantages of the YOLO algorithm in detection accuracy and speed, the YOLOv8s [11] framework was chosen to optimize the model to improve the accuracy of the algorithm.

YOLOv8 network structure

YOLOv8 is a fast single-stage object detection algorithm, including input segment, trunk, neck and output segment. The input segment performs mosaic data augmentation, adaptive anchor point calculation, and adaptive grayscale filling. The backbone network and neck module form the core, and the input image is processed through multiple Conv and C2f modules to extract feature maps at different scales. The improved C2f module integrates the advantages of the ELAN structure, reduces the standard convolutional layer and makes full use of the Bottleneck module to enhance the gradient branch, retaining lightweight features and capturing richer gradient flow information. The output feature maps are processed by the SPPF module, pooling with different kernel sizes is used to combine the feature maps, and then passed to the neck layer, as shown in Fig. 1.

Fig. 1. C2f module structure diagram

The neck layer of YOLOv8 adopts the FPN+PAN structure to enhance the feature fusion ability of the model. This structure combines high-level and low-level feature maps using upsampling and downsampling techniques to facilitate the transfer of semantic and localization features. Through this method, the network can better integrate the features of objects of different scales, thereby enhancing its detection performance of objects of different scales.

Improved YOLOv8s model

An attention mechanism is introduced to improve model performance. After investigating various attention mechanisms (CAM, SAM, etc.), we chose LSK-atten-tion to integrate into the YOLOv8s model. After combining with BiFPN, it not only reduces the computational complexity, but also improves the detection accuracy, which provides a practical basis for the model. Apps provide a more feasible solution.

Feature pyramid optimization

In YOLOv8s, feature maps are divided into B1-B5 (backbone network), P3-P4 (FPN) and N4-N5 (PAN). The original model uses the PAN-FPN structure, which is an optimized version of the traditional FPN structure, but it may cause positioning information to be lost during fusion. Therefore, PAN-FPN introduces a bottom-up PAN structure to make up for this shortcoming. In YOLOv8, the learning of positioning features is enhanced through the fusion of P4-N4 and P5-N5, achieving complementary effects. The specific structure is shown in Fig. 2. Although the PAN-FPN structure enriches semantic and positioning information, there are still problems of insufficient processing of large-scale feature maps and loss of original information, so there is still room for improvement.

BEGTHHK TOry. 2024. № 2 (73)

Fig. 2. Schematic diagram of the PAN-FPN structure in YOLOv8

In order to improve detection accuracy, Bidirectional Feature Pyramid Network (BiFPN) is introduced to redesign the feature fusion of YOLOv8. In road defect detection, the detection accuracy decreases due to limited feature information of smaller cracks. Through BiFPN, high-resolution features are fully utilized to expand the receptive field of the model. The specific implementation method is as follows: no additional processing is performed on the feature map of a single input path; when fusing the feature maps of two input paths, it is assumed to have the same scale and the spatial information of the feature map is improved, as shown in Fig. 3.

Fig. 3. Improved neck structure

Dynamic large convolution kernel spatial attention mechanism

In the field of computer vision, the attention mechanism is highly favored, and there are many excellent implementation methods, such as SE, CBAM and BAM. LSK-attention is a new type of attention mechanism that adaptively aggregates feature information of large kernels in space through a dynamic large convolution kernel selection mechanism. Compared with other methods, LSK-attention can extract and utilize feature information more effectively, thereby improving the performance of the model in target detection tasks. The structural comparison of the two methods is shown in Fig. 4.

Fig. 4. Comparison of LSKNet and SKNet architectures; (a) (LSKNet); (b) (SKNet)

LSK-attention generates features with a wide receptive field through multiple large convolution kernels, as shown in Fig. 5, and uses depth-separable convolution to reduce model parameters. It dynamically selects appropriate convolution kernels based on the local information of the input feature map, adapts to different target types and backgrounds, and adaptively distributes weights through a spatial selection mechanism to increase attention to the relevant spatial areas of the detection target, thereby improving the success rate of target detection.

(P) Channel Concatenation (+) Element Addition (x) Element product (S) Sigmoid

f - ■ V

_

Fig. 5. LSK model

In order to dynamically select an appropriate spatial kernel, the input feature map is divided into multiple sub-feature maps. Subsequently, various convolution kernels of different sizes are applied to each sub-feature map, thereby generating multiple output feature maps. Then, these sub-output feature maps are concatenated, as shown in equation (1), and this concatenation produces an output feature map with increased channel dimensions.

.......;Ui\ (1)

The cascaded feature map performs average pooling and maximum pooling operations along the channel dimension to extract spatial relationship descriptors,

BECTHHK TOry. 2024. № 2 (73)

namely SAavg and SAmax. The specific operation is as shown in equation (2). Subsequently, after the concatenation of SAavg and SAmax, they are converted into spatial attention maps using convolutional layers, ensuring that they have the same number of depth convolutions n. This conversion is mathematically represented by equation (3).

(2)

SAag = P\U I, SAmx = Pax I U

A

SA = f ([SAaVg ; SAmax ]) .

(3)

The spatial selection weights for each depth convolution are obtained by applying a sigmoid activation function to each spatial attention map. The weighted depth convolution feature map is then obtained by multiplying the weights and the elements of the corresponding depth convolution. Finally, a convolutional layer is employed to fuse these feature maps and produce the final attention features. This process is demonstrated by equations (4) and (5).

SAi = Sigmoid(S4 ) >

N ~ ~

s = f (2 (SA -u i ))■

(4)

(5)

Experimental results and analysis Dataset

The KITTI dataset is widely used in autonomous driving and computer vision research. It was created by KIT and Toyota Institute of Technology and contains a variety of road scenes and perspectives. The data set contains 8 categories of labels, such as cracks, potholes, etc., as shown in Figure 6. In this study, Pedestrian and Person_sitting are combined into one category. The data set contains a total of 9053 images and labels, which is divided into training set, test set and verification set in a ratio of 7:2:1.

Fig. 6. Dataset categories

¿=1

Experimental device

The experimental environment was selected: Windows operating system, python 3.9, Inter(R) Core(TM) i9-10980XE CPU @ 3.00GHz, NVIDA RTX A6000 for training, operating system Windows 10, and the model was trained for a total of 200 epochs. In order to ensure the accuracy of experimental results, no pre-trained weights were used during model training.

Evaluation indicators

In the experiment, the following evaluation indicators were used to evaluate the performance of the model: precision (P), recall (R) and average precision (mAP) at the IOU threshold of 0.5. Precision (P): Precision is the proportion of correctly predicted target objects among all prediction results. The calculation results are shown in formula (6).

„ . . TP

Precision =-. (6)

TP + FP

Recall is the proportion of correctly predicted target objects to all actual target objects. The calculation results are shown in formula (7).

TP

Recall =-. (7)

TP + FP

AP refers to the area under the precision-recall curve of a specific category, which is used to evaluate the average accuracy of the category. The calculation formula is as follows: formula (8):

AP = f PdR. (8)

Jo

Average Precision (mAP): mAP is the average of AP values across all categories. It is used to provide a comprehensive assessment of model performance. [email protected] refers to the average accuracy of all classes under the IOU threshold of 0.5, which is widely used in data set performance evaluation. The calculation results are shown in formula (9).

y AP

mAP = - . (9)

N (class)

Improved algorithm evaluation experiments

In order to enhance the verification of the universality of the BL-YOLOv8 model, a specific performance test of the model was conducted using real road images. In order to enhance the verification of the universality of the BL-YOLOv8 model, a specific performance test of the model was conducted using real road im-

BECTHHK Tory. 2024. № 2 (73)

ages. The image was taken using an iPhone 14pro device with camera settings including ISO value 80 and aperture f/1.78. The captured image size is 3024 * 4032 pixels. Multiple sets of sample images were captured to evaluate the performance of the model. The specific test results are shown in Fig. 7.

Fig. 7. Improved model detection effect

The detection results shown in Fig. 7 demonstrate the effectiveness of the BL-YOLOv8 model by successfully identifying most road defects. This also highlights the strong generalization ability and robustness of the model. Nonetheless, some

BECIHHK TOry. 2024. № 2 (73)

missed and false detections still occur. For example, in the inspection image in the upper left corner of the right image in Fig. 7, no longitudinal cracks are detected. This can be attributed to unclear boundaries of road cracks in some captured images, resulting in detection bounding boxes that are only partially aligned with the crack boundaries.

Comparative experimental analysis

In order to further prove the performance of the improved algorithm in road scene target detection, comparative experiments were conducted with mainstream models on the KITTI data set. The results are shown in Table 1.

Table 1

Model KITTI([email protected]) Para (M) GFLOPs

Faster R-CNN 0.563 27.31 602.95

YOLOv5s 0.559 7.02 13.8

YOLOv8s 0.591 10.16 25.6

BL-YOLOv8s 0.612 7.82 24.1

Experimental results show that the improved YOLOv8s algorithm as shown in Table 1 is significantly better than the classic detection algorithms Faster-RCNN and YOLOv5s on the data set. The improved algorithm performs better than the other three models with a smaller number of parameters, and is not inferior to the more classic YOLOv8s model. Furthermore, although the original model achieved a [email protected] score of 59.1 % on the KITTI dataset, the improved model achieved an additional improvement of 2.1 %. Overall, the improved model shows higher performance. Compared with the original YOLOv8s model, the number of parameters is reduced by 23.03 %, and the amount of calculation is reduced by 5.85 %.

BECTHHK Tory. 2024. № 2 (73)

Fig. 8. Comparative numerical chart

As shown in Fig. 8, the improved BL-YOLO model is no matter in Precision (precision), Recall (recall rate), [email protected] (average precision, IOU threshold is 0.5) and [email protected]:0.95 (average precision, IOU threshold is from 0.5) Interpolation within the range of 0.95) has good performance, and the numerical comparison has good results with other models.

Conclusion

Accurate detection of road defects is crucial for implementing smart road maintenance. This paper proposes a road defect detection model based on the enhanced version of YOLOv8s. In this method, the BiFPN method is used to reconstruct the Neck structure of the original model to reduce the model size and enhance its feature fusion ability. Finally, the LSK attention mechanism using dynamic large convolution kernels is introduced to improve the detection accuracy of the model. Experimental results show that the model has effective performance in detecting road defects from images captured by drones and on-board cameras. This proves that the model is suitable for various detection scenarios. The detection accuracy of BL-

BECIHHK Tory. 2024. № 2 (73)

YOLOv8 is better than other mainstream object detection models (such as Faster R-CNN, YOLOv5s, and YOLOv8s). 4.9 %, 5.3 % and 2.1 % respectively. Furthermore, the BL-YOLOv8s model improves detection accuracy while reducing computational and parameter requirements. The amount of parameters is reduced by 23.03 %, and the amount of calculation is reduced by 5.85 %.

Acknowledgments

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

This work were supported by the Project of Jilin Provincial Development and Reform Commission (2023C042-4), the Innovation and Entrepreneurship Talent Funding Project of Jilin Province (2023RY17) and the project of Jilin Provincial Education Department (SJZD23-01).

References

1. Detection of Polystriata based on FlowCam images and YOLOv3 deep learning model / Liu Yang, Kong Fanzhou, Yu Rencheng et al. // Marine Science. 2024. № 1-9.

2. Bochkovskiy A., Wang C. Y., Liao H. Y. M. Yolov4: Optimal speed and accuracy of object detection // arXiv preprint arXiv. 2020.

3. YOLOv6: A single-stage object detection framework for industrial applications // arXiv preprint arXiv. 2022. № 9.

4. Wang C. Y., Bochkovskiy A., Liao H. Y. M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors // Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023. P. 74647475.

5. UV image intensifier field of view defect detection method based on improved SSD algorithm / Ding Xiwen, Cheng Hongchang, Yuan Yuan et al. // Journal of Ordnance Industry. 2024. Vol. 16, № 1.

6. Aircraft target detection in remote sensing images based on transfer learning and improved Faster-RCNN / Zhou Shaohong, Fang Xinjian, Liu Xinyi et al. // Mechanical and Electrical Engineering Technology. 2024. Vol. 8, № 1.

7. Concrete disease instance segmentation based on improved mask-region con-volutional neural network / Huang Caiping, Xie Xin, Zhou Yongkang et al. // Bridge Construction. 2023. № 53. P. 63-70. DOI: 10.20051/j.

8. Wang K., Liu M., Ye Z. An advanced YOLOv3 method for small-scale road object detection // Applied Soft Computing. 2021. № 112.

9. Improved YOLOv5 network for real-time multi-scale traffic sign detection / Wang J., Chen Y., Dong Z. et al. // Neural Computing and Applications. 2023. № 35. P. 7853-7865.

10. Xie Jing, Deng Yueming, Wang Runmin. Improved YOLOv8s traffic sign detection algorithm // Computer Engineering. 2024. Vol. 25, № 3.

11. NVW-YOLOv8s: An improved YOLOv8s network for real-time detection and segmentation of tomato fruits at different ripeness stages / Wang A., Qian W., Li A. et al. // Computers and Electronics in Agriculture. 2024. Vol. 19, № 2.

Заглавие: Алгоритм обнаружения дефектов дорожного полотна на основе модели YOLOv8s

Авторы:

Цинь Хуну - Чанчуньский университет, Чанчунь, КНР; Янь Янюн - Чанчуньский университет, Чанчунь, КНР; Цзинь Цзои - Чанчуньский университет, Чанчунь, КНР;

Чье Ен Ун - Тихоокеанский государственный университет, Хабаровск, Российская Федерация;

Воронин В.В. - Тихоокеанский государственный университет, Хабаровск, Российская Федерация.

Аннотация: Обнаружение дефектов дорожного полотна является ключевой задачей для обеспечения безопасности дорожного движения и своевременного устранения повреждений. Однако традиционные методы ручного контроля неэффективны и дорогостоящи. Для решения таких проблем предлагается усовершенствованный алгоритм обнаружения дорожных дефектов BL-YOLOv8. Путем интеграции концепции BiFPN реконструируется структура модели YOLOv8s, уменьшая при этом количество параметров модели, снижая вычислительную нагрузку и повышая точность обнаружения дефектов. Результаты экспериментов показывают, что в наборе данных о дефектах дорожного полотна количество параметров сокращается на 23,03 %, объем вычислений уменьшается на 5,85 %, а средняя точность по тАР@0,5 увеличивается на 2,1 %, что подтверждает эффективность данного метода для автоматического обнаружения дефектов.

Ключевые слова, модели YOLOv8s, алгоритмы детектирования объектов BiFPN, распознавание дефектов дорожного полотна.

Алгоритм обнаружения дефектов дорожного полотна на основе модели YOLOv8s Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Цинь Хуну, Янь Янюн, Цзинь Цзои, Чье Ен Ун, Воронин В. В.

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Цинь Хуну, Янь Янюн, Цзинь Цзои, Чье Ен Ун, Воронин В. В.

THE IMPROVED ROAD DEFECT DETECTION ALGORITHM BASED ON YOLOv8s

Текст научной работы на тему «Алгоритм обнаружения дефектов дорожного полотна на основе модели YOLOv8s»