**1. Introduction**

The quantitative gathering of information on the growing status of fruit trees using current technology facilitates the digital management of orchard production and enhances the precision of orchard production management [1]. Flowering information monitoring is one of the basic techniques for digital orchard management, and it is extensively utilized for orchard flower thinning, pest and disease control, and other management operations. Pruning and intercutting are required to obtain more significant economic returns in the apple-growing industry [2]. In the early stages of apple growth, proper flower and fruitlet thinning may increase the fruit weight per fruit and the blooming yield [3]. Current flower thinning methods mainly include manual thinning [4], chemical thinning [5–7], and mechanical thinning [8].

Traditional flowering monitoring is achieved based on human observations of specific fruit trees at specific times. That is, experts go into the orchard to randomly select a few fruit trees and estimate the flowering state with the eye. After comprehensive consideration,

**Citation:** Zhou, X.; Sun, G.; Xu, N.; Zhang, X.; Cai, J.; Yuan, Y.; Huang, Y. A Method of Modern Standardized Apple Orchard Flowering Monitoring Based on S-YOLO. *Agriculture* **2023**, *13*, 380. https://doi.org/10.3390/ agriculture13020380

Academic Editors: Cheng Shen, Zhong Tang and Maohua Xiao

Received: 8 January 2023 Revised: 31 January 2023 Accepted: 2 February 2023 Published: 4 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the overall flowering state of the orchard is obtained. Thinning after 28 days of bloom is ideal for obtaining larger, high-quality Fuji apples [3,9]. However, modern standardized orchards often have large areas and variability in the flowering times of fruit trees in different regions. It is difficult to dynamically adjust orchard flower thinning time and measures for specific fruit tree flowering information, affecting the efficiency and accuracy of modern standardized orchard flower thinning management decisions. Consequently, there is an urgent need for a method that can monitor the various growth stages of apple flowers and quantify the flowering intensity, establishing the groundwork for real-time monitoring utilizing Internet of Things (IoT) technology.

The current apple blossom monitoring methods have not been effectively studied, and most of them have been carried out in simple experimental environments, i.e., with suitable light, shooting angle, and shooting distance, achieving close to 100% detection results. Using closer imaging distances, a study on stamens in fully open flowers [10] disregarded the predictive effect of early buds and semi-open flowers for fully open flowers and found they were only appropriate for close detection. Other studies that have grouped all stages of flowers into one category for detection, even for flower clusters [11,12], have ignored the interaction effects between the flowers at different growth stages and could not accurately monitor the complete flowering process. The division of flowers into three stages of detection [13] ignored the end-flowering stage as a marker of the end of the flowering stage for determining the flowering stage of fruit trees. Other studies have divided apple flowers into 6–8 stages for detection [14,15], devising even more categories. However, the similarity between flowers at different growth stages elevates the cost of data annotation and the inability to count the number after clustering detection in the same category.

Images obtained from closer distances with a high proportion of apple blossom pixels at various stages are simpler to recognize and perform better. However, with the development of IoT monitoring devices, research based on high-resolution images of whole fruit trees acquired at a distance has become an inevitable trend. Current studies have obtained more complete images due to the long imaging distance, such as vehicle-based [16] and uncrewed aerial vehicles [17,18]. However, the tiny area of individual flowers makes the variability between flowers at different growth stages low, and only fully open flowers or even flower clusters are recognized as detection objects. In addition, none of the above studies have observed the entire growth cycle of apple blossoms or tested the models under different weather conditions. The detection models obtained may not be suitable for a wide range of weather conditions. The key to the above problem is that the current detection algorithm cannot effectively detect tiny pixel flowers in high-resolution apple tree images, let alone monitor the complete growth process in complex weather.

Convolutional neural networks are the standard model in computer vision. The related models are categorized into two groups based on whether they directly implement the classification and localization process: the Faster RCNN [19–21] series of two-stage algorithms, the SSD [22], and the YOLO series [23,24] of one-stage algorithms. The current apple flower detection algorithms are primarily separated into mask-based semantic segmentation and box-based object detection. Studies using semantic segmentation algorithms in apple blossom detection have included DeepLab-ResNet [11], Mask R-CNN [12,13], and fully convolutional neural networks [16]. Although these algorithms can segment flowers, they cannot count the number of flowers and are less effective in detecting large aggregations of flowers. The box-based apple flower detection algorithm can count the number of flowers and enable further data analysis. In work using this type of algorithm, the YOLO family, especially YOLO v4 [25], has been widely improved and achieved better detection results [10,15,26,27]. However, these studies have only examined flowers at certain times, lacking the monitoring of the whole flowering process and quantitative analysis of the flowers. Therefore, a method is needed that can accurately detect tiny apple blossoms in high-resolution images and enable multi-stage flower monitoring in the open world.

As the most advanced model in the widely used YOLO family, YOLOX [28] has superior detection performance and has been effectively implemented for similar intensive detection applications [29,30]. Although CNN models, including YOLOX, have a long history of success in target detection using translation invariance and local correlation, CNN has a restricted field of vision, making it challenging to gather global information. In contrast, Transformer does not have translational invariance and local correlation but can capture long-range dependencies. So Vision Transformer [31] performs better than pure convolutional models for large datasets, especially when massive datasets can be obtained through IoT technology.

Since the introduction of Vision Transformer, many works have tried combining CNN and Transformer to motivate the network to inherit the advantages of CNN and Transformer and retain the global and local features to the maximum extent. As a landmark work, Swin Transformer [32], with shifted windows as a prominent feature, was created. With self-attention at its foundation, Swin Transformer gathers global contextual information to establish long-range dependency on targets and extract more robust features, demonstrating the potential to replace traditional convolutional networks as the new backbone network in computer vision.

In order to achieve information monitoring of the complete flowering process using IoT technology, research based on high-resolution images of apple trees taken in complex weather is essential. However, such images are not only tough to obtain, but also the typical characteristics, such as a complex and changeable environment, a tiny proportion of flower pixels, and hazy texture and color detail information, provide obstacles for flower detection and monitoring. This study took high-resolution images of apple blossoms at the complete growth stage in the open world as the research object and used the Slicing Aided Hyper Inference (SAHI) algorithm to generate mixed datasets containing global and local information. Then an S-YOLO model was designed based on Swin Transformer, achieving the accurate detection of apple blossoms at four growth stages. An analysis model for the number and number share of apple blossoms at each stage was established, realizing the flowering intensity and flowering monitoring of the orchard or even specific fruit trees. This work gives further theoretical and technological support for monitoring orchard flowering growth using IoT technology.

#### **2. Materials and Methods**
