Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming

Cakic, Stevan; Popovic, Tomo; Krco, Srdjan; Jovovic, Ivan; Babic, Dejan

doi:10.3390/app15073663

Open AccessArticle

Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming

by

Stevan Cakic

^1,*

,

Tomo Popovic

¹

,

Srdjan Krco

^1,2

,

Ivan Jovovic

¹

and

Dejan Babic

¹

Faculty for Information Systems and Technologies, University of Donja Gorica, Oktoih 1, 81000 Podgorica, Montenegro

²

DunavNET, Bul. Oslobodjenja 133/2, 21000 Novi Sad, Serbia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3663; https://doi.org/10.3390/app15073663

Submission received: 28 February 2025 / Revised: 19 March 2025 / Accepted: 21 March 2025 / Published: 27 March 2025

(This article belongs to the Special Issue Applied Computer Vision in Industry and Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

This research explores the role of synthetic data in enhancing the accuracy of deep learning models for automated poultry farm management. A hybrid dataset was created by combining real images of chickens with 400 FLUX.1 [dev] generated synthetic images, aiming to reduce reliance on extensive manual data collection. The YOLOv9 model was trained on various dataset compositions to assess the impact of synthetic data on detection performance. Additionally, automated annotation techniques utilizing Grounding DINO and SAM2 streamlined dataset labeling, significantly reducing manual effort. Experimental results demonstrate that models trained on a balanced combination of real and synthetic images performed comparably to those trained on larger, augmented datasets, confirming the effectiveness of synthetic data in improving model generalization. The best-performing model trained on 300 real and 100 synthetic images achieved mAP = 0.829, while models trained on 100 real and 300 synthetic images reached mAP = 0.820, highlighting the potential of generative AI to bridge data scarcity gaps in precision poultry farming. This study demonstrates that synthetic data can enhance AI-driven poultry monitoring and reduce the importance of collecting real data.

Keywords:

computer vision; deep learning; digital farm management; generative artificial intelligence; smart farms; synthetic data

1. Introduction

To optimize production while ensuring animal welfare, modern poultry farming requires advanced digital solutions that streamline operations and mitigate risks such as disease outbreaks and mortality [1]. Smart agriculture is transforming traditional farming by integrating machine learning (ML) models to improve efficiency, sustainability and resource management. In poultry farming, the precise monitoring of livestock health, growth, and anomalies is essential for optimizing production. However, challenges such as variable farm conditions, data scarcity, and labor-intensive monitoring hinder effective automation. To address this, computer vision and generative artificial intelligence (GenAI) are being utilized to enhance animal detection and tracking, enabling more accurate and scalable farm management. As AI-driven solutions advance, they provide farmers with data-driven insights and automation, reducing costs and improving productivity in modern poultry farming, including chicken counting, growth assessment, and automated mortality detection [2]. These solutions leverage computer vision and edge AI to analyze images from on-farm cameras, enabling real-time monitoring and early intervention. Additionally, integrating IoT sensors allows for the continuous tracking of environmental conditions such as temperature, humidity, CO₂, and ammonia levels, which are vital for poultry health. The global demand for food is projected to rise by 60% between 2010 and 2050, with poultry production playing a crucial role in meeting this need [3].

By automating monitoring and analysis, AI can significantly reduce labor demands, improve feed efficiency and lower mortality rates. A previously conducted study investigated the application of deep learning for real-time monitoring in poultry farms using edge AI devices equipped with cameras [4]. AI-based models were developed for chicken detection and segmentation, enabling automated functions such as counting, mortality detection, and growth assessment. The research utilized Faster R-CNN (Regional Convolutional Neural Network) architectures, with AutoML employed to identify the most suitable model for the given dataset. The optimized models achieved high accuracy (mAP = 85% for object detection and mAP = 90% for instance segmentation). These models were deployed on edge AI devices and evaluated in real poultry farm environments, demonstrating their potential for early disease detection and automated farm management. The findings highlighted the effectiveness of integrating computer vision (CV) and the Internet of Things (IoT) in poultry farming, though further dataset expansion and model refinement were identified as necessary for improved performance.

Recent advancements in object detection have significantly improved the YOLO (You Only Look Once [5]) family of models, compared to earlier frameworks like Detectron2 with models such as Faster R-CNN [6] and Mask R-CNN [7]. The YOLO architecture enables real-time processing with high precision, especially with new versions [8]. YOLO-based models offer superior inference speed and real-time processing capabilities, making them highly suitable for applications requiring low-latency detection [9,10,11]. Recent evaluations demonstrate that YOLOv8 outperforms two-stage detectors such as Faster R-CNN and Mask R-CNN in agricultural and industrial environments, achieving higher mAP scores with reduced computational cost [10,11]. Additionally, studies show that optimized YOLO models can maintain competitive accuracy while operating at significantly higher FPS (frames per second) compared to traditional architectures [9], further reinforcing their advantage in real-time object detection scenarios. Today, various YOLO models are available, each offering slightly different performance characteristics (Figure 1). The most recent version is YOLOv12 [12]. While newer YOLO versions often introduce architectural improvements, they do not necessarily guarantee better accuracy or efficiency across all applications [13]. Several factors influence the performance of different YOLO versions, including dataset characteristics, object complexity, model size and hardware compatibility. In some cases, earlier YOLO versions may outperform newer ones due to their lighter architectures and optimized processing speeds for specific tasks. Since the primary objective of this study was to analyze how synthetic data influence model accuracy, the selection of the latest YOLO model was not a critical factor in the experimental design. In this study, YOLOv9-e was chosen as a good representative model.

Having a good predictive deep learning model requires a large amount of data, which is a well-known fact. Manually collecting and annotating data can be a very costly process during project implementation. In parallel to advancements in object detection, generative artificial intelligence is emerging as an important tool across various fields. One of the sectors where generative AI can be applied is agriculture, particularly in enhancing productivity and sustainability. Technologies like Digital Twin (DT) and generative AI can simulate farming conditions, supporting better decision-making and resource use [15]. One of the applications is crop disease diagnosis, where Generative Adversarial Networks (GANs) create synthetic images to improve detection models. This helps address data limitations and enhances early disease identification. Yield prediction and plant monitoring also benefit from AI, enabling farmers to estimate production and manage resources more effectively [16]. Generative AI also supports data enrichment by filling gaps in agricultural datasets, improving insights for farmers and policymakers. By generating synthetic data for soil analysis and climate trends, AI models become more adaptable to different farming conditions [17,18]. One of the main focuses of our study is the generation of highly realistic images, comparable to those captured in real life. FLUX.1 [19] demonstrates advantages over other state-of-the-art text-to-image models like Ideogram2.0, DALL-E 3, Stable Diffusion 3, Midjourney, excelling in structured output generation, realism and domain-specific tasks such as scientific and medical imaging [20]. While Midjourney and Stable Diffusion 3 might generate aesthetically appealing images, FLUX.1 is better at maintaining logical consistency, structured layouts and accurate text generation, making it more versatile for applications beyond artistic renderings.

In this paper, we explore the feasibility of developing a highly accurate predictive AI model for poultry monitoring using synthetic data. We investigate how synthetic data can be effectively combined with real data to enhance deep learning model performance while reducing reliance on extensive real-world image collection. This study was conducted on a single farm under controlled conditions, which will be further explained in the following sections. The key contributions of this study are as follows:

Developed a hybrid dataset combining real and synthetic images generated by FLUX.1 to enhance AI-driven poultry monitoring;
Evaluated the impact of synthetic data on YOLOv9-e model accuracy, demonstrating its feasibility in reducing reliance on real-world data;
Implemented an automated annotation pipeline using Grounding DINO and SAM2, significantly reducing manual labeling effort;
Analyzed different dataset compositions to determine the optimal balance between real and synthetic images for object detection in poultry farming;
Provided a practical framework for integrating generative AI into smart agriculture applications, ensuring model generalization and efficiency;
Demonstrated that a combination of real and synthetic images can achieve comparable accuracy to larger, real-only datasets, validating the use of synthetic data augmentation.

The paper is structured as follows: Section 2 provides an overview of related work, Section 3 describes the materials and methods, including dataset preparation and experimental setup, Section 4 presents the results and discussion, and Section 5 concludes with key findings and future research directions.

2. Related Work

Modern poultry farming increasingly incorporates deep learning techniques to automate different tasks such as bird detection, animal counting, behavior analysis, mortality detection, etc. Deep learning models have improved automatic poultry counting, addressing challenges like occlusions and high-density environments, making farm management more efficient [21,22,23,24,25]. Behavior and movement analysis enables the early detection of anomalies, helping to monitor poultry welfare and identify potential health issues or stress factors [26,27,28]. These methods contribute to optimizing farm operations and improving animal well-being. Furthermore, deep learning is used for detecting deceased animals, which is crucial for disease prevention and biosecurity measures. AI-driven detection systems help automate this process, ensuring timely intervention and reducing manual labor requirements [29]. By integrating these solutions, poultry farming can enhance efficiency, reduce costs, and improve overall livestock management. Object detectors like YOLO and R-CNN variants have been widely applied in broiler and layer house environments [30,31]. YOLO-based models in particular have shown high accuracy in detecting chickens and even classifying their health status. For example, a thermal YOLOv8 model achieved a mAP of 98.8% in identifying broilers and pathological signs under challenging lighting [30]. The combination of YOLOv8 with DeepSORT tracking to monitor cage-free hens reached 94% multi-object tracking accuracy and detecting abnormal behaviors like smothering and piling [32]. These one-stage detectors offer a notable speed advantage: an augmented YOLO model ran at 37 FPS with 0.84 mAP, a 29% accuracy improvement over prior networks, far outpacing two-stage approaches [33]. Traditional two-stage detectors (e.g., Faster R-CNN) have also been used for poultry. Improved Faster R-CNN was used to detect stunned broilers, attaining 98% accuracy in poultry processing lines [34]. However, such R-CNN models are computationally heavier, often achieving only a few FPSs, whereas YOLO variants enable near real-time monitoring. For finer analysis, Mask R-CNN has been explored for segmenting birds and behaviors. Another example is with Mask R-CNN used to recognize the preening behavior of hens, reporting 94% accuracy and robust recall/specificity in identifying individual hens’ actions [31]. This highlights that both one-stage (YOLO) and two-stage (Faster/Mask R-CNN) frameworks can successfully detect poultry and their activities, though YOLO-based methods are generally favored for real-time farm deployment due to their speed and adequate precision.

One of the challenges in applying deep learning to poultry monitoring is data scarcity—collecting and labeling farm images is labor intensive and constrained by seasonality and environmental variability [35]. To address this, researchers have turned to generative AI for augmenting agricultural datasets. GAN-based data augmentation has become a focal approach in agri-vision research, yielding more diverse and balanced training images [35]. Using a GAN–MAE pipeline to generate synthetic chicken face images demonstrated improvement in detection accuracy by over 29% (mAP 0.84) compared to models without augmentation [33]. Similarly, a systematic review of GANs in agriculture highlighted substantial improvements in image classification and detection tasks with the addition of synthetic images [36]. Beyond GANs, diffusion models are emerging as powerful data generators. Recent work in the plant domain shows diffusion models producing highly realistic disease images, outperforming GANs in fidelity [37]. This suggests a strong potential to apply diffusion-based augmentation in poultry vision, although it remains largely untapped in animal agriculture to date [35]. Innovative strategies have also been proposed to generate entire synthetic datasets. For instance, a study on edge-case scenarios trained a tomato disease detector solely on iteratively refined synthetic images, achieving competitive performance on real data. This approach could be adapted to poultry farming by simulating flock images under different conditions [38]. Another study presents a comprehensive survey on the role of synthetic data in computer vision, emphasizing how GANs can generate realistic imagery that preserves essential features such as feather texture, lighting variations, and occlusions. The study highlights that well-trained GANs can reduce the dependency on costly real-world data collection, making them a practical tool for scaling deep learning applications in agricultural monitoring [39]. In another research, authors explored the effectiveness of blending synthetic and real data for object detection and classification tasks. Their findings suggest that incorporating 30–50% GAN-generated images into training datasets can improve model generalization. However, the study also underscores the importance of domain adaptation techniques, as models trained on purely synthetic data may struggle with real-world variability if proper calibration is not applied [40].

Most existing related studies have demonstrated the value of synthetic data in agriculture, primarily comparing its effectiveness to real data. The extensive exploration on how varying proportions of real and synthetic images impact model performance is still lacking. This study takes a systematic approach by analyzing different dataset compositions, providing insights into the optimal balance for improving detection accuracy over real data. By doing so, a practical framework for integrating synthetic data in AI-driven poultry monitoring is offered, helping to address challenges related to data availability while maintaining model reliability.

3. Materials and Methods

This section describes the key tools and methods used for data collection and analysis, which were later utilized in the development of AI models. The new component, which is an addition to the previous version of the system [4], is related to generative AI. The IoT-based platform for smart poultry farms integrates real and synthetic data, edge computing, and cloud-based processing for automated monitoring. The methodology follows these key steps (Figure 2):

Step 1—Dataset Collection: Real images are captured by an AI-enabled camera (Edge 1) inside the closed poultry farm Radinović located in Montenegro, during daylight conditions. Synthetic data are generated using a fine-tuned generative AI model (FLUX.1) [19,41] to reduce manual data collection efforts.

Step 2—Preprocessing: The dataset undergoes annotation, optional augmentation (rotation, scaling, and noise addition), and structuring to enhance generalization;

Step 3—Model Training: The YOLOv8 and YOLOv9 architectures are evaluated. After initial experiments, YOLOv9 is selected for fine-tuning due to its higher predictive accuracy;

Step 4—Edge Deployment: The trained model is ready to be deployed on Edge 2, which processes real-time camera input and provides local (on device) inference for object detection.

Step 5—Cloud Integration: Prediction results could be transmitted via an API to a cloud-based platform for farm management. The web application offers insights into farm conditions, enabling automated monitoring and decision-making [42].

By integrating real and synthetic data, edge AI (AI prediction model on edge device) and cloud processing, the system reduces the importance of real data, automates monitoring and improves decision-making in poultry farming.

3.1. Tools Selection and Setup

The development of a generative AI system for synthetic image generation requires a structured workflow that includes data collection, model selection, training and evaluation. Figure 3 illustrates the proposed workflow, outlining the essential steps involved in the selection and setup of tools. The methodology comprises the key steps shown as follows:

Step 1—Dataset Preparation: Images and sensor data are collected from cameras and sensors deployed in real-world environments, such as poultry farms. The raw data undergo cleaning, transformation and formatting to ensure they are structured and suitable for AI training.

Step 2—Data Annotation: Processed data are annotated to create a structured training set. Most of the annotation process is automated using Roboflow’s auto-labeling tools, including models like Grounding DINO [43], minimizing manual effort and improving efficiency.

Step 3—Model Selection and Training: Various generative AI models, including DALL·E 3, Midjourney, and FLUX.1, are evaluated to determine the most suitable option. FLUX.1 [dev] is selected due to its cost-effective fine-tuning capabilities and ability to generate realistic synthetic data. The selected model is fine-tuned using high-performance hardware to ensure high-quality image generation.

Step 4—Model Evaluation and Image Generation: The trained model’s performance is assessed based on human feedback, performance metrics, and qualitative analysis. Users provided text-based or parameterized prompts to guide the model in generating specific synthetic images.

Step 5—Final Output: The process generates high-quality, realistic synthetic images, enhancing the dataset, reducing the need for manual data collection, and improving model robustness.

This structured workflow ensures a systematic approach to synthetic image generation, leveraging AI-driven automation for efficiency and high-quality outputs.

To facilitate the development and experimentation of predictive models, several software tools and computational resources were selected. Python 3.11 was used as the primary programming language, along with Ultralytics [44] for testing and deploying YOLOv8 and YOLOv9 models. The experiments were conducted using access to A100 GPUs, ensuring efficient model training. For generative AI tasks and data annotation, FLUX.1 [dev] and Roboflow [45] were utilized, while Replicate [46] was employed for fine-tuning FLUX.1 [dev] with H100 GPUs. The dataset annotation process was streamlined using Roboflow’s auto-labeling feature, significantly reducing manual effort.

3.2. Dataset Collection and Annotation

In our previous research, the real dataset for training AI models in chicken detection was created using images extracted from video recordings taken on farms in Montenegro and Serbia [4]. Approximately 4000 images were manually annotated for detection, later expanded with augmentation to 9000 using Roboflow. The dataset was split into 7550 images for training, 725 for validation, and 725 for testing. To enhance model generalization, data augmentation techniques such as rotation, brightness adjustment, saturation tuning and cropping were applied. The images were collected at different poultry growth stages, across various farms and environmental conditions, ensuring dataset diversity.

To enhance dataset diversity and reduce the importance of real data, a synthetic data generation pipeline was implemented. The process began by selecting a balanced subset of real images. In the literature, authors emphasize that 10–20 images are enough to generate sufficiently good data [47]. Once fine-tuned, the model was used to create synthetic images, which were subsequently integrated into the training dataset.

For the synthetic data, automatic annotation was integrated using the Grounding DINO model within the Roboflow platform. Grounding DINO is an advanced object detection model that leverages transformer-based detection and grounded pre-training, enabling zero-shot detection, meaning it can recognize objects without prior training on specific classes. In Roboflow, this model automates image annotation, streamlining dataset preparation for computer vision models. Grounding DINO achieved 52.5 mAP on the Common Objects in Context (COCO) zero-shot benchmark, demonstrating its capability to detect objects without additional training on COCO data [43]. This model significantly outperformed manual annotation and surpassed some other zero-shot models, including YOLO-World [48]. Despite these advantages, both models were affected by lighting conditions, object occlusion and image quality. The annotation process involves uploading a dataset, selecting an image and using the Box Prompting (AI labeling) feature, where manually labeling just one object allows the model to predict bounding boxes for others. Adjusting the confidence threshold influences sensitivity—lower values increase object detection coverage but also raise the risk of false positives. This was particularly evident in images affected by lighting conditions and object occlusion, where manual adjustments were necessary. Once predictions are reviewed, annotations can be approved and saved, finalizing the dataset for model training (Figure 4a). Even if the instance segmentation was not the scope of the research, the automatic annotation feature was tested for this task. Object segmentation poses additional challenges, as models may misidentify entire flocks as a single chicken or fail to detect certain ones, especially in high-contrast lighting conditions. In such cases, manual correction is required. Segment Anything Model 2 (SAM2), developed by Meta AI Research, enhances segmentation by integrating image and video segmentation into a single framework, enabling faster predictions and improved generalization without retraining. Using a transformer-based architecture with memory mechanisms, SAM2 processes video content efficiently and has been trained on the largest video segmentation dataset, ensuring high performance [49]. Unlike bounding box annotation, segmentation requires manual interaction, where the user hovers over an object, prompting the model to generate a polygon mask. While this improves accuracy, it is more time-consuming, and Roboflow’s Smart Polygon feature remains a paid option for large-scale annotations (Figure 4b).

Automatic annotation significantly enhances dataset preparation by reducing manual effort, especially in complex tasks like object segmentation. Beyond using models like Grounding DINO and SAM2, pre-trained models with high prediction accuracy can further streamline annotation, improving efficiency. According to Roboflow researchers, their community has collectively saved over seventy years of manual annotation time through automation, highlighting its impact on accelerating model development [50]. Given that annotation is one of the most labor-intensive steps in developing high-precision predictive models, automation plays an important role in modern AI workflows. The following section explores the experiment setup and metrics for FLUX.1 [dev] and YOLOv9.

3.3. Experiment Models and Setup

The experiment was designed to assess the impact of synthetic data generation on the accuracy of AI models used for chicken detection. FLUX.1 is one of the text-to-image models developed by Black Forest Labs. This model generates high-quality images from natural language prompts, utilizing a hybrid architecture that combines multimodal and parallel diffusion transformer blocks scaled to 12 billion parameters [51,52,53]. It offers three variants: FLUX.1 [pro] for top performance, FLUX.1 [dev] for non-commercial applications (selected for this research), and FLUX.1 [schnell] for rapid, local development. A subset of 60 real images was used from a poultry farm in Montenegro (Radinović farm), representing three distinct growth stages of chickens: 20 images of small chickens, 20 of medium-sized chickens, and 20 of large chickens. This selection was designed to diversify the synthetic dataset and improve the model’s ability to generalize across different poultry growth stages. As previously noted, FLUX.1 [dev] requires only 10 to 20 real images to generate well-generalized synthetic images. To ensure a balanced dataset, 20 images were used for each class (small, medium, and large chickens), rather than mixing different sizes within a single category. The Replicate platform was used for fine-tuning, running on an H100 GPU with training parameters set at 1000 steps and LoRA rank = 16, following the recommended configurations. For dataset expansion, a fixed trigger word (“CHICKRAD”) was embedded into the fine-tuning process. This approach ensured that during synthetic image generation, the model consistently produced relevant chicken images while maintaining realism in texture, size, and pose variations. The fine-tuning process was completed in less than 30 min, ensuring the efficient adaptation of the model to poultry-specific features. The training (fine-tuning) cost on this platform was less than EUR 5 [54].

To achieve high-quality synthetic images, several parameters were optimized. The LoRA scale was set to 1.0, balancing the influence of the LoRA model in the generation process [55,56]. A higher value would strengthen the fine-tuned model’s impact, while a lower value would allow the base model to contribute more significantly. The guidance scale was set to 2.5, determining how closely the generated images adhered to the text prompt. Testing different values showed that 2.5 produced the most realistic results, striking the best balance between image diversity and prompt accuracy. The prompt strength parameter, which regulates the degree of influence from the given prompt, was left at its default value of 0.8 since it had minimal impact on image quality.

The text prompt used for generation described chickens in a natural farm setting, incorporating variations in color, positioning and depth to introduce greater diversity into the dataset. The entire generation process followed a systematic workflow, detailed in Algorithm 1, ensuring consistency and reproducibility. By integrating real and synthetic data, the extended dataset provided a more balanced and diverse set of images, enhancing the robustness of AI models trained for poultry detection. The optimized FLUX.1 [dev] model successfully generated high-quality realistic images that closely resembled real-world farm conditions, validating the effectiveness of synthetic data augmentation. The total cost for generating over 400 synthetic images was less than EUR 5. Consequently, the overall expense for fine-tuning the FLUX.1 [dev] model and producing a synthetic dataset exceeding 400 images amounted to approximately EUR 10 with H100 GPU on Replicate platform.

Algorithm 1 Implementation steps of generative AI for extending dataset with synthetic data.

$r e a l_d a t a s e t \leftarrow c o l l e c t_r e a l_d a t a s e t$
$r e a l_d a t a s e t_c h u n k \leftarrow s e l e c t_s a m p l e_f r o m_r e a l_d a t a s e t$
$s e l e c t e d_g e n e r a t i v e_a i_m o d e l \leftarrow F l u x 1.1 [d e v]$
Model training
$t r i g g e r_w o r d \leftarrow “ C H I C K R A D ”$
$l o r a_r a n k \leftarrow 16$
$s t e p s \leftarrow 1000$
$t r a i n (s e l e c t e d_g e n e r a t i v e_a i_m o d e l)$
Generate synthetic data
$l o r a_s c a l e \leftarrow 1$
$g u i d a n c e_s c a l e \leftarrow 2.5$
$p r o m p t_s c a l e \leftarrow 0.8$
$p r o m p t \leftarrow “ A close - up isometric photo capturing a cluster of CHICKRAD nestled in a bed of soft, straw - colored hay . The CHICKRAD, in shades of white, brown, and soft yellow, are scattered across the hay, with some pecking, others resting . The variety in CHICKRAD sizes and their various angles create depth and liveliness in the scene . The hay background enhances the rustic feel, bringing out the delicate textures of their fluffy feathers . ”$
$g e n e r a t e_s y n h t e t i c_d a t a ()$

The synthetic dataset was expanded using a structured pipeline, generating three images per model call as shown in Figure 5. The left column presents the input parameters used for the model, while the right column displays the corresponding generated images. These parameters have already been described. After generating these images, a small portion (5%) was manually excluded from the training dataset due to poor quality, lack of realism, or mismatch with the input prompt. The same manual procedure was applied to outliers for real images. As previously mentioned, this research focused on a specific farm with unique conditions (e.g., lighting, surface, and colors). Any additional variations beyond these specific conditions were outside the scope of this study and were not included in the fine-tuning process.

YOLOv9 represents a significant advancement in real-time object detection, introducing key innovations to enhance performance and efficiency [57]. One of the central challenges in deep neural networks is the Information Bottleneck Principle, which highlights the potential loss of information as data passes through successive layers [58]. This is mathematically represented as

I (X, X) \geq I (X, f_{θ} (X)) \geq I (X, g_{ϕ} (f_{θ} (X)))

(1)

where I denotes mutual information, and f and g are transformation functions with parameters

θ

and

ϕ

, respectively. To address this, YOLOv9 introduces Programmable Gradient Information (PGI), ensuring the preservation of essential data across the network’s depth, leading to more reliable gradient generation. PGI contains three key components: the main inference branch, auxiliary reversible branches for reliable gradient generation, and multi-layer auxiliary information for more effective deep supervision [59]. Additionally, the architecture incorporates reversible functions, which can be inverted without information loss, expressed as

X = v_{ζ} (r_{ψ} (X))

(2)

with

ψ

and

ζ

as parameters for the reversible and its inverse functions, respectively. This design choice maintains complete information flow, enabling accurate parameter updates. Also, YOLOv9 uses the Generalized Efficient Layer Aggregation Network (GELAN), enhancing feature extraction and fusion, thereby improving accuracy and efficiency. These innovations collectively position YOLOv9 as one of the most effective AI models for computer vision tasks, such as object detection, instance segmentation, etc. By integrating advanced loss functions and optimization strategies, YOLOv9 balances detection accuracy and speed, making it a good candidate for computer vision tasks. The next section covers evaluation metrics: YOLOv9 loss functions, and mean average precision for assessing model accuracy.

3.4. Evaluation Metrics

After generating over 400 synthetic images, an experiment was conducted to evaluate the impact of synthetic data, produced using the fine-tuned FLUX.1 [dev] model, on the accuracy of the YOLOv9 predictive model. All experiments were conducted using an A100 GPU.

During the training process, one of the most important metrics is loss. The YOLOv9 loss function enhances previous architectures for better detection accuracy and training stability. It consists of three key components [57]:

YOLOv9 uses CIoU (Complete Intersection over Union) loss, incorporating distance, aspect ratio, and scale, ensuring faster convergence and precise localization for bounding box regression loss [60].
Using Binary Cross-Entropy (BCE) loss with sigmoid activation, the model enables multi-label classification per bounding box for classification loss [61].
Focal loss reduces the impact of easy negatives, focusing on hard-to-classify cases, improving object–background distinction, and it is used as objectness loss [62].

Bounding box regression is a fundamental aspect of object detection, determining how accurately a model predicts object locations within an image in the context of loss function.

The Intersection over Union (IoU) metric is widely used to evaluate object detection performance [61]. It measures the overlap between a predicted bounding box

B_{p}

and the ground truth bounding box

B_{g}

, defined as

I o U = \frac{| B_{p} \cap B_{g} |}{| B_{p} \cup B_{g} |}

(3)

The (3) can be rewritten as follows:

IoU = \frac{Area of Overlap}{Area of Union}

(4)

The loss function is then computed:

L = 1 - I o U

(5)

While IoU effectively quantifies how well two bounding boxes align, it has several drawbacks when used as a loss function for training object detection models. One limitation is its lack of positional awareness, as IoU considers only the overlap area without accounting for the distance between the predicted and ground truth boxes. Even if

B_{p}

and

B_{g}

are close but do not overlap, the result will be 0, just as when the bounding boxes are far apart. IoU imposes no aspect ratio constraints, meaning that bounding boxes with significantly different aspect ratios can still have similar IoU values without being properly aligned. Another issue is scale sensitivity, where large bounding boxes may achieve high IoU scores even if they are poorly aligned with smaller objects [61]. Due to these limitations, using IoU alone as a loss function can lead to slow convergence and inaccurate bounding box predictions.

To overcome the IoU limitations, several improved loss functions have been proposed. Generalized IoU (GIoU) loss [63] introduces a penalty for non-overlapping boxes by considering the smallest enclosing area. Distance IoU (DIoU) loss enhances GIoU by incorporating a penalty based on the Euclidean distance between box centers [64].

YOLOv9 uses Complete Intersection over Union (CIoU) loss to optimize bounding box predictions by incorporating multiple geometric factors that enhance localization accuracy and model convergence speed. Unlike traditional IoU-based loss functions that rely solely on the overlap ratio between predicted and ground truth bounding boxes, CIoU introduces center distance, aspect ratio consistency, and scale awareness, making it a more effective loss function for object detection. CIoU loss is designed to address all three limitations of IoU by integrating the overlap, distance, and aspect ratio into a single loss function [61]:

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b_{p}, b_{g})}{c^{2}} + α v

(6)

where

I o U

represents the Intersection over Union between predicted and ground truth bounding boxes;

ρ^{2} (b_{p}, b_{g})

is the squared Euclidean distance between the center points of the predicted box

b_{p}

and ground truth box

b_{g}

; c denotes the diagonal length of the smallest enclosing box covering both

b_{p}

and

b_{g}

; v is an aspect ratio consistency term that penalizes mismatches in the shape of the bounding boxes; and

α

is a dynamic weighting factor that balances the contribution of the aspect ratio term.

The aspect ratio penalty term v is defined as

v = \frac{4}{π^{2}} {(arctan \frac{w_{g}}{h_{g}} - arctan \frac{w_{p}}{h_{p}})}^{2}

(7)

where

w_{g}, h_{g}

and

w_{p}, h_{p}

are the widths and heights of the ground truth and predicted boxes, respectively.

By combining the IoU, distance, and shape constraints, CIoU improves gradient flow and stabilizes training. This allows YOLOv9 to handle objects of varying sizes and shapes more effectively, especially in cluttered scenes.

YOLOv9 predicts an objectness score for each grid cell to determine whether it contains an object. Instead of using Softmax for multi-class classification, YOLOv9 applies BCE loss independently to each class [61]:

L_{BCE} = - y log (p) - (1 - y) log (1 - p)

(8)

where y is the ground truth label (1 for objects, 0 for background); and p is the predicted probability of an object.

BCE enables multi-label classification, allowing objects to belong to multiple classes. It suits one-stage detectors, handling multiple objects per grid cell. Additionally, it ensures stable gradient flow, preventing vanishing gradients.

Standard object detection models often suffer from class imbalance, where easily classified background regions dominate training, leading to poor performance on objects that are harder to classify. YOLOv9 addresses this issue by using focal loss, which modifies standard BCE loss by reducing the weight of well-classified examples and focusing on harder cases. The formula for general focal loss is [62]

L_{Focal} = - α {(1 - p_{t})}^{γ} log (p_{t})

(9)

where

p_{t}

is the predicted probability for the correct class;

α

is a weighting factor to balance class distribution; and

γ

(typically set to 2) controls the down-weighting of well-classified examples.

Several advanced loss functions, such as Alpha-IoU [65], EIoU [66], Deep-EIoU [67], SIoU [68], etc., have been proposed to enhance object detection accuracy. Many deep learning object detectors, including YOLO models, are actively exploring hybrid IoU metrics that dynamically adapt during training. Additionally, recent approaches integrate transformer-based attention mechanisms to refine bounding box positioning beyond traditional IoU calculations [12].

Another important metric for evaluating model quality is accuracy. The accuracy of YOLOv9-e was assessed using the mean average precision (mAP) metric across 50 real images, evenly distributed by size and identical across all experimental setups, ensuring a consistent basis for comparison. The training was conducted for 100 epochs, after which mAP showed no significant improvement, indicating that it had plateaued with the given dataset and model. The mean average precision is a key metric for evaluating the performance of object detection models such as YOLO. It provides a comprehensive assessment by integrating several key performance indicators. To calculate mAP, it is important to define the following values: True Positives (TP)—correctly identified objects; False Positives (FP)—incorrectly identified objects; and False Negatives (FN)—missed objects that were not identified.

Precision measures the accuracy of positive predictions and is calculated as

Precision = \frac{T P}{T P + F P}

(10)

Recall evaluates the model’s ability to identify all relevant objects and is determined by

Recall = \frac{T P}{T P + F N}

(11)

mAP uses the standard Intersection over Union metric defined in (3). A higher IoU value indicates a more accurate localization of the object by the model.

To compute mAP, the model’s precision and recall are evaluated at various IoU thresholds, typically ranging from

0.5

to

0.95

in steps of

0.05

. For each threshold, the average precision (AP) is calculated as the area under the precision–recall curve:

AP = \int_{0}^{1} P (r) d r

(12)

where

P (r)

represents the precision–recall curve. The final mAP is obtained by averaging the AP values across all classes and IoU thresholds:

{mAP}_{50 : 95} = \frac{1}{10} \sum_{t = 0.50}^{0.95} \frac{1}{N} \sum_{i = 1}^{N} \int_{0}^{1} P_{i} (r, t) d r

(13)

where N is the number of object classes, and t denotes the IoU threshold.

mAP (13) combines the precision, recall, and localization accuracy (via IoU) to provide a holistic measure of the model’s detection performance. It is a crucial metric for evaluating YOLO-based object detection models, ensuring both classification and localization quality.

The following section presents the results and analysis of these experiments.

4. Results and Discussion

The results presented in Table 1 provide insights into the impact of synthetic data on the YOLOv9-e model for object detection. YOLOv9-e was chosen as a strong representative of modern YOLO architectures to provide a consistent baseline for evaluating synthetic data augmentation rather than benchmarking different YOLO versions.

The initial experiment (Experiment 1) evaluated the baseline model without additional training data. The results show a relatively low mAP score of 0.245, which aligns with previous findings. These results confirm that, without additional annotated data or fine-tuning, the model struggles to achieve high accuracy. A clear trend is observed when comparing Experiments 2 and 3. Increasing the amount of real data leads to a notable improvement in model accuracy, rising from 0.796 to 0.822. This finding supports the well-established principle that expanding real datasets enhances model performance. However, Experiments 4 and 5, which assess the effect of using only synthetic data, reveal that increasing synthetic data alone does not improve accuracy (0.793 vs. 0.789). An important observation emerges when comparing Experiments 6 and 7. A combination of real and synthetic data leads to an mAP score of 0.821, closely matching the performance of the model trained exclusively on real data (0.822). This suggests that integrating synthetic and real data helps maintain high accuracy while reducing the need for extensive real-world data collection. Furthermore, Experiment 8 achieves the highest mAP score (0.829). These results indicate that adding a small percentage of synthetic data to real data can slightly improve model accuracy, which aligns with similar studies [40]. This improvement is likely due to better generalization introduced by synthetic data.

In experiments 10–15, data augmentation was applied in three different ways: using only real images, only synthetic images, and a combination of both. The results indicate that augmentation did not improve model accuracy, despite increasing dataset size and diversity. The applied augmentation techniques included rotation, brightness adjustment (−15% to +15%), saturation change (−10% to +10%), and cropping (max 10%). These findings indicate that augmentation alone is not sufficient for improving performance when working with synthetic datasets. This can be attributed to the differences between the test data and augmented data, as well as the impact of multiple transformations. For instance, in Experiment 15, where data augmentation was applied only through rotation, the mAP was slightly lower compared to Experiment 8, which yielded the best results, but better compared to other experiments with mixed data augmentation.

As mentioned, the purpose of this study was not to compare results with previous mAP values but rather to highlight the potential role of synthetic data in developing predictive models. Even though comparing mAP values with other research papers was not the primary focus of this study, the results obtained with a smaller dataset and a limited number of epochs remain comparable to existing studies in this field [4]. Based on the results and existing research, synthetic data can significantly enhance machine learning model training for chicken detection, particularly when real data are limited or unavailable. As generative AI models for synthetic data generation continue to evolve, their effectiveness in model training and dataset augmentation is expected to improve. However, it is essential to ensure that synthetic data accurately represent real-world characteristics to achieve high model precision. This research demonstrates that synthetic datasets and generative AI can significantly reduce the need for real data collection while maintaining high accuracy in YOLO-based object detection models.

Studies such as [35,36] have demonstrated that generative AI significantly improves object detection accuracy in agricultural applications by addressing dataset limitations. For example, in [33], researchers applied GAN-based data augmentation and reported a 29% improvement in mAP for chicken detection models, which aligns with the observations from Table 1 that synthetic data contribute to enhancing model generalization. Similarly, Nikolenko [39] emphasized the importance of synthetic data in preserving essential object characteristics, a principle that was validated by generating FLUX.1 [dev]-based poultry images to supplement a real dataset. The research results also align with research in other domains. For example, in one of the research papers, the authors demonstrated that synthetic data could fully replace real data in tomato disease detection, achieving competitive results with entirely AI-generated datasets [38]. While our study does not suggest replacing real data entirely, the observed model accuracy when training on hybrid datasets confirms that synthetic data have an important role in supplementing real images and reducing manual annotation efforts.

Regarding data annotation, using Grounding DINO and SAM2 for this research follows similar efforts by Roboflow [50], which has reported significant reductions in annotation time by leveraging automated tools. These efficiency gains align with industry trends emphasizing automation in dataset preparation to streamline deep learning workflows. The finding in this research confirms that automated annotation is particularly beneficial in agricultural applications, where dataset labeling is one of the major bottlenecks. Comparing our best-performing model (300 real, 100 synthetic images, mAP = 0.829) with other poultry detection studies, our results are competitive but as already mentioned, this was not the focus of this study. Experiments from this research demonstrate that a hybrid dataset can maintain high accuracy while reducing dependency on extensive manual image collection. These findings contribute to the broader discussion on generative AI applications in precision agriculture. The results reinforce that synthetic data can effectively supplement real-world datasets and improve model accuracy in poultry monitoring. This research is part of the agroNET platform [42] and contributes to a broader in-production platform ecosystem. In this study, ethical considerations were taken into account regarding data privacy and the use of synthetic data. The synthetic images generated using FLUX.1 [dev] are entirely new and do not replicate or reconstruct real individuals, ensuring compliance with data protection regulations. By using synthetic data, the need for real image collection from farms is reduced, minimizing potential privacy concerns related to real-world data. Additionally, the dataset used in this research is publicly available via Roboflow, ensuring transparency and reproducibility while adhering to ethical AI principles.

Some limitations of this study include its focus on a single farm with specific environmental conditions. Although the fine-tuning dataset included diversity in chicken sizes, additional variations, such as different surfaces, chicken colors and environmental settings, could further improve the model’s robustness. The dataset could also be expanded with more samples and tested on latest state-of-the-art computer vision models, such as YOLOv11 and YOLOv12. Additionally, further testing and parameter optimization could enhance the generation of more realistic images. To support a deeper statistical analysis, techniques like cross-validation could be implemented on a larger validation dataset.

5. Conclusions

This study investigated the potential of synthetic data in enhancing object detection models for AI-driven poultry farming. The findings demonstrate that synthetic images generated with FLUX.1 [dev] can effectively supplement real-world data, particularly in cases where data collection is challenging or resource-intensive. Through experimental validation, it was observed that models trained on a combination of real and synthetic images achieved comparable accuracy to those trained exclusively on real data. This highlights the potential of synthetic data to improve model generalization, reduce data dependency, and enhance robustness in practical applications. A key contribution of this research is the development of a hybrid dataset that integrates real and synthetic images, offering a scalable and cost-effective approach to dataset expansion. The implementation of an automated annotation pipeline using Grounding DINO and SAM2 significantly reduced the manual effort required for labeling, streamlining the dataset preparation process. The results further confirm that while synthetic data alone do not outperform real data, their strategic integration into training datasets improves model efficiency and reduces the reliance on large-scale real-world data collection. This finding is particularly valuable for precision agriculture, where the availability of high-quality labeled datasets remains a major challenge.

Despite these advancements, there are still areas for further exploration. Future research should focus on expanding dataset diversity, including synthetic large-scale image and video datasets, to improve model generalization and enhance the realism of synthetic data by incorporating samples from various farm environments. Additionally, fine-tuning the balance between real and synthetic data under varying environmental conditions and object complexities could provide deeper insights into optimal dataset compositions. Beyond poultry farming, the methodologies introduced in this study could be adapted to other domains of smart agriculture, where synthetic data generation and automated annotation improve AI model development. Leveraging High-Performance Computing (HPC) will be essential for efficiently processing large datasets, enabling faster training, real-time validation, and broader applications in dynamic monitoring and precision agriculture. By continuously refining synthetic data generation and integration techniques, future advancements can lead to more scalable, accessible, and cost-effective AI solutions across agriculture and related fields.

Author Contributions

Conceptualization, T.P. and S.K.; Methodology, S.C. and T.P.; Software, S.C., I.J. and D.B.; Supervision, T.P. and S.K.; Validation, S.C., I.J. and D.B.; Writing—original draft, S.C. and T.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported in part by projects HPC4S3ME and EuroCC2/EuroCC4SEE. HPC4S3ME received funding through IPA II program, CFCU/MNE/213, call reference EuropeAid/172-351/ID/ACT/ME. EuroCC2 and EuroCC4SEE projects received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No. 101101903 and No. 101191697. The JU receives support from the European Union’s Digital Europe Programme and Germany, Bulgaria, Austria, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, France, the Netherlands, Belgium, Luxembourg, Slovakia, Norway, Türkiye, the Republic of North Macedonia, Iceland, Montenegro, Serbia, Bosnia, and Herzegovina.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Conflicts of Interest

Author Srdjan Krco was employed by the company DunavNET. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AP	Average Precision
BCE	Binary Cross Entropy
COCO	Common Objects in Context
CIoU	Complete Intersection over Union

CV	Computer Vision
DIoU	Distance Intersection over Union
DT	Digital Twin
GAN	General Adversarial Network
GenAI	Generative Artificial Intelligence
GIoU	Generalized Intersection over Union
HPC	High Performance Computing
FP	False Positive
FN	False Negative
IoT	Internet of Things
IoU	Intersection over Union
LoRA	Low-Rank Adaptation
mAP	Mean Average Precision
ML	Machine Learning
RAM	Random Access Memory
R-CNN	Regional Convolutional Neural Network
SAM	Segment Anything Model
SSD	Single Shoot Detector
TP	True Positive
YOLO	You Only Look Once

References

USDA. Livestock and Poultry: World Markets and Trade, United States Department of Agriculture, Foreign Agriculture Service. 8 April 2022. Available online: https://www.fas.usda.gov/data/livestock-and-poultry-world-markets-and-trade (accessed on 25 February 2025).
Ojo, R.O.; Ajayi, A.O.; Owolabi, H.A.; Oyedele, L.O.; Akanbi, L.A. Internet of Things and Machine Learning Techniques in Poultry Health and Welfare Management: A Systematic Literature Review. Comput. Electron. Agric. 2022, 200, 107266. [Google Scholar] [CrossRef]
FAO. The Future of Food and Agriculture: Alternative Pathways to 2050; Food and Agriculture Organization of the United Nations: Rome, Italy, 2018; p. 228. [Google Scholar]
Cakic, S.; Popovic, T.; Krco, S.; Nedic, D.; Babic, D.; Jovovic, I. Developing Edge AI Computer Vision for Smart Poultry Farms Using Deep Learning and HPC. Sensors 2023, 23, 3002. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 8–13 December 2014; Volume 1, pp. 91–99. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Rabbani, A.; Hussain, M. YOLOv1 to YOLOv10: A Comprehensive Review of YOLO Variants and Their Application in the Agricultural. arXiv 2024, arXiv:2406.10139. [Google Scholar] [CrossRef]
Ezzeddini, L.; Ktari, J.; Frikha, T.; Alsharabi, N.; Alayba, A.; Alzahrani, A.; Jadi, A.; Alkholidi, A.; Hamam, H. Analysis of the Performance of Faster R-CNN and YOLOv8 in Detecting Fishing Vessels and Fishes in Real Time. PeerJ Comput. Sci. 2024, 10, e2033. [Google Scholar] [CrossRef]
Sapkota, R.; Ahmed, D.; Karkee, M. Comparing YOLOv8 and Mask R-CNN for Instance Segmentation in Complex Orchard Environments. Artif. Intell. Agric. 2024, 13, 84–99. [Google Scholar] [CrossRef]
Camacho, J.; Morocho-Cayamcela, M.E. Mask R-CNN and YOLOv8 Comparison to Perform Tomato Maturity Recognition. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–12. [Google Scholar] [CrossRef]
Alif, M.; Hussain, M. YOLOv12: A Breakdown of the Key Architectural Features. arXiv 2025, arXiv:2502.14740. [Google Scholar] [CrossRef]
Geetha, A.S.; Hussain, M. A Comparative Analysis of YOLOv5, YOLOv8, and YOLOv10 in Kitchen Safety. arXiv 2024, arXiv:2407.20872. [Google Scholar]
Ultralytics. YOLOv11 Overview. Available online: https://docs.ultralytics.com/models/yolo11/#overview (accessed on 27 February 2025).
Liu, J.; Zhou, Y.; Li, Y.; Li, Y.; Hong, S.; Li, Q.; Liu, X.; Lu, M.; Wang, X. Exploring the Integration of Digital Twin and Generative AI in Agriculture. In Proceedings of the 15th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, 26–27 August 2023; pp. 223–228. [Google Scholar] [CrossRef]
Klair, Y.S.; Agrawal, K.; Kumar, A. Impact of Generative AI in Diagnosing Diseases in Agriculture. In Proceedings of the 2nd International Conference on Disruptive Technologies (ICDT), Greater Noida, India, 15–16 March 2024. [Google Scholar] [CrossRef]
Narvekar, C.; Rao, M. Productivity Improvement with Generative AI Framework for Data Enrichment in Agriculture. Int. J. Recent Innov. Trends Comput. Commun. 2023, 11, 679–684. [Google Scholar] [CrossRef]
Majumder, S.; Khandelwal, Y.; Sornalakshmi, K. Computer Vision and Generative AI for Yield Prediction in Digital Agriculture. In Proceedings of the 2nd International Conference on Networking and Communications (ICNWC), Chennai, India, 2–4 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Yang, Z. The Ultimate FLUX.1 Hands-On Guide. Available online: https://medium.com/@researchgraph/the-ultimate-flux-1-hands-on-guide-067fc053fedd (accessed on 25 February 2025).
Lei, J.; Zhang, R.; Hu, X.; Lin, W.; Li, Z.; Sun, W.; Du, R.; Zhuo, L.; Li, Z.; Li, X.; et al. IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models. arXiv 2025, arXiv:2501.13920. [Google Scholar] [CrossRef]
Oñoro-Rubio, D.; López-Sastre, R.J. Towards Perspective-Free Object Counting with Deep Learning. In Computer Vision—ECCV 2016; Springer: Cham, Switzerland, 2016; Volume 9911. [Google Scholar] [CrossRef]
Laradji, I.H.; Rostamzadeh, N.; Pinheiro, P.O.; Vazquez, D.; Schmidt, M. Where Are the Blobs: Counting by Localization with Point Supervision. In Computer Vision—ECCV 2018; Springer: Cham, Switzerland, 2018; Volume 11206. [Google Scholar] [CrossRef]
Tian, M.; Guo, H.; Chen, H.; Wang, Q.; Long, Q.; Ma, Y. Automated pig counting using deep learning. Comput. Electron. Agric. 2019, 163, 104840. [Google Scholar] [CrossRef]
Xu, B.; Wang, W.; Falzon, G.; Kwan, P.; Guo, L.; Chen, G.; Tait, A.; Schneider, D. Automated cattle counting using mask R-CNN in Quadcopter Vision System. Comput. Electron. Agric. 2020, 171, 105300. [Google Scholar] [CrossRef]
Cao, L.; Xiao, Z.; Liao, X.; Yao, Y.; Wu, K.; Mu, J.; Li, J.; Pu, H. Automated Chicken Counting in Surveillance Camera Environments Based on the Point Supervision Algorithm: LC-DenseFCN. Agriculture 2021, 11, 493. [Google Scholar] [CrossRef]
Cowton, J.; Kyriazakis, I.; Bacardit, J. Automated Individual Pig Localisation, Tracking and Behaviour Metric Extraction Using Deep Learning. IEEE Access 2019, 7, 108049–108060. [Google Scholar] [CrossRef]
Lin, C.-Y.; Hsieh, K.-W.; Tsai, Y.-C.; Kuo, Y.-F. Automatic Monitoring of Chicken Movement and Drinking Time Using Convolutional Neural Networks. Trans. ASABE 2020, 63, 2029–2038. [Google Scholar] [CrossRef]
Neethirajan, S. ChickTrack—A quantitative tracking tool for measuring chicken activity. Measurement 2022, 191, 110819. [Google Scholar] [CrossRef]
Liu, H.-W.; Chen, C.-H.; Tsai, Y.-C.; Hsieh, K.-W.; Lin, H.-T. Identifying Images of Dead Chickens with a Chicken Removal System Integrated with a Deep Learning Algorithm. Sensors 2021, 21, 3579. [Google Scholar] [CrossRef]
Elmessery, W.M.; Gutiérrez, J.; Abd El-Wahhab, G.G.; Elkhaiat, I.A.; El-Soaly, I.S.; Alhag, S.K.; Al-Shuraym, L.A.; Akela, M.A.; Moghanm, F.S.; Abdelshafie, M.F. YOLO-Based Model for Automatic Detection of Broiler Pathological Phenomena through Visual and Thermal Images in Intensive Poultry Houses. Agriculture 2023, 13, 1527. [Google Scholar] [CrossRef]
Li, G.; Hui, X.; Lin, F.; Zhao, Y. Developing and Evaluating Poultry Preening Behavior Detectors via Mask R-CNN. Animals 2020, 10, 1762. [Google Scholar] [CrossRef]
Yang, X.; Bist, R.; Paneru, B.; Chai, L. Monitoring Activity Index and Behaviors of Cage-Free Hens with Advanced Deep Learning Technologies. Poult. Sci. 2024, 103, 104193. [Google Scholar] [CrossRef]
Ma, X.; Lu, X.; Huang, Y.; Yang, X.; Xu, Z.; Mo, G.; Ren, Y.; Li, L. An Advanced Chicken Face Detection Network Based on GAN and MAE. Animals 2022, 12, 3055. [Google Scholar] [CrossRef]
Ye, C.; Yousaf, K.; Qi, C.; Liu, C.; Chen, K. Broiler Stunned State Detection Based on an Improved Fast R-CNN Algorithm. Poult. Sci. 2020, 99, 637–646. [Google Scholar] [CrossRef] [PubMed]
Muhammad, A.; Salman, Z.; Lee, K.; Han, D. Harnessing the Power of Diffusion Models for Plant Disease Image Augmentation. Front. Plant Sci. 2023, 14, 1280496. [Google Scholar] [CrossRef] [PubMed]
Lu, Y.; Chen, D.; Olaniyi, E.; Huang, Y. Generative Adversarial Networks (GANs) for Image Augmentation in Agriculture: A Systematic Review. Comput. Electron. Agric. 2022, 200, 107208. [Google Scholar] [CrossRef]
Goyal, M.; Mahmoud, Q.H. A Systematic Review of Synthetic Data Generation Techniques Using Generative AI. Electronics 2024, 13, 3509. [Google Scholar] [CrossRef]
Klein, J.; Waller, R.; Pirk, S.; Pałubicki, W.; Tester, M.; Michels, D.L. Synthetic Data at Scale: A Development Model to Efficiently Leverage Machine Learning in Agriculture. Front. Plant Sci. 2024, 15, 1360113. [Google Scholar] [CrossRef]
Nikolenko, S.I. Synthetic Data for Deep Learning in Computer Vision: A Survey. arXiv 2019, arXiv:1909.11512. [Google Scholar]
Seth, P.; Bhandari, A.; Lakara, K. Analyzing Effects of Fake Training Data on the Performance of Deep Learning Systems. arXiv 2023, arXiv:2303.01268. [Google Scholar]
HuggingFace. FLUX 1.1-dev. Available online: https://huggingface.co/black-forest-labs/FLUX.1-dev (accessed on 25 February 2025).
agroNET. Digital Farming Management. Available online: https://digitalfarming.eu (accessed on 26 February 2025).
Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Jiang, Q.; Yang, J.; Li, C.; Yang, J.; Su, H.; et al. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. arXiv 2023, arXiv:2303.05499. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. Available online: https://github.com/ultralytics/ultralytics (accessed on 25 February 2025).
Roboflow (Version 1.0) [Software]. Available online: https://roboflow.com (accessed on 25 February 2025).
Replicate, Inc. Replicate Platform. Available online: https://replicate.com (accessed on 25 February 2025).
Belekar, A. Fine-Tuning Flux.1 with Your Own Images: Top 3 Methods. Available online: https://blog.segmind.com/fine-tuning-flux-1-with-your-own-images-top-3-methods/ (accessed on 12 November 2024).
Mullins, C.C.; Esau, T.J.; Zaman, Q.U.; Toombs, C.L.; Hennessy, P.J. Leveraging Zero-Shot Detection Mechanisms to Accelerate Image Annotation for Machine Learning in Wild Blueberry (Vaccinium angustifolium Ait.). Agronomy 2024, 14, 2830. [Google Scholar] [CrossRef]
Ravi, N.; Gabeur, V.; Hu, Y.-T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714. [Google Scholar] [CrossRef]
Roboflow Team. Segment Anything: Roboflow Image Segmentation. Available online: https://ai.meta.com/blog/segment-anything-roboflow-image-segmentation/ (accessed on 12 November 2024).
FLUX, Github. Available online: https://github.com/black-forest-labs/flux (accessed on 15 March 2025).
Yang, C.; Liu, C.; Deng, X.; Kim, D.; Mei, X.; Shen, X.; Chen, L.-C. 1.58-bit FLUX: Efficient Quantization of Text-to-Image Models. arXiv 2024, arXiv:2412.18653. [Google Scholar]
Flux.1-Architecture-Diagram, Github. Available online: https://github.com/brayevalerien/Flux.1-Architecture-Diagram/tree/main (accessed on 27 February 2025).
Replicate, Inc. Replicate Pricing. Available online: https://replicate.com/pricing (accessed on 27 February 2025).
ShakkerLabs. FLUX.1 dev LoRA Antiblur. Available online: https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-AntiBlur (accessed on 12 February 2025).
ShakkerLabs. FLUX.1 dev LoRA Add Details. Available online: https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-add-details (accessed on 12 February 2025).
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Tishby, N.; Pereira, F.C.; Bialek, W. The Information Bottleneck Method. arXiv 2000, arXiv:physics/0004057. [Google Scholar]
Yaseen, M. What Is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2409.07813. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef]
Terven, J.R.; Cordova-Esparza, D.M.; Ramirez-Pedraza, A.; Chavez-Urbiola, E.A.; Romero-Gonzalez, J.A. Loss Functions and Metrics in Deep Learning. arXiv 2024, arXiv:2307.02694. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–8 February 2020. [Google Scholar]
He, J.; Erfani, S.M.; Ma, X.; Bailey, J.; Chi, Y.; Hua, X.-S. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS), Online, 6–14 December 2021. [Google Scholar]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Huang, H.-W.; Yang, C.-Y.; Sun, J.; Kim, P.-K.; Kim, K.-J.; Lee, K.; Huang, C.-I.; Hwang, J.-N. Iterative Scale-Up ExpansionIoU and Deep Features Association for Multi-Object Tracking in Sports. arXiv 2023, arXiv:2306.13074. [Google Scholar]
Gevorgyan, Z. SIoU Loss: More Powerful Learning for Bounding Box Regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]

Figure 1. YOLO performance comparison for object detection on COCO dataset [14].

Figure 2. A high-level system architecture of the IoT-based platform for smart poultry farms.

Figure 3. End-to-end workflow for synthetic image generation using AI.

Figure 4. Roboflow dataset auto-labeling: (a) object detection (b) instance segmentation.

Figure 5. Synthetic data creation in Replicate with FLUX.1 [dev].

Table 1. The impact of synthetic data on the accuracy of the YOLOv9-e model, tested on 50 real images, object detection. Number of training epochs: 100, batch size: 16.

Experiment ID	Dataset Split				Metric
Experiment ID	Real	Synthetic	Augmented	Total	mAP
1	0	0	0	0	0.245
2	100	0	0	100	0.796
3	400	0	0	400	0.822
4	0	100	0	100	0.793
5	0	400	0	400	0.789
6	50	50	0	100	0.801
7	200	200	0	400	0.821
8	300	100	0	400	0.829
9	100	300	0	400	0.820
10	400	0	800	1200	0.820
11	0	400	800	1200	0.792
12	200	200	800	1200	0.814
13	300	100	800	1200	0.812
14	100	300	800	1200	0.808
15	400	0	800	1200	0.827

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cakic, S.; Popovic, T.; Krco, S.; Jovovic, I.; Babic, D. Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming. Appl. Sci. 2025, 15, 3663. https://doi.org/10.3390/app15073663

AMA Style

Cakic S, Popovic T, Krco S, Jovovic I, Babic D. Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming. Applied Sciences. 2025; 15(7):3663. https://doi.org/10.3390/app15073663

Chicago/Turabian Style

Cakic, Stevan, Tomo Popovic, Srdjan Krco, Ivan Jovovic, and Dejan Babic. 2025. "Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming" Applied Sciences 15, no. 7: 3663. https://doi.org/10.3390/app15073663

APA Style

Cakic, S., Popovic, T., Krco, S., Jovovic, I., & Babic, D. (2025). Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming. Applied Sciences, 15(7), 3663. https://doi.org/10.3390/app15073663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the FLUX.1 Synthetic Data on YOLOv9 for AI-Powered Poultry Farming

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Tools Selection and Setup

3.2. Dataset Collection and Annotation

3.3. Experiment Models and Setup

3.4. Evaluation Metrics

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI