Real-Time Farm Surveillance Using IoT and YOLOv8 for Animal Intrusion Detection

Delwar, Tahesin Samira; Mukhopadhyay, Sayak; Kumar, Akshay; Singh, Mangal; Lee, Yang-won; Ryu, Jee-Youl; Hosen, A. S. M. Sanwar

doi:10.3390/fi17020070

Open AccessArticle

Real-Time Farm Surveillance Using IoT and YOLOv8 for Animal Intrusion Detection

by

Tahesin Samira Delwar

^1,†

,

Sayak Mukhopadhyay

^2,†,

Akshay Kumar

²,

Mangal Singh

²

,

Yang-won Lee

³

,

Jee-Youl Ryu

^4,* and

A. S. M. Sanwar Hosen

^5,*

¹

Department of Smart Robot Convergence and Application Engineering, Pukyong National University, Busan 48513, Republic of Korea

²

Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune 412115, India

³

Department of Spatial Information Engineering, Pukyong National University, Busan 48513, Republic of Korea

⁴

Department of Information and Communication Engineering, Pukyong National University, Busan 48513, Republic of Korea

⁵

Department of Artificial Intelligence and Big Data, Woosong University, Daejeon 34606, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Future Internet 2025, 17(2), 70; https://doi.org/10.3390/fi17020070

Submission received: 28 December 2024 / Revised: 23 January 2025 / Accepted: 3 February 2025 / Published: 6 February 2025

(This article belongs to the Topic Internet of Things Architectures, Applications, and Strategies: Emerging Paradigms, Technologies, and Advancing AI Integration)

Download

Browse Figures

Versions Notes

Abstract

:

This research proposes a ground-breaking technique for protecting agricultural fields against animal invasion, addressing a key challenge in the agriculture industry. The suggested system guarantees real-time intrusion detection and quick reactions by combining cutting-edge sensor technologies, image processing capabilities, and the Internet of Things (IoT), successfully safeguarding crops and reducing agricultural losses. This study involves a thorough examination of five models—Inception, Xception, VGG16, AlexNet, and YoloV8—against three different datasets. The YoloV8 model emerged as the most promising, with exceptional accuracy and precision, exceeding 99% in both categories. Following that, the YoloV8 model’s performance was compared to previous study findings, confirming its excellent capabilities in terms of intrusion detection in agricultural settings. Using the capabilities of the YoloV8 model, an IoT device was designed to provide real-time intrusion alarms on farms. The ESP32cam module was used to build this gadget, which smoothly integrated this cutting-edge model to enable efficient farm security measures. The incorporation of this technology has the potential to transform farm monitoring by providing farmers with timely, actionable knowledge to prevent possible threats and protect agricultural production.

Keywords:

agricultural security; animal detection; computer vision; crop protection; early warning system; IoT; intrusion detection; image processing; machine learning; sensor technology; smart farming; wildlife intrusion; Yolo V7

1. Introduction

Agriculture, sometimes known as the “lifeblood of sustenance”, is critical to renewing the food supply. However, just creating these resources is insufficient, because they can be destroyed before they can be used, which wastes not only time and money but also valuable raw materials. As a result, protecting agricultural assets and farmlands is critical, especially given the rising issue of animal encroachment. This threat, posed by wild boars, macaques, porcupines, deer, monkeys, and bears, has not only proven to be extremely harmful, but has even resulted in human deaths on occasion. Local farmers face the brunt of this situation, including losses of up to 50% of their crops, exacerbated by their unwillingness to resort to severe measures due to rigorous wildlife rules. Figure 1 depicts animals invading agricultural regions. The most common expression of this conflict is human–elephant conflicts, which account for a significant amount of the damage seen in many locations [1].

Agriculture is the backbone behind both meeting the population’s food demands and delivering key raw materials to industry. Animal interference, on the other hand, results in massive crop losses in agricultural sectors. Crop devastation by marauding wild animals has recently become a major source of worry. Notably, the predation of these animals has caused significant damage, occasionally resulting in human mortality. The negative impact on agricultural output is especially obvious in the case of basic crops grown in rural regions, such as potatoes and wheat, with small-scale farmers suffering the brunt of these losses, which can amount to 40% to 50% of their overall yield. Unfortunately, due to strict wildlife preservation rules, these farmers are prevented from taking more active action. The increasing cases of human–elephant conflict, notably in India, highlight the critical need for a technology that can quickly repel these trespassing animals upon their discovery [2].

This paper presents the Smart Farm Surveillance System (SFSS), an innovative approach developed to protect agricultural lands from wildlife intrusion through real-time monitoring and detection utilizing IoT and AI technologies. The fundamental goal of the SFSS is to provide proactive, real-time monitoring of agricultural and crucial areas through the expert use of cutting-edge object detection algorithms. By seamlessly combining cloud technology with AI-powered camera systems, this new approach promises the rapid identification and mitigation of possible threats. The expanding agricultural industry, along with the need for strong security measures, is part of the growing demand for modern surveillance systems. The adoption of cutting-edge technologies, such as the SFSS, is essential, given the shortcomings of traditional monitoring systems in recognizing and responding to real-time threats. This system effectively addresses the difficulty of timely threat detection on farms by ensuring the early detection of intruders, animals, or unauthorised activity. Furthermore, through interactions with cloud-based services, its smooth data processing provides farmers with timely notifications and improves all their crop management skills.

The SFSS is based on cutting-edge object identification algorithms that use machine learning and deep learning techniques to recognize a wide range of things, from human intrusions to specific animal species. To avert possible risks, it provides proactive and real-time surveillance, prompting alarms for suspicious activities such as wildlife encroachment or intruders. The integration of cloud technology enables remote access to surveillance feeds as well as safe data storage for analysis. It is adaptable, accommodating various farm sizes, and encourages humane security measures while delivering data-driven insights for crop management. Despite the initial outlay, the system’s cost-effectiveness and compliance with wildlife rules make it relevant worldwide, as it addresses the issue of wildlife interference in farming while maintaining sustainability and profitability. The SFSS has several agricultural uses and benefits. It is an effective instrument as a wildlife deterrent since it detects and alerts farmers to the presence of animals such as wild boars, macaques, porcupines, deer, monkeys, and bears, allowing farmers to take prompt action to prevent crop damage. Furthermore, the system excels at intruder detection, protecting against theft and unlawful access. It aids agricultural monitoring by providing data on growth, health, and environmental factors, which aids in irrigation and pest management decisions. Environmental data collection encourages precision agriculture, which optimizes resource utilization. Some versions can also handle animals, providing animal care and security.

The advantages are numerous: it ensures early threat identification, which reduces losses and recovery efforts. Increased protection safeguards crops and property, while data-driven decision-making improves agricultural management and yields. Remote monitoring via cloud integration provides convenience and peace of mind. By decreasing harmful activities, it promotes sustainability and assists in compliance with animal conservation legislation, which has legal and ethical ramifications. The potential impact is significant, with the possibility for increased agricultural production, resource efficiency, and global food security [2]. It contributes to the preservation of biodiversity by aligning with environmental conservation aims. Furthermore, through promoting economic growth, it benefits rural communities and farmers of all sizes. The SFSS is a game-changing technology that promotes efficiency, sustainability, and security in agriculture. The SFSS is a complete system that includes many components and operational features and also possesses significant future potential. AI-powered cameras with powerful object identification algorithms capable of spotting dangers such as animals and intruders are among its components. Cloud infrastructure enables remote access to surveillance feeds by ensuring effective data storage and processing. Environmental sensors collect data on agricultural conditions, while an alerting system alerts farmers of any hazards that are recognized [3].

To detect relevant things effectively, the system requires an initial training of its AI algorithms. It provides continuous monitoring 24 h a day, seven days a week, with remote access via cellphones or laptops, allowing farmers to respond to alarms even from afar. Its data analysis yields useful insights, and its scalability enables customization for farms of various sizes. The system’s future potential is enormous. Agriculture might benefit from automation by integrating with IoT devices. AI advancements will improve detection precision, perhaps identifying individual animal species. Predictive analytics might anticipate dangers, allowing for proactive measures. This method, which has the potential for worldwide deployment, could handle wildlife encroachment issues on a large scale. Its use in environmental monitoring has the potential to conserve natural reserves and safeguard endangered species.

1.1. Literature Review

In recent years, multiple studies have presented IoT-based farm monitoring systems, with an emphasis on technology such as PIR sensors, ultrasonic detection, and deep learning models for animal identification. However, few systems have attained the precision and real-time detection required for large-scale agricultural monitoring. Table 1 represents the summary of the research contributions made in this domain.

Recent improvements in agricultural security frameworks have utilized numerous technologies to improve the safeguarding of farming activities. One study [4] presents an advanced system that employs passive infrared (PIR) and ultrasonic sensors in conjunction with a Raspberry Pi microprocessor. This system efficiently identifies and classifies intruders based on their physical attributes, providing prompt notifications via email, GSM modules, and aural alarms. Furthermore, it integrates a camera to photograph possible intruders, thus furnishing farmers with immediate information about hazards. This method effectively detects a wide array of intrusions, including both fauna and people.

Another study presents a unique method that utilizes computer vision and artificial intelligence (AI) technology for animal identification and warning systems [5]. This system utilizes strategically placed cameras to oversee weak entry points and processes the visual data using OpenCV and a pretrained MobileNet SSD model. Upon recognizing an animal, the system initiates an audio alarm and disseminates warnings to the agricultural community, while simultaneously preserving a thorough register of all recognized species. This AI-driven monitoring not only boosts crop security but also assists in decreasing possible losses.

Paper [6] delineates a comparable agricultural security framework that incorporates PIR sensors, ultrasonic sensors, and a Raspberry Pi microcontroller to enhance farm security. This system includes four PIR sensors positioned at varied heights to detect motion and categorize intruders. Upon detection of an incursion, a camera is activated, and notifications are dispatched by email and through GSM modules. An aural alert serves to dissuade the intruder, creating a cost-effective solution to agricultural security concerns and delivering real-time warnings with minimum human interaction.

Moreover, another study specifies a novel apparatus for animal identification and agricultural protection that involves an array of sophisticated sensors, microcontrollers, and IoT technologies [7]. This system employs PIR sensors to monitor animal movements, while auxiliary sensors, including Light-Dependent Resistors (LDR) and Flame Sensors, provide functionalities such as automated lighting management and forest fire detection. When animal activity is identified, the system activates acoustic signals, lights the area, and sends messages to both the farming community and forest authorities via an implanted WiFi module. Night vision cameras boost its monitoring capabilities during dark, offering an effective deterrent to preserve crops without injuring wildlife.

For further advancement of agricultural production globally, the potential for automation in important areas like temperature control remains underexplored. An eco-friendly, cost-effective solution leveraging the IoT intends to boost accessibility in Indian farming by providing real-time data on water management, temperature, electricity, and lighting.

Recent studies of the performance of drones in rice fields indicate the efficiency of agricultural spraying methods [22]. As the world population continues to expand, the integration of smart agriculture powered by IoT and LoRaWAN technologies is regarded as crucial for tackling agricultural concerns. IoT sensors are crucial in monitoring resource consumption and crop health, with LoRaWAN providing environmental evaluations, irrigation control, and animal tracking. In study [23], IoT sensors were used to gather information on resource utilization and crop conditions to enhance farming. This move relies significantly on LoRaWAN, which offers environmental monitoring, irrigation control, and animal tracking. Moreover, its unique technique, applying X-ray imaging paired with deep learning, attained an accuracy rate of 86.01% in differentiating between viable and non-viable seeds [24]. Improvements in animal identification were investigated through the application of Weighted Co-occurrence Histograms of Oriented Gradients (W-CoHOG) for spotting animal incursions in agricultural settings [8]. This method employs thorough studies of farm-captured imagery, boosting its detection skills by incorporating gradient direction and magnitude. The adoption of a sliding window approach improves detection across varied animal sizes and camera focus lengths.

Another important paper describes a system for identifying animal invasions in rice fields and other agricultural settings via sophisticated image processing techniques [9]. This complex procedure comprises gathering pictures via security cameras, followed by background removal, object segmentation, and object recognition, and then employing the Scale-Invariant Feature Transform (SIFT). Once an animal is discovered, an alarm message is used to inform farmers.

In a separate context, the DeepAnomaly technique [10], a hybrid deep learning–anomaly detection strategy, was used to discover abnormalities in agricultural landscapes. This approach capitalizes on the intrinsic uniformity of these landscapes, delivering amazing accuracy in anomaly identification. The authors [8] further employed support vector machines for item identification and landscape categorization, attaining a significant classification accuracy of 91.6%. The practical usefulness of this approach was proven on a local, operating farm [11].

A thorough assessment of 135 significant articles from 2017 to 2022 offered insight into IoT applications within smart agriculture [12]. That detailed review underlines the crucial role of communication technologies in integrating IoT into agricultural operations, examining both the benefits and problems connected to IoT installations. The paper also suggests future research initiatives targeted at boosting agricultural sustainability, quality, and production through IoT improvements.

In addition, a full evaluation of the security procedures within smart agriculture has been undertaken, concentrating on the authentication and access control mechanisms deployed by IoT applications [15]. That research presents a security architecture optimized for agri-tech and creates a threat model that defines probable hostile behaviors in agricultural environments. Another study investigates the role of information and communication technologies (ICTs) in agriculture, analyzing new dangers, weaknesses, and developments in ICT solutions [16].

A systematic analysis using Cochrane reporting rules evaluated 588 research publications from 2015 to 2021, identifying major technologies and solutions within smart farming [17]. A full discussion on the usage of wireless sensors and IoT devices in agriculture was also offered, stressing their applications in soil management, crop monitoring, irrigation, and pest identification, along with the role of unmanned aerial vehicles (UAVs) in maximizing agricultural yields [18].

In light of the obstacles faced by traditional agricultural techniques in India, such as antiquated equipment and scarce skilled personnel, the emergence of drone technology provides potential alternatives for precision agriculture [19]. The cited article reviews current developments in drone capabilities for crop monitoring and pesticide application. The notion of smart agriculture, or “Agriculture 4.0”, is studied in the context of IoT and big data, while it is noted that these developments come with security concerns that require attention. These are crucial, as India ranks as the second-largest producer of food globally.

1.2. Contributions of This Work

The Smart Farm Surveillance System (SFSS) is unique in that it employs advanced algorithms, cloud integration, real-time danger detection, efficient data processing, adaptation to varied sectors, proactive speaker activation, and drones. These qualities together make it a game-changing solution that meets current surveillance and security requirements.

Advanced Object Identification Algorithms: YOLOv8, a state-of-the-art object detection algorithm, is utilized to enhance the accuracy and speed of identifying both animals and humans in real time. Its advanced identification mechanism is attributed to its ability to process high-resolution images with minimal latency, even in challenging conditions such as low light or occlusion. The model’s multi-scale detection capability ensures reliable performance across varying sizes and distances of detected objects.
Cloud-Based Real-Time Monitoring: The cloud-based architecture of this system supports real-time monitoring by processing data from multiple IoT sensors (cameras) simultaneously. The system’s advanced scalability ensures that, as additional sensors are integrated, the cloud infrastructure automatically adjusts to accommodate increased data loads without compromising performance. Real-time alerts are generated and delivered through cloud-based AI models that assess the likelihood of intrusion, continuously learning from new data inputs.
Efficient Data Processing: The system’s connectivity with the cloud allows for seamless data processing, allowing farmers to receive timely notifications and enhance crop management. This feature guarantees that any problems are handled as soon as possible.
Adaptability Beyond Farms: The system’s adaptability extends its utility to a wide range of industries, including railway track monitoring, border security, wildlife protection, forest fire detection, disaster response, industrial safety, environmental monitoring, pest control in agriculture, and oil and gas pipeline inspection. This versatility will transform approaches to monitoring in a variety of businesses.
Speaker Activation for Intruder Deterrence: Once an incursion is detected, the system activates a speaker system to dissuade possible threats. This one-of-a-kind feature offers an extra degree of security.

2. Methodology and Implementation

The proposed methodology introduces a novel strategy for a Smart Farm Surveillance System (SFSS) that leverages IoT and machine learning technologies to detect wildlife and human intrusions. This strategy offers enhanced accuracy and responsiveness compared to traditional monitoring systems such as using wooden scarecrows to repel animals. For enhanced object recognition and quick user warnings, the suggested system takes advantage of cloud computing. Figure 2 presents the methodology of the proposed system.

To begin, live footage from the agricultural fields is effortlessly sent to the cloud servers using microcontrollers. These cloud servers perform two functions: they save real-time agricultural field imagery and they host machine learning algorithms that recognize animals and possible incursions. The implementation of the machine learning algorithms on cloud servers allows for the execution of intrusion detection in real time.

When an incursion occurs, whether it is caused by animals or people entering the field, the system initiates an alarm mechanism meant to repel these invaders and inform surrounding persons. This warning method provides a quick reaction to any illegal presence, both frightening away possible dangers and informing individuals in the affected area.

In the case of animal or human invasions, the cloud server sends an SMS alert to the field owner as an added layer of protection and for user convenience. This timely notice provides the customer with a thorough understanding of any security breaches in their agricultural assets.

To examine the overall performance of the system, a thorough evaluation is performed to determine the feasibility and efficacy of the suggested strategy. A detailed study of the AI models’ capacity to identify intrusions is required for performance measurement. End-to-end testing of the system includes testing its video transmission, intrusion detection, and the effectiveness of its alarm systems. This comprehensive assessment guarantees that the system runs smoothly and offers the appropriate degree of security and intrusion detection.

2.1. Intrusion Detection

In agricultural contexts, the phrase “intruders” usually refers to animals and humans [24]. The possible consequences of these incursions in this environment are many. Animals, whether wild or domesticated, represent a severe hazard to crops by devouring or harming them. Human invaders, on the other hand, may have far from benevolent intentions, frequently motivated by the illegal act of crop stealing. Figure 3 depicts the architecture of the Smart Farm Surveillance System, showcasing its incorporation of IoT components, cloud computing, and real-time monitoring functionalities intended to protect agricultural fields from invasions, including by cattle. The device comprises two ESP32 microcontrollers, each fitted with cameras and audio, that incessantly surveil the area and photograph any intruders. Live data are transferred to the cloud, where machine learning algorithms, like YOLOv8, evaluate it in real time to detect possible threats. Upon the identification of an intruder, the cloud system interfaces with the ESP32 microcontrollers, initiating alarms and activating the speakers to repel the intruders. In addition, the farmers receive alerts via a cloud-based messaging system. The arrows in the figure represent the flow of data between the sensors and the cloud, emphasizing the system’s ability to react promptly and improve farm security.

The primary goal of this system is to provide an efficient and resilient tool for detecting intrusions on agricultural property. When such intrusions are detected, the essential premise is to trigger a rapid and responsive warning and alarm system. This method is critical in guaranteeing the quick interception of illegal entities, whether animals or people, preventing possible harm and theft [25].

The identification of intrusions into agricultural systems is critical. It is now feasible to reduce the danger of crop loss due to animal depredation and human malice by implementing sophisticated intrusion detection systems. The capacity to identify incursions in real time allows the agricultural community to preserve important crops and, as a result, the livelihoods of individuals who rely on these lands for food and economic well-being. Furthermore, intrusion detection has ecological importance because it reduces the harmful effect of unregulated animal infiltration on local ecosystems. As a result, the significance of such systems extends well beyond the agricultural field, making them a critical component of sustainable and secure farming operations.

2.1.1. Data Collection

The data gathering procedure within the proposed system is critical because it serves as the basic bedrock for the construction and assessment of the deep learning models. A diverse approach to data collecting was employed in order to ensure the system’s reliability.

The value of leveraging varied datasets cannot be overemphasized, especially in the context of applications with a broad scope and complexity. It is critical to evaluate the deep learning models’ performance over a range of situations in order to appropriately estimate their capabilities. As a result, three separate datasets were painstakingly built.

The first dataset, which contains 580 photos, is critical to the model training process. This dataset combines images from two Kaggle datasets, one with animals and the other with people. This consolidation results in a full dataset suitable for training, testing, and assessment. This fusion employs 30 separate sets of animals, with 60 photos of each. A total of 60 photographs of people and 60 images of clear fields were also included to give a well-rounded training environment. Figure 4 presents sample images from the primary dataset.

To make deep learning model training easier, the first dataset was carefully divided into three unique categories: animals, people, and no_intrusion (indicating clean fields). This three-tiered separation enabled the supervised learning of the models, with 70% of the dataset designated for training and the remaining 30% for testing. A further segmentation of the training subset into a 3:1 ratio guarantees that 75% is used for training and 25% for validation, boosting the resilience of the models’ training.

The second dataset, which includes 62 photos, maintains the three-class paradigm while synthesizing a new dataset from several Kaggle sources. By combining multiple sources, this dataset adds another layer to the training and assessment process, widening the models’ exposure to different data. Figure 5 presents sample images from the secondary dataset. Besides, Figure 6 shows sample images from real farm dataset.

The third dataset features 100 precisely taken real-world agricultural photos sourced directly from local farm locations. These photographs were captured to intentionally recreate authentic agricultural conditions, providing a key validation ground for testing the proposed Smart Farm Surveillance System (SFSS). By selecting photos that perfectly match the actual operational contexts in which the system will be deployed, we ensured a full assessment of the model’s performance under genuine field conditions. This dataset marks a critical step in confirming the model’s practical application, allowing researchers to examine the YOLOv8 model’s detection skills in genuine agricultural settings with different terrain, illumination, and probable animal intrusion scenarios.

These three rigorously managed datasets give complete insight into the performance of the deep learning models. They allow for thorough model testing, training, and assessment, enabling them to deal with a broad range of probable circumstances in the field.

2.1.2. Model Selection

The process of choosing a model for the SFSS’s intrusion detection component is critical [26]. The model used for this job is critical to the system’s effectiveness since its ability to reliably recognize and categorize intruders, whether they are animals or people, determines the system’s overall performance. Inaccurate or poor models may jeopardize the system’s core goal of protecting agricultural land. Figure 7 presents the model selection procedure.

Model selection is based on a set of comprehensive criteria that take into consideration the system’s diverse needs. The models’ object identification skills, computational efficiency, the depth of their pretrained knowledge, and their adaptability to a cloud-based environment are all critical requirements. The models adopted must also be compatible with the system’s scope, which encompasses extensive agricultural areas and a variety of incursion situations.

A comparison study was performed to determine which model best meets these requirements. The performance of five separate models was evaluated in this analysis: Inception, Xception, VGG16, AlexNet, Yolov6, Yolov7, and Yolov8. The models were rigorously tested and evaluated, modeling real-world incursion situations frequent in the agricultural environment. The capacity of each model to identify intruders, its effectiveness in processing massive amounts of data, and its adaptation to cloud infrastructure were all assessed.

The selection process is driven by the critical necessity of finding a model that can integrate easily into the cloud-based architecture of our system, manage real-time video feeds from farms, and identify and react to intrusions quickly. Finally, the model adopted acts as the foundation of the SFSS, determining the system’s capacity to preserve agricultural land and reduce the harm caused by animals and people.

2.1.3. Inception

The Inception model, a ground-breaking advancement in the fields of deep learning and computer vision, has emerged as a vital component in SFSS’s intrusion detection capabilities. Google Research originally unveiled this model, known as GoogLeNet, in 2014, signaling a paradigm change in the architecture of convolutional neural networks (CNNs).

The Inception model was created in response to the increasing need for extremely efficient and resilient deep learning models and is known for its revolutionary architectural design. Inception deviates from the practice of regular CNNs, which primarily use one-layered convolutional layers. It has a new design that integrates several filter sizes within the same layer, promoting a more holistic approach to feature extraction. This architectural innovation is showcased by the development of the Inception module, which serves as our model’s backbone.

The use of the Inception model inside the SFSS helps to improve its intrusion detection capabilities. Its diverse characteristics make it ideal for this purpose. Notable characteristics of the Inception model include great object detection, high picture classification accuracy, and a superior capacity to handle a broad variety of object sizes.

The Inception model is used in the context of the agricultural Guard System to recognize animals and people in agricultural settings. Its ability to recognize a wide range of objects, along with its processing efficiency, enables it to handle real-time video feeds from farms. Its cloud-compatible design offers even smoother system interactions.

The Inception model makes use of transfer learning, a method that makes use of previously taught information from big-picture datasets. This provides it with a comprehensive grasp of objects and characteristics, which is useful in the context of intrusion detection.

The Inception model is critical to the SFSS because of its innovative design, varied feature extraction capabilities, and efficiency. It contributes to the system’s capacity to safeguard agricultural land, reduce damage, and react quickly to invasions.

2.1.4. Xception

Xception, a significant innovation in deep learning architecture, was a watershed moment in the evolution of convolutional neural networks (CNNs). The letter “X” in Xception refers for “Extreme”, underlining its radical divergence from regular CNNs. This advancement was created to solve the shortcomings of prior models and improve their performance on complicated image analysis jobs.

The Xception concept is based on tremendous depth and has an innate capacity to catch intricate visual details. The “Xception Block” is the network’s primary building component. It employs depth-wise separable convolutions in a revolutionary manner to learn features. Xception implements a depth-wise separable convolution, which divides the operation into two steps: depth-wise convolution followed by point-wise convolution. This division improves the model’s ability to detect complicated patterns and correlations in data.

Xception has exceptional capabilities, making it well suited for a variety of tasks in the context of the SFSS, including picture categorization, object identification, and intrusion detection. Its distinguishing characteristics include its capacity to extract fine-grained information from pictures, making it an excellent option for detecting complicated patterns and objects. Furthermore, Xception has a great track record of beating other models on difficult picture recognition tests.

Xception’s deep architecture and new design concepts allow it to excel in identifying intruders in agricultural fields, whether they be animals or people, within the agricultural Guard System. Its extraordinary capacity to record complex visual information corresponds to the system’s requirements, guaranteeing precise and efficient detection. When integrated with cloud architecture, this advanced model offers a dependable way of protecting agricultural land from possible dangers.

2.1.5. VGG16

VGG16 (Visual Geometry Group 16) is a well-known deep convolutional neural network (CNN) that has had a significant influence on the areas of computer vision and picture categorization. This model marks an important step forward in the development of deep learning architectures, having demonstrated its adaptability and efficacy in a broad range of applications.

The need for a deeper neural network capable of recognizing and categorizing sophisticated picture patterns and attributes led to the creation of VGG16. It was created by the University of Oxford’s Visual Geometry Group and soon became a standard in the field of image recognition.

This model is notable for its great depth, with 16 convolutional and fully linked layers. VGG16’s core is made up of 13 convolutional layers followed by 3 fully linked layers. These layers, with varying filter widths and depths, are critical to collecting and processing picture data hierarchically. This deep architecture is the heart of VGG16, allowing it to learn complicated visual data representations.

In image classification tasks, VGG16 has become associated with excellent accuracy. It is praised for its capacity to detect and discriminate between a wide range of object classes in large-scale datasets, most notably in the prestigious ImageNet Large Scale Visual Recognition Challenge. VGG16 is distinguished by its uniform and consistent architecture, which employs repeated 3x3 convolutional filters. This standardization simplifies the model’s construction and improves its generalization.

VGG16’s excellent picture categorization skills are put to use in the context of the SFSS. Because of its capacity to detect detailed patterns and objects inside photos, it is an excellent option for detecting intruders in agricultural fields. The adoption of VGG16 as one of the system’s models guarantees that any unwanted presence, whether animal or human, is identified quickly. VGG16’s excellent and constant performance record makes it a trustworthy instrument for agricultural land security, particularly when applied within cloud infrastructure.

2.1.6. Alex Net

AlexNet was a seminal accomplishment in deep learning, particularly for image classification problems. It is widely regarded as the pioneering model that sparked the deep learning revolution, having been developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. AlexNet utilized a novel design and training procedures that established the groundwork for contemporary convolutional neural networks (CNNs).

During the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, the AlexNet model rose to prominence. The goal of its creation was to outperform standard computer vision approaches and pave the way for deep learning models that can solve large-scale picture categorization tasks. AlexNet’s designers made crucial design decisions, such as using deep convolutional layers, rectified linear unit (ReLU) activation functions, dropout regularization, and data augmentation approaches. These elements all led to the model’s remarkable success.

The hallmark of AlexNet’s architecture is its numerous layers, which include convolutional, pooling, and fully connected layers. Convolutional layers collect picture characteristics via filters, while pooling layers minimize spatial dimensions to improve computing efficiency. Rectified linear units (ReLUs) are used as activation functions in AlexNet, which aid the model’s capacity to understand complicated nonlinear correlations within the data. Dropout layers reduce overfitting and hence improve generalization.

AlexNet is used for a variety of computer vision applications, with picture categorization being its core strength. Because of the model’s depth and creative architecture, it has achieved a cutting-edge performance on various benchmark datasets, including the ImageNet dataset. Because of its success, deeper and more complicated CNN architectures have been developed.

AlexNet’s key characteristics are its eight-layer-deep architecture, novel use of ReLU activation functions for quicker training, and introduction of GPU acceleration for efficient learning. Its capacity to capture detailed visual patterns and identify items across several categories highlights its versatility. In the framework of the agricultural Guard System, AlexNet’s picture classification expertise is used to identify intrusions in agricultural fields, whether they be animals or people. Its solid performance, learned from a large quantity of data, improves the system’s capacity to effectively preserve agricultural land.

2.1.7. Versions of YOLO Used

YOLOv6: A result of the development of the YOLO (You Only Look Once) family, YOLOv6 is renowned for its ability to accurately and quickly detect objects in real time. With an emphasis on more effective feature extraction and computational efficiency, YOLOv6 introduces network architectural advancements that yield a strong performance across various object identification applications. Its streamlined design makes it a valuable asset in the realm of efficient computer vision applications, enabling rapid deployment in systems requiring real-time analysis, such as autonomous vehicles or surveillance.
YOLOv7: You Only Look Once (YOLO) is a prominent real-time object identification system that has transformed the field of computer vision. Developed by Joseph Redmon and his colleagues at Roboflow, YOLOv7 is distinguished by its exceptional speed and accuracy in recognizing multiple objects within images or video streams.
The evolution of YOLOv7 has been characterized by a series of incremental improvements, with each iteration enhancing the capabilities of the previous version. This model emerged from collaborative efforts within the computer vision community to address the challenges of object detection. From YOLOv1 to YOLOv7, the model underwent rigorous optimization techniques to achieve enhanced accuracy and speed. Advances in deep learning algorithms and hardware acceleration have significantly contributed to the development of YOLOv7.
The architecture of YOLOv7 is notable for its simplicity and efficacy. The model is built on a single deep neural network capable of real-time object recognition. YOLOv7 employs a CNN architecture comprising a base network (e.g., CSPDarknet53 or CSPDarknet53-slim) and a detection head with multiple detection layers. These layers predict bounding boxes, class probabilities, and objectness scores for the objects present in the input image, resulting in a comprehensive list of identified items, complete with confidence ratings and class labels.
YOLOv8: The latest iteration of the YOLO series, YOLOv8, further advances the capabilities of real-time object detection. Building upon the strengths of its predecessors, YOLOv8 is designed for enhanced accuracy and efficiency in diverse application contexts. This model incorporates cutting-edge feature extraction techniques and employs advanced deep learning algorithms, resulting in significant improvements in both detection speed and precision.
The architecture of YOLOv8 features an optimized deep neural network that facilitates rapid object recognition. It includes a refined CNN backbone, which enhances the model’s ability to extract relevant features while maintaining computational efficiency. YOLOv8’s detection head is equipped with sophisticated layers that predict bounding boxes, class probabilities, and objectness scores, similar to previous versions but with increased accuracy due to its enhanced training methodologies.
Notably, YOLOv8 excels in real-time applications, making it suitable for environments such as autonomous driving and smart surveillance systems. Its robust performance and adaptability ensure that it remains at the forefront of object detection technology, responding effectively to the evolving demands of the computer vision landscape.
Figure 8 shows the YOLOv7 deep learning model’s architecture for object recognition. The backbone, neck, and prediction heads make up the three fundamental components of the model’s design.
Backbone: This section extracts features from the input image by processing it through a series of layers. The layers contain CBS (Convolution–BatchNorm–SiLU/ReLU) blocks, which use a nonlinear activation function for efficiency and stabilization after the convolutions are applied for feature extraction and batch normalization. To reduce spatial dimensions, Max Pooling (MP) layers are interspersed with ELAN blocks, which are used to improve the network’s learning and representational capability.
Neck: The features that the backbone extracted are further refined by this section of the network. Using modules like CUC (maybe a cross-stage feature fusion technique), ELAN, and REP (presumably indicating repetition or residual connections), it integrates features of various sizes. The spatial pyramid pooling module that pools features on different scales and concatenates them to capture contextual information at several levels may be referred to as the SPPCSPC block.
Prediction: The last step makes predictions about object detection based on these enhanced attributes. This allows the network to recognize objects of different sizes. It consists of a sequence of convolutional layers and detection blocks (called Detect 1, Detect 2, and Detect 3) that most likely correlate to the different scales at which objects are detected. The ‘Detect’ blocks produce predictions for their respective scales, while the ‘Conv’ layers modify the dimensionality of the features for the ultimate prediction.
YOLOv7 is well known for its many uses in real-time object identification. Its outstanding speed and accuracy make it ideal for situations requiring fast decision-making based on visual input. This approach has found applications in a variety of sectors, including autonomous cars, surveillance systems, and, most famously, the SFSS.
YOLOv7’s key strengths are its ability to perform object detection jobs in real time, its exceptional accuracy in recognizing objects across many classes, and its adaptation to various hardware configurations. The use of YOLOv7 in the framework of the SFSS allows for the speedy and precise detection of intruders in agricultural areas, whether they are animals or people. Its ability to scan live video feeds and provide warnings instantly adds considerably to crop protection and reducing any damage or theft. The SFSS’s use of YOLOv7 underscores the model’s reputation as a must-have for real-time item identification in complicated, dynamic contexts.
YOLOv8: YOLOv8 pushes the limits of object-detecting technology and is a major advancement in the YOLO series. Outstanding accuracy in object recognition and localization is provided by YOLOv8 thanks to its sophisticated architectures and training methods. It was designed for high-fidelity detection, meaning it can identify minute details in challenging visual settings. Its use in vital applications such as public safety and traffic management confirms its standing as a premier instrument for state-of-the-art, real-time object detection systems.
A detailed architectural diagram of YOLOv8, a sophisticated deep learning model for object identification tasks, is shown in Figure 9. These conclusions are drawn from the structure that is presented:
-
Backbone: In order to recognize objects of varying sizes, the model uses a multi-scale method, analyzing the input picture at several resolutions (P1 to P5). In order to capture complicated information, the backbone employs a sequence of convolutional (Conv) layers with different kernel sizes, strides, and padding. These layers gradually reduce their spatial dimensions while increasing their depth (channels). In order to combine low-level and high-level features for a more comprehensive representation of the input data, C2F blocks most likely reflect cross-stage feature fusion.
-
Head (YOLOv8Head): The head is made up of many detection blocks that anticipate object classes, bounding boxes, and objectness scores at various scales (P3, P4, P5). The model’s architecture demonstrates parallel processing at various sizes, enabling it to handle bigger items with efficacy while retaining good resolution for little objects. In order to increase localization and classification accuracy during training, Figure 9 recommends using sophisticated approaches such as CIoU and BCE loss functions.
-
Details: A thorough explanation of certain procedures like Bottleneck, SPPF, and Conv is given; they are crucial parts of the model that help keep computation efficient and refine feature maps. The model’s capacity to retain spatial information is improved by fusing information across multiple levels of the feature hierarchy through the use of skip connections and upsampling approaches. The employment of normalization and nonlinear activation functions to enable quicker and more stable training is shown by the use of BatchNorm2d and SiLU (Sigmoid Linear Unit) activation.
-
Detect (Detection Layers): Each detection block uses a number of convolutions prior to generating final predictions, and detection layers are customized to the particular scale of the features. These layers most likely include class probability predictions and anchor box changes, which are essential for determining the precise location and class of objects in the picture.
The YOLOv8 architecture highlights the significance of using multi-scale processing, feature fusion, and specific detection algorithms. It is a sophisticated and effective network built for fast, precise object recognition. Because of its sophisticated architecture, YOLOv8 can perform at the cutting edge of object detection tasks.

2.1.8. Comparison of the Models

Table 2 provides a comprehensive analysis of five prominent object recognition models: Inception, Xception, AlexNet, VGG16, and the YOLO family. Each model is evaluated based on key parameters, including its architectural design, object detection speed, accuracy, training dataset requirements, and hardware compatibility. Among these, YOLOv8 emerges as a highly favorable choice for this research.

YOLOv8 distinguishes itself by delivering a one-shot detection architecture that smoothly incorporates object localization and classification in a single pass. This innovation, paired with its quick detecting speed, making it particularly ideal for real-time applications. Moreover, YOLOv8 achieves this performance without losing accuracy, making it an ideal candidate for precision-critical applications.

While models such as Inception, Xception, and VGG16 display great detection accuracy, they frequently require significant training datasets [29,30,31]. In comparison, YOLOv8’s small dataset needs are a significant advantage, especially given its ability to retain high accuracy. Additionally, YOLOv8’s support for GPU acceleration considerably boosts its performance, particularly in real-time processing contexts.

YOLOv8 offers an optimal mix between speed and accuracy, establishing itself as the premier model for applications demanding real-time, high-precision object detection. Its unique design and hardware compatibility further boost its adaptability across a wide range of application situations.

YOLOv8 is the principal model used for object identification in the SFSS. The algorithm is trained on a bespoke dataset that includes photos of common agricultural invaders, such as animals, and human trespassers. The dataset was carefully built to incorporate multiple environmental variables to ensure strong model performance in real-world scenarios. YOLOv8 analyzes incoming photos from the ESP32 sensors in real time, recognizing possible threats with excellent accuracy. Once an incursion is detected, the model provides a confidence score for the discovered entity, categorizing it as either an animal or a person. The system then prompts a warning that is delivered to the farmer using a cloud-based notification system and triggers an alarm that is present to frighten away the incursion-causing agent.

2.1.9. Model Deployment

While several machine learning models were investigated, including Inception, Xception, VGG16, and several versions of YOLO, YOLOv8 was chosen for its better real-time detection capabilities. Unlike other models that need large computing resources, YOLOv8 strikes an efficient balance between speed and accuracy, making it particularly ideal for implementation in IoT devices such as ESP32, which possesses a small processing capacity. Other models are mentioned for comparative purposes and were found to be less successful in addressing the unique criteria of this system, with notably low latency detection in a farm context.

After the assessment of their performances, the model that outperformed others and emerged as the ideal choice for deployment was YOLOv8. Its excellent combination of object detection speed and precision, together with its novel single-shot design, positioned it as the favored model for the SFSS.

The deployment method used entails hosting the specified model on a cloud server. This server design is suited for effective and scalable object detection activities, providing the required resources for real-time processing.

The operation begins with the transfer of live video feeds from the agricultural area to the cloud servers, enabled by microcontrollers. These servers provide two functions: they store the constant stream of data from the farms and house the deployed machine learning model responsible for intrusion detection.

Intrusion detection happens within the cloud environment, exploiting the capabilities of the YOLOv8 model. The system regularly examines the camera stream for indicators of infiltration, closely monitoring the fields. Upon sensing an unlawful presence, whether animal or human, the warning system is immediately triggered.

The alert system has a number of aspects. Initially, an alert is generated to prevent prospective intruders, whether they be persons or animals posing a hazard to the crops. This alert serves as a substantial deterrent, urging would-be attackers to escape and averting probable harm.

Simultaneously, the cloud server sends an SMS alert to the field owner, warning them of the breach. This real-time warning is a vital component of the system, enabling owners to take rapid action in reaction to recognized dangers.

In essence, the model deployment procedure simplifies the entire operation. It merges the high-performance capabilities of the YOLOv8 model with cloud server architecture, generating a powerful, real-time, and rapid intrusion detection and alarm system for the security of agricultural areas.

2.2. Cloud Integration

The SFSS makes considerable use of cloud servers, reflecting the system’s preference for cloud-based applications. At its heart, this integration is critical for numerous fundamental system tasks.

Data Storage and Retrieval: Cloud servers are critical as repositories for the continuous video feed acquired from agricultural regions. These servers store a large amount of video data effectively, offering an archive that can be accessed as required. This role is critical not just for record-keeping but also for any post-incident analysis and inquiry.
Intrusion Detection in the Cloud: The video stream coming from the agricultural field is subjected to rigorous intrusion detection inside the cloud environment. To evaluate this stream in real time, the system employs AI models installed in the cloud. This method detects unwanted intrusions with greater precision and speed, ensuring that potential risks are discovered and treated as soon as possible.
Real-time alarm Mechanisms: The cloud integration extends to the alarm system, which is important to the proactive defensive approach of the SFSS. When an intrusion is discovered, the cloud server sends out real-time signals. These signals fulfill two functions. First, they activate on-field measures such as the ESP32, which may involve warning intruders and inhibiting future unauthorized activities. Second, and maybe more importantly, SMS notifications are sent to the agricultural landowner. This real-time communication is a critical component of the system, allowing for rapid and educated reactions to recognized threats.

The use of cloud servers is critical to the SFSS’s high performance. The cloud enables data storage, real-time intrusion detection, and responsive alarm systems, resulting in a sophisticated and intelligent defensive system for defending agricultural fields against possible incursions, damage, and theft. Its extensive connection with the cloud emphasizes the system’s dependence on cutting-edge technology to assure the safety and security of farming.

2.3. Alarm System

An auditory alarm system is a critical component of the SFSS’s incursion deterrent technique. When possible intruders, whether animals or unauthorized people, violate the border of the agricultural field, this device emits sound or rings a bell. The careful incorporation of this auditory deterrent mechanism aims to increase the system’s overall effectiveness in protecting crops.

Speaker Deployment for Auditory Deterrence: The deployment of speakers strategically positioned throughout the agricultural field is crucial to this auditory deterrence method. These speakers act as aural sentinels, ready to react quickly when the system triggers them. They are outfitted with noises meant to frighten and repel intruders, successfully preventing any unwelcome presence in the region.
Integration with ESP32 for an Instant Response: The SFSS’s intellect is stored in an ESP32, a powerful microcontroller, which is used to coordinate the auditory deterrent. Signals are routed from the cloud server to the ESP32 as soon as the system senses an intruder, whether animal or human. This acts as a trigger mechanism, causing the speaker to engage. Instant activation is critical because it provides a quick and forceful reaction to possible threats.

The SFSS’s use of auditory deterrents through speakers and the ESP32 is a manifestation of its comprehensive defensive approach. This audio-visual technology detects intrusion while also deploying a high-impact reaction mechanism. It efficiently scares away animals and trespassers by producing noises or ringing bells. This loud discouragement serves a dual purpose: it protects the crops while also alerting unauthorized persons to the existence of a strong security system. Because this system integrates cutting-edge technology with auditory alarm systems, it strengthens its position as a complete farm protection system.

2.4. SMS Alert

An SMS-based alert system is an important component of the SFSS. This method is precisely designed to send critical warnings to the farm owner’s mobile device, thereby alerting them to the presence of intruders on their agricultural grounds. The primary goal is to provide farmers with timely information, allowing them to take quick action to protect their crops from potential injury or theft. The purposes of the SMS alerts are as follows:

Ensure Prompt Intervention: The capacity to receive timely notifications is crucial in the context of contemporary farming, where distant monitoring is required due to the enormous expanses of farmland one farmer can cover. The SMS-based alert system meets this demand directly, guaranteeing that the farm owner is kept up to date in real time, regardless of their actual location. The benefit of this fast alarm system is that it allows for quick responses and mitigates any possible crop damage.
Integration of AI Model with Cloud System: The SFSS’s intricacy holds the key to launching SMS-based alerts. The AI model installed in the cloud by this system is critical in intrusion detection. When the AI model detects an irregularity or encroachment on agricultural land, it immediately initiates the SMS alert system. To commence message transmission, this method connects to the cloud system.
Farmer Empowerment: Beyond the technological elements, the actual value of the SMS-based alarm system rests in the farmer empowerment it provides. The system acts as a watchful and proactive guardian, ensuring that farm owners have the knowledge they need to take action to safeguard their agricultural interests. This method instills a feeling of security and trust in the agricultural community by bridging the physical divide and providing real-time information.
Security and prevention: The SMS-based alert system serves a dual purpose in terms of prevention. For starters, it is an important instrument for preventing possible risks from invaders, both animal and human, thus safeguarding crops. Second, it generates a climate of heightened security inside the farmlands, dissuading trespassers who may be discouraged by the knowledge that their presence is being observed and instantly reported.

As an important component of the SFSS, the SMS-based alert system highlights the connection between sophisticated technology and practical agricultural demands. It allows farmers to safeguard their crops and creates a more secure and efficient agricultural environment by providing them with rapid alerts.

2.5. Hardware Architecture

The SFSS’s hardware design has been rigorously built, taking into account a plethora of elements while stressing simplicity, practicality, and usefulness. The physical components of the system have been carefully chosen to function in tandem, providing smooth and effective operation. An ESP32 microcontroller, a camera module, a speaker, a motor, and the necessary power supply make up the main hardware. Figure 10 presents the hardware architecture of the proposed system and compares it with the ancient method.

Microcontroller ESP32: The ESP32 microcontroller serves as the central control unit at the core of the hardware design. It acts as the system’s brain, directing the operation of all other physical components. The microcontroller serves as the major data transmission interface, guaranteeing a smooth connection between the camera module, speaker, motor, and cloud servers.
Camera Module: A critical component of the system, the camera module is tightly tied to the ESP32 microcontroller. It is critical in enabling real-time video transmission from the farm to cloud servers. This continuous video stream serves as the basis for AI-driven intrusion detection, allowing for quick reaction mechanisms in the event of an abnormality.
Speaker: Another essential component of the hardware design, the speaker is directly connected to the ESP32 microprocessor. The ESP32 activates the speaker when it receives intrusion detection orders from the cloud server. This activation is successful in repelling intruders, whether they be animals or people, and so improves farm security.
Motor: The motor is a specialist component developed to improve the camera module’s efficiency. It is completely integrated with the camera and allows for 360-degree camera rotation. This function provides thorough coverage of agricultural land, guaranteeing that no blind patches go unnoticed. The regulated rotation of the motor offers an added degree of protection to the farms.
Power Supply: The system’s power supply is critical since it provides the necessary energy for all other hardware components to operate. It guarantees that the ESP32 microcontroller, camera module, speaker, and motor run smoothly, resulting in smooth and continuous functioning.

The harmonic combination of these hardware pieces emphasizes the SFSS’s simplicity and efficiency. This design not only enables the continual monitoring and preservation of agricultural fields, but it also serves as proof of technology’s potential within current farming techniques. This well-thought-out hardware ensemble serves as the foundation of a dependable and efficient farm intrusion detection system by addressing both security and functionality.

Figure 11 demonstrates a complicated electrical arrangement that includes several linked components. It shows the hardware of the system in the real world. Central to this arrangement is the ESP32 camera module, which permits the collection and processing of visual information. Adjacent to it is a power supply unit that enables the continued operation of the linked modules. On the left side, an FTDI Adapter acts as a USB-to-serial converter, enabling the connection of the ESP32 to PCs for programming and data transmission. Additionally, the design contains a distinctive lit buzzer, which works as either an alarm mechanism or a feedback tool for end users. The colored lines connecting the components illustrate the intricacy of their interconnections and symbolize the movement of data and power between the modules. Their painstaking arrangement on a green backdrop accentuates the relevance of each component of the system and emphasizes the requirement for perfect connections to ensure maximum performance. Also, Figure 12 presents the training of the ML models.

3. Results and Experiments

A variety of critical indicators are used to study machine learning model performance, which, when combined, give a thorough assessment of the models’ efficacy. These critical indications include the F1_score, accuracy, precision, recall, and Confusion Matrix, each of which serves a distinct purpose in the evaluation process. Matrix of Confusion: The Confusion Matrix, which has a tabular form, is an important tool for measuring a model’s classification accuracy. This matrix is made up of four fundamental parts:

Real Positives (TP): These are the cases that were appropriately identified as positive.
True Negatives (TN): These are examples of situations that were accurately categorized as negative.
False Positives (FP): These are situations in which Type I errors have led to an inaccurate positive classification.
False Negatives (FN): These are cases that are Type II mistakes and are mistakenly labeled as negative.

Accuracy: The accuracy Equation (1) calculates the ratio of accurately predicted instances to the total number of instances to determine the general correctness of the model’s predictions. It gives an overall evaluation of the model’s performance.

$A c c u r a c y = \frac{N u m b e r o f C o r r e c t P r e d i c t i o n s}{T o t a l N u m b e r o f P r e d i c t i o n s}$

(1)
Precision: Precision is defined as the fraction of actual positives out of all positive assertions. Precision assesses the accuracy of optimistic forecasts.

$P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}$

(2)
Recall: The proportion of true positives among all actual positives. The model’s recall evaluates the efficiency with which it detects cases that are positive.

$R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}$

(3)
F1_score: The harmonic mean of recall and precision is the F1_score. It examines the overall performance of the algorithm in question.

$F 1 - S c o r e = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}$

(4)

3.1. Training and Validation of the Models

In machine learning, “training” refers to the critical stage at which a model learns the ability to recognize complex patterns and provide predictions based on an annotated dataset. During this training phase, the model diligently optimizes its internal parameters—weights and biases—by closely examining the input data and comparing its predictions to the unquestionable truths contained in the training dataset. This process is essentially a matter of the model constantly trying to reduce the discrepancy between its predictions and the true values stored in the training dataset. After this stage, “validation” takes center stage, because it is the next stage of the model’s development. The cornerstone of objectivity, validation aims to understand the model’s capabilities in the realm of unknown data. It is a form of critical analysis that takes place when the trained model is exposed to a separate dataset—the validation dataset—that was intentionally omitted from the scope of its original training. This evaluation determines how well the model can generalize its knowledge to deal with new and unobserved cases, which serves as a benchmark for its proficiency.

The validation dataset acts as a critical assessment tool, apart from the training data, and gives insights into potential flaws such as overfitting. Overfitting happens when a model displays an outstanding performance on training data but becomes unresponsive to fresh data. In the specific context of the agricultural surveillance project, the foundational dataset was applied to rigorously enhance four well-known machine learning models: Inception, Xception, VGG16, and AlexNet.

These algorithms’ ability to decipher the meaning of objects and incursions in the dataset was developed through training. After training, a careful assessment was conducted, putting these models to the ultimate test on a validation dataset. This assessment measured their ability to skillfully extrapolate their learned knowledge and provide predictive identifications when confronted with new and unknown material. Training and validation are inextricably linked processes that must be addressed to develop models proficient in intrusion detection and to safeguard agricultural land.

Figure 13 provides a detailed analysis of how training and validation accuracies have changed over several epochs for a group of well-known neural network architectures: Inception Net, Xception Net, AlexNet, and VGG16 Net. Both Xception Net and Inception Net exhibit very stable training accuracies, approaching roughly 1.0 over the course of one epoch. Although the validation accuracy of Inception Net does not change at all, that of Xception Net shows a clear upward trend, indicating a good level of model improvement. On the other hand, AlexNet exhibits dramatic fluctuations in its accuracy, suggestive of overfitting episodes at particular epochs. VGG16 Net has a high training accuracy to start. But after the fourth epoch, there is a noticeable drop in its validation accuracy, which is indicative of overfitting when compared to a consistently high training accuracy. To summarize, while Inception Net is a model with good stability and resilience, both AlexNet and VGG16 have observable overfitting weaknesses at different points in time.

3.2. Evaluation of the Models on the Test Set of the Primary Dataset

The assessment of machine learning models on the test dataset serves as a vital litmus test, offering an important benchmark for evaluating their real-world performance. The importance of this testing process stems from its capacity to replicate the models’ performance in the face of unique, previously unknown data, simulating the real-world conditions in which these models would be applied. It enables the evaluation of their generalization capacities, the measurement of their accuracy, and the uncovering of latent faults like overfitting that may have gone unnoticed throughout the training and validation stages. As a consequence, evaluating these models on the test dataset highlights their dependability and durability before they are used in production. The four aforementioned machine learning models—Inception, Xception, VGG16, and AlexNet—along with the pretrained YOLOv8 model, are utilized effectively within this study to conduct a comprehensive evaluation of their performance on the test dataset. This thorough research is crucial for assessing their usefulness in real-world situations, notably in the detection of agricultural incursions. When evaluated against previously unknown farm surveillance data, this approach provides a comprehensive assessment of these models’ capabilities in terms of accurate identification and categorization, thereby reinforcing the foundation of the system’s reliability and efficacy. Figure 14 compares the results of the models in the classification of images from the test dataset.

The confusion matrices for four alternative neural network models are depicted in Figure 15: Inception, Xception, AlexNet, and VGG. These matrices depict the models’ categorization performance across three categories: ‘animal’, ‘clear’, and ‘human’. The Inception model excels at correctly categorizing the ‘animal’ category, with just two misclassifications as ‘human’ and none as ‘clear’. The ‘clear’ class classification is practically flawless, with 19 accurate identifications and 1 erroneous identification as ‘human’. In the ‘human’ category, 16 photographs are accurately identified, while 4 are mislabeled as ‘animal’. The Xception model does an excellent job, properly classifying 190 photographs in the ‘animal’ category. In the ‘clear’ category, it made 19 correct choices and only one misclassification as ‘animal’. Its ‘human’ categorization, on the other hand, is less precise, with 12 proper classifications and 8 photographs mistakenly labeled as ‘animal’. Moving on to AlexNet, the ‘animal’ class contained 178 valid identifications, with 6 and 7 misclassifications as ‘clear’ and ‘human’. The ‘clear’ class was chosen erratically, with 5 correct identifications and 15 misclassifications as ‘animal.’ The ‘human’ class produced unpredictable results, with 16, 1, and 3 photographs tagged as ‘animal’, ‘clear’, and accurately identified as ‘person’, respectively. Finally, in the ‘animal’ category, the VGG model performs brilliantly, properly categorizing 189 photographs, with only one error for ‘clear’ and ‘human’ each. In comparison to Yolo V6, which occasionally misclassified objects as either ‘animal’ or ‘person’, Yolo V7 significantly improved, showing fewer misclassifications and a larger percentage of true positives—a sign of increased model performance in terms of accurately identifying the given classes. The ‘clear’ class exhibited precise identification by Yolo V7, while the Yolo V8 model stands out, with 189 images correctly recognized as ‘animal’, and only 2 images misclassified as ‘clear’ and none as ‘human’. The ‘clear’ and ‘human’ classes were identified beautifully, with all 20 photographs accurately detected without errors.

The performance characteristics of five distinct neural network models are compared in Table 3. These models are Inception, Xception, VGG16, AlexNet, and the versions of Yolo used. The evaluation comprises four important metrics: accuracy, precision, recall, and F1_score, which provide insight into the models’ specific performance features. Inception achieves a 96.96% accuracy, nearly 97% precision and recall, and harmonic balance with a 96.77% F1_score. While Xception trails in accuracy somewhat (96.1%), it maintains a strong balance, with an F1_score of 95.87%. VGG16 performs brilliantly, with a 95.67% accuracy and an F1_score of 95.29%. AlexNet, on the other hand, displays a considerable drop in accuracy at 80.51%, as well as degraded precision and a lower F1_score, indicating potential constraints in precision-focused tasks. Yolo v6 and v7 also performed satisfactorily well, with above 90% accuracy. YoloV8, on the other hand, is the standout performer, displaying astounding brilliance across all metrics. To summarize, selecting an ideal model should not be based just on accuracy, but rather on a thorough examination of the sensitive trade-off between precision and recall, particularly in cases where misclassifications have major effects.

3.3. Evaluation of the Models Using the Secondary Dataset

It is impossible to overstate the significance of verifying the performance of machine learning models on a secondary dataset. This stage is essential for evaluating the models’ ability to generalize their learned patterns beyond the limits of the initial dataset. In real-world scenarios, these models will encounter data variances, distributions, and properties that differ from their training and validation sets. The addition of a new dataset simulates these real-world scenarios, ensuring that the models remain resilient and reliable when presented with previously unreported data. It contributes to demonstrating that the models’ usefulness is not limited to specific data but extends to a broader range, increasing their practical use and bolstering their credibility. Figure 16 compares the results of the models in the classification of images from the secondary dataset.

Figure 17 shows the confusion matrices for the performance of five different neural network models on the secondary dataset, including Inception, Xception, AlexNet, VGG, and Yolo V7, as well as their performance in classifying photographs into three categories: ‘animal’, ‘clear’, and ‘human’. Notable results include the Inception model correctly identifying 38 of 39 ’animal’ images and the Xception model correctly classifying all 39 ‘animal’ images. However, both Inception and Xception made slight misclassifications, most notably when distinguishing ‘animal’ from other categories. The AlexNet model struggles with ‘clear’ classification, which may reveal flaws in its category differentiation. VGG offers reliable findings that are comparable to Inception but differ in subtle ways. Yolo V6 made a few misclassifications, most notably between ‘animal’ and ‘clear’, as well as ‘human’. On the other hand, Yolo V8 shows a notable improvement, with reduced misclassifications for ‘animal’ and ‘clear’ and perfect classification for ‘human’, suggesting improved accuracy in differentiating between these categories. The most accurate model is the Yolo V8, which achieved a flawless classification of all categories.

Table 4 compares the performance characteristics of five neural network models: Inception, Xception, VGG16, AlexNet, and the versions of Yolo used. Inception scores a respectable 90% accuracy and recall, displaying a harmonic balance, with an F1_score of 90%. Xception maintains an accuracy of 90%, comparable to Inception, and a recall of 89%, resulting in an F1_score of 88%, favoring recall somewhat. VGG16 reaches a comparable accuracy of 89%, matching Xception’s results. It attaing a 89% accuracy and recall, yielding an F1_score of 88%, matching the balance observed in Xception. AlexNet, on the other hand, trails significantly, with an accuracy of 68% and a precision of 63%. It has a recall rate of 68% and an F1_score of 62%, showing a substantial deficiency in balancing accuracy and recall. Yolo v6 and Yolo v7 have accuracies of 89% and 92%. With an astounding 98% accuracy and 99% precision, YoloV8 emerges as the epitome of brilliance. It has an excellent balance, with a recall of 98% that matches its F1_score of 98%. YoloV8 significantly beats its opponents in every category. Inception outperforms Xception and VGG16, which generate extremely similar outcomes. AlexNet, on the other hand, trails significantly across the board, signaling that it is inappropriate for applications demanding high accuracy or recall.

3.4. Evaluation of the Models on the Farm Dataset

Evaluating machine learning models on the Farm Dataset, which comprises photos from local farms, is crucial since it reflects the sorts of real-world scenarios in which the models are to be applied. The significance of this assessment arises from its ability to recreate the exact conditions and challenges that these models would face in real-world applications. The performance evaluation of the models using farm-specific data assures their responsiveness to the unique characteristics and problems associated with agricultural surveillance. This test validates the models’ ability to perform successfully in a real-world context, where they must consistently recognize and categorize objects and intrusions in agricultural settings, proving their practicability. It analyzes not only their technical proficiency but also their potential to boost agricultural security and efficiency. Figure 18 compares the results of the models in the classification of images from the real farm dataset.

Figure 19 illustrates the confusion matrices of four different neural network models—Inception, Xception, AlexNet, and VGG—which were tested for their ability to classify images into three distinct categories: ‘animal’, ‘clear’, and ‘human’. A closer look at the results reveals some intriguing discoveries. In the case of the Inception model, there is substantial accuracy in detecting ‘clear’ images, but significant misclassifications in terms of ‘animal’ categorization. Meanwhile, Xception scores well, notably in properly recognizing ‘human’ images, but it is substantially worse at distinguishing ‘animal’ ones. Despite major difficulty in ‘clear’ classification, AlexNet delivers an exceptional overall performance. While the VGG model is good at recognizing ‘clear’ images, it has a hard time distinguishing between ‘animal’ and ‘clear’. Every class in the Yolo V6 model shows some misunderstanding, most notably when ‘clear’ is mistakenly classified as ‘person’. However Yolo V7 performs better, classifying ‘animal’ as accurately as possible, misclassifying ‘clear’ less frequently, and misclassifying ‘human’ in a tiny percentage of cases. The Yolo V8 model, on the other hand, remains the peak of precision, with few misclassifications and a noticeable advantage in accuracy. These matrices provide a visual framework for comparing the models’ overall performance, enabling better model selection.

Table 5 contains extensive information on the performance of five distinct neural network models, including Inception, Xception, VGG16, AlexNet, and the versions of Yolo we used. An in-depth analysis of their performance reveals the following information.

Inception accurately predicted the outcomes of 78% of the test samples. It has an impressive 77% precision, which implies that 77% of its positive classifications were right. Inception detected 77% of true positives, with a recall rate of 77% and an accuracy rate of 77%. Its F1_score of 76% also shows a solid mix of accuracy and recall. Xception lagged behind Inception in terms of its precision, which dropped to 67%, indicating that only 67% of positive samples were genuinely positive. Its recall stayed at 77%, indicating that it could detect 77% of true positives. However, compared to Inception, its F1_score of 72% implies a modest decrease in accuracy. VGG16 effectively predicted the outcomes of 74% of samples, leading to a 74% accuracy. The precision of the model is 64%, which implies that 64% of its positive predictions were right. With a recall rate of 74%, VGG16 accurately spotted 74% of positive occurrences. The model’s F1_score of 68% shows a balanced trade-off between precision and recall, however it is lower than that of Inception and Xception. AlexNet has a 72% accuracy, meaning it predicted the outcomes of 72% of the test samples accurately. However, its precision is 64%, implying that only 64% of its positive classifications were correct. This translates to a recall rate of 72%, meaning that 72% of true positive instances were documented. Its F1_score of 67% shows a close balance of accuracy and recall, similar to VGG16’s performance. Yolo v6 and Yolo v7 performed, comparatively, much better than the four models above. YoloV8 outperforms the competition with 99% accuracy, almost flawlessly guessing outcomes. Its 98% precision implies that virtually all events classified as positive were genuine positives. Based on a 98% recall rate, YoloV7 detected 98% of genuine positive instances. Furthermore, YoloV8’s top-tier performance is supported by an F1_score of 98%, demonstrating an excellent mix of accuracy and recall. YoloV8 earns the top spot, with near-perfect ratings in all categories. Although Inception and Xception have the same accuracy and recall rates, Xception is less exact. Both VGG16 and AlexNet exhibit lower accuracy values when compared to the leading models, resulting in somewhat lower F1 ratings. Nonetheless, model selection should be firmly matched with the application’s specific objectives, especially in terms of false positives and false negatives.

3.5. Comparison with Previous Research

In the model comparison in Table 6, Kragh et al.’s SVM has an accuracy of 91.60%, indicating its competence in binary classification tasks. Raghuvanshi et al.’s SVM, on the other hand, reaches an amazing 97%, highlighting the potential of SVMs with refined applications. Their RF model, on the other hand, has a lower score of 78%, indicating the heterogeneity of its performance within the same research. The W-CoHOG method developed by Andavarapu et al. achieves 93.30% accuracy, confirming the usefulness of its feature extraction technique. Roy et al.’s Bi-LSTM, built for sequential data, achieves 95% accuracy, demonstrating its suitability for time-series data. H.V. Le et al.’s CNN achieved 97% accuracy, demonstrating CNNs’ prowess with grid-based data. Yolo V8 achieves 99% accuracy, demonstrating YOLO’s object detection capabilities. Nonetheless, given the variety of methodologies used to solve the problem, the choice should be aligned with the unique needs of the task and subtleties of the dataset.

3.6. Result of Detection from Real Demo Camera

Figure 20, a composite, shows a comparison of livestreaming and object identification utilizing a sophisticated video processing system. The “Live Transmission” part on the left shows raw footage: the top frame shows an Ayrshire cow in its natural habitat, while the bottom frame shows two people. Figure 21 depicts the intrusion alarm alert messages received by the user. The “Detection” part on the right, in contrast, illustrates the system’s capacity to recognize and name things within the same situations. The cow, now labeled with a bounding box, shows the system’s capacity to recognize and classify animals. Similarly, one of the people in the bottom right shot is highlighted with a bounding box labeled “person”, indicating the system’s skill in recognizing humans. The contrast between the raw feed and processed output emphasizes the system’s efficacy in real-time object identification and the display of contextual information.

3.7. Achieving Real-Time Intrusion Detection

The system supports data transfer between the ESP32 sensor and the cloud, leveraging many design decisions to ensure real-time operation with minimal latency. The real-time capability of the SFSS is achieved by a mix of lightweight, optimized models and efficient communication protocols.

Optimized Machine Learning Model: The YOLOv8 model is intended for real-time object identification in particular. Its design enables high-speed image processing with minimum computing cost, especially when working on resource-constrained devices like the ESP32. The model’s quick inference time—typically under 30 milliseconds per frame—ensures that object recognition occurs virtually instantly upon data reaching the cloud server.
Efficient Data Transmission Protocols: To minimize transmission delays, the system leverages the MQTT (Message Queuing Telemetry Transport) protocol, which is optimized for low-latency, lightweight data transport. The MQTT is particularly suitable for IoT applications [28,29,30,31], providing a speedy and reliable connection between the ESP32 sensors and the cloud, thereby ensuring that data are delivered and analyzed in near real time.
Data Compression and Prioritization: The system leverages image compression techniques to minimize the size of the data transported from the ESP32 to the cloud, further lowering transmission time. Additionally, only crucial information—specifically frames indicating possible intrusions—is transferred to the cloud for processing, limiting the volume of data that require real-time management.
Cloud Processing Power: The cloud infrastructure deployed in this system is designed to grow automatically based on the incoming data load, eliminating processing bottlenecks. The elastic nature of the cloud means that the model can accommodate enormous data volumes without them affecting its real-time performance.
Latency Management: The whole round-trip time, from data collection to alert production, is closely managed to keep it within real-time restrictions. Empirical testing suggests that the whole delay—from data acquisition at the sensor to warning delivery—typically runs from 100 to 200 milliseconds, which is well within acceptable limits for real-time monitoring systems.

3.8. Reasons for Yolo V8 to Outperform the Other Models

In comparison to previous iterations and other prevalent models like Inception, Xception, VGG16, and AlexNet, the superior capabilities of the pretrained YOLOv8 model may be attributed to several pivotal enhancements and optimizations:

Advanced Architecture: YOLOv8 incorporates more sophisticated architectural changes, building upon the qualities of its predecessors. Improved feature extraction layers and complicated multi-scale object detection handling techniques are probably among these improvements, which lead to a more powerful and nuanced comprehension of complex visual inputs.
Enhanced Generalization: There has been a noticeable improvement in the model’s generalization capacity across different datasets and circumstances. Because agricultural surveillance frequently deals with changing situations and unknown components, this resilience is essential. Regardless of changes in the environment, YOLOv8 constantly achieves great performance thanks to its enhanced generalization.
Streamlined Efficiency: Efficiency, which is a defining feature of the YOLO series, is still emphasized in YOLOv8, but in a more simplified manner that cuts down on duplication and concentrates on critical calculations. This modification probably leads to substantially quicker inference times without sacrificing detection granularity or model fidelity.
Precision in Localization: It is anticipated that YOLOv8 has even more accurate object localization, which is crucial for real-time monitoring applications like agricultural surveillance. A proactive approach to managing agricultural resources can be substantially improved by a model having the capacity to precisely identify and categorize possible risks or incursions.
State-of-the-art Training Techniques: It is likely that YOLOv8 has upgraded its training methods to take advantage of the most recent developments in deep learning. These include innovative regularization techniques, loss functions, and data augmentation tactics that work together to enhance the model’s remarkable learning ability.
Hardware Optimization: It is possible that YOLOv8 keeps optimizing for GPU acceleration, making the most of contemporary technology. This optimization guarantees that the model may be deployed in field applications with strict real-time requirements and that it functions well in research settings as well.

YOLOv8’s excellent performance may be attributed to its sophisticated architecture, enhanced generalization, increased efficiency, accurate localization, state-of-the-art training methods, and optimized utilization of hardware. All of these elements work together to guarantee that YOLOv8 not only builds upon the achievements of its predecessor, YOLOv7, but also brings about important new developments that push the limits of object detection technology—particularly in demanding and dynamic domains like agricultural monitoring.

3.9. Performance Evaluation Under Challenging Environmental Conditions

In addition to the core evaluation of the Smart Farm Surveillance System (SFSS) using conventional datasets, analyzing the system’s ability to adjust to varying environmental conditions, such as changes in weather and lighting, is essential. In order to achieve this, an additional dataset of 80 photos was gathered from multiple Kaggle datasets, containing difficult situations like hazy photos, different weather (from rainy to sunny and cloudy), and midnight photos. This dataset was created especially to assess the YOLOv8 model’s resilience in practical settings where environmental influences might have a big influence on image quality.

Testing the YOLOv8 model on this difficult dataset showed that its intrusion detection accuracy was around 50%. Considering that there were not any images of these challenging situations in the training and testing datasets, this performance is remarkable. The YOLOv8 model’s performance under these difficult circumstances is shown in Figure 22, which also demonstrates the model’s detection ability across a range of image quality levels.

The absence of training data containing difficult conditions, like low-resolution, fuzzy, or poorly lit photos, is the reason for this decreased accuracy. These circumstances may cause misclassifications or missed detections by obscuring important characteristics needed for precise object detection. For example, low-contrast and inadequate illumination are common problems with nighttime photos, and cloudy or wet weather can bring noise and distortions that make it more difficult for the model to correctly identify things.

The YOLOv8 architecture’s intrinsic durability and adaptability are suggested by the model’s remarkable 50% accuracy under such conditions. The results show that although the model performs at a respectable level, there is still much room for improvement. In order to increase the model’s accuracy and dependability in practical applications, future research could concentrate on supplementing the training dataset with pictures that depict these difficult circumstances. Also, investigating possible mitigation strategies, like using image preprocessing methods to improve clarity and resolution, could strengthen the system’s functionality in unfavorable environmental conditions.

4. Discussion

The Smart Farm Surveillance System (SFSS) represents a transformative step forward in agricultural security, delivering innovative and efficient ways to secure crops and livestock. By leveraging cutting-edge IoT technology and powerful machine learning models, this system addresses significant issues in real-time intrusion detection while maintaining an ability to adaptat to varied agricultural situations. Beyond its immediate uses, the SFSS lays the framework for addressing deeper ramifications, including ethical considerations, cost-effectiveness, and sustainability, which are crucial for its widespread adoption and long-term success. This section digs into these areas, demonstrating the system’s existing influence and possibilities for future enhancements.

The deployment and maintenance of the Smart Farm Surveillance System (SFSS) entail both hardware and software costs, making it a cost-effective solution for farm security. The ESP32 controller, with its camera module, a critical component of the system, costs roughly INR 800. Other hardware components, such as speakers, motors, and a power supply, range between INR 500 and INR 1500, depending on the exact requirements and quality of the equipment. On the software side, the cloud servers used for real-time data processing and storage have a recurrent expense. Monthly cloud service rates can vary greatly, often ranging from INR 1500 to INR 5000, depending on the volume of data processed and the specific cloud provider. This clear cost structure guarantees that potential adopters may manage their budgets wisely while benefiting from the system’s robust features.

The energy efficiency of the SFSS is a significant part of its sustainable implementation. The ESP32 microcontroller, noted for its low power consumption, runs efficiently even during continuous monitoring, making it perfect for IoT-based applications. The adoption of optimal machine learning models like YOLOv8 ensures a low computational overhead, substantially reducing energy usage. On the cloud side, current cloud servers are designed with energy-efficient architectures, employing advanced cooling systems and renewable energy sources in many instances. By combining low-power IoT components with energy-conscious cloud infrastructure, the SFSS not only lowers its environmental imprint but also ensures its cost-effective and sustainable operation, complying with the principles of green technology.

As the Smart Farm Surveillance System (SFSS) continues to improve, it is necessary to address potential ethical difficulties affecting privacy, particularly in the context of human intrusion detection. The implementation of surveillance technologies raises concerns regarding the monitoring of persons without their agreement, which could lead to privacy infringements and public outcry. However, it is vital to emphasize that similar ethical difficulties are currently managed in public venues, such as shops and malls, where CCTV cameras are often put. In these cases, adequate disclosure, transparency in data harvesting, and adherence to privacy regulations help mitigate these concerns. By adopting comparable strategies, the SFSS can develop norms and standards that ensure responsible monitoring while explaining the system’s purpose and advantages to local communities. Incorporating aspects such as the anonymization of recorded data and stringent access limits can further guarantee individual privacy while still enabling effective intrusion detection. By proactively addressing these ethical challenges, the SFSS can increase its validity and promote its own wider deployment in agricultural settings.

5. Future Scope

In its revolutionary approach to intrusion detection, the SFSS not only delivers efficient and secure solutions for farms, but it also opens up various paths for future advancements and expansions. This section investigates possible future breakthroughs and areas of growth in the hardware’s architecture.

Future research may concentrate on improving the resilience of object detection models by training them on a selection of challenging photos, including those with blurred features, fluctuating lighting conditions, and poor weather conditions. This method would enhance the model’s capacity to generalize and function efficiently in practical scenarios. Challenging scenarios can be solved by using complex image processing techniques, such as histogram equalization, Gaussian filtering, and edge enhancement, to preprocess the images before feeding them into the object detection model. A viable avenue is the incorporation of night vision cameras alongside specialist imaging technologies, including thermal and infrared cameras, which may obtain high-quality photos in low-light or adverse weather circumstances. These enhancements would considerably increase the system’s durability and flexibility under varied environmental circumstances. Some more topics of future research include the following:

Low-Cost 360-Degree Camera Integration: The system may be modified in the future to include low-cost 360-degree cameras. This upgrade would give an even more complete perspective of the farm, ensuring that every area is always monitored. This development has the potential to dramatically improve the system’s security capabilities.
Owner Recognition and Exemption: The inclusion of owner recognition capabilities is a potential avenue for future growth. By storing the farm owner’s information in the system’s database, the technology could differentiate between the owner and intruders, significantly decreasing false alerts. When the owner is there, the system could stay dormant, reducing needless alarms.
Ultrasonic Sound Integration: Future studies might concentrate on adding ultrasonic noises into the capabilities of this system. Ultrasonic noises are well known for their ability to discourage animals. When intruders are discovered, the device may give an additional degree of safety by frightening away unwelcome animals.
Animal Type Identification and Adaptive Sound Generation: Modifying the system to recognize various sorts of animals visiting the farm is a fascinating path for future research. The system might then generate noises known to discourage certain animals. If a pet animal is identified, the system may emit noises that imitate the presence of bigger predators such as lions or tigers. This adaptive strategy enables a personalized reaction to various sorts of intrusions.
Live video access for Farmers: Future advancements might include giving farmers direct access to live video of their farm upon incursion detection, enhancing the system’s capabilities. Farmers would be able to watch the farm in real time, enabling them to evaluate the situation and take appropriate action.
Application Beyond Agriculture: This SFSS study may be applied to a variety of different sectors. Incorporating drones into the architecture brings up a world of possibilities, including railway track monitoring, border security, animal protection, forest fire detection, disaster response, industrial safety, and environmental monitoring. This technology has the ability to transform a wide range of industries by delivering efficient and cost-effective solutions.

The SFSS’s future offers the possibility of not only improving its capabilities for farm intrusion detection, but also of broadening its application to other regions, altering the way security and monitoring are conducted across many industries.

6. Conclusions

The SFSS is an innovative and efficient method of intrusion detection, providing farmers with a dependable and secure solution. This smart monitoring system uses cutting-edge technology and AWS capabilities to allow farmers to protect their crops and animals proactively. The technology guarantees farm security by offering real-time object detection, rapid alerts, and speaker control. The SFSS’s adaptability expands its value beyond the agricultural industry. The incorporation of cloud-based object identification opens up a world of possibilities in industries ranging from railway track monitoring to border security, wildlife protection, forest fire detection, disaster response, industrial safety, environmental monitoring, agricultural pest control, and oil and gas pipeline inspection. This adaptive technology revolutionizes surveillance approaches by providing real-time monitoring and threat identification, boosting security across a wide range of industries.

The SFSS is more than just a solution to today’s problems; it is also a foundation for future advancements. As technology advances and new possibilities emerge, the system’s ability to solve the security demands of a variety of industries will only improve. The SFSS is a beacon of innovation and efficiency in the surveillance and security industry, constantly refining intrusion detection systems and increasing its range of applications. This smart monitoring system shows the potential of employing technology to preserve and defend assets, whether they are crops on a farm or important infrastructure in other sectors, with its cutting-edge features and versatility.

Author Contributions

Conceptualization, T.S.D. and M.S.; methodology, S.M. and T.S.D.; validation, A.S.M.S.H. and A.K.; formal analysis, A.S.M.S.H. and M.S.; investigation, Y.-w.L. and T.S.D.; resources, Y.-w.L. and J.-Y.R.; data curation, S.M. and Y.-w.L.; writing—original draft preparation, A.S.M.S.H. and T.S.D.; writing—review and editing, Y.-w.L. and J.-Y.R.; supervision, M.S., Y.-w.L. and J.-Y.R.; funding acquisition, A.S.M.S.H. and J.-Y.R. All authors have read and agreed to the published version of this manuscript.

Funding

This work was supported by the Woosong University Academic Research Fund 2025, South Korea.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

We are thankful to the National Research Foundation (NRF) Korea for sponsoring this research under Project BK21 FOUR (Smart Robot Convergence and Application Education Research Center, PKNU).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Anoop, N.R.; Krishnan, S.; Ganesh, T. Elephants in the farm–changing temporal and seasonal patterns of human-elephant interactions in a forest-agriculture matrix in the Western Ghats, India. Front. Conserv. Sci. 2023, 4, 1142325. [Google Scholar]
Sinclair, M.; Fryer, C.; Phillips, C.J.C. The Benefits of Improving Animal Welfare from the Perspective of Livestock Stakeholders across Asia. Animals 2019, 9, 123. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Dhanaraju, M.; Chenniappan, P.; Ramalingam, K.; Pazhanivelan, S.; Kaliaperumal, R. Smart Farming: Internet of Things (IoT)-Based Sustainable Agriculture. Agriculture 2022, 12, 1745. [Google Scholar] [CrossRef]
Mane, A.; Mane, A.; Dhake, P.; Kale, S. Smart Intrusion Detection System for Crop Protection. Int. Res. J. Eng. Technol. IRJET 2022, 9, 2921–2925. [Google Scholar]
Kommineni, M.; Lavanya, M.; Vardhan, V.H. Agricultural farms utilizing computer vision (ai) and machine learning techniques for animal detection and alarm systems. J. Pharm. Negat. Results 2022, 13, 3292–3300. [Google Scholar]
Yadahalli, S.; Parmar, A.; Deshpande, A. Smart intrusion detection system for crop protection by using Arduino. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; pp. 405–408. [Google Scholar]
Geetha, D.; Monisha, S.P.; Oviya, J.; Sonia, G. Human and Animal Movement Detection in Agricultural Fields. SSRG Int. J. Comput. Sci. Eng. 2019, 6, 15–18. [Google Scholar]
Andavarapu, N.; Vatsavayi, V.K. Wild-animal recognition in agriculture farms using W-COHOG for agro-security. Int. J. Comput. Intell. Res. 2017, 13, 2247–2257. [Google Scholar] [CrossRef]
Gogoi, M. Protection of Crops From Animals Using Intelligent Surveillance System. J. Appl. Fundam. Sci. 2015, 1, 200. [Google Scholar]
Christiansen, P.; Nielsen, L.N.; Steen, K.A.; Jørgensen, R.N.; Karstoft, H. DeepAnomaly: Combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field. Sensors 2016, 16, 1904. [Google Scholar] [CrossRef]
Kragh, M.; Jørgensen, R.N.; Pedersen, H. Object detection and terrain classification in agricultural fields using 3D lidar data. In International Conference on Computer Vision Systems; Springer International Publishing: Cham, Switzerland, 2015; pp. 188–197. [Google Scholar]
Quy, V.K.; Hau, N.V.; Anh, D.V.; Quy, N.M.; Ban, N.T.; Lanza, S.; Randazzo, G.; Muzirafuti, A. IoT-enabled smart agriculture: Architecture, applications, and challenges. Appl. Sci. 2022, 12, 3396. [Google Scholar] [CrossRef]
AlZubi, A.A.; Galyna, K. Artificial Intelligence and Internet of Things for Sustainable Farming and Smart Agriculture. IEEE Access 2023, 11, 78686–78692. [Google Scholar] [CrossRef]
Lee, S.; Ahn, H.; Seo, J.; Chung, Y.; Park, D.; Pan, S. Practical monitoring of undergrown pigs for IoT-based large-scale smart farm. IEEE Access 2019, 7, 173796–173810. [Google Scholar] [CrossRef]
Vangala, A.; Das, A.K.; Chamola, V.; Korotaev, V.; Rodrigues, J.J. Security in IoT-enabled smart agriculture: Architecture, security solutions and challenges. Clust. Comput. 2023, 26, 879–902. [Google Scholar] [CrossRef]
Demestichas, K.; Peppes, N.; Alexakis, T. Survey on security threats in agricultural IoT and smart farming. Sensors 2020, 20, 6458. [Google Scholar] [CrossRef]
Elbeheiry, N.; Balog, R.S. Technologies driving the shift to smart farming: A review. IEEE Sens. J. 2022, 23, 1752–1769. [Google Scholar] [CrossRef]
Ayaz, M.; Ammad-Uddin, M.; Sharif, Z.; Mansour, A.; Aggoune, E.H.M. Internet-of-Things (IoT)-based smart agriculture: Toward making the fields talk. IEEE Access 2019, 7, 129551–129583. [Google Scholar] [CrossRef]
Hafeez, A.; Husain, M.A.; Singh, S.P.; Chauhan, A.; Khan, M.T.; Kumar, N.; Chauhan, A.; Soni, S.K. Implementation of drone technology for farm monitoring & pesticide spraying: A review. Inf. Process. Agric. 2022, 10, 192–203. [Google Scholar]
de Araujo Zanella, A.R.; da Silva, E.; Albini, L.C.P. Security challenges to smart agriculture: Current state, key issues, and future directions. Array 2020, 8, 100048. [Google Scholar] [CrossRef]
Shukla, A.; Jain, A. Smart Automated Farming System using IOT and Solar Panel. Sci. Technol. J. 2019, 7, 22–32. [Google Scholar] [CrossRef]
Panjaitan, S.D.; Dewi, Y.S.K.; Hendri, M.I.; Wicaksono, R.A.; Priyatman, H. A Drone Technology Implementation Approach to Conventional Paddy Fields Application. IEEE Access 2022, 10, 120650–120658. [Google Scholar] [CrossRef]
Pagano, A.; Croce, D.; Tinnirello, I.; Vitale, G. A Survey on LoRa for Smart Agriculture: Current Trends and Future Perspectives. IEEE Internet Things J. 2022, 10, 3664–3679. [Google Scholar] [CrossRef]
Hong, S.J.; Park, S.; Lee, C.H.; Kim, S.; Roh, S.W.; Nurhisna, N.I.; Kim, G. Application of X-ray imaging and convolutional neural networks in the prediction of tomato seed viability. IEEE Access 2023, 11, 38061–38071. [Google Scholar] [CrossRef]
Deshpande, A.V. Design and implementation of an intelligent security system for farm protection from wild animals. Int. J. Sci. Res. ISSN Online 2016, 5, 2319–7064. [Google Scholar]
Pandey, S.; Bajracharya, S.B. Crop protection and its effectiveness against wildlife: A case study of two villages of Shivapuri National Park, Nepal. Nepal J. Sci. Technol. 2015, 16, 1–10. [Google Scholar] [CrossRef]
Chang, B.R.; Tsai, H.F.; Hsieh, C.W. Accelerating the Response of Self-Driving Control by Using Rapid Object Detection and Steering Angle Prediction. Electronics 2023, 12, 2161. [Google Scholar] [CrossRef]
Available online: https://github.com/ultralytics/ultralytics/issues/189 (accessed on 2 November 2023).
Rao, K.; Maikhuri, R.; Naut iyal, S.; Saxena, K.G. Crop damage and livestock depredation by wildlife: A case study from Nanda Devi biosphere reserve, India. J. Environ. Manag. 2002, 66, 317–327. [Google Scholar] [CrossRef]
Bavane, V.; Raut, A.; Sonune, S.; Bawane, A.; Jawandhiya, P. Protection of crops from wild animals using Intelligent Surveillance System. Int. J. Res. Advent Technol. IJRAT 2018, 10, 2321–9637. [Google Scholar]
Vigneshwar, R.; Maheswari, R. Development of embedded based system to monitor elephant intrusion in forest border areas using internet of things. Int. J. Eng. Res. 2016, 5, 594–598. [Google Scholar]
Raghuvanshi, A.; Singh, U.K.; Sajja, G.S.; Pallathadka, H.; Asenso, E.; Kamal, M.; Singh, A.; Phasinam, K. Intrusion detection using machine learning for risk mitigation in IoT-enabled smart irrigation in smart farming. J. Food Qual. 2022, 2022, 3955514. [Google Scholar] [CrossRef]
Roy, B.; Cheung, H. A deep learning approach for intrusion detection in internet of things using bi-directional long short-term memory recurrent neural network. In Proceedings of the 28th International Telecommunication Networks and Applications Conference, Sydney, NSW, Australia, 21–23 November 2018; pp. 1–6. [Google Scholar]
Le, H.V.; Ngo, Q.-D.; Le, V.-H. Iot botnet detection using system call graphs and one-class CNN classification. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 937–942. [Google Scholar] [CrossRef]

Figure 1. Animal intrusion in agricultural fields.

Figure 2. Proposed methodology of the system.

Figure 3. Architectural diagram of intrusion detection using an SFSS.

Figure 4. Sample images from primary dataset.

Figure 5. Sample images from secondary dataset.

Figure 6. Sample images from real farm dataset.

Figure 7. Model selection.

Figure 8. Architecture of YOLOv7 [27].

Figure 9. Architecture of YOLOv8 [28].

Figure 10. Design of the system’s hardware.

Figure 11. Circuit diagram of the hardware system.

Figure 12. Training the ML models.

Figure 13. Comparative analysis of detection abilities of different models on test dataset.

Figure 14. Confusion matrices of the performance of the models on the test dataset.

Figure 15. Comparative analysis of detection capabilities of different models on secondary dataset.

Figure 16. Confusion matrices of the performance of the models on the secondary dataset.

Figure 17. Comparative analysis of detection capabilities of different models on the farm dataset.

Figure 18. Real-world implementation of the hardware.

Figure 19. Confusion matrices of the performance of the models on the real farm dataset.

Figure 20. Detection using Yolo V7 upon ESP32 transmission.

Figure 21. Alert messages sent to owner.

Figure 22. Evaluation of the performance of Yolo V8 on challenging images (blurry, varied weather, and different lighting conditions).

Table 1. Summary of research contributions of other studies.

Authors	Approach	Features	Contribution
Dhake et al., 2022 [4]	Sensor technology (PIR and ultrasonic)	Motion detection, intruder classification	Automated agricultural security system with real-time alerts and intrusion categorization.
Vardhan et al., 2022 [5]	Computer vision and AI	Animal detection, auditory alarm, notifications	AI-powered wildlife intrusion detection system with real-time alerts and animal identification.
Yadahalli et al., 2020 [6]	Multi-sensor system (PIR, ultrasonic, camera)	Intrusion detection, visual and auditory alerts	Comprehensive agricultural security framework with intruder categorization and real-time notifications.
Geetha et al., 2019 [7]	Sensor technology (PIR, LDR, flame sensors) and IoT	Animal detection, auditory and visual alerts, forest fire detection	Smart farm protection system with diverse sensor capabilities, real-time alerts, and additional features.
Andacarapu et al., 2017 [8]	Computer vision (W-CoHOG)	Animal detection, feature extraction, classification	Computer vision-based wildlife intrusion detection system with improved feature extraction and accuracy.
Gogoi et al., 2015 [9]	Image processing (SIFT)	Animal detection, object identification	Hybrid approach for animal identification in agricultural fields using image processing and SIFT.
Nielsen et al., 2016 [10]	Deep learning and anomaly detection (DeepAnomaly)	Anomaly detection, human recognition	Deep learning-based system for accurate anomaly detection in agricultural areas, surpassing RCNN.
Kragh et al., 2015 [11]	Support vector machine	Object recognition and terrain classification	High overall classification accuracy for identifying various objects and vegetation in agricultural settings.
Lanza et al., 2022 [12]	Content analysis (2017–2022)	IoT in smart agriculture and architecture applications	Guidelines for IoT use, IoT’s importance, and the categorization of its applications, benefits, challenges, and future research directions.
Galyna et al., 2023 [13]	IoT and AI framework	IoT building blocks for the creation of smart sustainable agriculture platforms	Real-time weather data collecting is made possible by smart sensors.
Lee et al., 2019 [14]	Deep learning- based computer vision	Undergrown pig detection	Early warning system for undergrown pigs, combining image processing, deep learning, and real-time detection.
Vangala et al., 2023 [15]	Analysis of authentication and access control protocols	Security in smart agriculture	Proposed independent architecture, security requirements, threat model, attacks, and protocol performance.
Peppes et al., 2020 [16]	Analysis of ICT’s function in agriculture, focusing on new threats and weaknesses	IoT in smart agriculture and architecture applications	ICT breakthroughs, approaches, and mitigation in agriculture, with emphasis on their benefits and potential concerns.
Balog et al., 2023 [17]	Content analysis (2015 to 2021)	IoT in smart agriculture and architecture applications	Describes common SF practices, problems, and solutions while classifying contributions into key SF technologies and research areas.
Ayaz et al., 2019 [18]	Analysis of wireless sensors and IoT in agriculture	IoT applications, wireless sensors, UAVs	Examines IoT use in agriculture, sensors, and UAVs and the potential benefits for precision agriculture.
Hafeez et al., 2022 [19]	Drone technology	Aerial monitoring, data collection, precision agriculture	Drones offer solutions to outdated farming methods, enable crop assessments, and improve crop management. Discusses advancements in drone technology for precision agriculture.
Araujo Zanella et al., 2020 [20]	Smart agriculture	IoT, big data, precision agriculture	Explores security challenges in Agriculture 4.0 and proposes solutions. Highlights the importance of security, computational capabilities, and the role of edge computing.
Shukla et al., 2019 [21]	Eco-friendly farming solution	IoT, real-time monitoring	Proposes an electronically operated system for Indian farming, addressing water problems, humidity, temperature control, electricity supplies, and lighting.
Panjaitan et al., 2022 [22]	Drones for spraying	UAVs, precision agriculture	Discusses drone implementation in paddy fields, assesses drone performance, and highlights the impact of drone-based spraying.
Pagano et al., 2022 [23]	LoRa/LoRaWAN in agriculture	IoT, LoRaWAN, wireless connectivity	Focuses on LoRa/LoRaWAN technology in agriculture, addressing challenges and suggesting solutions. Highlights the role of machine learning, AI, and edge computing.
Hong et al., 2023 [24]	Seed viability prediction	X-ray imaging, deep learning	Develops viability prediction technologies for tomato seeds using X-ray imaging. Achieves high accuracy in distinguishing between viable and non-viable seeds.

Table 2. Comparison of object detection models.

Model	Architecture	Detection Speed	Detection Accuracy	Training Dataset Size	Hardware Compatibility
Inception	Traditional CNN	Moderate	High	Large	GPU
Xception	Depthwise Separable Convolution	High	High	Large	GPU
AlexNet	Standard CNN	Moderate	Moderate	Moderate	CPU/GPU
VGG16	Standard CNN	Moderate	High	Large	GPU
Yolo Models	Single-Shot Detection	High	High	Moderate	GPU

Table 3. Performance measures of different models on the test dataset.

Model	Accuracy	Precision	Recall	F1_Score
Inception	96.96	96.89	97	96.77
Xception	96.10	96.42	96.1	95.87
VGG16	95.67	95.87	95.67	95.29
AlexNet	80.51	76.66	80.51	78.02
YoloV6	94.41	95.37	94.34	94.85
YoloV7	98.15	98.38	98.07	98.22
YoloV8	99.10	99.21	99.05	99.12

Table 4. Performance measures of different models on the secondary dataset.

Model	Accuracy	Precision	Recall	F1_Score
Inception	96	91	90	90
Xception	89	90	89	88
VGG16	89	89	89	88
AlexNet	68	63	68	62
YoloV6	89	93	89	90
YoloV7	92	93	92	92
YoloV8	98	99	98	98

Table 5. Performance measures of different models on the real farm dataset.

Model	Accuracy	Precision	Recall	F1_Score
Inception	78	77	77	76
Xception	77	67	77	72
VGG16	74	64	74	68
AlexNet	72	64	72	67
YoloV6	86	89	86	87
YoloV7	95	96	95	95
YoloV8	99	98	98	98

Table 6. Comparison with previous research.

Ref	Approach	Accuracy
Kragh et al. [11]	SVM	91.60%
Raghuvanshi et al. [32]	SVM	97%
Raghuvanshi et al. [32]	RF	78%
Andavarapu et al. [8]	W-CoHOG	93.30%
Roy et al. [33]	Bi-LSTM	95%
H.V. Le et al. [34]	CNN	97%
Proposed Method	Yolo V8	99%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Delwar, T.S.; Mukhopadhyay, S.; Kumar, A.; Singh, M.; Lee, Y.-w.; Ryu, J.-Y.; Hosen, A.S.M.S. Real-Time Farm Surveillance Using IoT and YOLOv8 for Animal Intrusion Detection. Future Internet 2025, 17, 70. https://doi.org/10.3390/fi17020070

AMA Style

Delwar TS, Mukhopadhyay S, Kumar A, Singh M, Lee Y-w, Ryu J-Y, Hosen ASMS. Real-Time Farm Surveillance Using IoT and YOLOv8 for Animal Intrusion Detection. Future Internet. 2025; 17(2):70. https://doi.org/10.3390/fi17020070

Chicago/Turabian Style

Delwar, Tahesin Samira, Sayak Mukhopadhyay, Akshay Kumar, Mangal Singh, Yang-won Lee, Jee-Youl Ryu, and A. S. M. Sanwar Hosen. 2025. "Real-Time Farm Surveillance Using IoT and YOLOv8 for Animal Intrusion Detection" Future Internet 17, no. 2: 70. https://doi.org/10.3390/fi17020070

APA Style

Delwar, T. S., Mukhopadhyay, S., Kumar, A., Singh, M., Lee, Y.-w., Ryu, J.-Y., & Hosen, A. S. M. S. (2025). Real-Time Farm Surveillance Using IoT and YOLOv8 for Animal Intrusion Detection. Future Internet, 17(2), 70. https://doi.org/10.3390/fi17020070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Farm Surveillance Using IoT and YOLOv8 for Animal Intrusion Detection

Abstract

1. Introduction

1.1. Literature Review

1.2. Contributions of This Work

2. Methodology and Implementation

2.1. Intrusion Detection

2.1.1. Data Collection

2.1.2. Model Selection

2.1.3. Inception

2.1.4. Xception

2.1.5. VGG16

2.1.6. Alex Net

2.1.7. Versions of YOLO Used

2.1.8. Comparison of the Models

2.1.9. Model Deployment

2.2. Cloud Integration

2.3. Alarm System

2.4. SMS Alert

2.5. Hardware Architecture

3. Results and Experiments

3.1. Training and Validation of the Models

3.2. Evaluation of the Models on the Test Set of the Primary Dataset

3.3. Evaluation of the Models Using the Secondary Dataset

3.4. Evaluation of the Models on the Farm Dataset

3.5. Comparison with Previous Research

3.6. Result of Detection from Real Demo Camera

3.7. Achieving Real-Time Intrusion Detection

3.8. Reasons for Yolo V8 to Outperform the Other Models

3.9. Performance Evaluation Under Challenging Environmental Conditions

4. Discussion

5. Future Scope

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI