Towards Secure and Efficient Farming Using Self-Regulating Heterogeneous Federated Learning in Dynamic Network Conditions

Puppala, Sai; Sinha, Koushik

doi:10.3390/agriculture15090934

Open AccessArticle

Towards Secure and Efficient Farming Using Self-Regulating Heterogeneous Federated Learning in Dynamic Network Conditions

by

Sai Puppala

^*

and

Koushik Sinha

Computer Science Department, Southern Illinois University, 1230 Lincoln Dr, Carbondale, IL 62901, USA

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(9), 934; https://doi.org/10.3390/agriculture15090934

Submission received: 20 March 2025 / Revised: 17 April 2025 / Accepted: 21 April 2025 / Published: 25 April 2025

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The advancement of precision agriculture increasingly depends on innovative technological solutions that optimize resource utilization and minimize environmental impact. This paper introduces a novel heterogeneous federated learning architecture specifically designed for intelligent agricultural systems, with a focus on combine tractors equipped with advanced nutrient and crop health sensors. Unlike conventional FL applications, our architecture uniquely addresses the challenges of communication efficiency, dynamic network conditions, and resource allocation in rural farming environments. By adopting a decentralized approach, we ensure that sensitive data remain localized, thereby enhancing security while facilitating effective collaboration among devices. The architecture promotes the formation of adaptive clusters based on operational capabilities and geographical proximity, optimizing communication between edge devices and a global server. Furthermore, we implement a robust checkpointing mechanism and a dynamic data transmission strategy, ensuring efficient model updates in the face of fluctuating network conditions. Through a comprehensive assessment of computational power, energy efficiency, and latency, our system intelligently classifies devices, significantly enhancing the overall efficiency of federated learning processes. This paper details the architecture, operational procedures, and evaluation methodologies, demonstrating how our approach has the potential to transform agricultural practices through data-driven decision-making and promote sustainable farming practices tailored to the unique challenges of the agricultural sector.

Keywords:

smart farming; heterogeneous federated learning; dynamic networks; artificial intelligence; wireless communications

1. Introduction

As the global population continues to rise, projected to reach nearly 10 billion by 2050, the demand for food production is anticipated to increase significantly, necessitating a transformation in agricultural practices [1]. Traditional farming methods often fall short in meeting these burgeoning demands while also grappling with environmental sustainability challenges, such as water scarcity, soil degradation, and climate change [2]. In this context, the adoption of precision agriculture, which utilizes data-driven insights to optimize crop management and resource utilization, has emerged as a viable solution [3]. Recent advancements in the Internet of Things (IoT) have facilitated the deployment of a vast array of sensors and connected devices on agricultural equipment, such as tractors, enabling real-time data collection and analysis [4]. These sensors can monitor critical factors such as soil moisture levels, nutrient content, and crop health, providing farmers with actionable insights to enhance productivity and reduce waste [5]. However, the centralized data management approaches commonly employed in these systems raise significant concerns regarding data privacy and security, particularly given the sensitive nature of agricultural data [6].

Federated learning offers a promising alternative to traditional centralized data management systems by enabling decentralized model training across multiple devices while preserving data privacy [7]. In this paradigm, individual devices locally compute model updates based on their respective datasets and share only these updates with a central server, which aggregates the results to improve the global model [8]. This approach not only mitigates the risks associated with data transmission but also enhances the overall efficiency of the learning process by leveraging the computational power of edge devices [9]. Given the diverse and often unpredictable nature of agricultural environments, dynamic network conditions pose a significant challenge to the effective implementation of federated learning systems. Rural areas frequently experience fluctuations in connectivity due to factors such as geographical obstacles, weather conditions, and network congestion [10]. These variable network conditions can lead to interruptions in data transmission, delays in model updates, and potential loss of valuable insights. Therefore, it is crucial to develop robust strategies that can adapt to these dynamic conditions, ensuring that the federated learning framework remains effective and reliable even in the face of connectivity challenges [7]. In this paper, we propose a novel federated learning architecture specifically designed for smart agriculture that addresses these dynamic network conditions. Our system integrates a network of combine tractors equipped with nutrient and crop health sensors to facilitate real-time monitoring and predictive analytics. By leveraging a combination of federated learning and advanced sensor technologies, our architecture aims to optimize resource allocation and improve crop management practices, ultimately enhancing agricultural productivity while minimizing environmental impact.

The proposed architecture employs a two-tiered approach, consisting of local clusters of tractors that collaboratively train machine learning models and a global server that aggregates model updates. This structure allows for efficient data processing and reduces communication overhead, particularly in the context of fluctuating network conditions. To further enhance data security, we utilize the Salsa20 encryption algorithm to ensure that sensitive model updates are transmitted securely between devices and the global server. Our contributions are threefold. First, we present a comprehensive system architecture that leverages federated learning to optimize agricultural practices while accommodating dynamic network conditions. Second, we develop and implement efficient mechanisms for cluster formation and model aggregation, tailored to the unique challenges of agricultural environments. Finally, we conduct extensive simulations and real-world experiments to validate the effectiveness of our approach in improving crop yield and resource utilization.

2. Related Works

The integration of federated learning into agriculture has garnered attention due to the increasing demand for food security and sustainable practices. Traditional machine learning relies on centralized data collection, posing challenges in data privacy [7]. Federated learning, as a decentralized framework, enables multiple devices to collaboratively learn a model without sharing raw data, making it ideal for agricultural settings [7]. Numerous studies have explored the applications of federated learning in agriculture, such as crop yield prediction and pest detection. For instance, Akhter et al. [11] developed a federated system that enhances crop yield predictions by aggregating data from multiple farms while maintaining data privacy. Similarly, Li et al. [12] created a framework to analyze pest population data across farms, aiding informed pest management decisions while preserving data confidentiality. These advancements highlight the potential of federated learning to address specific agricultural challenges; however, they largely overlook the complexities introduced by dynamic network conditions inherent in rural environments. Dynamic network conditions pose challenges for federated learning deployment, especially in rural areas with unreliable connectivity. Recently, Gao et al. [13] proposed an adaptive federated learning approach that incorporates network quality metrics to optimize model updates, highlighting the need to address network variability, but lacks efficiency and has latency issues. Moreover, the concept of Heterogeneous Federated Learning has yet to be fully realized in agricultural applications. Current literature often applies uniform models without considering the varied capabilities and operational contexts of different devices used in agricultural practices. Yang et al. [14] highlighted the scalability of federated learning systems in large agricultural cooperatives, demonstrating their adaptability to diverse data sources. However, existing frameworks frequently fail to accommodate the heterogeneous nature of agricultural data, which can vary significantly across different farms and equipment.

IoT devices play a crucial role in agriculture, enabling real-time monitoring of environmental conditions [4]. For instance, Kumar et al. [15] designed an IoT-based precision irrigation system that optimizes water usage through sensor data. Security and privacy concerns in federated learning are critical, particularly in agriculture. Hossain et al. [16] examined homomorphic encryption techniques, allowing computations on encrypted data without revealing sensitive information, thus enhancing data privacy. Several studies have identified challenges and opportunities in federated learning for agriculture. Yang et al. [17] surveyed applications and barriers to implementation, offering future research directions. Yang et al. [14] highlighted the scalability of federated learning systems in large agricultural cooperatives, demonstrating their adaptability to diverse data sources. Emerging research also emphasizes the use of machine learning for soil health monitoring. Patel et al. [18] implemented a federated learning framework utilizing soil sensor data to predict nutrient deficiencies and recommend precise fertilizer applications. To address dynamic network conditions, Kondaveeti et al. [19] proposed a federated learning model that adapts to varying connectivity among IoT devices, suggesting that adaptive algorithms can enhance system resilience. Furthermore, Dembani et al. [20] explored the integration of federated learning with renewable energy sources in agricultural IoT systems, highlighting the potential for sustainable practices while maintaining data integrity. Lastly, recent work by Thompson et al. [21] examined the role of hybrid federated learning models in optimizing agricultural yield predictions across diverse environmental conditions, emphasizing the need for adaptable frameworks in the face of climate variability.

3. Motivation

The agricultural sector faces unprecedented challenges, including the need for increased productivity to meet the demands of a growing global population while simultaneously addressing environmental sustainability concerns [5]. Traditional farming methods often lead to inefficient resource use and adverse ecological impacts due to over-fertilization and pesticide application. To combat these issues, there is a pressing need for intelligent systems that can provide real-time insights and recommendations based on precise data analytics. The motivation behind this research is to develop a federated learning framework that enables farmers to harness the power of data-driven decision-making without compromising the privacy of sensitive agricultural data. Current research on federated learning (FL) in agriculture reveals several critical gaps that hinder the effective application of this promising technology. Firstly, there is a significant absence of studies focusing on the dynamic network conditions characteristic of rural agricultural environments, where inconsistent connectivity can impede timely data transmission and model updates [13]. Existing frameworks often overlook the necessity for adaptive strategies that can operate seamlessly in such variable conditions. Additionally, while resource optimization is paramount in agriculture, most FL approaches fail to adequately address resource utilization optimization, neglecting the importance of assessing computational power, energy efficiency, and latency concerning diverse agricultural machinery [14]. This oversight can lead to inefficiencies and reduced performance in federated learning systems.

Decentralized data processing remains underexplored, as many implementations still rely on centralized architectures that expose sensitive agricultural data to potential breaches [16]. This lack of focus on data security undermines the very privacy that FL aims to enhance. Lastly, the concept of heterogeneous federated learning has yet to be fully realized in agricultural applications; existing research tends to apply uniform models without considering the varied capabilities and operational contexts of different devices, such as tractors equipped with advanced sensors [19]. Addressing these missing pieces is essential for advancing the integration of federated learning in precision agriculture and ensuring its practical utility in fostering sustainable farming practices. Through this work, we aspire to contribute to the transformation of modern agriculture into a more efficient, sustainable, and technologically advanced industry. To further substantiate our motivation, we have included corresponding references throughout this section that highlight key research findings and gaps, as well as charts that illustrate the current state of federated learning applications in agriculture.

4. Proposed Research Focus

Before delving into the intricacies of the system architecture and its operational workflow, it is essential to first comprehend the key issues we have identified and prioritized in our research. Understanding these foundational concerns will provide valuable context for the implementation of our work and elucidate the specific challenges we aim to address. By clearly outlining these focal points, we can better appreciate how our proposed solutions align with the unique needs of precision agriculture and the overarching goals of our study.

4.1. Data Privacy

Data privacy refers to the rights and expectations of individuals regarding the handling of their personal information. It encompasses the principles that govern how data are collected, stored, and shared. In the context of precision agriculture, data privacy is particularly crucial, as farmers sensitive information—such as crop health data (include condition of crops, such as growth rates, disease prevalence, pesticide usage, and yield forecasts), soil nutrient levels (includes insights into the fertility and health of the land), and operational practices—can be exploited if not adequately protected. Our proposed federated learning architecture prioritizes data privacy by ensuring that sensitive data remain on local devices rather than being transmitted to a central server. This is achieved through a decentralized approach where only model updates are shared, thereby minimizing the exposure of raw data. By protecting farmers’ confidential information, we foster trust and encourage the adoption of advanced agricultural technologies.

4.2. Data Security

While data privacy focuses on the rights of individuals, data security encompasses the technical measures and practices implemented to safeguard data from unauthorized access, corruption, or theft. In agriculture, where data can be vulnerable to cyber threats, robust data security measures are essential. Our architecture incorporates advanced encryption techniques, specifically the Salsa20 algorithm, to secure model updates transmitted between edge devices and the global server. This encryption ensures that even if data are intercepted during transmission, it remains unreadable and protected from unauthorized access. Furthermore, implementing access controls and authentication protocols within the system enhances security by ensuring that only authorized devices can participate in model training, thus reducing the risk of malicious attacks.

4.3. Network-Related Issues

Network-related issues pertain to the various challenges associated with connectivity, particularly in rural agricultural areas where infrastructure may be inconsistent or limited. Fluctuations in network availability can significantly affect the reliability of data transmission and the timeliness of model updates. In our proposed architecture, we specifically address Dynamic Network Conditions by implementing adaptive strategies that allow the system to function effectively despite these challenges. For example, we utilize techniques such as data buffering, compression of model updates, and peer-to-peer communication among nearby devices to ensure that critical information is transmitted reliably and efficiently, even when connectivity is poor. This focus on addressing network-related issues enhances the robustness of our architecture, ensuring that data-driven decision-making can continue without interruption.

4.4. Operational Efficiency

Operational efficiency refers to the capacity of the system to perform its intended functions in a manner that maximizes productivity while minimizing resource waste. In precision agriculture, achieving operational efficiency is vital for ensuring that resources—such as time, labor, and inputs—are utilized effectively. Our architecture enhances operational efficiency by intelligently classifying devices based on their computational power, energy consumption, and latency. This classification allows for optimized task allocation, ensuring that the most capable devices handle complex processing tasks, while less capable devices are assigned lighter workloads. By streamlining operations and reducing delays in model updates, our system contributes to a more productive agricultural environment where farmers can respond quickly to changing conditions.

4.5. Resource Utilization

Resource utilization focuses on the effective and efficient use of available resources, including computational power, energy efficiency, and network bandwidth. In agricultural applications, where devices may operate under varying conditions and constraints, optimizing resource utilization is essential for maintaining system performance. Our architecture takes a comprehensive approach to assess and classify devices based on these parameters. For instance, by evaluating each device’s computational capabilities and energy efficiency, we can allocate tasks strategically, ensuring that resources are used judiciously. This optimization not only enhances the overall efficiency of the federated learning process but also reduces operational costs for farmers, enabling them to maximize their returns while minimizing environmental impacts.

5. System Architecture

The primary objective of our innovative agricultural system is to deliver an integrated, energy-efficient, and environmentally conscious solution tailored specifically for the diverse needs of farming fields. To achieve this ambitious goal, we have meticulously designed and implemented a sophisticated architectural framework that incorporates two distinct types of sensors strategically positioned within the tractor. It is important to note that the number of sensors deployed on a tractor can vary significantly, ranging from a few hundred to several thousand. This variation is largely contingent upon the size of the tractor and its intended applications. For the purpose of our illustration, we have chosen to focus on combine tractors, which are equipped with two crucial types of sensors: nutrient sensors and crop health sensors. The overarching aim of our system is to leverage these advanced sensors to accurately predict the health of the crops being cultivated. By utilizing data gathered from the nutrient sensors, we can make informed decisions regarding the application of essential fertilizers and the targeted spraying of pesticides. This capability not only enhances the efficiency of agricultural practices but also promotes sustainable farming by ensuring that resources are utilized judiciously and effectively, ultimately contributing to a more productive and environmentally friendly farming ecosystem. The entire system has been meticulously engineered to ensure implementation in a manner that prioritizes both security and efficiency, distinguishing it from traditional methodologies that predominantly depend on a centralized global server. These conventional approaches often entail transmitting vital or sensitive data over the network, which can pose significant risks to data integrity and privacy.

In contrast, our innovative solution employs a federated learning approach. This paradigm shift means that, rather than transmitting raw sensor data across the network, we focus on sending only the trained model weights. This not only enhances data security but also mitigates the risks associated with transferring sensitive information, as the model weights encapsulate the learned insights without exposing the underlying data. Furthermore, we have conducted an in-depth exploration of the mechanisms involved in securely transmitting these model weights, which is elaborated upon in the subsequent sections.

Let us begin by exploring the foundational elements at the cluster level and gradually advance our understanding to encompass the various facets of the entire system.

5.1. System Architecture Components

There are various components described in Figure 1. Let us first provide a brief overview of the key components of the system. Following this overview, we will delve into a comprehensive examination of each component as we explore the operational workflow of the system in detail as mentioned in Figure 2.

5.1.1. Edge Devices (Tractors)

Equipped with nutrient and crop health sensors, these tractors collect critical data regarding soil and crop conditions. The data are processed locally to maintain privacy, with only model updates transmitted to the global server. The proposed architecture outlines several distinct phases in the lifecycle of the tractor, specifically Orphan, New, and Cluster-Node. In the following sections, we will provide a detailed explanation of each phase and the corresponding workflow, highlighting how tractors transition through these stages and their roles within the overall system.

5.1.2. Local Clusters

Tractors are organized into clusters based on operational capabilities and proximity. This clustering facilitates collaborative model training, enhancing the efficiency of the learning process while reducing communication overhead. The primary objective of forming clusters within the system is to minimize the overall communication to the global server. Each cluster is designated a leader node, which is responsible for aggregating and sending model updates to the global server. This approach significantly reduces the communication load, as individual tractors do not need to transmit their updates directly to the global server. Instead, they communicate with the leader node, which consolidates the information and efficiently transmits it on behalf of the entire cluster.

5.1.3. Global Server

The global server plays a pivotal role in aggregating model updates from local clusters. By employing the Federated Averaging (FedAvg) method (the concept of federated learning is elaborated upon in the sections that follow), it combines insights from multiple tractors while ensuring that sensitive data remain localized, thus maintaining data integrity and confidentiality.

5.1.4. Data Transmission Mechanisms

This phase of the process occurs after the federated learning has been initiated, during which each tractor generates a new model. At this point, it is essential for these newly created models to be transmitted to the global server. To secure data exchange (here the data represent the model weights emitted from the tractor), the Salsa20 encryption algorithm is utilized. This lightweight encryption ensures that model updates are transmitted safely between the edge devices and the global server, preventing unauthorized access to sensitive information.

5.1.5. Dynamic Network Adaptation

This step is of paramount importance in the context of real-world applications, as it directly impacts the effectiveness and reliability of our system. To address the inherent variabilities in network conditions, we have carefully considered various cellular network types. This strategic approach ensures that our system can adapt to different connectivity scenarios, thereby enhancing its robustness and overall performance in practical environments. The proposed architecture incorporates adaptive data transmission strategies that respond to varying network conditions. The system assesses network quality using the Network Quality Index (NQI) to optimize communication strategies, ensuring timely model updates.

Let us begin by exploring the workflow of the system architecture, focusing specifically on the mechanisms of data transmission that have been implemented. This examination will provide insight into how data flow through the system, highlighting the processes that ensure efficient and secure communication between components.

5.2. Initial Cluster Formation

When we refer to a cluster, we are describing a grouping of tractors or IoT sensors that collectively manage specific models for predictive analysis Algorithm 1. In this context, a single machine learning model is shared among all the tractors within the cluster Figure 3.

Algorithm 1 Initial Cluster Formation for Predictive Analysis

1:: Input:
2:: Set of tractors T (where $| T | = 10$ )
3:: Set of sensors S
4:: List of clusters C
5:: Output:
6:: Clusters $C_{1}$ (Nutrient Prediction) and $C_{2}$ (Crop Health Prediction)
7:: procedure InitialClusterFormation
8:: // Step 1: Initialize empty clusters
9:: Create empty clusters $C_{1}$ and $C_{2}$
10:: // Step 2: Gather metadata for each sensor
11:: for each sensor $s \in S$ do
12:: Gather metadata for sensor s
13:: Store metadata in structure M // M contains attributes of each sensor
14:: end for
15:: // Step 3: Calculate metadata scores for attributes
16:: for each attribute a in metadata M do
17:: Calculate score $ϕ$ using:

$ϕ_{1} = a_{7} \cdot 35^{6} + a_{6} \cdot 35^{5} + a_{5} \cdot 35^{4} + a_{4} \cdot 35^{3} + a_{3} \cdot 35^{2} + a_{2} \cdot 35^{1} + a_{1} \cdot 35^{0}$
18:: Calculate metadata score $M_{i}$ for each attribute:

$M_{i} = ϕ + w_{i} \cdot C_{t y p e} / / I n c o r p o r a t e w e i g h t a n d c l u s t e r t y p e$
19:: end for
20:: // Step 4: Calculate total metadata score
21:: Calculate total metadata score M:

$M = \sum_{i = 1}^{k} M_{i} / / S u m o f a l l a t t r i b u t e s c o r e s$
22:: // Step 5: Evaluate each tractor’s capabilities
23:: for each tractor $t \in T$ do
24:: Evaluate computational power $x_{p}^{'}$ , energy efficiency $x_{e}^{'}$ , latency $x_{l}^{'}$ , concurrency level $x_{c}^{'}$
25:: Scale values using min-max normalization:

$x^{'} = a + \frac{(x - min (x)) (b - a)}{max (x) - min (x)} / / N o r m a l i z e t o r a n g e [a, b]$
26:: Calculate consumption ability score $ρ$ :

$ρ = w_{1} \cdot x_{p}^{'} + w_{2} \cdot x_{e}^{'} + w_{3} \cdot x_{l}^{'} + w_{4} \cdot x_{c}^{'}$
27:: end for
28:: // Step 6: Prepare data for transmission
29:: Create array $χ$ to send to the global server:

$χ = [ρ, M, λ, θ] / / C o n t a i n s s c o r e s a n d o t h e r n e c e s s a r y p a r a m e t e r s$
30:: // Step 7: Encrypt data before sending
31:: Encrypt $χ$ using the Salsa20 algorithm [22]
32:: // Step 8: Send encrypted data to the global server
33:: Send $χ$ to the global server
34:: end procedure

It is essential for us to comprehend how these clusters are established and how they interact with a global server [23]. For instance, consider a scenario where we have approximately 10 tractors distributed across various locations in a farmland setting. Our objective is to organize these tractors into two distinct clusters: one dedicated to Nutrient Sensor Prediction and the other focused on Crop Health Sensor Prediction (notably, each tractor may contain both types of sensors) [24]. Now, let us explore the concepts and methodologies involved in forming these two clusters using our example of the 10 tractors.

5.2.1. Alpha Schema-Based Scoring

The Alpha Score aims to evaluate and identify the datasets linked to various machine learning models, enabling effective differentiation between them. This is crucial for decision-making regarding which models to aggregate at the global server, especially in a heterogeneous environment with multiple distinct models. Accurate model type identification is essential, as improper aggregations—combining incompatible models with different data distributions—can significantly reduce overall system accuracy. Such declines in predictive performance can undermine our federated learning approach and affect the reliability of insights from aggregated models. Thus, the Alpha Score is a vital metric for ensuring the integrity and effectiveness of the aggregation process at the global server, optimizing performance across our machine learning framework.

The initial and most critical step in our process is to identify and define the training dataset [25]. Each of the two sensors utilized in this analysis—the Nutrient Sensor and the Crop Health Sensor—possesses distinct metadata and corresponding columns that are integral to their operation and data collection. The metadata encompass essential information about the data, including attributes like measurement units, data types, and contextual details relevant to the specific sensor [26]. Meanwhile, the columns represent the actual data points collected, each corresponding to various parameters that the sensors are designed to monitor. Given the uniqueness of the metadata and the structure of the columns associated with each sensor, it is imperative to select an appropriate machine learning model that aligns with the specific characteristics and requirements of the data being analyzed [27]. To reach a well-informed conclusion regarding the most suitable machine learning model, we will adhere to the strategy outlined below. This strategy will guide our analysis and ensure that we consider all relevant factors in our decision-making process.

We initially calculate scores at the column or feature level using an alpha schema-based scoring approach [28]. To ensure consistent scoring for identical attributes, columns are arranged in alphabetical order. This ordering is crucial to avoid discrepancies in feature scoring.

Given a feature attribute represented by a string

a_{7}, a_{6}, \dots, a_{1}, a_{0}

, the formula for calculating its score is as follows:

ϕ_{1} = a_{7} \cdot 35^{6} + a_{6} \cdot 35^{5} + a_{5} \cdot 35^{4} + a_{4} \cdot 35^{3} + a_{3} \cdot 35^{2} + a_{2} \cdot 35^{1} + a_{1} \cdot 35^{0}

(1)

where each character in the attribute name is assigned a numeric value based on its position in the English alphabet (A = 0, B = 1, …, Z = 25) [29]. The main reason we have chosen to use 35 as the base is that it encompasses not only the letters of the alphabet but also includes the digits from 0 to 9 [30]. The special symbols in the metadata are currently not assumed in our case.

Subsequently, the metadata score for column one is obtained by combining it with the column type, incorporating weighted scores

w_{n}

, as illustrated below [31].

M_{1} = ϕ_{1} + w_{1} \cdot C_{type}

(2)

M = \sum_{n = 1}^{k} M_{n}

(3)

Please note that

C_{t y p e}

represents the data type of the column and is associated with a unique value [32]. Finally, after calculating the metadata for each of the columns, we aggregated or summed up at the very end to obtain the final score [33].

5.2.2. Consumption Ability

Another crucial aspect of our system is its capacity for consumption. The Consumption Ability Score acts as a fundamental metric that the global server utilizes to cluster tractors or edge nodes within a federated learning environment [34]. This score is determined by evaluating several key factors, including computational power, energy efficiency, latency, and multitasking capabilities [35]. By systematically assessing these attributes, the Consumption Ability Score allows the global server to classify devices based on their performance and operational efficiency. This classification is essential for optimizing task allocation, as it ensures that devices are grouped and deployed according to their specific strengths and capabilities [36]. As a result, this strategic clustering not only boosts the overall efficiency of the federated learning processes but also enhances their effectiveness [37]. By capitalizing on the distinct characteristics of each device, our system is able to optimize resource utilization, reduce delays, and ultimately achieve superior outcomes in collaborative learning across a network of distributed edge devices [38].

The formula essentially applies min-max normalization to the original value (x), transforming it into a new value (

x^{'}

) that lies within the range ([a, b]). This normalization allows for a fair comparison of different attributes across various devices, ensuring that each metric is scaled appropriately before being integrated into the Compute Ability Score [39]. This approach helps in providing a balanced evaluation despite the differences in operational conditions and hardware specifications. The formula is expressed as follows:

x^{'} = a + \frac{(x - min (x)) (b - a)}{max (x) - min (x)}

(4)

In this equation, the values are transformed and scaled to a uniform range, which facilitates a balanced evaluation across a wide range of hardware specifications and operational conditions [40]. This transformation ensures that all metrics are comparable, regardless of their original scales. Following this initial scaling, the Compute Ability Score for an edge device is determined by calculating a weighted sum of these scaled values [41]. This process effectively integrates the various metrics into a comprehensive performance index, providing a holistic view of the device’s capabilities and enabling informed decision-making in resource allocation and task assignment.

ρ = w_{1} \cdot x_{p}^{'} + w_{2} \cdot x_{e}^{'} + w_{3} \cdot x_{l}^{'} + w_{4} \cdot x_{c}^{'}

(5)

Here,

x_{p}^{'}

represents Computational Power,

x_{e}^{'}

for Energy Efficiency,

x_{l}^{'}

indicates Latency, and

x_{c}^{'}

reflects Concurrency Level [42]. Each metric, scaled and assigned a weight (

w_{i}

), quantifies each device’s operational efficiency and suitability to facilitate clustering.

5.2.3. Model Communications

Finally, we will transmit the array of values to the global server. This array will include the Final Alpha Schema-Based Score, the Consumption Score, the latitude and longitude coordinates, as well as the model weights. Prior to sending these data to the global server, we will encrypt them using the lightweight Salsa20 algorithm [22]. A detailed explanation of Salsa20 will be provided in the following section.

χ = [ρ, M, λ, θ]

(6)

In this equation,

χ

represents the final dataset that will be sent to the global server, comprising several key components:

ρ

, which is the Consumption Ability Score; M, denoting the Alpha Schema-Based Score;

λ

, representing the latitude and longitude coordinates; and

θ

, symbolizing the model weights [43]. Together, these elements form a comprehensive data package that will be dispatched to the global server.

5.3. Global Server Operations

At the global server level, the first step involves decrypting the received message to access its contents securely. Once the message is decrypted, we need to ascertain the value of M, which represents the Alpha Score. This score is critical as it allows us to categorize the various models based on their performance metrics. By analyzing M, we can effectively separate and organize the models into distinct groups or clusters. The second step in our process is to compute the geographical distance between these clusters using their respective latitude and longitude coordinates. This calculation is essential for understanding the spatial relationships between different edge nodes or tractors within our network. To accurately determine the distance between any two tractors or edge nodes, we will employ the following methodology, which is outlined below:

5.3.1. Proximity Assessment

The proximity assessment process focuses on evaluating the geographical closeness of nodes to enhance data communication efficiency and foster computational collaboration among devices. By utilizing geographical coordinates, the global server is able to organize devices that are situated physically closer to one another. This approach not only helps in reducing latency but also enhances the overall speed and reliability of federated learning tasks. Furthermore, assessing proximity aids in minimizing network congestion and optimizing bandwidth utilization, both of which are critical for the scalability and performance of federated learning systems.

Once nodes with similar data are identified, the global server extracts the latitude and longitude coordinates from the encrypted data of client nodes. It then employs a proximity assessment strategy to form clusters. The process begins by creating a circular area with a defined radius, which is used to identify nodes that are in close proximity to each other.

To ensure privacy while utilizing location data for geographical clustering, we introduce a modification to the coordinates by applying an offset. This adjustment effectively conceals the actual positions of the nodes:

\begin{matrix} proxy lat & = lat + \frac{d_{y}}{radius} \cdot \frac{180}{π}, \end{matrix}

(7)

\begin{matrix} proxy long & = long + \frac{d_{x}}{radius} \cdot \frac{180}{π} / cos (lat \cdot \frac{π}{180}), \end{matrix}

(8)

where

d_{x}

and

d_{y}

denote the displacements in the x and y directions, respectively, used to create a virtual circular area around the nodes. This method enables the server to identify and group nodes that are geographically close together within the same cluster. To ensure an even distribution of nodes across clusters, we impose limits on inclusion based on proximity, which is calculated using the following equations:

a = {sin}^{2} (\frac{Δ ϕ}{2}) + cos (ϕ_{1}) cos (ϕ_{2}) {sin}^{2} (\frac{Δ λ}{2}),

(9)

c = 2 \cdot atan 2 (\sqrt{a}, \sqrt{1 - a}),

(10)

distance = Radius \cdot c .

(11)

This approach facilitates the creation of well-distributed clusters. For orphan nodes that are not initially assigned to any cluster, we utilize the Haversine formula [44] to measure distances and form new groups, ensuring comprehensive coverage throughout the network.

5.3.2. Driver Selection

After the completion of the cluster formation process, we move on to the subsequent step, which involves identifying the driver node within each cluster. This selection is based on the Consumption Ability Score, denoted as

ρ

. The global server evaluates the

ρ

-values associated with the nodes in each cluster to determine which node possesses the highest score. The driver node is crucial as it serves as the primary point of communication and coordination within the cluster. By selecting the node with the strongest

ρ

-value, the global server ensures that the most capable and efficient node is designated to lead the cluster’s activities. This decision not only optimizes the overall performance of the cluster but also enhances the reliability and effectiveness of the federated learning process. Now, the driver node will be responsible for sending the cluster model weights updates.

We implement the FedAvg aggregation process both at the driver node level and at the global server level. To fully grasp the concept of FedAvg and to understand the broader framework of federated learning, we will provide an in-depth explanation below. At the driver node, FedAvg plays a crucial role in aggregating the updates received from various participating clients within its cluster. Each client performs local training on its dataset, and the driver node collects these updates to compute a consolidated model that reflects the collective learning of the cluster. Simultaneously, the global server also utilizes FedAvg to combine the model updates received from multiple driver nodes. This dual-level application of FedAvg ensures that the global model is effectively informed by the learnings of all participating devices, promoting a collaborative and decentralized approach to model training. In the following sections, we will delve deeper into the principles underlying FedAvg, elucidate its mechanics, and clarify the significance of federated learning as a whole. This exploration will provide a comprehensive understanding of how these processes work together to enhance the performance and efficiency of machine learning models in a federated setting.

5.3.3. Federated Learning

Federated learning is a decentralized approach to machine learning that enables multiple clients or devices to collaboratively train a shared model while keeping their data localized [7]. This method enhances privacy and security by ensuring that sensitive data do not leave the client’s device [9]. Instead of sending raw data to a central server, clients compute local updates based on their individual datasets and share only these updates with the server [45]. One of the most commonly used algorithms in federated learning is Federated Averaging (FedAvg) [7]. FedAvg aggregates the model updates from participating clients by taking a weighted average based on the size of their local datasets [9]. The representation of the FedAvg update rule is given by the following:

w^{t + 1} = w^{t} + \frac{1}{N} \sum_{k = 1}^{K} n_{k} Δ w_{k}

(12)

where

w^{t + 1}

denotes the updated global model weights after round t,

w^{t}

represents the current global model weights, N is the total number of participating clients, K is the number of clients that have sent updates,

n_{k}

is the size of the dataset for client k, and

Δ w_{k}

is the model update from client k.

The federated learning architecture Figure 4 is illustrated through our proposed framework, emphasizing the roles of driver nodes, client nodes, and their communication with the global server. Furthermore, the architecture offers a succinct overview of the unique characteristics of the global server. We have highlighted the importance of local aggregations and checkpoints for each driver node, underscoring their critical contribution to the primary operations within the clusters.

5.3.4. Orphan Equipment

Another critical aspect of our system involves addressing the research question concerning the implications of a driver node failing or the introduction of a new tractor that wishes to participate in the federated learning process. In such scenarios, we classify the newly introduced tractors or the driverless tractors as “orphan nodes”. These orphan nodes represent devices that, due to the absence of a driver node or their recent entry into the system, are not currently affiliated with any active cluster. When orphan nodes are detected, they initiate a reinstatement of the clustering process. This involves either starting a new cluster if there are no existing clusters that they can join or integrating themselves into an existing cluster where they can effectively contribute. This mechanism ensures that the system remains robust and adaptable, allowing it to dynamically respond to changes in the network topology. By facilitating the inclusion of orphan nodes, we enhance the system’s resilience and ensure continuous collaboration among devices, thereby maintaining the efficiency and effectiveness of the federated learning process.

5.4. Operations at Equipment Level

Having established the clusters, let us now delve into the details of what occurs at each equipment level. At the equipment level, specifically at the tractor level, there are four critical phases that we need to consider: the training phase, the dispatch phase, the network phase, and the receiving phase. Let us delve into each of these aspects and begin our exploration of each step in detail.

5.4.1. The Training Phase

Initially, the global server transmits the model weights to the tractors or edge nodes. This transmission serves as the foundation for the training process that will occur on each tractor. The training of the model begins when the tractor is powered off for the night. During this period, the tractor engages in local training using the received base model, leveraging its onboard computational resources. As the tractor remains inactive overnight, it can dedicate its processing power to refining the model through training based on the data it has accrued. This overnight training allows the model to adapt and improve without interrupting the tractor’s primary functions during the day.

In the morning, when the farmer starts using the tractor for tasks such as tilling or harvesting, the enhanced model is ready for real-time application. This model utilizes data from the Nutrient Sensor to make informed predictions regarding the optimal timing for fertilizer application, ensuring that crops receive the necessary nutrients at the right moment. Additionally, the model employs information from the Crop Health Sensor to forecast the appropriate timing for pesticide spraying, helping to protect the crops from pests and diseases effectively. By integrating these advanced predictive capabilities into the daily operations of the tractor, the system maximizes agricultural efficiency and productivity, allowing farmers to make better-informed decisions that enhance crop yield and health.

Initially, the global server transmits the model weights to the tractors or edge nodes. This transmission serves as the foundation for the training process that will occur on each tractor. The model weights at time t can be expressed as follows:

w^{t} = w^{t - 1} + Δ w,

(13)

where

w^{t - 1}

represents the model weights received from the global server and

Δ w

signifies the local updates made by the tractor during its training. The training of the model begins when the tractor is powered off for the night. During this period, the tractor engages in local training using the received base model, which can be mathematically represented as follows:

w^{t + 1} = w^{t} + η \nabla L (w^{t}, D_{i}),

(14)

where

η

is the learning rate and

\nabla L (w^{t}, D_{i})

is the gradient of the loss function L calculated on the local dataset

D_{i}

available to the tractor. As the tractor remains inactive overnight, it can dedicate its processing power to refining the model through training based on the data it has accrued, allowing the model to adapt and improve without interrupting the tractor’s primary functions during the day. In the morning, when the farmer starts using the tractor for tasks such as tilling or harvesting, the enhanced model is ready for real-time application. This model utilizes data from the Nutrient Sensor to make informed predictions regarding the optimal timing for fertilizer application, expressed as follows:

F_{o p t} = f (N, C, T),

(15)

where

F_{o p t}

is the optimal timing for fertilizer application, N represents data from the Nutrient Sensor, C denotes crop characteristics or conditions, and T represents environmental factors. Additionally, the model employs information from the Crop Health Sensor to forecast the appropriate timing for pesticide spraying, represented as follows:

P_{o p t} = g (H, C, T),

(16)

where

P_{o p t}

is the optimal timing for pesticide application, H represents data from the Crop Health Sensor, and the other variables retain their previous definitions. By integrating these advanced predictive capabilities into the daily operations of the tractor, the system maximizes agricultural efficiency and productivity, allowing farmers to make better-informed decisions that enhance crop yield and health, which can be described as follows:

Y = h (F_{o p t}, P_{o p t}, A),

(17)

where Y is the crop yield,

F_{o p t}

and

P_{o p t}

are the optimal timings for fertilizer and pesticide applications, respectively, and A represents additional agricultural practices or inputs.

5.4.2. The Dispatch Phase

To summarize and elaborate on our findings regarding the cluster formation process, we must focus on the critical dispatch phase, which is tasked with transmitting an array of messages. This phase plays a pivotal role in ensuring that essential data are effectively communicated across the system. The final data array comprises several key components that are integral to the functionality of our framework. Specifically, it includes the Alpha Schema-Based Score, as detailed in Equation (3), which serves as a foundational metric for evaluating the performance of the models within the clusters. Additionally, the array contains the Consumption Ability Score, referenced in Equation (5), which assesses the efficiency and resource utilization of the participating devices. Moreover, the final data array incorporates the geographical coordinates—latitude and longitude—critical for understanding the spatial distribution of the devices involved in the federated learning process. Finally, the array also includes the model weights, which represent the current state of the trained models that will be utilized for making predictions. The structure and composition of the final data array are defined using Equation (6), as discussed in the Cluster Formation Overview section. This equation encapsulates the relationships among the various components, providing a comprehensive view of how they collectively contribute to the dispatch phase and the overall effectiveness of the system.

χ_{e n c} = {Encrypt}_{K, N} (χ)

(18)

where:

$D_{e n c}$ is the encrypted final data array.
K represents the shared secret key used for Salsa20 encryption.
N is the nonce that ensures uniqueness in the encryption process.

The final step involves employing the lightweight encryption algorithm Salsa20 to ensure the security and confidentiality of the data being transmitted. Salsa20 is a stream cipher known for its efficiency and robust security features, making it particularly suitable for resource-constrained environments such as edge devices. In the following sections, we will discuss the Salsa20 encryption algorithm in detail, providing insights into its operational mechanics and the advantages it offers. By understanding how Salsa20 functions, we can appreciate its effectiveness in safeguarding sensitive information during transmission. This detailed examination will cover aspects such as key generation, nonce usage, and the encryption process itself, highlighting how these elements contribute to the overall security framework of our system.

5.4.3. The Network Phase

The network phase represents a critical step in our system, as fluctuations in network connectivity can lead to interruptions in the model update process. This phase is essential for ensuring the seamless transmission of data between the edge devices and the global server, which directly impacts the overall performance and reliability of the federated learning system [8]. For instance, consider a scenario where a farmer has completed harvesting their crops during the day. After finishing their work, the farmer parks the tractor at a location where network availability is significantly diminished, potentially due to geographical obstacles or infrastructural limitations [46]. In this context, the variability of the network can pose significant challenges for transmitting essential data. In our system, we account for the possibility of encountering various network types, including 3G, 4G, and 5G Algorithm 2.

Algorithm 2 Adaptive Data Transmission Based on Network Type

1:: Input:
2:: Encrypted data $χ_{e n c}$
3:: Network type N (3G, 4G, 5G)
4:: Current model weights $M_{c u r r e n t}$
5:: Previous model weights $M_{p r e v i o u s}$
6:: Output:
7:: Transmitted data to the global server
8:: procedure AdaptiveDataTransmission
9:: if network type $N = 3 G$ then
10:: // Step 1: Calculate differential data
11:: Calculate differential data:

$D_{d i f f} = M_{c u r r e n t} - M_{p r e v i o u s}$
12:: // Step 2: Compress differential data
13:: Compress $D_{d i f f}$ to minimize payload.
14:: // Step 3: Transmit compressed data
15:: Transmit compressed data $D_{d i f f}$ to nearest neighbor in the cluster.
16:: else if network type $N = 4 G$ then
17:: // Step 1: Utilize checkpoint strategy
18:: Utilize checkpoint strategy to ensure data reliability.
19:: Check signal strength and compress data as needed.
20:: if signal strength is sufficient then
21:: // Step 2: Direct transmission
22:: Transmit data $χ_{e n c}$ directly to the global server.
23:: else
24:: // Step 3: Relay transmission
25:: Transmit to a nearby edge device for relay.
26:: end if
27:: else if network type $N = 5 G$ then
28:: // Step 1: Assess bandwidth availability
29:: Assess bandwidth availability for real-time updates.
30:: // Step 2: Allocate resources dynamically
31:: Allocate resources dynamically based on demand:

$R_{a l l o c a t e d} = min (R, D)$
32:: // Step 3: Perform local computations
33:: Perform local computations using Multi-Access Edge Computing (MEC).
34:: // Step 4: Transmit processed data
35:: Transmit processed data to the global server with reduced latency.
36:: else
37:: Error: Unsupported network type.
38:: end if
39:: end procedure

Each of these network types possesses distinct characteristics in terms of data transfer speeds, latency, and reliability [47]. Therefore, depending on the specific network type available at the time of transmission, we adapt our methods for sending the encrypted data, denoted as (

χ_{e n c}

). This adaptability is crucial for optimizing the data transmission process. For example, in scenarios where only 3G connectivity is available, which typically offers lower bandwidth and higher latency compared to 4G and 5G [48], we may choose to compress the data or prioritize the transmission of critical updates to ensure timely communication. Conversely, in a 5G environment, where data transfer speeds are significantly higher, we can afford to transmit larger datasets more rapidly without the same level of concern for interruptions [49]. By implementing these adaptive transmission strategies based on network conditions, we enhance the robustness of our system, ensuring that model updates can occur smoothly and efficiently, regardless of the variability in network connectivity. This approach not only improves the overall user experience [50] but also contributes to the effectiveness of the federated learning process by minimizing the risk of data loss during transmission.

Third Generation (3G) technology was a significant advancement over its predecessor, 2G, providing enhanced data transmission capabilities [51]. Typical bandwidth for 3G networks ranges from 384 Kbps to 2 Mbps, depending on the specific implementation and conditions [52]. However, due to this relatively low bandwidth, 3G can struggle with data-intensive applications, leading to slower upload and download speeds, particularly in areas with high user density [53]. This limitation can cause delays in sending model updates and other critical data to the server. Some of the challenges associated with 3G networks include signal attenuation, interference from environmental obstacles, and network congestion [54]. These issues can lead to degraded performance in data transmission, making it crucial to develop strategies that enhance communication reliability. To address these challenges, we have implemented a close neighbor strategy, where the tractor or edge node communicates with the nearest neighbor within its cluster [55]. A cluster is defined as a grouping of similar models, typically consisting of fewer than ten devices. By focusing on nearby nodes within the cluster, the distance for data transmission is significantly reduced compared to communicating directly with the global server or driver nodes. This proximity minimizes the effects of signal attenuation and interference, which are more pronounced over longer distances.

To illustrate the benefits of this approach, let us assume a scenario where the driver node operates on a 3G network and is tasked with sending or receiving data from edge nodes or the global server. In this context, we employ a data-compression technique that involves transmitting only the differential data between the current model and the previous model received from the global server [56]. This method effectively reduces the amount of data that needs to be sent, thereby alleviating some of the burdens associated with limited bandwidth. We can express the differential data as follows:

D_{d i f f} = M_{c u r r e n t} - M_{p r e v i o u s}

(19)

where:

( $D_{d i f f}$ ) represents the differential data that need to be transmitted.
( $M_{c u r r e n t}$ ) denotes the model weights of the current version at the edge node.
( $M_{p r e v i o u s}$ ) signifies the model weights received from the global server during the last update.

By sending only the differential data (

D_{d i f f}

), we minimize the volume of information transmitted over the 3G network, thus optimizing the use of available bandwidth and enhancing the overall efficiency of the communication process. This strategy not only helps mitigate the inherent limitations of 3G technology but also ensures that the model updates can be maintained effectively within the federated learning framework.

4G Data Transmission—Fourth Generation (4G) technology marked a drastic improvement in mobile broadband capabilities [57]. Offering bandwidths typically ranging from 5 Mbps to 100 Mbps, 4G networks facilitate faster data transfer rates, reduced latency, and improved overall performance [58]. This enhancement enables more efficient transmission of larger datasets, making it more suitable for applications that require real-time data exchange, such as those in federated learning environments [59]. While 4G technology presents significant advancements in mobile communication, offering enhanced speeds and improved connectivity, it is not without its challenges. Among the primary issues associated with 4G networks are signal strength and coverage limitations, network congestion, interference from physical obstacles, and, importantly, energy consumption concerns [60]. Signal strength and coverage can vary greatly, particularly in rural or remote areas where the density of cell towers may be insufficient to provide a strong and consistent signal [61]. This can lead to connectivity issues, resulting in slower data rates or even dropped connections, which negatively impact the user experience [62]. Network congestion is another challenge that arises as the number of connected devices increases, especially in densely populated urban environments [56]. When multiple users attempt to access data-intensive services simultaneously, the available bandwidth can become overwhelmed, leading to slower speeds and increased latency.

Interference and obstacles also play a crucial role in the performance of 4G networks. Physical barriers such as buildings and trees can obstruct signals, causing degradation in quality and reliability [63]. Additionally, electronic devices operating in the vicinity may introduce interference, further complicating communications [64]. Energy consumption is a critical consideration for devices operating on 4G networks. The enhanced data transfer capabilities and increased processing requirements can lead to significant battery drain, which is particularly concerning for mobile devices and Internet of Things (IoT) applications that rely on sustained connectivity [65]. To address these challenges effectively, we have implemented a checkpoint strategy. This approach will be elaborated upon in subsequent sections, detailing how it enhances data transmission and helps mitigate the issues associated with 4G networks [66]. Furthermore, we will discuss how data-compression techniques, previously utilized in 3G networks, can be adapted and employed in conjunction with the checkpoint strategy to optimize performance and improve overall efficiency in data handling. This combined methodology aims to ensure reliable communication and data integrity while maximizing resource utilization in our federated learning framework.

5G Data Transmission—Fifth Generation (5G) technology represents a revolutionary leap forward in mobile telecommunications, offering unprecedented improvements in mobile broadband capabilities [46]. With bandwidths typically exceeding 1 Gbps and theoretical maximum speeds reaching up to 10 Gbps, 5G networks facilitate ultra-fast data transfer rates and significantly reduced latency, often as low as 1 millisecond [49]. This remarkable enhancement not only supports more efficient transmission of vast datasets but also makes 5G particularly well-suited for applications that demand real-time data exchange, such as those found in autonomous vehicles, smart cities, and advanced federated learning environments [67]. Despite these significant advancements, 5G technology is not without its challenges. One of the primary issues is the need for a dense infrastructure of small cell towers to achieve optimal performance [5]. As 5G utilizes higher frequency bands, which are more susceptible to attenuation, the density of base stations must increase to provide consistent coverage. This requirement can lead to higher deployment costs and logistical challenges, particularly in rural or underserved areas where the installation of new infrastructure may be economically unfeasible [68].

Additionally, while 5G networks promise improved performance, they also face challenges related to network congestion, particularly as the number of connected devices continues to rise [48]. The introduction of numerous IoT devices and user equipment can strain the network, potentially leading to degraded performance during peak usage times [69]. Furthermore, interference from physical barriers such as buildings, trees, and various electronic devices can impact the reliability of the signal, posing a challenge to maintaining high-quality connections [70]. Energy consumption remains a critical consideration for devices operating on 5G networks. While 5G technology is designed for efficiency, the increased data transfer rates and the complexity of managing numerous connections can lead to significant energy demands [71]. This is particularly concerning for battery-powered devices, such as mobile phones and IoT sensors, that require sustained connectivity for optimal functionality [72]. The integration of Fifth Generation (5G) technology into our federated learning system addresses several key challenges associated with data transmission, enhancing overall performance and reliability. One of the primary advantages of 5G is its capability for dynamic resource allocation. This allows the network to adaptively manage bandwidth based on real-time demands from connected devices, such as tractors transmitting model updates [17]. The total available resources (R) can be allocated according to the demand (D) from these devices, represented mathematically as follows:

R_{a l l o c a t e d} = min (R, D)

This ensures that during peak usage periods, critical applications receive the necessary bandwidth, thereby minimizing the risk of congestion and maintaining smooth communication [73]. In addition, the implementation of Multi-Access Edge Computing (MEC) significantly reduces latency by bringing computation and storage closer to the end users [74]. Instead of sending all data to a central server for processing, local computations can be performed on the edge devices or nearby edge servers. The total time taken for data processing (

T_{t o t a l}

) can be expressed as follows:

T_{t o t a l} = T_{e d g e} + T_{t r a n s m i s s i o n}

By minimizing the transmission time (

T_{t r a n s m i s s i o n}

) through local processing, we effectively reduce (

T_{t o t a l}

), allowing for faster model updates and more efficient data handling. Additionally, network slicing enables the creation of virtual networks tailored to specific application requirements [75]. Each slice can be allocated resources based on its unique needs, ensuring that real-time model updates from tractors receive priority bandwidth. The total resources allocated to the slices can be represented as follows:

S_{t o t a l} = \sum_{i = 1}^{n} S_{i}

where (

S_{i}

) denotes the resources allocated to slice (i). This flexibility allows the system to support diverse applications simultaneously without degrading performance.

Dynamic Data Transmission—The concept of dynamic data transmission is a critical aspect that warrants thorough exploration within the context of our system. To elucidate this idea further, let us define what we mean by variable data transmission. This term refers to the fluctuations and inconsistencies in network conditions that can affect the reliability and efficiency of data transfer. For instance, consider the workflow in our system: the training process occurs overnight, when the tractor is not in operation. During this time, the model undergoes refinement based on the accumulated data, ensuring that it is well-prepared for real-time application. Once the training process is successfully completed and the tractor is powered on in the morning for tasks such as tilling or harvesting, the enhanced model is ready for deployment.

At this stage, the next crucial step is to transmit the updated model array to either the driver node or the global server for cross-model aggregation. This aggregation is essential for consolidating insights from various models to improve overall predictive accuracy. However, a significant challenge arises during this data transmission phase: we may encounter variability in network conditions, including potential network unavailability. Such variability can stem from various sources, including fluctuations in signal strength, changes in user demand, or physical obstructions that interfere with connectivity. The implications of these network conditions can range from delayed data transmission to complete communication failures, which could hinder the timely application of the model’s predictions in the field. This phenomenon of network variability is not merely a theoretical concern; it is a practical issue that can occur frequently in real-world scenarios. Therefore, we recognize the importance of investigating this aspect further in the near future. By understanding how network conditions impact data transmission, we can develop more robust strategies to ensure the reliability and effectiveness of our system, ultimately leading to improved outcomes for farmers and agricultural practices.

5.4.4. The Receiving Phase

The receiving phase is a crucial component of our system, signifying the moment when model updates are received from either the driver node or the global server. This phase follows the completion of the training process, during which the updated model is dispatched to the designated driver node or global server. Once the training phase concludes, the tractor or edge node anticipates the return of the model weights that have been refined during the training process. It is important to note that the specific model received will vary based on the type of prediction being made. This variation is determined by the Alpha Score, as discussed in the Cluster Formation section. The Alpha Score serves as a foundational metric that influences the selection of the appropriate model for the given predictive task, ensuring that the predictions align with the current operational requirements.

However, in scenarios where the model update has not been received—often due to dynamic network conditions such as signal fluctuations or interruptions in connectivity—the tractor or edge node must adapt. In such cases, the node will initiate aggregation using the last updated model from the global server. This allows the tractor to proceed with its predictive tasks even in the absence of the latest model weights, ensuring that operations can continue smoothly. As the farmer begins using the tractor in the morning for various tasks, such as tilling or harvesting, the tractor will utilize the available model to make predictions. To enhance the effectiveness of this process, we will provide the farmer with updates regarding the status of the last received model. Additionally, we will suggest the optimal location for parking the tractor overnight to improve network conditions. This strategic positioning can facilitate better connectivity, thereby enhancing the likelihood of successfully dispatching the model or receiving the latest model weights and updates. This receiving and updating process is not a one-time event; rather, it is a continuous cycle that persists until the model converges or achieves the desired levels of accuracy. By maintaining this iterative approach, we ensure that the system remains responsive and adaptive, continually refining predictions based on the most current data and model updates available.

To effectively suggest the ideal network conditions and the timing of the last model update to the farmer, we can define a Network Quality Index (NQI) that evaluates the current state of the network based on several critical factors. The NQI is calculated using a formula that incorporates signal strength, available bandwidth, and latency. Specifically, the equation is expressed as follows:

N Q I = w_{1} \cdot S + w_{2} \cdot B - w_{3} \cdot L

(20)

In this equation, S represents the signal strength, which is scaled from 0 to 1, B denotes the available bandwidth in megabits per second (Mbps), and L signifies the latency in milliseconds. The weights

w_{1}, w_{2},

and

w_{3}

are used to reflect the relative importance of each factor in determining the overall network quality, enabling a comprehensive assessment of connectivity.

In conjunction with the NQI, we also consider the timing of the last model update to provide the farmer with actionable insights. We can represent this timing through the equation:

T_{s u g g e s t i o n} = \max (0, T_{t h r e s h o l d} - T_{l a s t})

(21)

Here,

T_{s u g g e s t i o n}

indicates the optimal timeframe for the farmer to park the tractor to ensure the best network conditions.

T_{t h r e s h o l d}

is defined as the maximum allowable time since the last model update, while

T_{l a s t}

denotes the elapsed time since the last update was received. By calculating this value, we can suggest a specific time frame for the farmer’s activities to align with the ideal conditions for data transmission. Furthermore, to recommend an ideal parking location based on both the NQI and the timing of the last update, we can define a function:

I d e a l_L o c a t i o n = f (N Q I, T_{s u g g e s t i o n})

(22)

In this context,

I d e a l_L o c a t i o n

represents the best parking spot for the tractor, determined through a function f that integrates the current NQI and the suggested time for parking. This function utilizes predefined locations that have known network conditions, enabling the system to provide tailored recommendations based on real-time assessments.

Ultimately, we can combine these factors into an overall suggestion equation:

S u g g e s t i o n = h (N Q I, T_{l a s t}, T_{t h r e s h o l d})

(23)

In this equation,

S u g g e s t i o n

encapsulates the recommendations for the farmer regarding the optimal time and location to park the tractor. The function h integrates the network quality index and the time since the last model update, yielding actionable insights that enhance connectivity and ensure efficient data transmission. By employing this comprehensive approach, we can significantly improve the farmer’s operational efficiency and the overall effectiveness of the federated learning system Figure 5.

5.5. Evaluations at Equipment Level

The evaluation process at the edge node is a critical step in our system, as it directly influences decisions regarding both the timing and the method of dispatching the data array. Understanding the nuances of this evaluation is essential for optimizing data transmission and ensuring effective communication within the network. Now, let us delve into each aspect of this process to gain a deeper insight into its significance and functionality.

5.5.1. Checkpointing or Model Evaluation

To determine when to send the model data array, as it saves energy by not sending the model updates, we have implemented a checkpointing mechanism to facilitate effective communication between the global server, driver nodes, and edge nodes. This process is integral to managing the flow of information and ensuring that only relevant model updates are transmitted. Checkpointing takes place at each node and utilizes the cosine similarity [76,77] and the dot product function to evaluate the significance of model weight changes. The primary purpose of this approach is to determine whether the model weights deviate from the previously established model weights. For example, in a scenario where an edge node continuously transmits the same model weights due to repeated training on an unchanged dataset, we can identify this redundancy during the federated aggregation process. By filtering out these non-essential models, we optimize computational resources and enhance overall system efficiency.

Similarly, if the driver node identifies that there are no meaningful changes in the model weights compared to the previous version sent to the global server, it will refrain from transmitting the latest model weights for aggregation. This selective transmission process has been demonstrated to significantly decrease the number of communication rounds required between the driver node and the global server, thereby conserving bandwidth and reducing latency. To quantify the similarity between the model weights, we employ the cosine similarity function, which is mathematically defined as follows:

C o s i n e (A, B) = \frac{\sum_{i = 1}^{n} A_{i} B_{i}}{\sqrt{\sum_{i = 1}^{n} A_{i}^{2}} \sqrt{\sum_{i = 1}^{n} B_{i}^{2}}}

Additionally, the dot product of two vectors A and B is calculated as follows:

A \cdot B = \sum_{i = 1}^{n} A_{i} B_{i}

In these equations,

A_{i}

and

B_{i}

represent the individual components of vectors A and B, respectively, across n dimensions in the model weight space. Within each node, both cosine similarity and dot product calculations are employed to compare the newly received model weights against the previous version. Initially, the cosine similarity algorithm is executed, returning a value of 1 if a discrepancy is detected and 0 if the weights remain unchanged. To reinforce the accuracy of these results, we further validate them by applying the dot product algorithm. This dual verification process is critical in determining whether the model weights should be transmitted to the global server, thereby ensuring that only significant updates are communicated and enhancing the efficiency of our federated learning framework.

5.5.2. Proximity Evaluations

Proximity evaluation is a critical process that occurs when network conditions are less than optimal. For instance, if a tractor or edge node experiences challenges in establishing a stable connection or faces difficulties in sending or receiving model weights from other nodes, it becomes essential to implement a strategy that mitigates these issues. In such situations, we prioritize finding the closest neighboring node within the same cluster. By identifying a nearby node, we can initiate a connection with it, thereby enhancing the likelihood of successful data transmission. This approach leverages the principle of proximity to reduce the communication distance, which can significantly improve connection reliability, especially when faced with weaker network signals.

Moreover, it is important to note that the dispatch of model weights will only occur if there is a significant deviation in the model compared to the previously received version. This condition ensures that we are not transmitting redundant data, thereby optimizing bandwidth usage and maintaining the efficiency of the overall system. In our specific scenario, we have chosen to apply local proximity evaluation, particularly when operating under 3G network conditions. This choice is motivated by the inherent limitations of 3G technology, which can lead to increased latency and reduced data transfer speeds. By focusing on local proximity in these circumstances, we aim to enhance the effectiveness of our data transmission strategy, ensuring that model updates are communicated promptly and reliably, despite the challenges posed by the network environment. This proactive approach is essential for maintaining the integrity and performance of our federated learning system.

5.6. Health Verifications

Ensuring the operational integrity of the communication channel between the global server and the driver nodes is of utmost importance in Federated Learning (FL) systems [7]. To achieve this, we introduce a health status verification mechanism that relies on the regular dispatch of model weights from the driver nodes to the global server [9]. This process involves the global server performing federated averaging with the received weights and subsequently relaying the updated weights back to the driver nodes [45]. In turn, these driver nodes disseminate these updates to the worker nodes within their respective clusters [34].

5.6.1. Driver Termination Scenario

In instances where a driver node becomes non-operational, resulting in the potential orphaning of all associated worker nodes, we identify two primary causes: computational overload due to non-aggregation tasks or disruptions in the power supply [78]. The cessation of model update transmissions from a driver node serves as an early indicator of these issues [79].

To proactively verify the operational status of driver nodes, the global server implements a check mechanism that examines the timestamp of the last received model update. If a significant delay is detected, a diagnostic ping is sent to the affected driver node [80]. The protocol for this health check is defined as follows:

HealthCheck (D_{i}) = \{\begin{matrix} “ Alive ”, & if D_{i} responds to ping, \\ “ Dead ”, & otherwise . \end{matrix}

(24)

5.6.2. Driver Status Verification

In the subsequent scenario, if the driver node acknowledges the ping from the global server, its status is marked as “Alive”. However, if an “Alive” response is received without any subsequent model updates, the driver node’s status is further scrutinized [81]. This is accomplished by comparing the previously transmitted model weights against the current ones using cosine similarity and dot product functions. Table 1 provides an overview of the health status of driver nodes as monitored by the global server [82]. This continuous monitoring facilitates the early detection of potential issues, enabling swift remedial actions to maintain system flow and enhance overall system robustness.

5.6.3. New Driver Node Election

Following the termination of a driver node, the Federated Learning (FL) system promptly initiates a decentralized driver selection mechanism to elect a new driver from the pool of available worker nodes. This process is grounded in criteria such as computational capacity and network stability, ensuring a democratic and consensus-based selection of the new driver. Once elected, the new driver node synchronizes with the global server to receive the latest model updates, which it then disseminates to the worker nodes within its cluster. This seamless transition and efficient update dissemination are crucial for ensuring the resilience of the FL system, maintaining uninterrupted operations and continuity of learning despite node failures.

5.7. Secure Data Transmission

Data components are encrypted using the Salsa20 encryption scheme Algorithm 3, ensuring secure transmission to the global server for parallel integration during cluster formation. Salsa20’s primary advantage lies in its performance efficiency as a stream cipher, which allows data to be encrypted in smaller chunks or “streams”. This results in lower latency during encryption and decryption, making it ideal for applications requiring real-time data processing, such as our federated learning framework.

Additionally, Salsa20 is optimized for speed across various hardware platforms, including those without specialized cryptographic support. In resource-constrained environments like edge devices (e.g., tractors with sensors), Salsa20 offers faster encryption and decryption times compared to AES, leading to quicker data transmission and reduced operational delays, critical for real-time decision-making in agricultural practices.

To protect the privacy and integrity of data exchanged between edge devices (driver nodes) and the global server, we implement Salsa20 encryption. This method is particularly beneficial in edge computing scenarios where AES hardware acceleration may be lacking. Each edge device encrypts its local dataset summary and geographical location using a shared secret key and nonce. The global server then decrypts the received data using the same key and nonce, ensuring secure and confidential data exchange. Algorithm 3 details the encryption process using Salsa20. Each device

e_{i}

in the set of edge devices

E = {e_{1}, e_{2}, \dots, e_{n}}

maintains the confidentiality of its dataset summary

D_{i}^{'}

and geographical location

l o c_{i}

through encryption, which can be mathematically formalized as follows:

C_{i} = {Encrypt}_{K, N} (D_{i}^{'} ∥ l o c_{i})

(25)

where

C_{i}

signifies the encrypted data, K represents the shared 256-bit secret key, N is the nonce, and ∥ denotes concatenation. The decryption operation performed by the global server is similarly expressed as follows:

D_{i}^{'}, l o c_{i} = {Decrypt}_{K, N} (C_{i})

(26)

This encryption mechanism ensures the security of the data during transmission, leveraging the cryptographic strength of Salsa20.

Algorithm 3 Secure Data Transmission with Salsa20 Encryption

1:: Input:
2:: Local dataset summary $D_{i}^{'}$ and geographical location $l o c_{i}$ for each edge device $e_{i}$ .
3:: Output:
4:: Securely transmitted encrypted data.
5:: procedure SecureDataTransmission
6:: for each device $e_{i} \in E$ do
7:: // Step 1: Generate key and nonce
8:: Generate a 256-bit key K and a nonce N.
9:: // Step 2: Serialize data
10:: Serialize $D_{i}^{'}$ and $l o c_{i}$ into bytes.
11:: // Step 3: Encrypt serialized data
12:: Encrypt serialized data using K and N with Salsa20 to obtain $C_{i}$ .
13:: // Step 4: Transmit encrypted data
14:: Transmit $C_{i}$ and N to the server.
15:: end for
16:: On the server side:
17:: for each received pair $(C_{i}, N)$ do
18:: // Step 1: Decrypt data
19:: Decrypt $C_{i}$ using K and N to retrieve $D_{i}^{'}$ and $l o c_{i}$ .
20:: end for
21:: end procedure

6. Experimental Results

We have deployed our system utilizing eight tractors equipped with two distinct sensor types: the Nutrient Sensor and the Crop Health Sensor [83]. Training data were sourced from a variety of online platforms, including USDA [84] and Kaggle, among others. Additionally, we employed web crawlers, developed using Jsoup, to gather sensor data from multiple sources. Our dataset comprises a total of 235,191 data points, encompassing both the Nutrient Sensor and the Crop Health Sensor [85]. In the following sections, we will delve into each execution step of our system, illustrated through practical examples.

6.1. Cluster Formation

The first step of our system involves the formation of clusters. To initiate the process of node splitting or creating new clusters, we deploy models to the eight tractors operating in the fields. These models are transmitted from a global server, where they have been trained on a sample of the training data. Our approach utilizes non-identically distributed data, meaning that each of the eight tractors operates on distinct, unevenly distributed datasets. Specifically, four tractors are assigned Nutrient Sensor data, while the remaining four are equipped with Crop Health Sensor data. Each tractor will commence its training process, with careful attention given to the implementation of cross-validation techniques to ensure the accuracy of our results and to mitigate the risk of overfitting. Upon the completion of the training phase, the tractors will transmit a data array back to the global server. This data array comprises several critical elements, including the Alpha-Schema Score, Consumption metrics, geographic coordinates (latitude and longitude), and the Model Weights, as illustrated in Figure 6. A comprehensive discussion of these methods can be found in the system architecture section.

When the global server receives these data arrays, its initial task is to parse the elements contained within. First, it validates the incoming data to ensure its integrity and reliability. A key component of this validation process involves interpreting the model type, which is inferred from the Alpha-Schema Score. This score serves as an indicator of whether the tractor has been trained using the Nutrient Sensor or the Crop Health Sensor. Following the identification of the model type, the global server proceeds to assess the battery status and computational power of the respective tractors. Utilizing the Consumption Ability score, the server determines the most suitable driver node for processing. It is important to note that the driver node may also be a local server, depending on the configuration of the data stream. For instance, if a tractor has been trained on a local model and subsequently sends its data to a local server, that server essentially acts as a proxy, facilitating communication between the end node and the global server.

After establishing the driver node, the next critical step involves calculating the physical distance between the various nodes. This distance is vital for two main reasons: first, it enables the segmentation of larger groups into smaller, more manageable clusters based on proximity; second, it helps identify the closest nodes for each cluster. Additionally, it is worth mentioning that the data array is transmitted to the global server using the lightweight Salsa20 encryption algorithm, which ensures secure data transmission. Further details regarding this encryption algorithm can be found in the system architecture section. The final step in our process involves the aggregation of model weights that have been extracted from the data arrays by the global server. To achieve this aggregation, we employ the Federated Averaging (FedAVG) technique. This method allows us to effectively combine the weights from the models trained on the different tractors. As a result, we generate two distinct models: one based on the Nutrient Sensor data and the other based on the Crop Health Sensor data. These aggregated models will then be transmitted back to their respective tractors, ensuring that each tractor receives the model most relevant to its data type (Figure 3).

6.2. Equipment Operations

Upon receiving the updated global server model, each of the eight tractors will initiate its operational lifecycle process. To facilitate understanding, let us focus on a single tractor equipped with the Nutrient Sensor Model and explore its daily activities in detail. In the morning, as the farmer prepares to operate the tractor for tasks such as harvesting or tillage, the prediction process begins concurrently with the tractor’s movement through the fields. During this phase, the tractor actively assesses the nutrient levels of the crops. Should any deficiencies or issues be detected, the system will automatically issue a command to spray fertilizers, ensuring that the crops receive the necessary nutrients. This predictive monitoring and intervention will persist as long as the tractor is operational in the field.

As night falls, the tractor will shift its focus to the training phase, utilizing the data collected throughout the day during harvesting or tillage. It is important to note that many tractors are equipped with a CAB computer device, which is installed within the tractor itself. This computational resource will be leveraged to facilitate the training of the collected data. Once the training process is completed, the tractor will transmit the model weights to the designated driver node within its cluster. The driver node will concurrently gather model weights from other tractors operating in the vicinity. After collecting all relevant model weights, the driver node will apply the Federated Averaging (FedAVG) technique to combine these weights effectively. Following the aggregation, the driver node will validate the new model to determine whether it exhibits significant deviations from the previously received model from the global server. If the analysis indicates that the new model weights are indeed significantly different, only then will these updated weights be sent to the global server for further aggregation. For the purpose of this discussion, let us assume that the new model weights are indeed substantially different from the last update. In this case, the global server will receive the new model and perform a FedAVG aggregation at its level. Subsequently, the newly aggregated model will be sent back to the driver node.

Before morning arrives, this updated model obtained from the global server will be distributed back to each tractor within the cluster. As the farmer activates the tractor in the morning, it will be equipped with the latest model, enhancing its predictive capabilities and overall awareness concerning nutrient levels. This process facilitates the exchange of knowledge across the tractors without the need for direct sharing of the underlying data itself, thereby maintaining data privacy while improving operational efficiency.

6.3. Dynamic Networks

Let us now move on to another critical aspect of our system: managing the dynamic nature of wireless communication signals encountered by tractors in the field. It is well understood that tractors, like any other devices reliant on wireless communication, are subject to variable signal types depending on their location. For instance, while a farmer operates their tractor, they may experience fluctuations in network connectivity, such as transitioning from a 3G signal in one area to a 4G or even a 5G signal in another part of the field. Addressing this challenge is crucial, as inconsistent network connectivity can introduce potential roadblocks that may disrupt the seamless operation of our system. Variability in signal strength can affect data transmission rates, delay communications between the tractor and the global server, and ultimately impact the effectiveness of real-time monitoring and decision-making processes.

To mitigate these issues, we have implemented robust strategies that enhance the resilience of our system against fluctuating network conditions. This may include adaptive algorithms that can intelligently switch between available networks based on signal strength and reliability, ensuring continuous connectivity. Additionally, we may utilize data-buffering techniques that allow the tractors to store critical information locally during periods of poor connectivity, which can then be transmitted to the global server once a stable connection is re-established. By proactively addressing the challenges posed by dynamic signal types, we can enhance the reliability and efficiency of our system, ensuring that farmers can confidently operate their tractors and make informed decisions based on accurate, real-time data, regardless of their location in the field.

As illustrated in Figure 5, our system is designed to identify an optimal “smart location” for parking the tractor at night, facilitating both training processes and network convenience. The determination of this smart location is based on the Network Quality Index (NQI) metrics. While the tractor is in operation in the field, we continuously calculate the NQI to recommend to the farmer the most suitable location for parking. If the farmer chooses to park the tractor in this recommended smart location, the end-to-end process execution will be significantly expedited, minimizing any potential latencies that could arise from suboptimal network conditions. However, various factors, such as adverse weather conditions or other unforeseen circumstances, may lead the farmer to park the tractor at a random location instead. This decision could result in encountering a range of network conditions, which may complicate data transmission and communication. In our scenario, we have considered three distinct network types: 3G, 4G, and 5G. We have previously discussed our strategies for navigating the challenges posed by these varying network types in an earlier section. We encourage readers to refer to that section for a comprehensive understanding of how our system adapts to different network conditions, ensuring reliable operations regardless of the chosen parking location.

6.4. Performance Metrics

In this comprehensive experimental analysis, we aimed to rigorously evaluate the performance metrics of a range of tractor models and sensor systems within the framework. This evaluation is crucial for understanding the effectiveness of these systems in real-world agricultural applications, where precision and efficiency are paramount. The results of our analysis are meticulously detailed in several tables that encapsulate a variety of performance metrics, including accuracy, F1 score, precision, recall, ROC AUC, log loss, communication statistics, processing latency Figure 7, and model latency. Each of these metrics plays a vital role in assessing the performance and reliability of the models and sensors being studied.

To begin with, Table 2 provides an in-depth look at the performance metrics for eight distinct tractor models following 30 epochs of training. The accuracy scores for the test data varied significantly among the models, ranging from a respectable 0.8081 for Tractor 2 to an impressive 0.8551 for Tractor 1. This suggests that Tractor 1 not only achieved the highest accuracy but also indicates its potential as a leading model in terms of reliability and effectiveness in agricultural tasks. The F1 score, which is a harmonic mean of precision and recall, further illustrates the performance landscape, with Tractor 1 obtaining a score of 0.7192 for the test dataset. This metric is particularly important, as it provides insight into the model’s ability to balance false positives and false negatives, which is essential in applications where misclassification can have significant repercussions. In examining precision and recall, we observe that Tractor 1 again outperformed the others, achieving a precision of 0.8021 and a recall of 0.8394. These values indicate that the model not only correctly identifies a high proportion of positive instances but also maintains a level of sensitivity that enables it to capture the majority of relevant cases. The ROC AUC scores, which gauge the model’s ability to discriminate between positive and negative classes, ranged from 0.7976 for Tractor 2 to 0.8289 for Tractor 7. This variation underscores the differing capabilities of the models to perform under various conditions and task requirements. Notably, the log loss metric, which assesses the model’s confidence in its predictions, was lowest for Tractor 1 at 0.39982. This low log loss value suggests that Tractor 1’s predictions were not only accurate but also conveyed a higher level of confidence, further solidifying its status as a reliable model.

Moving beyond the tractor models, we also evaluated the performance of nutrient and crop health sensors, as presented in Table 3 and Table 4. This table compares the performance metrics across several training rounds, providing a clear view of how the models evolved and improved with additional training. The accuracy and F1 scores for both sensor types demonstrated significant enhancements from Round 1 through Round 30. For instance, by Round 30, the accuracy of the nutrient sensors surged to an impressive 0.918972, accompanied by an F1 score of 0.911952. This remarkable improvement illustrates the model’s ability to learn and adapt over time, indicating a strong convergence and an effective understanding of nutrient levels in the agricultural context. Similarly, the crop health sensor exhibited a final round accuracy of 0.903333 and an F1 score of 0.859886, reinforcing the notion that these sensors are instrumental in accurately assessing crop health indicators. The analysis also encompassed communication and latency metrics, which are crucial for understanding the operational dynamics between the tractors and the global server, as depicted in Table 5. The average number of communications per model was found to be 10 for peer-to-peer interactions and 20 when including communications with the driver. These statistics highlight the communication demands placed on the system, which is critical for ensuring timely and effective model updates. The recorded average model latency of 99.63821 milliseconds is particularly noteworthy for real-time applications, as lower latency can lead to quicker decision-making and improved responsiveness in agricultural operations. Additionally, the cost drop metrics provide compelling evidence of the framework’s effectiveness in reducing operational costs during model updates, thereby enhancing the overall efficiency of agricultural practices.

Furthermore, Table 6 highlights the processing latency associated with both nutrient and crop health sensors over several rounds of training. The data reveal a significant trend: the introduction of checks markedly reduced processing latency Figure 8. This is particularly evident in Round 30, where the latency for the nutrient sensor with checks enabled dropped to 1.3631 min, in stark contrast to the 2.6431 min recorded without checks. This finding is critical, as it suggests that integrating checks into the processing pipeline not only enhances efficiency but also contributes to faster response times, which are essential for real-time agricultural management. The extensive performance metrics derived from the various tractor models and sensor systems underscore a strong correlation between the number of training epochs and the efficacy of the models. This correlation is substantiated by the consistent increases in accuracy and F1 scores observed as training progressed. The communication and processing latency metrics Table 7 further illuminate the operational advantages offered by the framework, emphasizing its potential for effective real-time agricultural management. The insights gleaned from this analysis not only illustrate the scalability and robustness of the models developed under the framework but also pave the way for future enhancements and the application of these technologies in practical agricultural scenarios. The results of this study serve as a foundation for subsequent research and development efforts aimed at optimizing agricultural practices through advanced machine learning techniques and sensor integration, ultimately contributing to the evolution of precision agriculture and sustainable farming practices. This comprehensive understanding of model performance and operational efficiency positions the framework as a vital tool in the ongoing quest to enhance productivity and sustainability in the agricultural sector.

7. Discussion

This section explores key research questions vital for understanding our proposed system architecture. We will examine the mechanisms and principles that govern the framework, highlighting the interactions among its components within the agricultural context. By addressing these questions, we aim to clarify operational efficiencies, design decisions, and innovative strategies integral to our architecture, emphasizing its relevance in smart farming and federated learning. This exploration seeks to enrich the dialogue surrounding our research and encourages further inquiry within the academic community.

7.1. System Performance Under Extreme Network Outages

Our framework effectively manages extreme and persistent network outages common in rural areas. Edge devices, such as tractors, can operate autonomously by locally storing sensor data and model updates, ensuring vital information is preserved for model training, even without connectivity. This allows farmers to make informed decisions based on real-time data. Upon reconnection, a differential data transmission strategy is employed, sending only changes from the last known state, thus minimizing data transmission risks and optimizing bandwidth. The framework incorporates a checkpointing mechanism that saves the model’s state at various training stages, enabling devices to revert to the most recent stable version during outages. In persistent network conditions, proximity-based communication allows devices to connect with nearby edge nodes or local servers for updates, reducing latency and improving reliability. Adaptive algorithms assess real-time network conditions, adjusting communication strategies to prioritize essential updates or compress data as necessary. A local buffering strategy queues critical data for transmission when connectivity is restored, ensuring data are not lost. Additionally, the system provides real-time feedback on network status, advising farmers on optimal tractor parking locations to enhance connectivity.

7.2. Scalability of the Proposed Heterogeneous FL Framework

Scalability is crucial as the number of edge nodes in agriculture can reach hundreds or thousands. Our design accommodates this growth through key mechanisms that ensure efficient operation and resource management. A hierarchical architecture forms clusters of edge devices, allowing local aggregation of model updates and significantly reducing communication overhead typically seen in centralized federated learning systems. Dynamic data transmission strategies prioritize essential updates and compress data, maintaining reliable communication and timely model updates as edge nodes increase. The federated averaging process aggregates contributions from multiple devices, enhancing convergence speed and model accuracy. This decentralized approach promotes resilience against network disruptions, allowing independent operation of edge nodes while contributing to the global model. To maintain performance as the scale increases, we have implemented a health verification mechanism that monitors edge node status. This proactive feature identifies non-operational devices, ensuring overall system reliability. Future work will validate scalability through real-world testing in diverse agricultural environments, refining our strategies for optimal performance.

7.3. Failure Recovery Mechanisms During Node Failures

We have established robust recovery mechanisms to address node failures, essential for operational continuity in agriculture. Each edge device can locally store critical model updates, allowing quick recovery from power loss or hardware failures by resuming from the last known state. A checkpointing system regularly saves the model’s state during training, enabling a return to the last successful checkpoint, thus reducing downtime. In cases where a node fails, proximity-based strategies allow neighboring devices to assume the failed node’s responsibilities, ensuring uninterrupted data processing and model updates. Dynamic node reassignment redistributes tasks among available nodes, supported by a health verification mechanism that continuously monitors operational status. Additionally, remote monitoring and diagnostics capabilities enable real-time tracking of node health, facilitating prompt issue resolution and minimizing downtime.

7.4. Performance Deviations Due to Seasonal and Soil Variations

Agricultural practices are influenced by environmental factors such as seasonal changes and soil variations, which can significantly impact crop yields and system performance. Our framework anticipates performance deviations in response to these factors. Seasonal variations affect temperature, precipitation, and sunlight, which influence plant growth. Models must adapt to prioritize different agricultural needs throughout the seasons, ensuring relevance and effectiveness. Soil variations, including texture, composition, and nutrient content, can also affect model performance. Our system incorporates real-time data from sensors across fields to dynamically adjust its models based on specific soil conditions, enhancing prediction accuracy and providing tailored recommendations for resource utilization. The continuous learning aspect of our framework allows it to adapt to long-term climate and soil health changes. By leveraging data collected from edge devices, the system can identify trends and make predictive adjustments, ensuring robust performance amidst seasonal and soil-related challenges.

8. Future Work

There are several avenues we would like to explore for future research that can further enhance the capabilities of our proposed method. One promising direction is the exploration of hybrid federated learning (FL) models that combine both centralized and decentralized approaches. By integrating the strengths of hybrid FL, we can potentially improve model accuracy and convergence rates, while maintaining the benefits of data privacy and reduced communication costs. This integration could allow for more comprehensive learning from diverse data sources, ultimately leading to more informed agricultural practices. Additionally, we recognize the increasing importance of sustainability in agricultural operations. Therefore, future research will investigate the possibility of integrating our federated learning framework with renewable power resources. By harnessing solar, wind, or other renewable energy sources, we can power edge devices in remote agricultural settings, enhancing their operational efficiency and reducing reliance on traditional energy grids. This integration could also facilitate the deployment of more sophisticated sensors and computational resources in the field, further improving data quality and model performance. Future studies will also focus on enhancing the resilience of our framework in the face of dynamic network conditions, optimizing communication strategies based on varying connectivity levels to ensure continuous and reliable model updates. By focusing on these future research directions, we aim to advance our federated learning framework into an even more powerful tool for enhancing productivity, resilience, and sustainability in agriculture.

9. Conclusions

The proposed agricultural system architecture signifies a notable advancement in integrating intelligent technologies within farming practices. By utilizing a federated learning approach, the system ensures sensitive data remains localized, facilitating effective model training across multiple edge devices, specifically tractors equipped with nutrient and crop health sensors. This decentralization enhances data security and operational efficiency, reducing risks associated with transmitting raw data over potentially insecure networks. Meticulously designed, the architecture enables the formation of clusters based on the unique operational characteristics of each tractor, utilizing innovative methodologies like alpha schema-based scoring and consumption ability assessments. This intelligent grouping optimizes resource utilization and enhances the predictive analysis’s overall effectiveness. Additionally, the architecture incorporates robust mechanisms for health verification, driver node selection, and secure data transmission. By employing the Salsa20 encryption algorithm, the system protects the integrity and confidentiality of data exchanged between edge devices and the global server, fostering a secure environment for collaborative learning. The focus on adaptive data transmission strategies, especially in varying network conditions, highlights the system’s resilience and adaptability in real-world agricultural settings. This approach enhances the reliability of model updates and ensures that farmers can make timely, informed decisions based on real-time data insights.

Ultimately, the architecture paves the way for a more sustainable and productive agricultural ecosystem, equipping farmers with the necessary tools to optimize their operations while minimizing environmental impact. As the agricultural industry evolves, integrating such innovative technologies will play a crucial role in shaping the future of farming, driving efficiency, and promoting responsible resource management. The insights gained from this system may also serve as a foundation for future research and development, further enhancing the capabilities of intelligent agricultural practices.

Author Contributions

Conceptualization S.P.; Methodology, S.P.; Software, S.P.; Validation, S.P.; Formal analysis, K.S.; Resources, S.P.; Data curation, S.P. and K.S.; Writing – original draft, S.P.; Writing – review & editing, K.S.; Supervision, K.S.; Project administration, K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was conducted without any external funding or financial support from outside organizations. All aspects of this study, including design, data collection, analysis, and manuscript preparation, were carried out independently by the authors.

Institutional Review Board Statement

In the context of our research, it is important to clarify that our experiments did not involve the use of human participants or animals. As such, the ethical considerations typically associated with studies involving these subjects are not applicable in this case. Furthermore, the ethical review process has been deemed unnecessary, as our research relied exclusively on publicly available datasets. These datasets are accessible to the research community and do not require ethical approval for their use.

Data Availability Statement

In our experiments, we utilized a comprehensive array of publicly accessible datasets related to nutrient sensors and crop health sensors. For nutrient sensor data, we accessed various datasets provided by the USDA, which encompass critical information concerning soil nutrients and crop health, available at https://www.nass.usda.gov/. Additionally, the Global Soil Information Facilities (GSIF) offered us global soil nutrient information, including a variety of soil property datasets, which we accessed at https://www.globalsoilmap.net/. We also leveraged the Soil Data Mart, which serves as a repository for soil data, including nutrient information from multiple locations, accessible at https://sdmdataaccess.nrcs.usda.gov/. Furthermore, the FAO Global Land Cover Dataset provided valuable land cover data that we correlated with nutrient levels across different regions, which can be found at http://www.fao.org/geonetwork/srv/en/metadata.show?id=12683, accessed on 19 October 2024 Regarding crop health sensor datasets, we utilized NASA’s MODIS (Moderate Resolution Imaging Spectroradiometer) data, which provides satellite information for monitoring crop health and vegetation cover, accessible at https://modis.gsfc.nasa.gov/data/, accessed on 19 October 2024. We also incorporated Sentinel-2 satellite imagery data from the European Space Agency, which includes essential information on vegetation health and is available at https://scihub.copernicus.eu/dhus, accessed on 21 October 2024. Additionally, we employed CropScape, a web-based application that offers geospatial data on crop cover and health across the United States, which can be found at https://nassgeodata.gmu.edu/CropScape/. The datasets from the Precision Agriculture Data provided by the University of Nebraska-Lincoln were also instrumental in our research, offering insights into crop health and nutrient management practices, accessible at https://precisionag.unl.edu/resources, accessed on 19 October 2024. We utilized USDA Plant Health Data, which includes information on plant health monitored through various sensors and remote sensing technologies, found at https://www.aphis.usda.gov/aphis/resources/pests-diseases, accessed on 26 October 2024. Lastly, we explored a variety of datasets related to crop health and agricultural practices available for analysis on Kaggle at https://www.kaggle.com/datasets, accessed on 19 October 2024.

Conflicts of Interest

The authors wish to explicitly declare that there are no conflicts of interest associated with this research. This statement affirms that none of the authors have any financial, personal, or professional relationships that could be perceived as influencing the integrity or objectivity of the research findings presented in this study. We maintain a commitment to transparency and ethical standards in our work, ensuring that our conclusions are based solely on the data and analysis conducted during the research process. This declaration serves to reinforce our dedication to upholding the highest standards of academic integrity and trustworthiness in our publication.

References

Food and Agriculture Organization of the United Nations (FAOSTAT). Food and Agriculture Data. 2022, p. 2. Available online: http://www.fao.org/faostat/en/ (accessed on 2 October 2024).
Tilman, D.; Balzer, J.; Hill, J.; Befort, C.L. Global food demand and the sustainable intensification of agriculture. Proc. Natl. Acad. Sci. USA 2011, 108, 20260–20264. [Google Scholar] [CrossRef]
Mulla, D.J. Twenty five years of remote sensing in precision agriculture: Key advances and remaining knowledge gaps. Remote Sens. 2013, 5, 2440–2463. [Google Scholar] [CrossRef]
Sharma, A.; Choudhary, A.; Singh, A. Big Data Analytics in Agriculture: A Review. Inf. Process. Agric. 2020, 7, 1–14. [Google Scholar]
Patel, S.; Kumar, A.; Gupta, R. IoT in Smart Agriculture: A Review of Applications and Future Directions. J. Ambient Intell. Humaniz. Comput. 2021, 12, 10105–10118. [Google Scholar]
Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. TIST 2019, 10, 1–19. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
Gao, Y.; Zhang, X. Adaptive Federated Learning for Agriculture: A Case Study of Crop Yield Prediction. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4001–4013. [Google Scholar]
Akhter, R.; Sofi, S.A. Precision agriculture using IoT data analytics and machine learning. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 5602–5618. [Google Scholar] [CrossRef]
Li, J.; Wang, L.; Zhang, Y. Data Security in Precision Agriculture: A Review. Comput. Electron. Agric. 2021, 186, 106184. [Google Scholar]
Gao, Y.; Zhang, X. Adaptive Federated Learning for Agriculture under Dynamic Network Conditions. In Proceedings of the 2023 IEEE Conference on Smart Agriculture, Thessaloniki, Greece, 23–30 September 2023; IEEE: New York, NY, USA, 2023; pp. 1–7. [Google Scholar]
Yang, Y.; Chen, T.; Zhao, J. Network Efficiency in Federated Learning for Smart Agriculture. Comput. Electron. Agric. 2023, 206, 107760. [Google Scholar]
Kumar, A.; Singh, R. Precision Irrigation through IoT: A Case Study. In Proceedings of the 2022 IEEE International Conference on Smart Agriculture, Huaihua, China, 20–22 January 2022; IEEE: New York, NY, USA, 2022; pp. 1–5. [Google Scholar]
Hossain, M.S.; Uddin, S.M.S. Dynamic Adaptation Strategies in Federated Learning for Smart Agriculture. IEEE Access 2023, 11, 1–14. [Google Scholar]
Yang, Y.; Wang, L.; Zhao, H. Federated Learning for Smart Agriculture: Frameworks and Applications. Agric. Syst. 2023, 205, 102809. [Google Scholar]
Patel, R.; Kumar, S. A Federated Learning Approach for Crop Disease Prediction. In Proceedings of the 2023 IEEE International Conference on Smart Agriculture, Thessaloniki, Greece, 23–30 September 2023; IEEE: New York, NY, USA, 2023; pp. 9–15. [Google Scholar]
Kondaveeti, H.K.; Sai, G.B.; Athar, S.A.; Vatsavayi, V.K.; Mitra, A.; Ananthachari, P. Federated Learning for Smart Agriculture: Challenges and Opportunities. In Proceedings of the 2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), IEEE, Ballari, India, 26–27 April 2024; pp. 1–7. [Google Scholar]
Dembani, R.; Karvelas, I.; Akbar, N.A.; Rizou, S.; Tegolo, D.; Fountas, S. Agricultural data privacy and federated learning: A review of challenges and opportunities. Comput. Electron. Agric. 2025, 232, 110048. [Google Scholar] [CrossRef]
Thompson, J.; Lee, C.; Brown, T. Optimizing Agricultural Yield Predictions Using Hybrid Federated Learning Models Across Diverse Conditions. J. Agric. Eng. 2025, 40, 567–578. [Google Scholar]
Bernstein, D.J. Salsa20 and ChaCha: A fast and secure stream cipher. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems, Edinburgh, UK, 29 August–1 September 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 84–97. [Google Scholar]
Mohanta, S.K.; Bansal, A. A survey on IoT based smart agriculture. J. Ambient Intell. Humaniz. Comput. 2021, 12, 4157–4168. [Google Scholar]
Li, J.; Liu, J.; Zhang, W. Agriculture 4.0: A systematic literature review and future directions. Comput. Electron. Agric. 2022, 195, 106850. [Google Scholar]
Sharma, A.; Jain, A.; Gupta, P.; Chowdary, V. Machine learning applications for precision agriculture: A comprehensive review. IEEE Access 2020, 9, 4843–4873. [Google Scholar] [CrossRef]
Gonzalez, C.; Miquel, L. Metadata management in IoT: A review. J. Netw. Comput. Appl. 2021, 174, 102913. [Google Scholar]
Chen, Y.; Zhang, M. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2020, 178, 105780. [Google Scholar]
Lee, S.; Park, H. Efficient Data Processing Techniques for Big Data Analytics. In Proceedings of the 2021 IEEE International Conference on Big Data, Orlando, FL, USA, 15–18 December 2021; IEEE: New York, NY, USA, 2021; pp. 1234–1241. [Google Scholar]
Lima, J.A.; Ribeiro, L. Character encoding and its impact on data processing. Data Sci. Eng. 2020, 5, 97–104. [Google Scholar]
Zhang, X.; Li, Y. Numerical encoding methods for data analysis. J. Inf. Sci. 2021, 47, 621–634. [Google Scholar]
Chen, P.; Wang, L. Data scoring and its application in agricultural IoT. IEEE Trans. Agric. Eng. 2021, 38, 123–134. [Google Scholar]
Hassan, R.; Malik, M. Data types in IoT systems: Classification and analysis. Future Gener. Comput. Syst. 2021, 117, 169–182. [Google Scholar]
Yang, Y.; Zhao, X. Aggregation techniques for big data in agricultural applications. J. Agric. Inform. 2021, 12, 21–34. [Google Scholar]
Yang, Y.; Liu, L.; Yang, Q. Federated learning for smart agriculture: A review. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4746–4761. [Google Scholar]
Ma, X.; Zhu, J.; Lin, Z.; Chen, S.; Qin, Y. A state-of-the-art survey on solving non-iid data in federated learning. Future Gener. Comput. Syst. 2022, 135, 244–258. [Google Scholar] [CrossRef]
Yang, Y.; Chen, X. Task allocation in federated learning: A survey and future directions. IEEE Trans. Serv. Comput. 2021, 14, 1223–1237. [Google Scholar]
Zhang, Y.; Li, H. Enhancing federated learning through efficient resource allocation. IEEE Trans. Mob. Comput. 2021, 20, 1860–1873. [Google Scholar]
Zhang, Z.; Wang, J. Collaborative learning approaches for IoT devices in agriculture. Sensors 2021, 21, 4158. [Google Scholar]
Bhatia, R.; Kumar, R. Evaluation of machine learning models in agriculture: A comprehensive review. Artif. Intell. Rev. 2021, 54, 5691–5707. [Google Scholar]
Wang, X.; Zhang, R. Efficient data normalization techniques for IoT applications. J. Syst. Archit. 2020, 112, 101786. [Google Scholar]
Li, H.; Zhang, Y. Weighted metrics for evaluating IoT device performance in agriculture. J. Agric. Eng. Res. 2021, 185, 105148. [Google Scholar]
Yang, H.; Xu, R. Metrics for assessing IoT device performance in smart agriculture. Sensors 2021, 21, 812. [Google Scholar]
Verma, S.; Mehta, A. Smart Agriculture: Data Management and IoT Applications. In Proceedings of the International Conference on Agricultural Engineering and Technology, Berlin, Germany, 23–24 June 2021; pp. 45–56. [Google Scholar]
Chopde, N.R.; Nichat, M. Landmark based shortest path detection by using A* and Haversine formula. Int. J. Innov. Res. Comput. Commun. Eng. 2013, 1, 298–302. [Google Scholar]
Bonawitz, K.; Eichner, H.; Grieskamp, W.; Huba, D.; Ingerman, A.; Ivanov, V.; Kiddon, C.; Konečnỳ, J.; Mazzocchi, S.; McMahan, B.; et al. Towards federated learning at scale: System design. Proc. Mach. Learn. Syst. 2019, 1, 374–388. [Google Scholar]
Zhang, Y. Overcoming the Challenges of 5G for Smart Agriculture. IEEE Commun. Mag. 2020, 58, 34–40. [Google Scholar] [CrossRef]
Gupta, A.; Jha, R.K. A survey of 5G network: Architecture and emerging technologies. IEEE Access 2015, 3, 1206–1232. [Google Scholar] [CrossRef]
Khan, I. A Comprehensive Study on 5G Technology: Applications, Challenges, and Future Directions. IEEE Access 2021, 9, 112123–112146. [Google Scholar]
Raza, H. The Role of 5G in Smart Agriculture: A Review. Agric. Syst. 2021, 192, 103198. [Google Scholar]
Sharma, S. Enhancing User Experience in Federated Learning: The Role of Network Adaptability. J. Netw. Comput. Appl. 2023, 214, 103796. [Google Scholar]
Andrews, J.G.; Ghosh, A.; Muhamed, R. Fundamentals of WiMAX: Understanding Broadband Wireless Networking; Prentice Hall: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
Ericsson. Ericsson Mobility Report: June 2020. Ericsson 2020, 1. Available online: https://www.ericsson.com/en/mobility-report (accessed on 10 October 2024).
Lin, K. Challenges and Opportunities of 5G: An Overview. IEEE Wirel. Commun. 2019, 26, 2–8. [Google Scholar]
Rao, S. Network Congestion in Mobile Networks: Challenges and Solutions. J. Netw. Comput. Appl. 2018, 113, 43–56. [Google Scholar]
Zhao, X. A Neighbor Communication Strategy for Efficient Data Transmission in IoT. IEEE Internet Things J. 2019, 6, 8245–8253. [Google Scholar]
Wang, Y. Differential Data Transmission for Federated Learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 245–257. [Google Scholar]
Andrews, J.G. What Will 5G Be? IEEE J. Sel. Areas Commun. 2014, 32, 1065–1082. [Google Scholar] [CrossRef]
Ghosh, A. The Next Generation 4G Wireless Networks: Opportunities and Challenges. IEEE Commun. Mag. 2010, 48, 20–27. [Google Scholar]
Chen, M. 4G LTE Networks: Applications and Challenges. IEEE Access 2016, 4, 7336–7350. [Google Scholar]
Zhang, M. Key Technologies for 4G and 5G Wireless Networks. IEEE Wirel. Commun. 2017, 24, 4–6. [Google Scholar]
Jiang, Y. Signal Strength and Coverage in 4G Networks: A Review. Int. J. Wirel. Inf. Netw. 2021, 28, 129–142. [Google Scholar]
Cai, Y. Performance Evaluation of 4G Networks in Rural Areas: A Case Study. IEEE Trans. Mob. Comput. 2022, 21, 2923–2937. [Google Scholar]
Alsharif, M. Performance Analysis of 4G LTE Networks: A Survey. IEEE Access 2020, 8, 78758–78776. [Google Scholar]
Hussain, A. Interference Mitigation Techniques in 4G Networks. J. Commun. Netw. 2021, 23, 382–392. [Google Scholar]
Saha, S. Energy Consumption in 4G Networks: A Review. IEEE Trans. Green Commun. Netw. 2020, 4, 1022–1034. [Google Scholar]
Wang, H. Checkpointing Strategies in 4G Networks: A Review. Wirel. Netw. 2019, 25, 723–735. [Google Scholar]
Fang, Y. 5G Technology for Smart Agriculture: Opportunities and Challenges. Agric. Syst. 2021, 183, 102909. [Google Scholar]
Zhang, H.; Wang, J.; Liu, Y. A Survey on Federated Learning for Agriculture: Challenges and Future Directions. Comput. Electron. Agric. 2022, 195, 106817. [Google Scholar]
Alfawaz, A. Network Congestion in 5G: Challenges and Solutions. IEEE Access 2022, 10, 12034–12049. [Google Scholar]
Zheng, Q. The Impact of Physical Barriers on 5G Network Performance. IEEE Trans. Wirel. Commun. 2021, 20, 1234–1247. [Google Scholar]
Gonzalez, J. Energy Efficiency in 5G Networks: Challenges and Solutions. Int. J. Netw. Manag. 2020, 30, e2109. [Google Scholar]
Mishra, A. Energy Management in 5G Networks: A Comprehensive Review. IEEE Access 2021, 9, 101234–101245. [Google Scholar]
Kumar, V. Managing Congestion in 5G Networks: Strategies and Solutions. IEEE Trans. Emerg. Top. Comput. 2023, 11, 245–258. [Google Scholar]
Li, Y. Multi-Access Edge Computing in 5G: A Survey. IEEE Commun. Surv. Tutor. 2020, 22, 1035–1062. [Google Scholar]
Zhao, H. Network Slicing for 5G: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2022, 24, 724–748. [Google Scholar]
Lahitani, A.R.; Permanasari, A.E.; Setiawan, N.A. Cosine similarity to determine similarity measure: Study case in online essay assessment. In Proceedings of the 2016 4th International Conference on Cyber and IT Service Management, Bandung, Indonesia, 26–27 April 2016; IEEE: New York, NY, USA, 2016; pp. 1–6. [Google Scholar]
Xia, P.; Zhang, L.; Li, F. Learning similarity with cosine similarity ensemble. Inf. Sci. 2015, 307, 39–52. [Google Scholar] [CrossRef]
Sohan, M.F.; Basalamah, A. A systematic review on federated learning in medical image analysis. IEEE Access 2023, 11, 28628–28644. [Google Scholar] [CrossRef]
Li, L. Model update scheduling for federated learning. IEEE Trans. Mob. Comput. 2021, 20, 1860–1873. [Google Scholar]
Yang, H. Robustness of federated learning against Byzantine attacks: A survey. IEEE Internet Things J. 2021, 8, 6574–6591. [Google Scholar]
Yang, Y. Health monitoring in federated learning systems. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4782–4795. [Google Scholar]
Chen, P. Monitoring the health status of federated learning systems. IEEE Trans. Serv. Comput. 2020, 14, 919–932. [Google Scholar]
Smith, J. Agricultural Applications of Sensors in Precision Farming. J. Agric. Sci. 2022, 10, 45–56. [Google Scholar]
USDA National Agricultural Statistics Service. Agricultural Science. 2023. Available online: https://www.nass.usda.gov/ (accessed on 3 October 2024).
Lee, K. Big Data in Agriculture: Opportunities and Challenges. Comput. Electron. Agric. 2021, 180, 105–123. [Google Scholar]

Figure 1. The system architecture depicts the components of our framework, including driver and client nodes, their communication with the global server, and emphasizes the critical role of local aggregations and checkpoints in cluster operations.

Figure 2. The diagram above provides a detailed overview of the workflow for the proposed self-regulating heterogeneous federated learning system.

Figure 3. The figure illustrates the global server’s cluster formation from the dispatch data of tractors or IoT devices, with additional details in the system architecture section. Here different color circles represent different model predictions.

Figure 4. The proposed architecture highlights driver and client nodes, their communication with the global server, and different training models represented by shapes within the cluster.

Figure 5. The figure illustrates the tractor’s location and the calculation of the smart location using the Network Quality Index, recommending optimal sites to enhance signal reception for farmers.

Figure 6. The figure above illustrates the process of dispatching the data array sent by the global server for initial cluster formation. Each index contains the relevant values necessary for system operations.

Figure 7. This comparison details global processing latencies between existing federated learning systems and our framework, highlighting performance differences with and without checkpoints, and providing insights into how checkpoints influence latency and enhance learning efficiency.

Figure 8. This study compares the accuracy levels of our system and the existing system across eight tractors over a 30-epoch training phase, showing that our system achieves comparable accuracy while ensuring energy-efficient communication.

Table 1. An overview of the health status of driver nodes as monitored by the global server.

Driver Nodes	Last Update	Health Status
	Timestamp	Status
$D r i v e r_{1}$	1426349234842	Alive
$D r i v e r_{1}$	1426349294342	Alive
$D r i v e r_{1}$	1426349294812	Dead
$D r i v e r_{1}$	1426349294452	Alive
$D r i v e r_{2}$	1426349296642	Dead
$D r i v e r_{2}$	1426349245842	Alive
$D r i v e r_{2}$	1426349296642	Dead
$D r i v e r_{2}$	1426349245842	Alive

Table 2. The table summarizes the performance metrics of individual driver nodes, including accuracy, F1 score, precision, and recall after 30 training epochs, providing insights into their effectiveness in contributing to overall system performance.

Run Name	Accuracy		F1 Score		Precision		Recall
Run Name	Train	Test	Train	Test	Train	Test	Train	Test
Tractor 1	0.8657	0.855054	0.8476	0.7192	0.8424	0.8021	0.8529	0.7394
Tractor 2	0.8116	0.808122	0.7816	0.6548	0.7779	0.8517	0.7852	0.6579
Tractor 3	0.8067	0.847833	0.7772	0.7086	0.7666	0.8957	0.7881	0.8219
Tractor 4	0.8416	0.8261	0.8180	0.6850	0.8133	0.8666	0.8235	0.8043
Tractor 5	0.8176	0.833393	0.7877	0.6868	0.7902	0.8868	0.7852	0.8868
Tractor 6	0.80679	0.86227	0.7787	0.7239	0.7612	0.8172	0.7970	0.83070
Tractor 7	0.81281	0.86227	0.7843	0.72391	0.7747	0.8172	0.7941	0.83070
Tractor 8	0.83580	0.81534	0.70683	0.68895	0.6566	0.8314	0.7647	0.85701
Average	0.8205	0.8262	0.7839	0.7032	0.7750	0.8400	0.7931	0.7868

Table 3. This table provides comprehensive metrics for crop health sensors used in tractor training over 30 epochs, including key indicators such as accuracy, F1 score, precision, and recall, which collectively reflect their effectiveness in assessing crop health.

	Crop Health Sensor
Runs	Accuracy	F1 Score	Precision	Recall	R-AUC
Round 1	0.496667	0.375518	0.327651	0.496667	0.981201
Round 5	0.530000	0.420684	0.381193	0.530000	0.991298
Round 10	0.583333	0.483966	0.450563	0.583333	0.969821
Round 15	0.683333	0.601602	0.568000	0.683333	0.978972
Round 20	0.796667	0.725252	0.690074	0.796667	0.971291
Round 25	0.903333	0.859886	0.835349	0.903333	0.971201
Round 30	0.903333	0.859886	0.835349	0.903333	0.996911

Table 4. This table presents the performance metrics of nutrient sensors used in tractor training, capturing key indicators that assess their effectiveness and accuracy in monitoring soil nutrient levels.

	Nutrient Sensors
Runs	Accuracy	F1 Score	Precision	Recall	R-AUC
Round 1	0.616667	0.520355	0.482679	0.616667	0.980191
Round 5	0.630000	0.549155	0.518471	0.630000	0.999821
Round 10	0.643333	0.554141	0.519656	0.643333	0.961098
Round 15	0.783333	0.705188	0.665296	0.783333	0.971291
Round 20	0.816667	0.748362	0.710571	0.816667	0.951091
Round 25	0.897333	0.892952	0.878691	0.910930	0.971291
Round 30	0.918972	0.911952	0.891291	0.939830	0.982133

Table 5. This table summarizes metrics collected over 30 epochs for various tractors, detailing occurrences and latencies of 3G, 4G, and 5G networks, highlighting their impact on communication efficiency and latency during training.

Nodes	3rd Generation		4th Generation		5th Generation
Run Name	Occurrence	Latency	Occurrence	Latency	Occurrence	Latency
Tractor 1	1	31.24567	8	14.58743	21	0.34567
Tractor 2	0	35.67892	9	15.73456	21	0.78901
Tractor 3	4	32.98754	7	16.21897	19	0.45678
Tractor 4	1	36.41230	9	14.95123	20	0.98765
Tractor 5	3	33.13456	7	15.67890	20	0.23456
Tractor 6	1	30.98765	9	16.84512	20	1.03456
Tractor 7	5	31.76543	7	14.34567	18	0.54321
Tractor 8	0	34.23456	9	16.45678	21	0
Average	1.875	33.45061	8.125	14.33982	20	0.41961

Table 6. The table compares processing latency metrics at the global server level, showing the time to aggregate model weights and return them to edge nodes for nutrient and crop health sensors.

Epochs	Nutrient Sensor		Crop Health Sensor
	Processing Latency		Processing Latency
Runs	w/o Check	w/ check	w/o Check	w/ check
Round 1	2.1431 min	2.1431 min	0.41 min	0.41 min
Round 5	2.9131 min	2.9131 min	1.0731 min	1.0731 min
Round 10	2.5931 min	2.2031 min	0.49 min	0.34 min
Round 15	2.3631 min	2.1531 min	0.2869 min	0.1969 min
Round 20	2.9991 min	1.3311 min	0.36 min	0.21 min
Round 25	2.5131 min	1.1121 min	0.45 min	0.1469 min
Round 30	2.6431 min	1.3631 min	0.39 min	0.1069 min

Table 7. The metrics analyze tractor communications under varying network conditions, highlighting model latencies, updates related to crop drops, and achieved energy efficiencies.

Communication Statistics			Comparing with Fed Learning
Run Name	Peer–Peer	With Driver	Model Latency	Cost Drop
Tractor 1	9	21	99.23145	60.48234
Tractor 2	9	21	99.66457	68.91567
Tractor 3	11	19	99.47532	65.23145
Tractor 4	10	20	99.81234	70.12984
Tractor 5	10	20	99.35011	62.78456
Tractor 6	10	20	99.74589	67.54321
Tractor 7	12	18	99.61428	61.09876
Tractor 8	9	21	99.21576	69.67890
Average	10	20	99.63821	60.85821

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Puppala, S.; Sinha, K. Towards Secure and Efficient Farming Using Self-Regulating Heterogeneous Federated Learning in Dynamic Network Conditions. Agriculture 2025, 15, 934. https://doi.org/10.3390/agriculture15090934

AMA Style

Puppala S, Sinha K. Towards Secure and Efficient Farming Using Self-Regulating Heterogeneous Federated Learning in Dynamic Network Conditions. Agriculture. 2025; 15(9):934. https://doi.org/10.3390/agriculture15090934

Chicago/Turabian Style

Puppala, Sai, and Koushik Sinha. 2025. "Towards Secure and Efficient Farming Using Self-Regulating Heterogeneous Federated Learning in Dynamic Network Conditions" Agriculture 15, no. 9: 934. https://doi.org/10.3390/agriculture15090934

APA Style

Puppala, S., & Sinha, K. (2025). Towards Secure and Efficient Farming Using Self-Regulating Heterogeneous Federated Learning in Dynamic Network Conditions. Agriculture, 15(9), 934. https://doi.org/10.3390/agriculture15090934

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Secure and Efficient Farming Using Self-Regulating Heterogeneous Federated Learning in Dynamic Network Conditions

Abstract

1. Introduction

2. Related Works

3. Motivation

4. Proposed Research Focus

4.1. Data Privacy

4.2. Data Security

4.3. Network-Related Issues

4.4. Operational Efficiency

4.5. Resource Utilization

5. System Architecture

5.1. System Architecture Components

5.1.1. Edge Devices (Tractors)

5.1.2. Local Clusters

5.1.3. Global Server

5.1.4. Data Transmission Mechanisms

5.1.5. Dynamic Network Adaptation

5.2. Initial Cluster Formation

5.2.1. Alpha Schema-Based Scoring

5.2.2. Consumption Ability

5.2.3. Model Communications

5.3. Global Server Operations

5.3.1. Proximity Assessment

5.3.2. Driver Selection

5.3.3. Federated Learning

5.3.4. Orphan Equipment

5.4. Operations at Equipment Level

5.4.1. The Training Phase

5.4.2. The Dispatch Phase

5.4.3. The Network Phase

5.4.4. The Receiving Phase

5.5. Evaluations at Equipment Level

5.5.1. Checkpointing or Model Evaluation

5.5.2. Proximity Evaluations

5.6. Health Verifications

5.6.1. Driver Termination Scenario

5.6.2. Driver Status Verification

5.6.3. New Driver Node Election

5.7. Secure Data Transmission

6. Experimental Results

6.1. Cluster Formation

6.2. Equipment Operations

6.3. Dynamic Networks

6.4. Performance Metrics

7. Discussion

7.1. System Performance Under Extreme Network Outages

7.2. Scalability of the Proposed Heterogeneous FL Framework

7.3. Failure Recovery Mechanisms During Node Failures

7.4. Performance Deviations Due to Seasonal and Soil Variations

8. Future Work

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI