1. Introduction
According to a United Nations (UN) report (
https://www.un.org/development/desa/en/news/population/world-population-prospects-2019.html, accessed on 12 December 2023), the global population is expected to grow to almost 10 billion people by 2050. Considering this estimation, the World Resources Institute (WRI) calculates that, to keep up with this demand, food production will have to be increased by at least 60% in the same period [
1]. However, to cope with that need, the agricultural land that would be needed exceeds by far the land available as of today. This is a concept referred to by the WRI as
land gap, and it has towering climate implications: using that much land for agricultural purposes would destroy vital ecosystems, which in turn would contribute even more to climate change, with food production being nowadays already responsible for almost 25% of global greenhouse gas emissions [
1].
In recent years, many lines of research have started to explore the possibility of solving the land gap problem via the application of cutting-edge technologies, making agriculture itself evolve toward a new era, which has already been called, amongst others,
smart farming, precision agriculture, or
Agriculture 4.0 [
2]. This field of study has gained a lot of momentum in the last few years since it is considered one of the key contributors toward the UN’s Sustainable Development Goals (
https://sdgs.un.org/goals, accessed on 12 December 2023) introduced in 2015 in the 2030 Agenda for Sustainable Development (
https://sdgs.un.org/2030agenda, accessed on 12 December 2023).
Undoubtedly, Artificial Intelligence (AI) is a pivotal technology in the realm of smart agriculture, as noted in the works [
3,
4]. Its integration has revolutionized this field, opening up new avenues to enhance both the quantity and quality of crop yields, as well as to automate processes. This paves the way for intelligent, autonomous systems that can learn and make informed decisions. For instance, the study in [
5] highlights how AI can reduce chemical use by up to 90%, showcasing its efficacy in optimizing agricultural production processes, leading to more efficient farming, increased productivity, and reduced environmental impact.
In addition, Shankar et al. (2020) [
6] present insights on how AI-driven strategies can refine crop protection, bolstering sustainable agriculture. This research demonstrates the significant improvements AI can bring to crop management strategies. Further exploring AI’s role in environmentally conscious agriculture, the authors in [
7] review precision chemical weed management strategies and propose a new CNN-based modular spot sprayer. This innovation is a testament to how AI can be applied to develop more precise and efficient weed control solutions, reducing the overall chemical footprint in farming.
Furthermore, the work carried out in [
8] delves into how AI can alleviate environmental challenges posed by agriculture. It provides a comprehensive look at how AI can be implemented for more efficient crop production and monitoring while minimizing ecological footprints. Complementing these findings, Visentin et al. (2023) [
9] investigate a mixed-autonomous robotic platform for precise weed removal in both intra-row and inter-row settings. This development underscores the role of AI in precision agriculture, illustrating how robotic systems can be specialized for tasks like exact weed control, which helps in reducing chemical herbicide reliance and supports sustainable agriculture.
Additionally, the deployment of collaborative smart robots, as detailed in the work [
10], represents a significant advancement. In this case, a group of robots leverages AI to optimize harvesting routes, thereby boosting crop collection volumes. This not only exemplifies the increasing autonomy in agricultural systems but also their efficiency and ecological responsibility.
The integration of AI in agriculture, as mentioned, seeks not only to enhance productivity, but also to ensure the welfare and efficiency of both the machinery and the workforce, emphasizing the potential of AI in aiding human workers rather than substituting them. Implementing AI in conjunction with UGVs, for example, to assist workers in optimizing fruit harvesting, or to accurately distribute phytosanitary products, highlights how well humans and machines can work together. Such synergies not only optimize agricultural processes but also ensure the protection of the environment and the sustainability of resources. This approach seeks a balance, where technology serves human efforts rather than replacing them, ensuring that the insights and expertise of human intervention remain an integral part of the process.
However, this inclusion comes with its own share of challenges to overcome before starting to consider its extensive adoption within the agricultural domain.
The enormous development and extension of AI in recent years is undeniable [
11,
12,
13]. It has spread so much that it has become a revolution, being applied in almost every sector imaginable: embedded finances [
14], business value [
15], transport management [
16], medicine [
17], industry 4.0 [
18], and, of course, agriculture [
19,
20,
21].
Precisely, agriculture is one of the domains where AI is gaining much importance. The automation of processes through Machine Learning (ML) or Deep Learning (DL) based models is expected to allow for the substitution of humans by machines (UGVs, UAVs), to perform repetitive and costly tasks. This, in turn, is expected to increase the performance and efficiency of the task at hand. However, the application of AI in production or operational environments in general, and in the agriculture domain in particular, still faces many challenges. One of the main open issues in AI right now is not so much the creation of AI models themselves, but their deployment in production environments, their maintenance throughout their entire life cycles, and the management of the huge datasets that are usually involved. Rapid changes in models and data need continuous updates in production systems. Asset management, including model versions and data, should be autonomous and optimal as much as possible. The true challenge of AI integration lies in adapting to increasingly rapid changes, optimizing that integration for real-world scenarios, and maintaining an organized workflow to implement these measurements. Right now, there is not a single solution for this, but rather a plethora of tools that AI practitioners such as data scientists or ML engineers need to master before even getting to use them, let alone considering their usage in industrial environments. In the current environment, it is up to the data scientist/ML engineer to study the different tools and assess which ones are most suitable to build the whole workflow/model life cycle. This makes the learning curve a steep one and hinders the acceptance of AI in environments where it creates complete disruption, generating distrust in a technology that is seen as a black box, such as agriculture.
To address these issues, a new paradigm known as Machine Learning Operations (MLOps) has emerged. Its objectives are twofold: (1) automating the process of building ML models and deploying them to production; and (2) maintaining and monitoring these models throughout their whole life cycle to detect potential issues which could compromise the AI model’s performance and automate a response [
22,
23,
24], increasing the efficiency and scalability, as well as reducing the potential risks. By embracing MLOps culture, developers unlock the advantages of optimized workflows, automated AI model deployment, and effective collaboration, leading to increased productivity and robustness, faster development cycles, and better performance of their AI models.
Therefore, the objective of this paper is to provide a solution for some of the aforementioned open challenges. We propose an open-source AI architecture based on the MLOps paradigm to reduce the complexity of developing and deploying AI models in agricultural contexts. The proposed architecture seeks to improve upon the state-of-the-art MLOps methods by implementing a functional and tested architecture that is used by several AI stakeholders. This solution aims to (1) minimize the learning curve associated with managing AI models without a centralized MLOps platform and (2) promote the acceptance of AI in agriculture by presenting an integrating approach to develop and deploy AI models, store datasets, and even gather data from different sources of information. It supports state-of-the-art IoT communication protocols such as Message Queuing Telemetry Transport (MQTT) [
25] and Hypertext Transfer Protocol (HTTP) [
26].
Hence, the main contributions of this paper are the following:
An AI architecture using open-source technologies for creating and producing AI models is presented, covering the whole life cycle of the AI model, from its creation to its deployment and monitoring.
The architecture builds a workflow made of state-of-the-art tools that enable data scientists and ML engineers to work more efficiently and rapidly, solving many problems in their day-to-day lives.
The architecture supports the access through different types of IoT protocols, such as HTTP and MQTT, enabling ease of access and communication with diverse devices.
The system is able to run different AI models at the same time, making optimal use of the hardware resources available in the cluster where the platform has been deployed.
The rest of the document is organized as follows. First, in
Section 2.1, the related work is analyzed and presented. Then, the proposed architecture with its different components, which is the main contribution of this paper, is described in
Section 2.2. Insights gathered from key stakeholders who have already been exposed to the platform are summarized in
Section 3. Finally, the main conclusions are extracted in
Section 5.
3. Results
The following section presents the results and findings obtained from the evaluation of the platform presented in this paper.
Firstly, we have demonstrated practical applications of the graphical user interface, specifically designed to facilitate the accessibility of inference systems for individuals with non-technical backgrounds.
Secondly, a thorough analysis was conducted to assess the performance, usability, and effectiveness of the proposed architecture. As part of this process, potential users of the platform have been surveyed regarding the usage of this solution. This evaluation aimed to validate the platform’s capabilities and its potential impact in addressing the challenges outlined in
Section 1. The results provide valuable insights into the platform’s qualitative assessments, comparative analysis against existing solutions, scalability and efficiency, as well as real-world use cases. Additionally, limitations identified during the evaluation are discussed, and potential future directions for further improvement are suggested.
Finally, the platform’s evaluation of its position within the broader scope of MLOps must be considered. This is particularly important as MLOps is rapidly evolving and becoming more prominent in fields such as agriculture. As a recent research study shows, MLOps primarily aims to streamline AI operations, improve collaboration, and facilitate the transition from AI model design to deployment [
93]. The survey data, especially from AI professionals, supports the platform’s ability to effectively bridge the gap between data scientists and ML engineers, a sentiment also highlighted in [
94].
The section is organized as follows: first,
Section 3.1 presents the designed GUI for end-users to interact in a visual and simple way with inference systems. Second,
Section 3.2 presents a summary of how different stakeholders can use the proposed architecture; then,
Section 3.3 outlines the contents of the survey conducted, together with the insights extracted from them.
3.1. Web Platform
In this work, a Graphical User Interface (GUI) (
Figure 6) has been developed for individuals who lack a technical background in making queries to REST APIs or in deploying AI models. Therefore, in the GUI, four inference models have been deployed for use by Agricultural Engineers. In addition, this GUI is public [
76] via internet.
In this study, four distinct Artificial Intelligence models have been deployed for specific agricultural applications. These models represent cutting-edge integrations of AI in the realm of agriculture, addressing diverse challenges faced by the industry.
Firstly, a model dedicated to the detection of tractors utilizing computer vision techniques has been implemented (top left corner in
Figure 6). This model exemplifies the application of image recognition technologies in agricultural settings, enabling enhanced monitoring and management of farming equipment.
Secondly, the project’s focus extends to the precise detection of Botrytis, a significant fungal disease affecting various crops (top right corner in
Figure 6). Leveraging AI, a model that facilitates targeted spraying through advanced detection methods has been developed. This approach allows for precise application of fungicides, optimizing resource use and minimizing environmental impact.
Thirdly, pest management is addressed via the deployment of an AI model capable of detecting and counting insects in traps (bottom left corner in
Figure 6). This model facilitates the monitoring process, providing accurate, real-time data that is crucial for effective pest control strategies.
Finally, a model for the detection and quantification of weeds has been introduced (bottom right corner in
Figure 6). This model aids in the identification of unwanted flora, a key task for crop management and yield optimization. By accurately recognizing and counting weed species, this model supports more effective and environmentally conscious weed control practices.
Collectively, these AI models demonstrate the potential of Artificial Intelligence to revolutionize various aspects of agricultural operations, offering innovative solutions to longstanding challenges in the field.
3.2. How to Use the Platform
As introduced in
Section 2.2, one of the main motivations of the work presented in this paper is to provide a comprehensive architecture that eases the collaboration between two clear and distinct professional profiles that have arisen in these last years: data scientists and ML engineers. Without this architecture, the collaboration between these roles is not efficient, which results in limited job parallelization. Typically, the ML engineer waits for the data scientist to finalize the model’s preparation and tuning. Then, the ML engineer must build the production infrastructure from scratch. If issues arise, adjustments by the data scientist are followed by reintegration by the ML engineer. These development and deployment processes are lengthier due to the required coordination between both roles, as depicted in
Figure 7a.
The architecture proposed in this paper (
Figure 4) provides an ideal solution for this problem, empowering each role to excel in their respective stages. The proposed method delivers results in significantly shorter times compared to the traditional development methodology described above. In this process, collaboration between the data scientist and the ML engineer is crucial, defining the AI solution before development begins.
Figure 7b illustrates an initial “agreement” step, outlining the pipeline components and their respective inputs and outputs.
Once the data scientist is clear about their role and understands the ML engineer’s procedure, they can start designing AI solutions using Jupyter notebooks provided by Kubeflow. They also have the option to use Katib for hyperparameter tuning studies. Concurrently, to expedite the process, the ML engineer begins designing pipeline components, such as data collection and AI model deployment in KServe. Since this stage can also be executed with Kubeflow’s Jupyter notebooks, the ML engineer can integrate preliminary versions of the model into the full pipeline.
Upon the data scientist’s model design completion, the ML engineer integrates the model, gathering the necessary weights for its pipeline implementation. During this phase, the ML engineer can also adjust the model’s input protocol, incorporating data from HTTP or MQTT. Thus, the ML engineer prepares the AI model for actual production deployment.
The outcome is a comprehensive and efficient pipeline covering the entire process of development, testing, and deployment of an AI model.
Finally, the proposed architecture allows data scientists and ML engineers to track the performance of the deployed model in real time by using the same platform. They can collect relevant metrics, monitor the model’s behavior, and make iterative improvements based on the feedback received. This feedback loop ensures continuous enhancement and adaptation of the model to changing requirements.
3.3. Conducted Interviews
To assess the utility and effectiveness of the proposed platform, interviews were conducted with six data scientists and five ML engineers who are involved in diverse agricultural field activities. The interviewees possessed mid-level seniority and have been working in the machine learning field for the last 3–5 years, giving a balanced perspective of experience and current industry practices to the study. Therefore, it has been determined that the responses provided by the respondents carry equal weight, as they all possess the same level of expertise. In
Section 3.3.1, the contents and structure of the survey are presented. Afterwards, in
Section 3.3.2, the results obtained during the surveys are presented in a pointed format from 1 to 5. Additionally, metrics such as the mean and standard deviation have also been extracted to statistically comprehend the results.
3.3.1. Contents and Structure of the Survey
To evaluate the usability and adaptability of the platform in general, the surveyed professionals were invited to implement the platform in collaborative projects, aiming to understand its adaptability and performance in a real-world setting. One of the first aspects examined was the learning curve, measuring the time it took for professionals to become comfortable using the platform’s tools and architecture in an actual project.
The choice to center this evaluation on interviews with AI professionals arises from the intention to ensure that the proposed platform addresses tangible challenges in real-world settings. The insights gained from these interviews offer a comprehensive view of the platform’s capabilities. This emphasizes its effectiveness in closing communication and operational gaps between data scientists and ML engineers [
93]. This human-centric evaluation methodology highlights the commitment to utility and adaptability.
Additionally, special attention was given to understanding how the platform would impact the team dynamics. Factors such as collaboration and efficient task division were evaluated to determine whether the platform facilitates or complicates these processes. This is crucial, as any tool, no matter how advanced, must effectively integrate into a team’s workflow to be genuinely useful.
Another area of focus was the interaction between different professional profiles. The aim was to assess whether the platform simplifies communication and collaboration between data scientists and ML engineers, who often need to coordinate closely but from different technical perspectives.
The study also explored the transition of projects from development to production. This is key to understanding the platform’s versatility and its applicability at different stages of a project’s lifecycle.
In terms of data management, the user experience with the platform’s storage system (MinIO) was examined, especially concerning data handling and sharing. Lastly, the ease with which knowledge of the platform can be transferred to new users was evaluated, an aspect for the long-term sustainability of projects.
In summary, the evaluation aimed to show how well the platform meets the needs and solves the challenges faced by both data scientists and ML engineers.
3.3.2. Metrics
For a better understanding of the results and system distribution, in addition to the theoretical explanation, each interviewee was asked to provide a rating indicating the level of difficulty, ranging from 1 (most negative) to 5 (most positive). Each aspect mentioned in the previous section is considered one of the evaluated criteria.
These ratings have been accumulated and organized into tables, where the "criterion" column displays the corresponding question, and the remaining table values store the accumulative scores given by the respondents.
Furthermore, to provide more context and understand the statistical variation in these values, it has been decided to calculate the mean and standard deviation. Additionally, it has been assumed that the distribution is normal and confidence intervals have been constructed around the mean, where 70% aligns with the criterion [].
In Equations (
1) and (
2), the value
n represents the number of elements in the dataset, and
represents the individual values in the dataset.
Following
Table 1, the majority of the scores assigned to the data scientist team are above 3. This suggests that the infrastructure is accessible to data scientist profiles. As observed in
Table 2, the statistics for all criteria, specifically the mean, are above 4 out of 5, with a standard deviation of ±1. This demonstrates that the evaluated individuals feel comfortable and at ease with all the assessed criteria.
In
Table 3, the data reveals a noteworthy trend, with the majority of the scores awarded to the Machine Learning engineer team surpassing the 3-point mark, affirming the accessibility of the infrastructure for professionals specializing in Machine Learning. Based on the statistical analysis presented in
Table 4, each criterion, particularly the mean scores, consistently register above the 4-point threshold out of 5. Accompanied by a standard deviation range of ±1, these findings affirm the participants’ pronounced comfort and proficiency across the spectrum of evaluated criteria.
4. Discussion
A summary of the key findings and conclusions extracted from the surveys conducted with the professionals aforementioned is provided in
Table 5. In the following, the contents of the table are going to be analyzed in detail.
Both data scientists and ML engineers encountered a somehow steep initial learning curve when starting with MLOps tools, particularly with Kubeflow. However, this challenge was generally overcome after an initial period, leading to a more intuitive and user-friendly experience with the platform’s tools. It is important to mention that prior knowledge in related technologies, such as Kubernetes, was cited as beneficial for easing the learning process.
In terms of collaboration and teamwork, the platform was universally seen as a facilitator. However, this benefit was not without its conditions. Both roles emphasized the need for a well-organized team and clear methodologies to fully leverage the platform’s capabilities. The structure of the platform allows for the division of tasks and parallel work streams, which can speed up project timelines and make the workflow more efficient. However, this requires the team to be well coordinated and possibly adhere to agile methodologies.
Data management was another common area of agreement. The centralized data storage capabilities provided by MinIO were highly valued by both data scientists and ML engineers. This feature not only simplifies data management but also enhances collaboration by ensuring that all team members have access to the latest versions of datasets.
When it comes to the handover of projects and the sharing of resources, the experiences were generally positive. The platform’s structure and the use of code for defining pipelines make it easier to transfer work between team members. However, the time required for handover could increase if the incoming member is not familiar with the platform’s technologies. Despite this, once the handover is complete, sharing data, models, and pipelines within the platform is relatively straightforward.
In conclusion, both data scientists and ML engineers find significant value in using MLOps tools, despite facing different challenges and benefits. The common themes that emerged from the study include the importance of documentation, the initial learning curve, and the benefits of centralized data storage.