1. Introduction and Related Work
Global warming and drought have rendered safe drinking water inaccessible for an estimated 4 billion individuals, constituting roughly two-thirds of the worldwide population [
1]. Concurrently, population growth necessitates increased agricultural activity to satisfy burgeoning food demands, thereby exacerbating the depletion of water reserves. Previous contributions [
2] provide an understanding of the changes in the water footprint for production and consumption for possible future scenarios by region.
Eutrophication is a principal contributor to the contamination of freshwater lakes and reservoirs. This phenomenon results from the proliferation of factors essential for photosynthesis, such as sunlight, carbon dioxide, and nutrients [
3]. Natural eutrophication gradually transpires as aquatic ecosystems age and sediments accumulate. Nonetheless, the runoff from fertilizers, primarily due to intensive farming, hastens and intensifies this process.
Amidst the climate crisis and eutrophication of water bodies, there has been a noticeable uptick in the intensity and frequency of harmful algal and cyanobacterial blooms (HACBs) occurrences in recent decades. Notably, cyanobacterial blooms pose significant threats to public health and the environment. Cyanobacteria, a group of oxygen-producing bacteria, harness sunlight to metabolize carbon dioxide (CO
2) into organic matter [
4]. These bacteria emerged over 3 billion years ago and catalyzed the oxygenation of Earth’s atmosphere—a pivotal, transformative event [
5]. Despite being commonly referred to as blue–green algae, they are not true algae; this term is reserved for eukaryotic photosynthetic organisms.
Figure 1 displays a recent HACB in an inland water reservoir.
The peril extends beyond aquatic environments, as extracellular byproducts from HACBs have been detected in distant water and atmospheric samples and regions far from the coastline [
6]. Addressing such incidents may necessitate substantial financial resources, mainly when environmental damage is extensive, thus constraining the available response options. This challenge has instigated a quest for systems adept at monitoring and mitigating environmental risks in aquatic settings.
Traditionally, since the 20th century, assessing water body health and HACBs prediction relied upon sample collection by specialists or automatic instruments positioned at set locations [
7]. These samples would subsequently undergo laboratory analysis. Such methods demand considerable effort, time, and financial investment. Additionally, the procedural inefficiency and delayed analytical results hamper comprehensive study and impede the timely actions of entities responsible for water supplies and recreational water usage.
To safeguard the environment, developing new cost-effective early warning systems (EWSs) to track and forecast HACBs is imperative.
According to the United Nations Environment Programme, early warning is “the provision of timely and effective information, through identified institutions, that allows individuals exposed to hazard to take action to avoid or reduce their risk and prepare for effective response”. Among the main features of an EWS are [
8]: risk knowledge, where risk assessment is performed to prioritize the mitigation and prevention strategies; monitoring and prediction, composed of systems with monitoring and prediction capabilities to make timely risk estimates; information dissemination, for which communication systems are necessary to transmit reliable, synthetic, and simple warning messages to the authorities and the public; and response, composed of coordination and good governance, and appropriate action plans.
These EWSs must be capable of autonomous water analyses: processing informative data through modeling and simulation (M&S) techniques to predict the temporal and spatial dynamics of HACBs. For instance, machine learning models have proven effective for simulating HACBs in water systems [
9]. It can be seen from previous works how obtaining knowledge from complex aquatic environment data sources is a challenge that requires extensive software engineering knowledge [
10].
Recent advancements in environmental monitoring have led to the development of various predictive models and monitoring techniques aimed at addressing the challenges posed by HACBs (see [
11]). However, despite these efforts, several open problems persist in the field. One of the primary issues is the lack of real-time data acquisition and processing capabilities that can provide immediate insights into the water quality and the presence of harmful algal blooms. Traditional methods often involve time-consuming sample collection and analysis, resulting in delays that can hinder prompt response and mitigation efforts. Furthermore, the integration of heterogeneous data sources, such as satellite imagery, in situ sensor networks, and meteorological data, remains a challenge. The effective fusion of these data streams is crucial for developing accurate and comprehensive predictive models. Additionally, there is a need for more sophisticated early warning systems that not only detect the presence of HACBs but also predict their movement and growth dynamics, enabling preemptive actions to be taken to protect public health and the environment [
12].
The development of such a sophisticated EWS is a complex and multidisciplinary undertaking. This is why it requires collaboration between specialized teams focused on the parallel creation of the different subsystems of the system:
HACB modeling: modeling of water bodies and inference of HACB biodynamics, with a segment of the team comprising biologists [
12];
Unmanned surface vehicles (USVs): guidance, navigation, and control (GNC) of USVs [
13] and sensor instrumentation.
Software and Internet of Things (IoT): software development and system simulation and integration [
14].
Our solution addresses these challenges by introducing a novel M&S-driven EWS integrative model that leverages the power of IoT architecture [
15] to facilitate real-time data acquisition and processing. By employing autonomous boats equipped with sensors, our system can gather high-resolution data on water quality parameters that are essential for the detection and prediction of HACBs. The integration of cloud, fog, and edge computing layers enables the efficient processing and fusion of diverse data sources, providing a comprehensive view of the aquatic environment. Our predictive algorithms, situated within the fog layer, are designed to offer timely predictions of algal bloom dynamics, thus addressing the critical need for advanced early warning capabilities. This article outlines a methodology for managing the intricate process of developing such an EWS. Modular and hierarchical formalism, vital for modeling and simulating such complex systems, is instrumental during development. Throughout this process, the Discrete Event System Specification (DEVS) formalism has been employed as a modeling technique that facilitates transitioning through the stages of development.
One of the main contributions of this paper is the detailed approach to subsystem verification and validation using the DEVS formalism. This methodology is particularly advantageous in the development of complex systems, such as the automated inland water monitoring system presented here. By leveraging DEVS, we can systematically replace virtual components with their real-world counterparts, enabling a gradual transformation from a virtual model to a fully operational real system. This approach not only saves costs by identifying and addressing potential issues early in the development process but also significantly reduces risks associated with deploying untested components in critical environments.
The ability to validate individual subsystems in isolation, and later in conjunction with other components, ensures a high level of confidence in the system’s reliability and performance before it is fully deployed. This paper demonstrates the practical application of DEVS for achieving a structured and efficient transition from model to reality in the development of an early warning system for HACBs.
The subsequent content of this article is organized as follows: initially, the model-based development and DEVS formalism utilized for problem modeling is detailed; then, the description of the system is provided, followed by an exposition of the developmental stages: design, implementation, integration, verification, and validation; finally, the conclusions are shown.
2. Model-Based Development
Model-based system engineering (MBSE) is an approach that uses models to design, verify, and validate complex systems [
16]. This method has become increasingly popular in systems engineering as it provides a visual and formal representation of systems: facilitating understanding and communication between the various stakeholders involved in developing and maintaining a system.
Unlike industrial projects, research projects usually do not have commercial solutions for some of the subsystems, and these subsystems will not be available until the final stages. Hence, it is necessary to advance in the development stages using simulated models of these subsystems. Therefore, with respect to the typical V-diagram used in engineering projects, in research projects, it will be necessary to verify and validate subsystems against other simulated subsystems that are based on models. Although software can be functionally verified on simulated subsystems, the validation will be performed on the real subsystem.
Figure 2 illustrates a hybrid V-model [
17] with the software development stages of a research project. The V-model is called hybrid because it is not purely incremental: this allows agile iteration of designs and implementations based on the results obtained in verification and validation stages.
For model-based development, after defining the system’s requirements and each subsystem, the team will design the functional models of the subsystems and then the software to merge them. Based on these designs, the functional models of the subsystems and the software logic will be implemented. In the integration stage, it must be ensured that the software correctly connects the different subsystems using their simulated models. The functional verification of subsystems is also performed on the simulated models, although the real subsystem can be used if available. On the other hand, in the subsystem validation stage, the real subsystem must be used, even if the rest of the subsystems are simulated. The final stage of system validation is where all the real subsystems should be used.
The modeling-to-deployment process presented in this paper requires advanced modeling and simulation technologies. These technologies provide a means to understand, analyze, and predict the behavior of these systems before they are implemented in the real world. One such technology is DEVS [
18], which is a formalism widely used in modeling and simulation.
The complexity of the systems being dealt with requires using an M&S formalism. These systems often involve multiple interacting components, each with its own behaviors and interactions. Previous work has proposed methodologies based on model-driven engineering aimed at developing secure data management applications [
19] or reducing the number of skills needed to manage a Big Data pipeline and support the automation of Big Data analytics [
20].
Modeling and simulating such systems without a formalism can quickly become overwhelming and error-prone. In previous studies, Architecture Analysis & Design Language and DEVS are gradually used to develop combined architecture and design models [
21]. In this regard, DEVS offers several benefits that make it suitable for developing complex systems. One of the key advantages is the ability to design the system’s structure using coupled models and hierarchy. This allows the system to be broken down into manageable components and capture the interactions between them. By representing the system in this way, its behavior can be better understood, and informed decisions about its design and operation can be made. Another benefit is the separation of the simulator layer from the modeling layer. This separation allows for different simulation strategies, such as sequential, parallel, distributed, or even real-time simulation, without modifying the underlying model [
22]. This flexibility is particularly valuable when dealing with complex systems that require different simulation approaches depending on the specific requirements or constraints.
Furthermore, DEVS provides a formal framework for modeling and simulation, which ensures consistency and rigor in the development process. The formalism allows for precise specification of system components, their behaviors, and their interactions, reducing ambiguity and facilitating communication among researchers and developers.
This section provides an overview of DEVS and discusses its key concepts and principles. Integration of real-time capabilities into DEVS has also been explored, which enables the simulation of systems that require timely and synchronized responses, being an internal process for deploying the real system.
4. Design Stage
The design phase begins with a basic coupled model that contains other models, atomic or coupled, to represent the functionality of each subsystem. Through an iterative process, the models and connections are refined until they represent the fundamental behavior of the subsystems.
Figure 5 shows the initial distributed system model and information flows, where the subsystems that form it and the flow of information between them can already be appreciated.
At this stage, the hardware and software that constitute the systems are not yet available, so input files are used to emulate the functionality of subsystems and events. The main objective of this model is to define the logic, timing, and message format between the atomic/coupled models.
In this design, the subsystems have been distributed in three layers. The edge layer includes two models: the ship with sensors and the weather station. The fog layer provides the behavior of the GCS, which employs three models: a local database, an inference model, and a trajectory planner. The cloud layer includes two models: a global database and an inference model trainer.
The logical behavior is simulated with files as follows. Initially, the inference model proposes a place and time to start data collection. Then, the planner sends the ship to that position. When the ship reaches the position, the ship’s sensors take measurements, which are sent back to the GCS.
The GCS merges these measurements and those from the weather station, which generates signals periodically. The inference model can now use these measurements to close the control loop, inferring a new position and time to take measurements. In parallel, all this information is uploaded to the cloud layer server, where the inference model is trained again. When new inference parameters are available, they are also updated in the model used in the fog layer.
To clarify, the inference and its training work with different temporal granularity. A new position is inferred approximately every 30 min, while the model is only trained once a day (using the history received during the previous day, which will be available at the end of the day).
From this high-level design, design decisions are made iteratively. The structure of folders and files, the timing of messages, the type of messages, the separation between services and models, etc., have been refined. For example, through this process, it has been concluded that it is most advisable to generalize communications between subsystems and use a generic message class for the whole environment, as shown below.
class Event:
# A message to model events.
id: str
source: str
timestamp: datetime = field(default_factory=datetime.now)
payload: dict = field(default_factory=dict)
The potential of Python dictionaries, which are used in the payload field, allows all ports in the environment to share the same event class. Thanks to this, the development of web reports and graphical scenarios is substantially simplified.
8. Validation Stage
The next stage consists of transforming the simulator into real-time software that allows the deployment and validation of real subsystems. This must be done progressively, i.e., subsystem by subsystem. To do this, part of the software must be re-coded to run on the embedded hardware, and another part can evolve from the simulation version to a real-time version, taking advantage of the DEVS formalism. DEVS allows the model to work in both simulated- and real-time and also in accelerated real-time, thus speeding up the validation process.
Previous studies have shown possible robust management strategies for IoT sensors deployed in real environments and have shown the coordination between indoor and outdoor sensors to extend real-time monitoring coverage [
29,
30]. To deploy and validate a subsystem, it must communicate with the rest of the subsystems. Therefore, in such a complex system, a mixed deployment is necessary, where one subsystem runs on real hardware and the others are simulated.
All subsystems have already been modeled and integrated into the simulator. Most of them have already been developed and are in the verification stage. A real GPS at the edge layer and the WebReports service at the cloud layer have been validated.
8.1. GPS Validation
To carry out the validation phase of some subsystems, real Things must be integrated into real-time simulations, for which a common communication interface must be incorporated.
As explained in
Section 2.2, the motivation behind I/O handlers is to be able to adapt external events as input messages and send output messages outside the simulator. The DEVS simulator was not originally planned to handle external events. This can be implemented in two ways: by event polling or by event injection.
Things integration by polling: The simulator requests information from the Things through a communications protocol. First, the necessary connections must be opened, and the parameters must be configured. Once the simulation is launched, the processor is responsible for making a request to the device being polled, where both the input data (captured by
function) and the device status can be queried. Waiting for a response can be active if the execution is blocked until a response is received, or it can be passive if the simulation continues to run regardless of whether a response is received or not. The processor is in charge of checking the presence of accumulated data in a specific virtual device every so often. Subsequently, information processing is performed, which may involve data analysis or decision-making based on the device status, etc. After processing the response, it generally waits for a certain period before sending the next request. This has the advantage of avoiding overloading the devices with recurring requests. This method is widely used in HTTP requests when working with web services or APIs [
31].
Things integration by event injection: In this method, an
Injector is added to the simulator (external to the DEVS formalism), which allows external events to be incorporated in a soft real-time simulator [
32]. External events occur outside the normal simulation flow, such as a hardware device sending a signal to indicate that it needs attention. Now, the information from the sensor is received by the
Injector, which feeds an event into the simulator by the
port. The simulator processes the event as soon as possible. Among the benefits of employing this type of integration is the speed of response, as it benefits the synchronization between the sensor time and the simulator clock. In this way, the data obtained by the sensor are directly reflected in the data that the simulator works with.
Figure 10 shows the two integration schemes for a GPS sensor. In these examples, it is possible to appreciate the connection of the external device with the virtual sensor of the simulator and the way to communicate with the GCS.
Although both methods work correctly with the designed system, we decided to work with the integration of the GPS sensor by injection because these data are externally generated events.
For the case of the sensor integrated by polling, the access to the data provided by the real sensor maintains the state space diagram design for simulated sensors shown above in
Figure 6. The virtual sensor starts from the OFF state and completes its initialization by notifying the simulator that it is a real sensor and specifying its data types and communication protocol, and it transits to the ON state. If the simulator sends a request for data on the
port, the sensor transits to a real WORK state, where the real sensor reads data from the real world rather than reading from the
simbody. After the read time interval, the data are available and the virtual sensor transits to the SEND state to provide the simulator with an event with the information. Finally, the sensor returns to the ON state to wait for a new request from the simulator.
The case of the integrated sensor with event injection shown in
Figure 11 is very similar to the simulated sensor diagram in the initial states. However, the sensor model remains in the ON state until it receives the external sensor data injection over the
port, which causes an external state transition to the SEND state and sending of the sensor data to the simulator. In this way, the sending of sensor data to the simulator only depends on the arrival of an external interrupt with data from the external sensor to the simulator. For this reason, the WORK state and internal state transitions executed with a sigma period are dispensed with. Once the data injection is complete, the sensor returns to the ON state to continue normal operation. In this case, the simulator can also place the sensor in the OFF state as required.
A commercial GPS sensor has been used for the validation stage, and a version of its drivers has been executed in real-time. The messages delivered by the GPS sensor follow the protocol provided by the National Marine Electronics Association (NMEA) [
33]. Thus, the messages provided by the GPS are constantly collected by the
Injector in a parallel thread containing a Flask server. In this case, only the message containing the information related to the position, the timestamp, the error, and the signal quality is of interest to be collected. The driver discards messages that do not correspond to these parameters.
Once the desired information has been obtained from the GPS, the Injector adds the measurement values into the simulator as an external event. A log of the software driver is provided below.
...
Starting event 8 @ t = 12:12:17
Starting event 9 @ t = 12:12:17
event 8 finished @ t = 12:12:17
INFO:__main__:event 8
{
'timestamp': datetime.time(10, 12, 18),
'latitude': 40.45073216666667, 'longitude': -3.726055166666667,
'horizontal_dil': '1.42', 'num_sats': '07', 'gps_qual': 1
}
...
Starting event 14 @ t = 12:12:18
Starting event 15 @ t = 12:12:18
event 14 finished @ t = 12:12:18
INFO:__main__:event 14
{
'timestamp': datetime.time(10, 12, 19),
'latitude': 40.450736166666665, 'longitude': -3.7260635,
'horizontal_dil': '1.42', 'num_sats': '07', 'gps_qual': 1
}
...
Starting event 19 @ t = 12:12:19
Starting event 20 @ t = 12:12:19
event 19 finished @ t = 12:12:19
...
The validation of the GPS sensor shows an anomaly. A 2 h and 1 s lag exists between the timestamp of the simulator and the GPS timestamp. The 2 h lag is because the computer clock is set to local time, and the 1 s delay is because UTC corrects 1 s every five years to compensate for slowing of the earth’s rotation.
8.2. WebReport Validation
This service resides in the cloud layer, and it is responsible for receiving data from the other layers, processing them, and displaying them on user request. The cloud layer includes servers and data centers that can store large amounts of data and perform data processing and analysis. Data analytics and machine learning, which require high computational power, can be performed on these servers.
As in the coupled fog model, the cloud model can run different services but is highly scaled to manage one or several water bodies. These services include running extensive data analysis, with all data stored in the central database, or running training services to update the inference models for the models coupled to the fog layer. In any case, these actions are always triggered by the arrival of new data files from the lower layers. For example, services have been implemented in the cloud layer to inform the water manager of alerts or reports generated throughout the simulation day.
No specific atomic models have been included for executing services because they are always processes installed in Docker containers, i.e., they have a distributed architecture. It is unnecessary to encapsulate such atomic models in DEVS models; in other words, the cloud layer is seen as a centralized entity.
Adaptability is guaranteed as the framework has been designed with scalability in mind, which allows users to outsource and scale up critical services in case of bottlenecks. For example, resource-intensive atomic models, such as those designed for training services or data analysis, can be easily simulated in parallel or distributed computing architectures.
Figure 12 shows the connection structure of the cloud layer, including its components. It can be seen that the database aggregates the information delivered by the fog layer and Things. In this case, the data are stored in directories with files hierarchically organized by the body of water in which the monitoring is performed, the date, and the type of signal recorded. The benefit of using this type of system is that it is possible to store data from different Things regardless of whether they are actual or simulated subsystems. The superior subsystems, such as the web report or the other services, use the information provided by the database and can identify the nature of each signal by the type of file. However, the services work the same way for each case, even if real-time simulations are used. In this case, the files have been generated by simulation, but the service’s validation is also valid for actual data files.
Early warning systems are often accompanied by a web interface or application, which offers different services and has the potential to create a direct communication channel between citizens, researchers, and decision-makers [
34]. There are countless applications for EWS monitoring, be it fire prevention [
35], weather monitoring [
36], geological hazards [
37], biological hazards [
38], etc. Among their most common features are alert generation, signal monitoring, signal analysis, and event forecasting. This type of web interface demonstrates the capabilities of the system to be able to integrate the collection, visualization, and analysis of sensor data on a single platform [
39].
An advanced web application has been designed to monitor the HACBs in different water bodies. For this purpose, the free and open-source tool Django is used to develop an application efficiently and with a user-friendly interface that provides highly relevant information for the different user roles [
40]. The interface has several views aimed at offering different services to users: the first of these views is ‘Alarm and Water body management’, which provides the water manager with visual information on the recording of algal bloom measurements and alarms generated in a water body. It is possible to delimit the regions of the map where the algae level alert service is performed and to mark the limit values. This information must be declared when initializing the simulator.
The ‘USV’s Monitoring’ view allows the user to keep track of the planning and trajectories of the USVs used for the monitoring of the HACBs. The behavior of the ship(s) can be observed dynamically over the recorded days. Another of the views incorporated in the web interface is the ‘Forecast’ view, which is used to obtain estimates of the behavior of algal blooms based on historical data.
Finally, the last implemented view is the ‘Signal Analysis’ view, which is designed to provide managers with a detailed examination of the signals captured by the USV sensors. This functionality allows pattern analysis, identification of possible events, and evaluation of the quality of actions taken by authorities. The data obtained are presented graphically and interactively for more efficient interpretation. A view is currently being developed for an ‘editor’, which allows the user to configure the simulation scenarios and obtain data in regions of high interest or under certain specified conditions.
One of the views provided by the web service can be seen in
Figure 13. It is the page destined for the water manager. This page shows a record of the different signals captured by the USV sensors for each of the available water bodies. In particular, the figure shows the measured and predicted algal density signal for the Lake Washington water body between 1 and 8 September 2008. The reactive interface allows filtering the signals by date and displaying the maximum, minimum, and average values and the instants they occurred. This screen lets us obtain information on the values collected by the ship’s sensors found in a specified body of water along its course. A list of alarms is automatically generated according to the maximum values of the selected detection parameters (algal, nitrate, and oxygen levels).
The services react to the user in a reactive way according to the parameters specified by the manager. In each view, the user can select the water body, the signals to be represented, and the date range over which the data are found. In this way, the cloud layer can support authorities when they are making high-level decisions based on the information provided on the different water bodies.
9. Conclusions
The presence of HACBs in aquatic ecosystems such as marshes, swamps, lakes, and rivers poses risks for recreational use and human consumption, adversely affects ecosystems, and directly impacts public health. Traditional efforts to detect these blooms have relied on reactive measures and isolated actions such as manual sample collection, automatic sampling at fixed locations, or the use of classical predictive models. However, new EWSs are being developed to provide proactive, contemporary, and automated monitoring systems.
In this project, we propose the design and development of an advanced EWS. Due to the system’s complexity, we have undertaken a robust implementation based on model-based system engineering (MBSE) principles and driven by the DEVS approach. This methodology enables the progressive design, development, validation, deployment, and verification of a complex system such as the one we are focusing on.
During the development of the various subsystems, it is crucial to simulate them to ensure distributed validation and deployment. It is imperative to fully stabilize the control loops before deployment using a system simulator to train the inference models.
For the design of the framework architecture, we have stratified the system in layers according to the IoT paradigm: cloud, fog, and edge layers. The system was then developed in a simulated environment, allowing for validation of subsystems. This was followed by an incremental deployment process to progressively validate subsystems.
The research presented in this paper has successfully demonstrated the application of a modeling- and simulation-driven methodology for the development of an inland water monitoring system. The use of the DEVS formalism has provided a structured and systematic approach to the development process and enabled the simulation, validation, and incremental deployment of various subsystems. The layered IoT architecture has facilitated the integration of autonomous data collection units, predictive algorithms, and inference models, resulting in a robust and adaptable EWS for HACB surveillance.
The methodology outlined in this paper has significant implications for environmental monitoring and management. By leveraging the capabilities of DEVS modeling and real-time simulation, the developed system can provide timely and accurate predictions of HACB occurrences, thereby enhancing the ability of water managers and authorities to respond effectively to environmental threats. The modular design and the integration of real and simulated components offer a flexible and cost-effective solution for deploying complex monitoring systems in diverse aquatic environments.
Future work will focus on refining the inference models, improving the real-time data processing capabilities, and expanding the system’s functionality to cover a broader range of environmental parameters. The ultimate goal is to establish a comprehensive monitoring framework that can be adapted to various ecological contexts and contribute to the global effort to combat the adverse effects of climate change on water resources.