4.3.2. Cloud Platform and Used Services

We decided to develop our application using AWS tools, but the same application architecture can be reproduced using equivalent services from other cloud providers. For regions impacted by flooding, high availability of the computing backend is imperative due to the need for quick analysis of the incoming weather and real time water level data. AWS offers high availability, which includes regional failovers in case a data center is taken offline. Deploying and redeploying resources on AWS can also be quickly automated using AWS CloudFormation [31], a tool used to provision specified resources (such as Lambda, EC2, RDS, etc.) through a provided script. The code written for the backend of the cloud-based system can be found at [32].

#### Amazon Lambda

AWS provides a serverless computing platform known as Amazon Lambda [21], which allows users to run their custom functions on demand. The underlying infrastructure of Lambda is maintained by AWS, which means the system developer must only worry about choosing the correct runtime environment to deploy their code. Using Lambda, the sensors are queried for uplink data at specified intervals. The uplink is then parsed, and the data is transformed to only include information pertinent to the application. The sensors' uplink data is uploaded to S3 for long-term storage and becomes available to be queried into the MySQL database when needed. After the Lambda function finishes uploading the transformed data, it automatically shuts off, allowing the user to pay only for the computing time and memory resources used rather than provisioning a continuously running machine (e.g., EC2). Lambda was chosen for our solution due to ease of scalability with future added devices, monitoring, high availability, and resource efficiency. For instance, if a new TTN application is added to the system, the existing Lambda function can be promptly updated to query sensor data. Should multiple applications need to report data in overlapping intervals, the same Lambda function can run in parallel of up to 1000 instances if needed.

The Lambda functions for this use case require modification from the default settings. We used the AWS SDK Pandas Lambda Layer [33] to query from TTN, parse data, and to store or read data from a S3 bucket. Python's Pandas module is used to quickly transform and manipulate data. The urllib3 module is used to send HTTP requests to The Things Network's storage integration and to retrieve sensor data. Other configurations for the Lambda function include setting the allocated memory to 192 MB (determined by AWS Compute Optimizer [34]), a timeout limit of 1 min, and being triggered to run once every hour. The lambda function triggering period can be adjusted based on application needs, where shorter periods translate to lower latency between data being available on TTN and stored in the S3 bucket but also resulting in higher costs for the lambda function computing service.

Another use we make of AWS Lambda is to return stored sensor data requested by our RESTful API and to manage user authentication. When receiving a query from Amazon Gateway API, a lambda function is initially executed to check an authorization token provided in the API request and authorize or deny the API request. If authorization is granted, a second lambda function reads, and parse data stored in the long-term data storage solution in the S3 bucket to return the required data to the API gateway. This lambda function to query data from S3 and return to the API gateway is configured to allocate 512 MB of data as a compromise between cost and performance to serve the API functionality and timeout limit of 1 min. The authorizer function uses the default settings of 128 MB memory allocation and 3 s timeout due to the simplicity of our currently adopted solution that only checks if the authorization key input matches a hardcoded string value.

### Amazon S3 Data Storage

Amazon Simple Storage Solution (S3) is a cost-effective way to store data for extended periods. Data collected by sensors are uploaded in S3 for long-term storage as a read-only resource of the raw data feed. These readings can be used to repopulate the database in case of a database failure or migration and can be performed using the python library created for this system. AWS also maintains a python module (BOTO3 [35]) that allows users to download a copy of the readings from S3 to a local machine. All readings in S3 are currently stored as the AWS Standard tier for regular access for this application example.

Another use for the S3 storage is hosting static websites. We used a S3 bucket to store our RESTful API documentation using the Swagger UI interface [29]. Our website is based on the Swagger UI demonstration provided in their github page, and adapted to read an OpenAPI 3.0 description of our API service. The static website contains the API server address, a description of the required header, all accepted parameters, and the possibility to perform an API GET request trial with parameters provided by the user.

#### Amazon Elastic Cloud Compute

To host MySQL and Grafana, two Amazon Elastic Compute Cloud (Amazon EC2) instances were provisioned. Amazon EC2 allows for a continuous computing platform on the cloud, which allows access to the database and Grafana when needed. The developed system uses t3.micro instances with 10 GB storage, which fits the needs of this example application by minimizing costs while still maintaining a reliable performance for the relatively low number of sensors currently in the system. A more capable instance could be used to serve a larger number of users or for a use case requiring quicker response times. For this study, MySQL and Grafana were hosted on two separate EC2 instances for simpler management and increased flexibility, allowing, for example, the easy replacement of visualization software or on-demand use of MySQL database to allow fast data access to applications. It may also be worthwhile to adopt an AWS Relational Database Service (RDS) [36] instead of an EC2 instance running MySQL as the system database solution and then scale the RDS database based on the application's requirements for maintainability and access speed. This was considered, but not implemented in this study because RDS comes at a higher cost. However, RDS has the advantage that it provides built-in scalability as data volumes and users grows. The Section 5 includes a cost comparison between these alternatives for hosting the database and a discussion of pros and cons of each alternative.

#### 4.3.3. Relational Database Design and Implementation

As our relational database, we selected MySQL as a simple solution with wide community support. We deployed MySQL on the cloud through Bitnami [37], which provides a pre-configured virtual machine image which is ready to be loaded to an Amazon EC2 instance. We created an entity relationship diagram (ERD) to normalize the sensor readings as shown in Figure 4. The ERD is centered around the Measurements entity, which stores the value of individual data points along with the time of data collection (Received\_at). The Devices entity stores the device's unique identifier (Device\_ID), the device's model (Device\_model), the last received battery reading of the device (Last\_battery), and the last activity timestamp (Last\_activity). Similarly, the Locations entity contains data on the latitude, longitude, and altitude for each location that data is collected from, along with a unique identifier for each location. For each value in the values table, the Variables entity stores the data points' unique display name and the unit of the variable. The Measurements entity has a one-to-many relationship with the three other entities, meaning that each value data point can only have one device, variable, and location, while the remaining entities can have many values for each data point in their tables. This ERD was developed by advancing an approach from previous related research [38]. This design of the database allows for easy further advancement and change as additional devices and variables can be more easily incorporated.

#### 4.3.4. Graphical User Interface

This system allows users to visualize and monitor data through Grafana, an opensource analytics platform for querying, visualizing, and alerting on data metrics. Grafana was selected as the software solution to visualize incoming data due to its dynamic dashboards, built-in alerting capabilities, and its specialization in time series data.

Grafana was deployed on the cloud through Bitnami [37], which provides a system image of a pre-configured Grafana stack on AWS. A connection was then made to the MySQL database in Grafana to access the data for visualization. Dashboards of each monitoring station were created to display relevant information for users. In Figure 5 we show an example of the dashboard for the water depth monitoring station. This dashboard includes a graph of the water depth over time, statistics on the water depth values for the set time range, a water level gauge of the current depth, a map of the sensor station location, and a gauge of the sensor battery level. The water depth graph and gauge allow users to view the current and past water levels in relation to a threshold of 0.4 m to signify flooding. Grafana's built-in alert system can send alert notifications if the incoming data triggers a set alert rule. As an example, the water depth dashboard has alert rules set to send a notification through the messaging application Slack [39] if the 0.4 m threshold is met, although sending alerts to other systems or via email is also possible.

**Figure 5.** Grafana decision support dashboard of a water depth monitoring sensor.

#### 4.3.5. RESTful API

Our RESTful API serves as a programmatic interface for users to quickly download data from sensors. We created the API using the Amazon API Gateway service [25] and lambda functions, both to manage API access and to read, parse, and return data from our long-term data storage solution in AWS S3. To document our RESTful API and provide easy access to sensor data, we created a static website using Swagger UI and it is currently hosted using AWS S3 buckets. We also enabled CORS in our API Gateway service, and we added a custom header with an authorization token for access control.

We described our API following the Open API 3.0 framework and stored it as a json file loaded to a specification variable in the javascript code for our documentation website. To download data using the API, the user will be required to input a valid authorization token to be granted access. Although we are currently using a simple custom lambda function to grant access, other more comprehensive user access management tools can be used in future versions, such as Amazon Cognito [40]. Other available parameters to customize the sensor data request are: "application", which selects which TTN application to download data from; "device\_id" which selects devices using a unique identifier; and "last" or "start\_date" and "end\_date" which allow the selection of periods of time to download data. Using the "last" parameter, users can retrieve data collected by the sensors from the time of querying to the day specified. Using the "start\_date" parameter, users can specify the beginning of the time range of the dataset to download. By default, if only one of "start\_date" or "end\_date" parameters are provided, data from the single specified day will be returned. Using the API, the user can request datasets for any of the available sensors. In Figure 6, we illustrate a typical use of the API to request data from a pressure sensor by using the Swagger graphical user interface. In Figure 7, we show a typical API call with parameters and the response.


**Figure 6.** Example of parameters for the sensor data download API, with the asterisk representing the required authorization token field.

**Figure 7.** Response from API using example parameters.

#### **5. Results and Discussion**
