The technology development presented here includes both the hardware and validation of the data analysis algorithm. Both are discussed in the following sections.
2.1. Technical Description
Design criteria for the sensor development included a low-power, low-cost, user friendly hardware instrument to measure the performance and use of various development projects and relay this data directly to the internet for international dissemination. To meet the design criteria, key features were realized including distributed processing between hardware and the internet cloud, and remote automated recalibration and reconfiguration.
The hardware platform developed is powered with five AA batteries to provide a 6–18 month lifetime while still achieving a high sampling rate of up to 8 Hz. Battery life is saved through triggered event logging and infrequent reporting. Reports are transmitted through wifi or the cellular network but sensors can also include a secure digital (SD) card backup. In addition to the collected data the sensor also reports battery level and cellular signal strength to track system health and performance. Processing occurs through the Internet “cloud” which also enables remote auto calibration. Sensors can be tuned for different applications with 15 different standard voltage or current reference signal input. Data reported by the sensor can then be downloaded from any browser with a protected login. Software includes automatic and manual updating of sensor calibration, reporting and alarm parameters with the ability to be integrated with other web-based platforms through a provide API that is in development.
A commonly used data acquisition system design requires multiple different components (sensor, microprocessor, logger, radio, antenna, power supply) that are packaged and sold separately thereby increasing cost, complexity and power consumption. Additionally, many existing systems require specialized software to collect and analyze the data. Instead, the system presented here is a fully integrated hardware solution that includes the front-end sensor directly integrated, the processing hardware, the radio and the power supply. It is designed to maximize the value of the data and minimize power consumption. The data is transmitted to an internet-cloud platform that is accessible through any standard internet browser. This architecture has enabled the system to be significantly lower in cost and more accessible to the end-user than a similarly functional collection of off-the-shelf components.
Figure 1 shows the current industry standard approach, compared against the design presented in this paper.
Figure 1.
Example historical integrated data communication system, where each function is provided by a separate hardware component, and data is analyzed with proprietary software on dedicated services (top) and Architecture described herein, where the hardware platform is a fully integrated electronics board, and all processing is conducted on-line (bottom).
Figure 1.
Example historical integrated data communication system, where each function is provided by a separate hardware component, and data is analyzed with proprietary software on dedicated services (top) and Architecture described herein, where the hardware platform is a fully integrated electronics board, and all processing is conducted on-line (bottom).
Data loggers often have a tradeoff between frequency of sampling and energy consumption, including recent sensor platforms deployed for environmental monitoring [
13,
14]. The design presented here addresses this issue by sampling at a comparatively high rate, between several times a minute and many times a second, while only logging and relaying the data when a reconfigurable, experimentally determined, threshold value is reached. This thereby minimizes power consumption and allows high resolution logging of usage events while running off of compact batteries for a targeted minimum of six months. The sensors relay all collected data, and rely on the internet processing to aggregate and reduce, thereby providing a more complete data set allowing more flexible analysis, unlike other recent power management strategies that reduce data on-board [
15].
This platform combines commercially available front-end sensors, selected for specific applications including water treatment, cookstove, sanitation, infrastructure or other applications, with a comparator circuit board that samples these sensors at a reasonably high rate (up to 8 Hz, although nominally deployed at 1 Hz). The comparator boards monitor the sensors for trigger threshold events that start and end periodic local data logging. The comparators sample the sensors frequently (up to 8 Hz, although nominally deployed with 10 second samples), and the output is fed into a low power microcomputer chip where the relative time that the parameter change occurs is logged. Logging continues until the parameter returns to a reconfigurable threshold. For example, when applied to water flow measurements, a transducer comparator examines the reported water pressure data and waits for a change indicating, perhaps, that a tap has been opened. When the sudden drop in water pressure is observed, the system starts logging the actual pressure readings until the user closes the tap. The stored events are coded to reduce the amount of data, and thereby the amount of energy required for transmission. An optional on-board SD card allows for local backup logging, as well as logging when cell phone towers are disabled or out of range. The reconfigurable threshold is in analog-to-digital units and is determined experimentally for each application.
A key feature of this sensor data acquisition platform is the nominal low-power consumption of approximately 300 microamps. This is achieved through the use of Semiconductor Industries lowest power microcomputers manufactured by Microchip.com. During nominal operation, the sensor platform is in sleep mode, and all on-chip and off-chip peripherals are in a low-power mode until activated by a change in the sensored parameter. The most significant power usage occurs when each unit reports data and receives configuration parameters from the internet cloud database. Power usage is minimized by logging data locally and reporting on a user-configured scheduled, between approximately every 5 min to once every 24 h. These report intervals can also be dynamically autonomously optimized using cloud-based processing. For example, the sensor boards can be configured to only report when a certain threshold of data is recorded, rather than on a programmed schedule.
Additionally, several sensor inputs from different applications can be integrated into the same sensor board. For example, a single board of integrated power supply, logger and radio can take inputs from air quality and water quality sensors separately. Up to seven analog or digital inputs can be accommodated on a single board, and eight additional binary (on/off) inputs. The boards report directly to the internet over the HTTP protocol, and receive instructions and current time/date information from the cloud server. This significantly reduces the duration of the reporting. Should the communications protocol be disrupted by connectivity issues, such as maintenance on a cellular network tower, the sensor board will return to sleep mode after several connection attempts.
Each sensor board uses adaptive data compression coding algorithms to reduce the amount of data transmitted to the cloud server, less data transmitted equates to a shorter time the cell module needs to be on which improves battery life. In some versions, the sensor board could be deployed with a battery charging solar panel, heat harvesting Peltier junction, or other energy harvesting technologies, and its battery voltage can be monitored more often to decide which power saving mode to operate in. This will allow the system to be more adaptable to the local environment and subject application. Each board can autonomously effect an emergency alarm such as low battery capacity and contact the internet cloud server independent of any local event triggers. An example application on a household water filter is shown in
Figure 2.
Figure 2.
Sensors on household water filter. The dark blue waterproof box contains the electronics board. Pressure transducers are routed from the box (white cable) to the input bucket of the water filter.
Figure 2.
Sensors on household water filter. The dark blue waterproof box contains the electronics board. Pressure transducers are routed from the box (white cable) to the input bucket of the water filter.
Through the internet cloud (hosted on Amazon EC2), the data is then integrated with an online analysis and database system [
16]. First, the sensor boards deliver raw data on a reconfigurable period over HTTP. Then C++ protocols process the raw data into a MySQL database, scaling readings when appropriate using sensor specific calibration values, discarding corrupted data, and compiling reporting periods, cellular signal strength and battery strength. Then, R scripts [
17] process the MySQL data tables with signal processing, statistical analysis and aggregate routines, generate MySQL event tables, charts, and downloadable CSV files.
The distributed methods of data analysis allow some processing to be performed locally on the board, such as some averaging, trigger events, logging, offsets, and gains. Separately, processing algorithms for summary statistics and alarm events may be done on the internet cloud (distributed between the C++ and R routines), allowing high performance with low power consumption. The internet cloud based program can remotely re-configure the hardware platforms, and can cross check individual reports against reports from similarly deployed sensors. The overall data management architecture is pictorially described in
Figure 3.
Figure 3.
Data management architecture. The sensor technology is attached to the subject application, such as a water filter, water pump, or cookstove. The data is logged locally, and periodically transmitted over the cellular networks to the internet. Updated calibration and configuration parameters are downloaded to the hardware. Initial analysis is conducted by a C++ code that puts raw data into a MySQL database. R code is then applied to produce analytics and charts for display online.
Figure 3.
Data management architecture. The sensor technology is attached to the subject application, such as a water filter, water pump, or cookstove. The data is logged locally, and periodically transmitted over the cellular networks to the internet. Updated calibration and configuration parameters are downloaded to the hardware. Initial analysis is conducted by a C++ code that puts raw data into a MySQL database. R code is then applied to produce analytics and charts for display online.
2.2. Validation
Each sensor application is validated in at least two ways – laboratory and field trials. In laboratory testing, measured or known values of the target parameter (water volume, rate, gas concentration, etc.) are introduced to the sensor platform. The software signal processing algorithm is tuned to these known quantities through utilization of laboratory calibration values, signal processing and trial and error, and then this algorithm is processed identically for the field validation and field deployments.
Field validation is performed using structured observations where a household that verbally consents to having a monitoring device and observers in their home is studied for one day. The duration of the observation (hours) depends on the application to acquire enough data to compare structured observations to sensor data. The signal algorithm may be adjusted towards greater correlation to field data in some cases, for example when field data more closely approximates typical use of interest. It is presumed that structured observations need only be conducted each time usage behavior is presumed to be significantly different than previously validated environments. Structured observations are governed by ethic committee approved procedures (Portland State University in most cases thus far, Human Subjects Research Review Committee Proposal #11853).
Two examples of this analysis technique are presented here for illustration, the handpump and the household water filter. The household water filter sensor, on the Vestergaard Frandsen LifeStraw Family 2.0, includes two 1-psi pressure transducers, as shown in
Figure 2. One mounted at the base of the six-liter input bucket, and one in the base of the six-liter storage bucket. Sensor logging is started when a change in water level is detected, as well as during the gravity driven treatment process. On the handpump, deployed on both the India Mark 2 pump and the AfriDev models, a 1-psi pressure transducer is mounted in the base of the pump head, detecting water that is lifted through the pipe and flows into the outlet tap.
To validate the water filter sensors, five sensor equipped Vestergaard Frandsen LifeStraw Family 2.0 water filters were deployed in a rural village in Western Province, Rwanda. The instrumented filters were in each household for one day with same-day structured observations by a community health worker. Each observation was approximately ten hours in duration, five hours in the morning and five hours in the evening. The observer recorded which instrumented filter was being observed, when water was added or taken out of the filter, the volume of water added/removed, and each time the filter was backwashed. To validate the community hand pump sensors, an AfriDev hand pump was sensor equipped to a pump in urban Rwanda. An observer then recorded when the pump was used over a period of a few hours. Structured observations were conducted using the smartphone application doForms [
18] where the observer was prompted to select the type of observation and any other supporting information (
i.e., in the case of the water filter—automatically scanned barcodes of the sensor and how much water was filtered). Additionally the use of the smartphone application for the structured observation allowed for automatic logging of the date and time. This is important as manually recording time can lead to discrete data [
4] that will not perfectly correlate to sensor data. Structured observation and sensor data were aggregated by R polling the DoForms and MySQL databases, and correlated based on the barcode of the sensor observed and the timestamps of each observed event and each sensor-detected event.
In analysis, an online based R script is applied to the raw data. The web-based raw data is polled based on the sensor ID selected. Timestamps and data type are then identified and parsed. Events and usage are then identified from the parsed data.
In the case of the LifeStraw, the events are indicated by regions of near-constant slope in the raw data as shown in
Figure 4a. These regions correspond to the linear decrease in pressure associated with the draining top reservoir of the LifeStraw as water is filtered. To accurately detect these events it is necessary to robustly estimate the slope of the non-uniformly sampled raw data. This is accomplished using a sliding window linear fit technique. The slope and residual error are calculated for a sliding window of length 20 min at one-minute intervals. In this manner a slope signal,
s[
n], and an error signal,
e[
n], are calculated on a uniform time grid
n to facilitate standard signal processing methods. The error for each linear fit is normalized by the Euclidean norm of the raw data used for that fit. This normalization ensures that the maximum error is one and that zero error indicates perfectly collinear data. For a linear fit to be used, two requirements are placed on each window to ensure quality of fit; a minimum of 1/3 samples per minute are required (corresponding to seven samples per 20 min window) and at least one raw data point is required in each third of the window to verify the raw data is spread evenly across the window. After calculating the uniformly sampled slope and error from the raw data, LifeStraw uses are determined by identifying regions with low error and near-constant slope within a specified slope range. This is accomplished by calculating a
slope spectrum for each raw dataset.
The slope spectrum is a novel tool developed to visualize the slope as a function of time, analogous to the spectrogram for frequency data. The first step in constructing the slope spectrum for this type of noisy raw data is to create binary masks for each slope bin, or range, indicating the time indices when the slope signal is within that bin. These signals are defined for each bin as
where
is the binary mask of the slope signal,
s[
n], for bin
m. These masks are then weighted by the error,
e[
n], and convolved with a moving averager,
h[
n]. The length of the averager is chosen to be equal to that of the time window used for slope estimation. The slope spectrum is then defined as
and will contain values between zero and one. An example of a slope spectrum and the corresponding raw data are shown in
Figure 4.
Figure 4.
Example data set from household water filter sensor. Example of a section of raw data (a) and the corresponding slope spectrum (b). The raw pressure data shows three typical usage “events” wherein water is added to the input bucket (vertical line) and then is slowly filtered out (sloped line). The units are in raw analog-to-digital, later converted to liters of water volume.
Figure 4.
Example data set from household water filter sensor. Example of a section of raw data (a) and the corresponding slope spectrum (b). The raw pressure data shows three typical usage “events” wherein water is added to the input bucket (vertical line) and then is slowly filtered out (sloped line). The units are in raw analog-to-digital, later converted to liters of water volume.
The individual uses of the LifeStraw are detected from the slope spectrum by taking the maximum of each column of
S[
m,n] and identifying when the resulting signal is above a specified threshold, 0.5 in this case. The resulting binary signal is defined as
The rises and falls of x[n] correspond to the start and stop times of each use of the LifeStraw. These are identified using a first difference filter and need to be padded by a time equal to the threshold multiplied by the window length, 10 min in this work.
The identified start and stop times of each use of the LifeStraw are then cross referenced against the data normalized from 0 to 6 liters. The value at stop time is subtracted from the value at start time to produce an estimate of the amount of water filtered during this event. These values are calibrated against both lab validation and field verification observations and then applied to all sensors of the same type—while the capability exists to tune each sensor individually, the data presented here is a design-level calibration then applied to all unique deployments.