Overcoming Data Scarcity in Earth Science

A special issue of Data (ISSN 2306-5729). This special issue belongs to the section "Spatial Data Science and Digital Earth".

Deadline for manuscript submissions: closed (31 August 2019) | Viewed by 36155

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Department of Fluid Mechanics and Environmental Engineering (IMFIA), School of Engineering, Universidad de la República, Montevideo 11300, Uruguay
Interests: surface hydrology; hydrologic and water-quality modeling; impact assessment of land use and climate change; urban hydrology and water-quality
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Computer Science Department, Engineering College, Universidad de la República, Montevideo, Uruguay
Interests: optical network; optimization; machine learning

E-Mail Website
Guest Editor
Institute of Fluid Mechanics and Environmental Engineering (IMFIA), Engineering College, Universidad de la República, Montevideo, Uruguay
Interests: water resources management; surface hydrology; flood modeling

E-Mail Website
Guest Editor
Computer Science Department, Engineering College, Universidad de la República, Montevideo, Uruguay
Interests: data management; open data; data quality

Special Issue Information

Dear Colleagues,

Environmental mathematical models represent one of the key aids for scientists to forecast, create, and evaluate complex scenarios. These models heavily rely on the data collected by direct field observations. However, a functional and comprehensive dataset of any environmental variable is hard to collect, mainly because of: i) the high cost of the monitoring campaigns; and ii) the low reliability in the measurements (e.g., due to occurrences of equipment malfunctions and/or issues related to the equipment location). The lack of a sufficient amount of Earth science data may induce an inadequate representation of the response’s complexity in any environmental system to any type of input/change, both natural and human-induced. In such a case, before undertaking expensive studies to gather and analyze additional data, it is reasonable to first understand what enhancement in estimates of system performance would result if all the available data could be well exploited.

Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. Different approaches are available to deal with missing data. Traditional statistical data completion methods are used in different domains to deal with single and multiple imputation problem. More recently, machine learning techniques as clustering and classification, have been proposed to complete missing data.

This Special Issue on “Overcoming Data Scarcity in Earth Science” of the Journal Data is designed to draw attention to the body of knowledge that aims at improving the capacity of exploiting the available data to better represent, understand, predict, and manage the behavior of environmental systems at all practical scales.

Authors are encouraged to submit research articles, reviews, and short communications addressing this theme in this Special Issue.

Dr. Angela Gorgoglione
Dr. Alberto Castro Casales
Dr. Christian Chreties Ceriani
Dr. Lorena Etcheverry Venturini
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Earth science data
  • Data scarcity
  • Missing data
  • Data quality
  • Data Imputation
  • Statistical Methods
  • Machine learning
  • Environmental modeling
  • Environmental observations

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review, Other

5 pages, 189 KiB  
Editorial
Overcoming Data Scarcity in Earth Science
by Angela Gorgoglione, Alberto Castro, Christian Chreties and Lorena Etcheverry
Data 2020, 5(1), 5; https://doi.org/10.3390/data5010005 - 01 Jan 2020
Cited by 10 | Viewed by 2789
Abstract
The Data Scarcity problem is repeatedly encountered in environmental research. This may induce an inadequate representation of the response’s complexity in any environmental system to any input/change (natural and human-induced). In such a case, before getting engaged with new expensive studies to gather [...] Read more.
The Data Scarcity problem is repeatedly encountered in environmental research. This may induce an inadequate representation of the response’s complexity in any environmental system to any input/change (natural and human-induced). In such a case, before getting engaged with new expensive studies to gather and analyze additional data, it is reasonable first to understand what enhancement in estimates of system performance would result if all the available data could be well exploited. The purpose of this Special Issue, “Overcoming Data Scarcity in Earth Science” in the Data journal, is to draw attention to the body of knowledge that leads at improving the capacity of exploiting the available data to better represent, understand, predict, and manage the behavior of environmental systems at meaningful space-time scales. This Special Issue contains six publications (three research articles, one review, and two data descriptors) covering a wide range of environmental fields: geophysics, meteorology/climatology, ecology, water quality, and hydrology. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)

Research

Jump to: Editorial, Review, Other

14 pages, 1355 KiB  
Article
Classification of Soils into Hydrologic Groups Using Machine Learning
by Shiny Abraham, Chau Huynh and Huy Vu
Data 2020, 5(1), 2; https://doi.org/10.3390/data5010002 - 19 Dec 2019
Cited by 47 | Viewed by 7780
Abstract
Hydrologic soil groups play an important role in the determination of surface runoff, which, in turn, is crucial for soil and water conservation efforts. Traditionally, placement of soil into appropriate hydrologic groups is based on the judgement of soil scientists, primarily relying on [...] Read more.
Hydrologic soil groups play an important role in the determination of surface runoff, which, in turn, is crucial for soil and water conservation efforts. Traditionally, placement of soil into appropriate hydrologic groups is based on the judgement of soil scientists, primarily relying on their interpretation of guidelines published by regional or national agencies. As a result, large-scale mapping of hydrologic soil groups results in widespread inconsistencies and inaccuracies. This paper presents an application of machine learning for classification of soil into hydrologic groups. Based on features such as percentages of sand, silt and clay, and the value of saturated hydraulic conductivity, machine learning models were trained to classify soil into four hydrologic groups. The results of the classification obtained using algorithms such as k-Nearest Neighbors, Support Vector Machine with Gaussian Kernel, Decision Trees, Classification Bagged Ensembles and TreeBagger (Random Forest) were compared to those obtained using estimation based on soil texture. The performance of these models was compared and evaluated using per-class metrics and micro- and macro-averages. Overall, performance metrics related to kNN, Decision Tree and TreeBagger exceeded those for SVM-Gaussian Kernel and Classification Bagged Ensemble. Among the four hydrologic groups, it was noticed that group B had the highest rate of false positives. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)
Show Figures

Figure 1

16 pages, 2606 KiB  
Article
Use of the WRF-DA 3D-Var Data Assimilation System to Obtain Wind Speed Estimates in Regular Grids from Measurements at Wind Farms in Uruguay
by Gabriel Cazes Boezio and Sofía Ortelli
Data 2019, 4(4), 142; https://doi.org/10.3390/data4040142 - 29 Oct 2019
Cited by 5 | Viewed by 2763
Abstract
This work assessed the quality of wind speed estimates in Uruguay. These estimates were obtained using the Weather Research and Forecast Model Data Assimilation System (WRF-DA) to assimilate wind speed measurements from 100 m above the ground at two wind farms. The quality [...] Read more.
This work assessed the quality of wind speed estimates in Uruguay. These estimates were obtained using the Weather Research and Forecast Model Data Assimilation System (WRF-DA) to assimilate wind speed measurements from 100 m above the ground at two wind farms. The quality of the estimates was assessed with an anemometric station placed between the wind farms. The wind speed estimates showed low systematic errors at heights of 87 and 36 m above the ground. At both levels, the standard deviation of the total errors was approximately 25% of the mean observed speed. These results suggested that the estimates obtained could be of sufficient quality to be useful in various applications. The assimilation process proved to be effective, spreading the observational gain obtained at the wind farms to lower elevations than those at which the assimilated measurements were taken. The smooth topography of Uruguay might have contributed to the relatively good quality of the obtained wind estimates, although the data of only two stations were assimilated, and the resolution of the regional atmospheric simulations employed was relatively low. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)
Show Figures

Figure 1

15 pages, 1469 KiB  
Article
Application of Rough Set Theory to Water Quality Analysis: A Case Study
by Maryam Zavareh and Viviana Maggioni
Data 2018, 3(4), 50; https://doi.org/10.3390/data3040050 - 07 Nov 2018
Cited by 11 | Viewed by 3792
Abstract
This work proposes an approach to analyze water quality data that is based on rough set theory. Six major water quality indicators (temperature, pH, dissolved oxygen, turbidity, specific conductivity, and nitrate concentration) were collected at the outlet of the watershed that contains the [...] Read more.
This work proposes an approach to analyze water quality data that is based on rough set theory. Six major water quality indicators (temperature, pH, dissolved oxygen, turbidity, specific conductivity, and nitrate concentration) were collected at the outlet of the watershed that contains the George Mason University campus in Fairfax, VA during three years (October 2015–December 2017). Rough set theory is applied to monthly averages of the collected data to estimate one indicator (decision attribute) based on the remainder indicators and to determine what indicators (conditional attributes) are essential (core) to predict the missing indicator. The redundant attributes are identified, the importance degree of each attribute is quantified, and the certainty and coverage of any detected rule(s) is evaluated. Possible decision making rules are also assessed and the certainty coverage factor is calculated. Results show that the core water quality indicators for the Mason watershed during the study period are turbidity and specific conductivity. Particularly, if pH is chosen as a decision attribute, the importance degree of turbidity is higher than the one of conductivity. If the decision attribute is turbidity, the only indispensable attribute is specific conductivity and if specific conductivity is the decision attribute, the indispensable attribute beside turbidity is temperature. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)
Show Figures

Figure 1

Review

Jump to: Editorial, Research, Other

14 pages, 735 KiB  
Review
A Lack of “Environmental Earth Data” at the Microhabitat Scale Impacts Efforts to Control Invasive Arthropods That Vector Pathogens
by Emily L. Pascoe, Sajid Pareeth, Duccio Rocchini and Matteo Marcantonio
Data 2019, 4(4), 133; https://doi.org/10.3390/data4040133 - 29 Sep 2019
Cited by 6 | Viewed by 3182
Abstract
We currently live in an era of major global change that has led to the introduction and range expansion of numerous invasive species worldwide. In addition to the ecological and economic consequences associated with most invasive species, invasive arthropods that vector pathogens (IAVPs) [...] Read more.
We currently live in an era of major global change that has led to the introduction and range expansion of numerous invasive species worldwide. In addition to the ecological and economic consequences associated with most invasive species, invasive arthropods that vector pathogens (IAVPs) to humans and animals pose substantial health risks. Species distribution models that are informed using environmental Earth data are frequently employed to predict the distribution of invasive species, and to advise targeted mitigation strategies. However, there are currently substantial mismatches in the temporal and spatial resolution of these data and the environmental contexts which affect IAVPs. Consequently, targeted actions to control invasive species or to prepare the population for possible disease outbreaks may lack efficacy. Here, we identify and discuss how the currently available environmental Earth data are lacking with respect to their applications in species distribution modeling, particularly when predicting the potential distribution of IAVPs at meaningful space-time scales. For example, we examine the issues related to interpolation of weather station data and the lack of microclimatic data relevant to the environment experienced by IAVPs. In addition, we suggest how these data gaps can be filled, including through the possible development of a dedicated open access database, where data from both remotely- and proximally-sensed sources can be stored, shared, and accessed. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)
Show Figures

Figure 1

Other

8 pages, 4820 KiB  
Data Descriptor
System for Collecting, Processing, Visualization, and Storage of the MT-Monitoring Data
by Elena Bataleva, Anatoly Rybin and Vitalii Matiukov
Data 2019, 4(3), 99; https://doi.org/10.3390/data4030099 - 14 Jul 2019
Cited by 12 | Viewed by 2921
Abstract
On the basis of the Research Station of the Russian Academy of Sciences in Bishkek, a unique scientific infrastructure—a complex geophysical station—is successfully functioning, realizing a monitoring of geodynamic processes, which includes research on the network of points of seismological, geodesic, and electromagnetic [...] Read more.
On the basis of the Research Station of the Russian Academy of Sciences in Bishkek, a unique scientific infrastructure—a complex geophysical station—is successfully functioning, realizing a monitoring of geodynamic processes, which includes research on the network of points of seismological, geodesic, and electromagnetic observations on the territory of the Bishkek Geodynamic Proving Ground located in the seismically active zone of the Northern Tien Shan. The scientific and practical importance of monitoring the geodynamical activity of the Earth’s crust takes place not only in seismically active regions, but also in the areas of the location of particularly important objects, mining, and hazardous industries. Therefore, it seems highly relevant to create new software and hardware to study geodynamic processes in the earth’s crust of seismically active zones, based on integrated monitoring of the geological environment in the widest possible depth range. The use of modern information technology in such studies provides an effective data management tool. The considering system for collecting, processing, and storing monitoring electromagnetic data of the Bishkek geodynamic proving ground can help overcome the scarcity of experimental data in the field of Earth sciences. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)
Show Figures

Figure 1

11 pages, 1338 KiB  
Data Descriptor
A High-Resolution Global Gridded Historical Dataset of Climate Extreme Indices
by Malcolm N. Mistry
Data 2019, 4(1), 41; https://doi.org/10.3390/data4010041 - 13 Mar 2019
Cited by 34 | Viewed by 10597
Abstract
Climate extreme indices (CEIs) are important metrics that not only assist in the analysis of regional and global extremes in meteorological events, but also aid climate modellers and policymakers in the assessment of sectoral impacts. Global high-spatial-resolution CEI datasets derived from quality-controlled historical [...] Read more.
Climate extreme indices (CEIs) are important metrics that not only assist in the analysis of regional and global extremes in meteorological events, but also aid climate modellers and policymakers in the assessment of sectoral impacts. Global high-spatial-resolution CEI datasets derived from quality-controlled historical observations, or reanalysis data products are scarce. This study introduces a new high-resolution global gridded dataset of CEIs based on sub-daily temperature and precipitation data from the Global Land Data Assimilation System (GLDAS). The dataset called “CEI_0p25_1970_2016” includes 71 annual (and in some cases monthly) CEIs at 0.25 × 0.25 gridded resolution, covering 47 years over the period 1970–2016. The data of individual indices are publicly available for download in the commonly used Network Common Data Form 4 (NetCDF4) format. Potential applications of CEI_0p25_1970_2016 presented here include the assessment of sectoral impacts (e.g., Agriculture, Health, Energy, and Hydrology), as well as the identification of hot spots (clusters) showing similar historical spatial patterns of high/low temperature and precipitation extremes. CEI_0p25_1970_2016 fills gaps in existing CEI datasets by encompassing not only more indices, but also by being the only comprehensive global gridded CEI data available at high spatial resolution. Full article
(This article belongs to the Special Issue Overcoming Data Scarcity in Earth Science)
Show Figures

Figure 1

Back to TopTop