Next Issue
Volume 2, March
Previous Issue
Volume 1, September
 
 

Data, Volume 1, Issue 3 (December 2016) – 7 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
2555 KiB  
Review
Standardization and Quality Control in Data Collection and Assessment of Threatened Plant Species
by Lloyd W. Morrison and Craig C. Young
Data 2016, 1(3), 20; https://doi.org/10.3390/data1030020 - 14 Dec 2016
Viewed by 4056
Abstract
Informative data collection is important in the identification and conservation of rare plant species. Data sets generated by many small-scale studies may be integrated into large, distributed databases, and statistical tools are being developed to extract meaningful information from such databases. A diversity [...] Read more.
Informative data collection is important in the identification and conservation of rare plant species. Data sets generated by many small-scale studies may be integrated into large, distributed databases, and statistical tools are being developed to extract meaningful information from such databases. A diversity of field methodologies may be employed across smaller studies, however, resulting in a lack of standardization and quality control, which makes integration more difficult. Here, we present a case study of the population-level monitoring of two threatened plant species with contrasting life history traits that require different field sampling methodologies: the limestone glade bladderpod, Physaria filiformis, and the western prairie fringed orchid, Plantanthera praeclara. Although different data collection methodologies are necessary for these species based on population sizes and plant morphology, the resulting data allow for similar inferences. Different sample designs may frequently be necessary for rare plant sampling, yet still provide comparable data. Various sources of uncertainty may be associated with data collection (e.g., random sampling error, methodological imprecision, observer error), and should always be quantified if possible and included in data sets, and described in metadata. Ancillary data (e.g., abundance of other plants, physical environment, weather/climate) may be valuable and the most relevant variables may be determined by natural history or empirical studies. Once data are collected, standard operating procedures should be established to prevent errors in data entry. Best practices for data archiving should be followed, and data should be made available for other scientists to use. Efforts to standardize data collection and control data quality, particularly in small-scale field studies, are imperative to future cross-study comparisons, meta-analyses, and systematic reviews. Full article
(This article belongs to the Special Issue Biodiversity and Species Traits)
Show Figures

Figure 1

590 KiB  
Article
Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations
by Mahbaneh Eshaghzadeh Torbati, Makedonka Mitreva and Vanathi Gopalakrishnan
Data 2016, 1(3), 19; https://doi.org/10.3390/data1030019 - 13 Dec 2016
Cited by 6 | Viewed by 5369
Abstract
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can [...] Read more.
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Figure 1

3403 KiB  
Article
The Land Surface Temperature Synergistic Processor in BEAM: A Prototype towards Sentinel-3
by Ana Belen Ruescas, Olaf Danne, Norman Fomferra and Carsten Brockmann
Data 2016, 1(3), 18; https://doi.org/10.3390/data1030018 - 21 Oct 2016
Cited by 11 | Viewed by 5708
Abstract
Land Surface Temperature (LST) is one of the key parameters in the physics of land-surface processes on regional and global scales, combining the results of all surface-atmosphere interactions and energy fluxes between the surface and the atmosphere. With the advent of the European [...] Read more.
Land Surface Temperature (LST) is one of the key parameters in the physics of land-surface processes on regional and global scales, combining the results of all surface-atmosphere interactions and energy fluxes between the surface and the atmosphere. With the advent of the European Space Agency (ESA) Sentinel 3 (S3) satellite, accurate LST retrieval methodologies are being developed by exploiting the synergy between the Ocean and Land Colour Instrument (OLCI) and the Sea and Land Surface Temperature Radiometer (SLSTR). In this paper we explain the implementation in the Basic ENVISAT Toolbox for (A)ATSR and MERIS (BEAM) and the use of one LST algorithm developed in the framework of the Synergistic Use of The Sentinel Missions For Estimating And Monitoring Land Surface Temperature (SEN4LST) project. The LST algorithm is based on the split-window technique with an explicit dependence on the surface emissivity. Performance of the methodology is assessed by using MEdium Resolution Imaging Spectrometer/Advanced Along-Track Scanning Radiometer (MERIS/AATSR) pairs, instruments with similar characteristics than OLCI/ SLSTR, respectively. The LST retrievals were properly validated against in situ data measured along one year (2011) in three test sites, and inter-compared to the standard AATSR level-2 product with satisfactory results. The algorithm is implemented in BEAM using as a basis the MERIS/AATSR Synergy Toolbox. Specific details about the processor validation can be found in the validation report of the SEN4LST project. Full article
(This article belongs to the Special Issue Temperature of the Earth)
Show Figures

Figure 1

3278 KiB  
Data Descriptor
Land Cover Data for the Mississippi–Alabama Barrier Islands, 2010–2011
by Gregory A. Carter, Carlton P. Anderson, Kelly L. Lucas and Nathan L. Hopper
Data 2016, 1(3), 16; https://doi.org/10.3390/data1030016 - 30 Sep 2016
Cited by 1 | Viewed by 5747
Abstract
Land cover on the Mississippi–Alabama barrier islands was surveyed in 2010–2011 as part of continuing research on island geomorphic and vegetation dynamics following the 2005 impact of Hurricane Katrina. Results of the survey include sub-meter GPS location, a listing of dominant vegetation species [...] Read more.
Land cover on the Mississippi–Alabama barrier islands was surveyed in 2010–2011 as part of continuing research on island geomorphic and vegetation dynamics following the 2005 impact of Hurricane Katrina. Results of the survey include sub-meter GPS location, a listing of dominant vegetation species and field photographs recorded at 375 sampling locations distributed among Cat, West Ship, East Ship, Horn, Sand, Petit Bois and Dauphin Islands. The survey was conducted in a period of intensive remote sensing data acquisition over the northern Gulf of Mexico by federal, state and commercial organizations in response to the 2010 Macondo Well (Deepwater Horizon) oil spill. The data are useful in providing ground reference information for thematic classification of remotely-sensed imagery, and a record of land cover which may be used in future research. Full article
(This article belongs to the Special Issue Geospatial Data)
Show Figures

Graphical abstract

302 KiB  
Data Descriptor
SNiPhunter: A SNP-Based Search Engine
by Werner P. Veldsman and Alan Christoffels
Data 2016, 1(3), 17; https://doi.org/10.3390/data1030017 - 29 Sep 2016
Viewed by 4756
Abstract
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data [...] Read more.
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data artifact—called SNiPhunter—is to assist researchers in finding articles relevant to a reference single nucleotide polymorphism (SNP) identifier of interest. A novel feature of this NoSQL (not only structured query language) database search engine is that it returns results to the user ordered according to the amount of times a refSNP has appeared in an article, thereby allowing the user to make a quantitative estimate as to the relevance of an article. Queries can also be launched using author-defined keywords. Additional features include a variant call format (VCF) file parser and a multiple query file upload service. Software implementation in this project relied on Python and the NodeJS interpreter, as well as third party libraries retrieved from Github. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Graphical abstract

4486 KiB  
Data Descriptor
Technical Guidelines to Extract and Analyze VGI from Different Platforms
by Levente Juhász, Adam Rousell and Jamal Jokar Arsanjani
Data 2016, 1(3), 15; https://doi.org/10.3390/data1030015 - 24 Sep 2016
Cited by 10 | Viewed by 8065
Abstract
An increasing number of Volunteered Geographic Information (VGI) and social media platforms have been continuously growing in size, which have provided massive georeferenced data in many forms including textual information, photographs, and geoinformation. These georeferenced data have either been actively contributed (e.g., adding [...] Read more.
An increasing number of Volunteered Geographic Information (VGI) and social media platforms have been continuously growing in size, which have provided massive georeferenced data in many forms including textual information, photographs, and geoinformation. These georeferenced data have either been actively contributed (e.g., adding data to OpenStreetMap (OSM) or Mapillary) or collected in a more passive fashion by enabling geolocation whilst using an online platform (e.g., Twitter, Instagram, or Flickr). The benefit of scraping and streaming these data in stand-alone applications is evident, however, it is difficult for many users to script and scrape the diverse types of these data. On 14 June 2016, a pre-conference workshop at the AGILE 2016 conference in Helsinki, Finland was held. The workshop was called “LINK-VGI: LINKing and analyzing VGI across different platforms”. The workshop provided an opportunity for interested researchers to share ideas and findings on cross-platform data contributions. One portion of the workshop was dedicated to a hands-on session. In this session, the basics of spatial data access through selected Application Programming Interfaces (APIs) and the extraction of summary statistics of the results were illustrated. This paper presents the content of the hands-on session including the scripts and guidelines for extracting VGI data. Researchers, planners, and interested end-users can benefit from this paper for developing their own application for any region of the world. Full article
(This article belongs to the Special Issue Geospatial Data)
Show Figures

Figure 1

521 KiB  
Data Descriptor
688,112 Statistical Results: Content Mining Psychology Articles for Statistical Test Results
by Chris H. J. Hartgerink
Data 2016, 1(3), 14; https://doi.org/10.3390/data1030014 - 23 Sep 2016
Cited by 5 | Viewed by 20735
Abstract
In this data deposit, I describe a dataset that is the result of content mining 167,318 published articles for statistical test results reported according to the standards prescribed by the American Psychological Association (APA). Articles published by the APA, Springer, Sage, and Taylor [...] Read more.
In this data deposit, I describe a dataset that is the result of content mining 167,318 published articles for statistical test results reported according to the standards prescribed by the American Psychological Association (APA). Articles published by the APA, Springer, Sage, and Taylor & Francis were included (mining from Wiley and Elsevier was actively blocked). As a result of this content mining, 688,112 results from 50,845 articles were extracted. In order to provide a comprehensive set of data, the statistical results are supplemented with metadata from the article they originate from. The dataset is provided in a comma separated file (CSV) in long-format. For each of the 688,112 results, 20 variables are included, of which seven are article metadata and 13 pertain to the individual statistical results (e.g., reported and recalculated p-value). A five-pronged approach was taken to generate the dataset: (i) collect journal lists; (ii) spider journal pages for articles; (iii) download articles; (iv) add article metadata; and (v) mine articles for statistical results. All materials, scripts, etc. are available at https://github.com/chartgerink/2016statcheck_data and preserved at http://dx.doi.org/10.5281/zenodo.59818. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop