Data | March 2017 - Browse Articles

114 KiB

Open AccessErratum

Erratum: Morrison, H., et al. Open Access Article Processing Charges (OA APC) Longitudinal Study 2015 Preliminary Dataset

by Heather Morrison, Guinsly Mondésir, Jihane Salhab, César Villamizar, Alexis Calvé-Genest and Lisa Desautels

Data 2017, 2(1), 11; https://doi.org/10.3390/data2010011 - 16 Feb 2017

Viewed by 3048

Abstract

The authors wish to make the following corrections to their paper [...] Full article

2892 KiB

Open AccessData Descriptor

Herbarium of the Pontifical Catholic University of Paraná (HUCP), Curitiba, Southern Brazil

by Rodrigo A. Kersten, João A. M. Salesbram and Luiz A. Acra

Data 2017, 2(1), 10; https://doi.org/10.3390/data2010010 - 10 Feb 2017

Cited by 1 | Viewed by 4523

Abstract

The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that [...] Read more.

The main objective of this paper is to present the herbarium of the Pontifical Catholic University of Parana’s and its collection. The history of the HUCP had its beginning in the middle of the 1970s with the foundation of the Biology Museum that gathered both botanical and zoological specimens. In April 1979 collections were separated and the HUCP was founded with preserved specimens of algae (green, red, and brown), fungi, and embryophytes. As of October 2016, the collection encompasses nearly 25,000 specimens from 4934 species, 1609 genera, and 297 families. Most of the specimens comes from the state of Paraná but there were also specimens from many Brazilian states and other countries, mainly from South America (Chile, Argentina, Uruguay, Paraguay, and Colombia) but also from other parts of the world (Cuba, USA, Spain, Germany, China, and Australia). Our collection includes 42 fungi, 258 gymnosperms, 299 bryophytes, 2809 pteridophytes, 3158 algae, 17,832 angiosperms, and only one type of Mimosa (Mimosa tucumensis Barneby ex Ribas, M. Morales & Santos-Silva—Fabaceae). We also have botanical education and education for sustainability programs for basic and high school students and training for teachers. Full article

► Show Figures

Figure 1

10739 KiB

Open AccessArticle

The Effectiveness of Geographical Data in Multi-Criteria Evaluation of Landscape Services †

by Roberta Mele and Giuliano Poli

Data 2017, 2(1), 9; https://doi.org/10.3390/data2010009 - 6 Feb 2017

Cited by 14 | Viewed by 4560

Abstract

The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic [...] Read more.

The aim of the paper is to map and evaluate the state of the multifunctional landscape of the municipality of Naples (Italy) and its surroundings, through a Spatial Decision-Making support system (SDSS) combining geographic information system (GIS) and a multi-criteria method an analytic hierarchy process (AHP). We conceive a knowledge-mapping-evaluation (KME) framework in order to investigate the landscape as a complex system. The focus of the proposed methodology involving data gathering and processing. Therefore, both the authoritative and the unofficial sources, e.g., volunteered geographical information (VGI), are useful tools to enhance the information flow whenever quality assurance is performed. Thus, the maps of spatial criteria are useful for problem structuring and prioritization by considering the availability of context-aware data. Finally, the identification of landscape services (LS) and ecosystem services (ES) can improve the decision-making processes within a multi-stakeholders perspective involving the evaluation of the trade-off. The results show multi-criteria choropleth maps of the LS and ES with the density of services, the spatial distribution, and the surrounding benefits. Full article

(This article belongs to the Special Issue Geospatial Data)

► Show Figures

Figure 1

2966 KiB

Open AccessData Descriptor

Data on Healthy Food Accessibility in Amsterdam, The Netherlands

by Marco Helbich and Julian Hagenauer

Data 2017, 2(1), 7; https://doi.org/10.3390/data2010007 - 26 Jan 2017

Cited by 2 | Viewed by 5283

Abstract

This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses [...] Read more.

This data descriptor introduces data on healthy food supplied by supermarkets in the city of Amsterdam, The Netherlands. In addition to two neighborhood variables (i.e., share of autochthons and average housing values), the data comprises three street network-based accessibility measures derived from analyses using a geographic information system. Data are provided on a spatial micro-scale utilizing grid cells with a spatial resolution of 100 m. We explain how the data were collected and pre-processed, and how alternative analyses can be set up. To illustrate the use of the data, an example is provided using the R programming language. Full article

(This article belongs to the Special Issue Geospatial Data)

► Show Figures

Figure 1

2319 KiB

Open AccessArticle

An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data

by Yuzhe Liu and Vanathi Gopalakrishnan

Data 2017, 2(1), 8; https://doi.org/10.3390/data2010008 - 25 Jan 2017

Cited by 60 | Viewed by 8633

Abstract

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages [...] Read more.

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. Full article

(This article belongs to the Special Issue Biomedical Informatics)

► Show Figures

Figure 2

1927 KiB

Open AccessTechnical Note

Determination of Concentration of the Aqueous Lithium–Bromide Solution in a Vapour Absorption Refrigeration System by Measurement of Electrical Conductivity and Temperature

by Salem M. Osta-Omar and Christopher Micallef

Data 2017, 2(1), 6; https://doi.org/10.3390/data2010006 - 19 Jan 2017

Cited by 8 | Viewed by 11496

Abstract

Lithium–bromide/water (LiBr/water) pairs are widely used as working medium in vapour absorption refrigeration systems where the maximum expected temperature and LiBr mass concentration in solution are usually 95 ℃ and 65%, respectively. Unfortunately, published data on the electrical conductivity of aqueous lithium–bromide solution [...] Read more.

Lithium–bromide/water (LiBr/water) pairs are widely used as working medium in vapour absorption refrigeration systems where the maximum expected temperature and LiBr mass concentration in solution are usually 95 ℃ and 65%, respectively. Unfortunately, published data on the electrical conductivity of aqueous lithium–bromide solution are few and contradictory. The objective of this paper is to develop an empirical equation for the determination of the concentration of the aqueous lithium–bromide solution during the operation of the vapour absorption refrigeration system when the electrical conductivity and temperature of solution are known. The present study experimentally investigated the electrical conductivity of aqueous lithium–bromide solution at temperatures in the range from 25 ℃ to 95 ℃ and concentrations in the range from 45% to 65% by mass using a submersion toroidal conductivity sensor connected to a conductivity meter. The results of the tests have shown this method to be an accurate and efficient way to determine the concentration of aqueous lithium–bromide solution in the vapour absorption refrigeration system. Full article

► Show Figures

Figure 1

331 KiB

Open AccessArticle

Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure

by Jonathan Lyle Lustgarten, Jeya Balaji Balasubramanian, Shyam Visweswaran and Vanathi Gopalakrishnan

Data 2017, 2(1), 5; https://doi.org/10.3390/data2010005 - 18 Jan 2017

Cited by 5 | Viewed by 6551

Abstract

The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule [...] Read more.

The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial in the number of predictor variables in the model. We relax these global constraints to learn a more expressive local structure with BRL-LSS. BRL-LSS entails a more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data. Full article

(This article belongs to the Special Issue Biomedical Informatics)

► Show Figures

Figure 1

156 KiB

Open AccessEditorial

Acknowledgement to Reviewers of Data in 2016

by Data Editorial Office

Data 2017, 2(1), 4; https://doi.org/10.3390/data2010004 - 11 Jan 2017

Viewed by 2503

Abstract

The editors of Data would like to express their sincere gratitude to the following reviewers for assessing manuscripts in 2016.[...] Full article

7901 KiB

Open AccessData Descriptor

Scanned Image Data from 3D-Printed Specimens Using Fused Deposition Modeling

by Felix W. Baumann, Julian R. Eichhoff and Dieter Roller

Data 2017, 2(1), 3; https://doi.org/10.3390/data2010003 - 1 Jan 2017

Cited by 4 | Viewed by 6413

Abstract

This dataset provides high-resolution 2D scans of 3D printed test objects (dog-bone), derived from EN ISO 527-2:2012. The specimens are scanned in resolutions from 600 dpi to 4800 dpi utilising a Konica-Minolta bizHub 42 and Canon LiDE 210 scanner. The specimens are created [...] Read more.

This dataset provides high-resolution 2D scans of 3D printed test objects (dog-bone), derived from EN ISO 527-2:2012. The specimens are scanned in resolutions from 600 dpi to 4800 dpi utilising a Konica-Minolta bizHub 42 and Canon LiDE 210 scanner. The specimens are created to research the influence of the infill-pattern orientation; The print orientation on the geometrical fidelity and the structural strength. The specimens are printed on a MakerBot Replicator 2X 3D-printer using yellow (ABS 1.75 mm Yellow, REC, Moscow, Russia) and purple ABS plastic (ABS 1.75 mm Pink Lion&Fox, Hamburg, Germany). The dataset consists of at least one scan per specimen with the measured dimensional characteristics. For this, software is created and described within this work. Specimens from this dataset are either scanned on blank white paper or on white paper with blue millimetre marking. The printing experiment contains a number of failed prints. Specimens that did not fulfil the expected geometry are scanned separately and are of lower quality due to the inability to scan objects with a non-flat surface. For a number of specimens printed sensor data is acquired during the printing process. This dataset consists of 193 specimen scans in PNG format of 127 objects with unadjusted raw graphical data and a corresponding, annotated post-processed image. Annotated data includes the detected object, its geometrical characteristics and file information. Computer extracted geometrical information is supplied for the images where automated geometrical feature extraction is possible. Full article

► Show Figures

Figure 1

10074 KiB

Open AccessArticle

How to Make Sense of Team Sport Data: From Acquisition to Data Modeling and Research Aspects

by Manuel Stein, Halldór Janetzko, Daniel Seebacher, Alexander Jäger, Manuel Nagel, Jürgen Hölsch, Sven Kosub, Tobias Schreck, Daniel A. Keim and Michael Grossniklaus

Data 2017, 2(1), 2; https://doi.org/10.3390/data2010002 - 1 Jan 2017

Cited by 56 | Viewed by 26432

Abstract

Automatic and interactive data analysis is instrumental in making use of increasing amounts of complex data. Owing to novel sensor modalities, analysis of data generated in professional team sport leagues such as soccer, baseball, and basketball has recently become of concern, with potentially [...] Read more.

Automatic and interactive data analysis is instrumental in making use of increasing amounts of complex data. Owing to novel sensor modalities, analysis of data generated in professional team sport leagues such as soccer, baseball, and basketball has recently become of concern, with potentially high commercial and research interest. The analysis of team ball games can serve many goals, e.g., in coaching to understand effects of strategies and tactics, or to derive insights improving performance. Also, it is often decisive to trainers and analysts to understand why a certain movement of a player or groups of players happened, and what the respective influencing factors are. We consider team sport as group movement including collaboration and competition of individuals following specific rule sets. Analyzing team sports is a challenging problem as it involves joint understanding of heterogeneous data perspectives, including high-dimensional, video, and movement data, as well as considering team behavior and rules (constraints) given in the particular team sport. We identify important components of team sport data, exemplified by the soccer case, and explain how to analyze team sport data in general. We identify challenges arising when facing these data sets and we propose a multi-facet view and analysis including pattern detection, context-aware analysis, and visual explanation. We also present applicable methods and technologies covering the heterogeneous aspects in team sport data. Full article

(This article belongs to the Special Issue Geospatial Data)

► Show Figures

Figure 1

6863 KiB

Open AccessData Descriptor

Description of a Database Containing Wrist PPG Signals Recorded during Physical Exercise with Both Accelerometer and Gyroscope Measures of Motion

by Delaram Jarchi and Alexander J. Casson

Data 2017, 2(1), 1; https://doi.org/10.3390/data2010001 - 24 Dec 2016

Cited by 70 | Viewed by 19067

Abstract

Wearable heart rate sensors such as those found in smartwatches are commonly based upon Photoplethysmography (PPG) which shines a light into the wrist and measures the amount of light reflected back. This method works well for stationary subjects, but in exercise situations, PPG [...] Read more.

Wearable heart rate sensors such as those found in smartwatches are commonly based upon Photoplethysmography (PPG) which shines a light into the wrist and measures the amount of light reflected back. This method works well for stationary subjects, but in exercise situations, PPG signals are heavily corrupted by motion artifacts. The presence of these artifacts necessitates the creation of signal processing algorithms for removing the motion interference and allowing the true heart related information to be extracted from the PPG trace during exercise. Here, we describe a new publicly available database of PPG signals collected during exercise for the creation and validation of signal processing algorithms extracting heart rate and heart rate variability from PPG signals. PPG signals from the wrist are recorded together with chest electrocardiography (ECG) to allow a reference/comparison heart rate to be found, and the temporal alignment between the two signal sets is estimated from the signal timestamps. The new database differs from previously available public databases because it includes wrist PPG recorded during walking, running, easy bike riding and hard bike riding. It also provides estimates of the wrist movement recorded using a 3-axis low-noise accelerometer, a 3-axis wide-range accelerometer, and a 3-axis gyroscope. The inclusion of gyroscopic information allows, for the first time, separation of acceleration due to gravity and acceleration due to true motion of the sensor. The hypothesis is that the improved motion information provided could assist in the development of algorithms with better PPG motion artifact removal performance. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Data, Volume 2, Issue 1 (March 2017) – 11 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI