Next Issue
Previous Issue

Table of Contents

Data, Volume 3, Issue 2 (June 2018)

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
View options order results:
result details:
Displaying articles 1-12
Export citation of selected articles as:
Open AccessData Descriptor Taguchi Orthogonal Array Dataset for the Effect of Water Chemistry on Aggregation of ZnO Nanoparticles
Received: 29 May 2018 / Revised: 12 June 2018 / Accepted: 13 June 2018 / Published: 15 June 2018
PDF Full-text (1547 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The dynamic nature of engineered nanoparticle (ENP) aggregation behavior and kinetics are of paramount importance in the field of toxicological and environmental nanotechnology. The Taguchi orthogonal array (OA) L27(313) matrix based on a fractional factorials design was applied to
[...] Read more.
The dynamic nature of engineered nanoparticle (ENP) aggregation behavior and kinetics are of paramount importance in the field of toxicological and environmental nanotechnology. The Taguchi orthogonal array (OA) L27(313) matrix based on a fractional factorials design was applied to systematically evaluate the contribution and significance of water chemistry parameters (pH, temperature, electrolyte, natural organic matter (NOM), content and type) and their interactions in the aggregation behavior of zinc oxide nanoparticles (ZnO NPs). The NPs were dispersed into the solution using a probe-sonicator cell crusher (Bio-safer, 1200-90, Nanjing, China). The data were obtained from UV–Vis spectroscopy (Optizen 2120 UV, Mecasys, Daejeon, Korea), Fourier Transform Infrared Spectrometery (FT-IR 4700, spectroscopy, a JASCO Analytical Instruments, Easton, Pennsylvania, USA) and particle electrophoresis (NanoZS, Zetasizer, Malvern Instruments Ltd., Worcestershire, UK). The dataset revealed that Taguchi OA matrix is an efficient approach to study the main and interactive effects of environmental parameters on the aggregation of ZnO NPs. In addition, the aggregation profile of ZnO NPs was significantly influenced by divalent cations and NOM. The result of the FT–IR data presents a possible mechanism of ZnO NP stabilization in the presence of different NOM. This data may be helpful to predict the aggregation behavior of ZnO NPs in environmental and ecotoxicological contexts. Full article
Figures

Graphical abstract

Open AccessArticle Interactive Data Framework and User Interface for Wisconsin’s Oversize-Overweight Vehicle Permits
Received: 13 April 2018 / Revised: 7 May 2018 / Accepted: 13 June 2018 / Published: 15 June 2018
PDF Full-text (4248 KB) | HTML Full-text | XML Full-text
Abstract
With continuing increases in the number of Oversize-Overweight (OSOW) vehicle permits issued in recent years, the management and analysis of OSOW permit data is becoming more inefficient and time-consuming. Large quantities of archived OSOW permit data are held by Departments of Transportation (DOTs)
[...] Read more.
With continuing increases in the number of Oversize-Overweight (OSOW) vehicle permits issued in recent years, the management and analysis of OSOW permit data is becoming more inefficient and time-consuming. Large quantities of archived OSOW permit data are held by Departments of Transportation (DOTs) across the United States, and manual extraction and analysis of this data requires significant effort. In this paper, the authors present a new framework for analyzing Wisconsin’s historic OSOW permit program data. This framework provides an interactive, web-based interface to query the OSOW permit data, link OSOW records to geospatial data features, and dynamically visualize query results. The web-based interface offers scalability and broad accessibility to the data across different DOT divisions, and use cases. Furthermore, a user survey and heuristic evaluation of the interface demonstrate the project’s utility, and identify goals for future system development. Full article
Figures

Figure 1

Open AccessArticle UAT ADS-B Data Anomalies and the Effect of Flight Parameters on Dropout Occurrences
Received: 11 April 2018 / Revised: 23 May 2018 / Accepted: 5 June 2018 / Published: 8 June 2018
PDF Full-text (3568 KB) | HTML Full-text | XML Full-text
Abstract
An analysis of the performance of automatic dependent surveillance-broadcast (ADS-B) data received from the Grand Forks, North Dakota International Airport was carried out in this study. The purpose was to understand the vulnerabilities of the universal access transceiver (UAT) ADS-B system and recognize
[...] Read more.
An analysis of the performance of automatic dependent surveillance-broadcast (ADS-B) data received from the Grand Forks, North Dakota International Airport was carried out in this study. The purpose was to understand the vulnerabilities of the universal access transceiver (UAT) ADS-B system and recognize the effects on present and future air traffic control (ATC) operation. The Federal Aviation Administration (FAA) mandated all the general aviation aircraft to be equipped with ADS-B. The aircraft flying within United States and below the transition altitude (18,000 feet) are more likely to install a UAT ADS-B. At present, unmanned aircraft systems (UAS) and autonomous air traffic control (ATC) towers are being integrated into the aviation industry and UAT ADS-B is a basic sensor for both class 1 and class 2 detect-and-avoid (DAA) systems. As a fundamental component of future surveillance systems, the anomalies and vulnerabilities of the ADS-B system need to be identified to enable a fully-utilized airspace with enhanced situational awareness. The data received was archived in GDL-90 format, which was parsed into readable data. The anomaly detection of ADS-B messages was based on the FAA ADS-B performance assessment report. The data investigation revealed ADS-B message suffered from different anomalies including dropout, missing payload, data jump, low confidence data, and altitude discrepancy. Among those studied, the most severe was dropout and 32.49% of messages suffered from this anomaly. Dropout is an incident where ADS-B failed to update within a specified rate. Considering the potential danger being imposed, an in-depth analysis was carried out to characterize message dropout. Three flight parameters were selected to investigate their effect on dropout. Statistical analysis was carried out and the Friedman Statistical Test identified that altitude affected dropout more than any other flight parameter. Full article
Figures

Figure 1

Open AccessArticle Improving the Efficiency of the ERS Data Analysis Techniques by Taking into Account the Neighborhood Descriptors
Received: 20 April 2018 / Revised: 28 May 2018 / Accepted: 29 May 2018 / Published: 30 May 2018
PDF Full-text (1557 KB) | HTML Full-text | XML Full-text
Abstract
Planning based on reliable information about the Earth’s surface is an important approach to minimize economic expenses conditioned by natural factors. Data collected by Earth remote sensing (ERS), as well as the analysis of such data using automated classification methods, are becoming more
[...] Read more.
Planning based on reliable information about the Earth’s surface is an important approach to minimize economic expenses conditioned by natural factors. Data collected by Earth remote sensing (ERS), as well as the analysis of such data using automated classification methods, are becoming more and more important for research and practice activities related to assessing the spatio-temporal structure and sustainability of the Earth’s surface. The analysis of the authenticity of the surrounding areas enables a more objective classification of land plots on the basis of spatial patterns. Combined use of various environmental descriptors enables high-quality handling of neighborhood properties, as each descriptor provides its own specific information about a geospatial system. Experiments have shown that the diagnostics of the emergent properties of such internal structure by analyzing the diversity of dynamic characteristics allows reducing exposure to noise, obtaining a generalized result, and improving the classification accuracy. Full article
(This article belongs to the Special Issue Data in Astrophysics & Geophysics: Research and Applications)
Figures

Figure 1

Open AccessData Descriptor Benthic Macroinvertebrate Diversity in the Middle Doce River Basin, Brazil
Received: 11 May 2018 / Revised: 18 May 2018 / Accepted: 21 May 2018 / Published: 22 May 2018
PDF Full-text (1511 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
This resource contains a checklist of the benthic macroinvertebrate community sampled biannually from 1999 to 2010 in eight natural lakes from the middle Rio Doce Valley lake system and eight river segments in the Piracicaba River basin (sub-basin of Doce river), Minas Gerais
[...] Read more.
This resource contains a checklist of the benthic macroinvertebrate community sampled biannually from 1999 to 2010 in eight natural lakes from the middle Rio Doce Valley lake system and eight river segments in the Piracicaba River basin (sub-basin of Doce river), Minas Gerais State, Brazil. Three of the lakes are located inside a protected state park and are surrounded by preserved vegetation (Atlantic Forest). The other five lakes are in private properties, surrounded by Eucalyptus plantations. The seven stretches of rivers have a distinct degree of anthropogenic impacts. Samples were collected with a kick net and fixed with formaldehyde solution. Four phyla were represented: Mollusca, Annelida, Arthropoda, and Platyhelminthes. For Insecta, 76 families were identified, one family was identified for Crustacea, and nine families were identified for Mollusca. This subproject belongs to the International Long-Term Ecological Research Project (ILTER—Programa de Pesquisas Ecológicas de Longa Duração—PELD) site 4. Full article
Figures

Figure 1

Open AccessData Descriptor Plant Trait Dataset for Tree-Like Growth Forms Species of the Subtropical Atlantic Rain Forest in Brazil
Received: 18 April 2018 / Revised: 4 May 2018 / Accepted: 6 May 2018 / Published: 8 May 2018
PDF Full-text (3215 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Plant functional traits have been incorporated in studies of vegetation ecology to better understand the mechanisms of ecological processes. For this reason, a global effort has been made to collect functional traits data for as many species as possible. In light of this,
[...] Read more.
Plant functional traits have been incorporated in studies of vegetation ecology to better understand the mechanisms of ecological processes. For this reason, a global effort has been made to collect functional traits data for as many species as possible. In light of this, we identified the most common species of an area of 15,335 km2 inserted in the subtropical Atlantic Rain Forest in Southern Brazil. Then, we compiled functional trait information mostly from field samples, but also from herbarium and literature. The dataset presents traits of leaf, branch, maximum potential height, seed mass, and dispersion syndrome of 117 species, including trees, tree ferns, and palms. We also share images of anatomical features of branches used to measure wood traits. Data tables present mean trait values at individual and species level. Images of wood and stomatal features may be useful to assess other anatomical traits that were not covered in the data tables for the anatomical determination of species and/or for educational purposes. Full article
Figures

Figure 1

Open AccessData Descriptor Datasets for Aspect-Based Sentiment Analysis in Bangla and Its Baseline Evaluation
Received: 20 March 2018 / Revised: 30 April 2018 / Accepted: 2 May 2018 / Published: 4 May 2018
PDF Full-text (1945 KB) | HTML Full-text | XML Full-text
Abstract
With the extensive growth of user interactions through prominent advances of the Web, sentiment analysis has obtained more focus from an academic and a commercial point of view. Recently, sentiment analysis in the Bangla language is progressively being considered as an important task,
[...] Read more.
With the extensive growth of user interactions through prominent advances of the Web, sentiment analysis has obtained more focus from an academic and a commercial point of view. Recently, sentiment analysis in the Bangla language is progressively being considered as an important task, for which previous approaches have attempted to detect the overall polarity of a Bangla document. To the best of our knowledge, there is no research on the aspect-based sentiment analysis (ABSA) of Bangla text. This can be described as being due to the lack of available datasets for ABSA. In this paper, we provide two publicly available datasets to perform the ABSA task in Bangla. One of the datasets consists of human-annotated user comments on cricket, and the other dataset consists of customer reviews of restaurants. We also describe a baseline approach for the subtask of aspect category extraction to evaluate our datasets. Full article
Figures

Figure 1

Open AccessData Descriptor RetroTransformDB: A Dataset of Generic Transforms for Retrosynthetic Analysis
Received: 28 March 2018 / Revised: 16 April 2018 / Accepted: 19 April 2018 / Published: 21 April 2018
PDF Full-text (666 KB) | HTML Full-text | XML Full-text
Abstract
Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms). While there are many public datasets with reactions in synthetic direction (usually non-generic reactions), there are no publicly-available databases with
[...] Read more.
Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms). While there are many public datasets with reactions in synthetic direction (usually non-generic reactions), there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools. Full article
Figures

Graphical abstract

Open AccessData Descriptor Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large Urban and Rural Areas
Received: 16 March 2018 / Revised: 5 April 2018 / Accepted: 5 April 2018 / Published: 10 April 2018
PDF Full-text (2875 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Because of the increasing relevance of the Internet of Things and location-based services, researchers are evaluating wireless positioning techniques, such as fingerprinting, on Low Power Wide Area Network (LPWAN) communication. In order to evaluate fingerprinting in large outdoor environments, extensive, time-consuming measurement campaigns
[...] Read more.
Because of the increasing relevance of the Internet of Things and location-based services, researchers are evaluating wireless positioning techniques, such as fingerprinting, on Low Power Wide Area Network (LPWAN) communication. In order to evaluate fingerprinting in large outdoor environments, extensive, time-consuming measurement campaigns need to be conducted to create useful datasets. This paper presents three LPWAN datasets which are collected in large-scale urban and rural areas. The goal is to provide the research community with a tool to evaluate fingerprinting algorithms in large outdoor environments. During a period of three months, numerous mobile devices periodically obtained location data via a GPS receiver which was transmitted via a Sigfox or LoRaWAN message. Together with network information, this location data is stored in the appropriate LPWAN dataset. The first results of our basic fingerprinting implementation, which is also clarified in this paper, indicate a mean location estimation error of 214.58 m for the rural Sigfox dataset, 688.97 m for the urban Sigfox dataset and 398.40 m for the urban LoRaWAN dataset. In the future, we will enlarge our current datasets and use them to evaluate and optimize our fingerprinting methods. Also, we intend to collect additional datasets for Sigfox, LoRaWAN and NB-IoT. Full article
Figures

Figure 1

Open AccessArticle Comparison between Simulation and Analytical Methods in Reliability Data Analysis: A Case Study on Face Drilling Rigs
Received: 16 December 2017 / Revised: 17 March 2018 / Accepted: 22 March 2018 / Published: 10 April 2018
PDF Full-text (6220 KB) | HTML Full-text | XML Full-text
Abstract
Collecting the failure data and reliability analysis in an underground mining operation is challenging due to the harsh environment and high level of production pressure. Therefore, achieving an accurate, fast, and applicable analysis in a fleet of underground equipment is usually difficult and
[...] Read more.
Collecting the failure data and reliability analysis in an underground mining operation is challenging due to the harsh environment and high level of production pressure. Therefore, achieving an accurate, fast, and applicable analysis in a fleet of underground equipment is usually difficult and time consuming. This paper aims to discuss the main reliability analysis challenges in mining machinery by comparing three main approaches: two analytical methods (white-box and black-box modeling), and a simulation approach. For this purpose, the maintenance data from a fleet of face drilling rigs in a Swedish underground metal mine were extracted by the MAXIMO system over a period of two years and were applied for analysis. The investigations reveal that the performance of these approaches in ranking and the reliability of the studies of the machines is different. However, all mentioned methods provide similar outputs but, in general, the simulation estimates the reliability of the studied machines at a higher level. The simulation and white-box method sometimes provide exactly the same results, which are caused by their similar structure of analysis. On average, 9% of the data are missed in the white-box analysis due to a lack of sufficient data in some of the subsystems of the studies’ rigs. Full article
Figures

Figure 1

Open AccessData Descriptor SIMADL: Simulated Activities of Daily Living Dataset
Received: 1 March 2018 / Revised: 29 March 2018 / Accepted: 30 March 2018 / Published: 1 April 2018
PDF Full-text (810 KB) | HTML Full-text | XML Full-text
Abstract
With the realisation of the Internet of Things (IoT) paradigm, the analysis of the Activities of Daily Living (ADLs), in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research
[...] Read more.
With the realisation of the Internet of Things (IoT) paradigm, the analysis of the Activities of Daily Living (ADLs), in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator), which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset. Full article
Figures

Figure 1

Open AccessArticle Associative Root–Pattern Data and Distribution in Arabic Morphology
Received: 28 January 2018 / Revised: 18 March 2018 / Accepted: 26 March 2018 / Published: 29 March 2018
PDF Full-text (1052 KB) | HTML Full-text | XML Full-text
Abstract
This paper intends to present a large-scale dataset for Arabic morphology from a cognitive point of view considering the uniqueness of the root–pattern phenomenon. The center of attention is focused on studying this singularity in terms of estimating associative relationships between roots as
[...] Read more.
This paper intends to present a large-scale dataset for Arabic morphology from a cognitive point of view considering the uniqueness of the root–pattern phenomenon. The center of attention is focused on studying this singularity in terms of estimating associative relationships between roots as a higher level of abstraction for words meaning, and all their potential occurrences with multiple morpho-phonetic patterns. A major advantage of this approach resides in providing a novel balanced large-scale language resource, which can be viewed as an instantiated global root–pattern network consisting of roots, patterns, stems, and particles, estimated statistically for studying the morpho-phonetic level of cognition of Arabic. In this context, this paper asserts that balanced root-distribution is an additional significant key criterion for evaluating topic coverage in an Arabic corpus. Furthermore, some additional novel probabilistic morpho-phonetic measures and their distribution have been estimated in the form of root and pattern entropies besides bi-directional conditional probabilities of bi-grams of stems, roots, and particles. Around 29.2 million webpages of ClueWeb were extracted, filtered from non-Arabic texts, and converted into a large textual dataset containing around 11.5 billion word forms and 9.3 million associative relationships. As this dataset is predominantly considering the root–pattern phenomenon in Semitic languages, the acquired data might be significant support for researchers interested in studying phenomena of Arabic such as visual word cognition, morpho-phonetic perception, morphological analysis, and cognitively motivated query expansion, spell-checking, and information retrieval. Furthermore, based on data distribution and frequencies, constructing balanced corpora will be easier. Full article
Figures

Figure 1

Back to Top