Next Issue
Volume 3, March
Previous Issue
Volume 2, September
 
 

Data, Volume 2, Issue 4 (December 2017) – 10 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
4059 KiB  
Article
Investigating the Evolution of Linkage Dynamics among Equity Markets Using Network Models and Measures: The Case of Asian Equity Market Integration
by Biplab Bhattacharjee, Muhammad Shafi and Animesh Acharjee
Data 2017, 2(4), 41; https://doi.org/10.3390/data2040041 - 9 Dec 2017
Cited by 3 | Viewed by 4449
Abstract
The state of cross-market linkage structures and its stability over varying time-periods play a key role in the performance of international diversified portfolios. There has been an increasing interest of global investors in emerging capital markets in the Asian region. In this setting, [...] Read more.
The state of cross-market linkage structures and its stability over varying time-periods play a key role in the performance of international diversified portfolios. There has been an increasing interest of global investors in emerging capital markets in the Asian region. In this setting, an investigation into the temporal dynamics of cross-market linkage structures becomes significant for the selection and optimal allocation of securities in an internationally-diversified portfolio. In the quest for this, in the current study, weighted network models along with network metrics are employed to decipher the underlying cross-market linkage structures among Asian markets. The study analyses the daily return data of fourteen major Asian indices for a period of 14 years (2002–2016). The topological properties of the network are computed using centrality measures and measures of influence strength and are investigated over temporal scales. In particular, the overall influence strengths and India-specific influence strengths are computed and examined over a temporal scale. Threshold filtering is also performed to characterize the dynamics related to the linkage structure of these networks. The impacts of the 2008 financial crisis on the linkage structural patterns of these equity networks are also investigated. The key findings of this study include: a set of central and peripheral indices, the evolution of the linkage structures over the 2002–2016 period and the linkage dynamics during times of market stress. Mainly, the set of indices possessing influence over the Asian region in general and the Indian market in particular is also identified. The findings of this study can be utilized in effective systemic risk management and for the selection of an optimally-diversified portfolio, resilient to system-level shocks. Full article
Show Figures

Figure 1

2422 KiB  
Data Descriptor
GasLib—A Library of Gas Network Instances
by Martin Schmidt, Denis Aßmann, Robert Burlacu, Jesco Humpola, Imke Joormann, Nikolaos Kanelakis, Thorsten Koch, Djamal Oucherif, Marc E. Pfetsch, Lars Schewe, Robert Schwarz and Mathias Sirvent
Data 2017, 2(4), 40; https://doi.org/10.3390/data2040040 - 1 Dec 2017
Cited by 82 | Viewed by 9061
Abstract
The development of mathematical simulation and optimization models and algorithms for solving gas transport problems is an active field of research. In order to test and compare these models and algorithms, gas network instances together with demand data are needed. The goal of [...] Read more.
The development of mathematical simulation and optimization models and algorithms for solving gas transport problems is an active field of research. In order to test and compare these models and algorithms, gas network instances together with demand data are needed. The goal of GasLib is to provide a set of publicly available gas network instances that can be used by researchers in the field of gas transport. The advantages are that researchers save time by using these instances and that different models and algorithms can be compared on the same specified test sets. The library instances are encoded in an XML (extensible markup language) format. In this paper, we explain this format and present the instances that are available in the library. Full article
Show Figures

Figure 1

13684 KiB  
Article
Congestion Quantification Using the National Performance Management Research Data Set
by Virginia P. Sisiopiku and Shaghayegh Rostami-Hosuri
Data 2017, 2(4), 39; https://doi.org/10.3390/data2040039 - 25 Nov 2017
Cited by 6 | Viewed by 4939
Abstract
Monitoring of transportation system performance is a key element of any transportation operation and planning strategy. Estimation of dependable performance measures relies on analysis of large amounts of traffic data, which are often expensive and difficult to gather. National databases can assist in [...] Read more.
Monitoring of transportation system performance is a key element of any transportation operation and planning strategy. Estimation of dependable performance measures relies on analysis of large amounts of traffic data, which are often expensive and difficult to gather. National databases can assist in this regard, but challenges still remain with respect to data management, accuracy, storage, and use for performance monitoring. In an effort to address such challenges, this paper showcases a process that utilizes the National Performance Management Research Data Set (NPMRDS) for generating performance measures for congestion monitoring applications in the Birmingham region. The capabilities of the relational database management system (RDBMS) are employed to manage the large amounts of NPMRDS data. Powerful visual maps are developed using GIS software and used to illustrate congestion location, extent and severity. Travel time reliability indices are calculated and utilized to quantify congestion, and congestion intensity measures are developed and employed to rank and prioritize congested segments in the study area. The process for managing and using big traffic data described in the Birmingham case study is a great example that can be replicated by small and mid-size Metropolitan Planning Organizations to generate performance-based measures and monitor congestion in their jurisdictions. Full article
(This article belongs to the Special Issue Transportation Data)
Show Figures

Figure 1

3802 KiB  
Data Descriptor
Antibody Exchange: Information Extraction of Biological Antibody Donation and a Web-Portal to Find Donors and Seekers
by Sandeep Subramanian and Madhavi K. Ganapathiraju
Data 2017, 2(4), 38; https://doi.org/10.3390/data2040038 - 21 Nov 2017
Cited by 3 | Viewed by 4472
Abstract
Bio-molecular reagents, like antibodies that are required in experimental biology are expensive and their effectiveness, among other things, is critical to the success of the experiment. Although such resources are sometimes donated by one investigator to another through personal communication between the two, [...] Read more.
Bio-molecular reagents, like antibodies that are required in experimental biology are expensive and their effectiveness, among other things, is critical to the success of the experiment. Although such resources are sometimes donated by one investigator to another through personal communication between the two, there is no previous study to our knowledge on the extent of such donations, nor a central platform that directs resource seekers to donors. In this paper, we describe, to our knowledge, a first attempt at building a web-portal titled Antibody Exchange (or more general ‘Bio-Resource Exchange’) that attempts to bridge this gap between resource seekers and donors in the domain of experimental biology. Users on this portal can request for or donate antibodies, cell-lines, and DNA Constructs. This resource could also serve as a crowd-sourced database of resources for experimental biology. Further, we also studied the extent of antibody donations by mining the acknowledgement sections of scientific articles. Specifically, we extracted the name of the donor, his/her affiliation, and the name of the antibody for every donation by parsing the acknowledgements sections of articles. To extract annotations at this level, we adopted two approaches—a rule based algorithm and a bootstrapped pattern learning algorithm. The algorithms extracted donor names, affiliations, and antibody names with average accuracies of 57% and 62%, respectively. We also created a dataset of 50 expert-annotated acknowledgements sections that will serve as a gold standard dataset to evaluate extraction algorithms in the future. Full article
Show Figures

Figure 1

8426 KiB  
Article
Regionalization of a Landscape-Based Hazard Index of Malaria Transmission: An Example of the State of Amapá, Brazil
by Zhichao Li, Thibault Catry, Nadine Dessay, Helen Da Costa Gurgel, Cláudio Aparecido de Almeida, Christovam Barcellos and Emmanuel Roux
Data 2017, 2(4), 37; https://doi.org/10.3390/data2040037 - 2 Nov 2017
Cited by 3 | Viewed by 4183
Abstract
Identifying and assessing the relative effects of the numerous determinants of malaria transmission, at different spatial scales and resolutions, is of primary importance in defining control strategies and reaching the goal of the elimination of malaria. In this context, based on a knowledge-based [...] Read more.
Identifying and assessing the relative effects of the numerous determinants of malaria transmission, at different spatial scales and resolutions, is of primary importance in defining control strategies and reaching the goal of the elimination of malaria. In this context, based on a knowledge-based model, a normalized landscape-based hazard index (NLHI) was established at a local scale, using a 10 m spatial resolution forest vs. non-forest map, landscape metrics and a spatial moving window. Such an index evaluates the contribution of landscape to the probability of human-malaria vector encounters, and thus to malaria transmission risk. Since the knowledge-based model is tailored to the entire Amazon region, such an index might be generalized at large scales for establishing a regional view of the landscape contribution to malaria transmission. Thus, this study uses an open large-scale land use and land cover dataset (i.e., the 30 m TerraClass maps) and proposes an automatic data-processing chain for implementing NLHI at large-scale. First, the impact of coarser spatial resolution (i.e., 30 m) on NLHI values was studied. Second, the data-processing chain was established using R language for customizing the spatial moving window and computing the landscape metrics and NLHI at large scale. This paper presents the results in the State of Amapá, Brazil. It offers the possibility of monitoring a significant determinant of malaria transmission at regional scale. Full article
Show Figures

Figure 1

1366 KiB  
Data Descriptor
Database of Himalayan Plants Based on Published Floras during a Century
by Suresh Kumar Rana and Gopal Singh Rawat
Data 2017, 2(4), 36; https://doi.org/10.3390/data2040036 - 30 Oct 2017
Cited by 52 | Viewed by 11120
Abstract
The Himalaya is the largest mountain range in the world, spanning approximately ten degrees of latitude and elevation between 100 m asl to the highest mountain peak on earth. The region varies in plant species richness, being highest in the biodiversity hotspot of [...] Read more.
The Himalaya is the largest mountain range in the world, spanning approximately ten degrees of latitude and elevation between 100 m asl to the highest mountain peak on earth. The region varies in plant species richness, being highest in the biodiversity hotspot of Eastern Himalaya and declining to the North-Western parts of the Himalaya. We examined all published floras (31 floras in 42 volumes spanning the years 1903–2014) from the Indian Himalayan region, Nepal, and Bhutan to compile a comprehensive checklist of all gymnosperms and angiosperms. A total of 10,503 species representing 240 families and 2322 genera are reported. We evaluated all the botanical names reported in the floras for their updated taxonomy and excluded >3000 synonyms. Additionally, we identified 1134 species reported in these floras that presently remain taxonomically unresolved and 160 species with missing information in the global plant database (The Plant List, 2013). This is the most comprehensive estimate of plant species diversity in the Himalaya. Full article
(This article belongs to the Special Issue Biodiversity and Species Traits)
Show Figures

Figure 1

2635 KiB  
Article
Earth Observation for Citizen Science Validation, or Citizen Science for Earth Observation Validation? The Role of Quality Assurance of Volunteered Observations
by Didier G. Leibovici, Jamie Williams, Julian F. Rosser, Crona Hodges, Colin Chapman, Chris Higgins and Mike J. Jackson
Data 2017, 2(4), 35; https://doi.org/10.3390/data2040035 - 23 Oct 2017
Cited by 6 | Viewed by 6542
Abstract
Environmental policy involving citizen science (CS) is of growing interest. In support of this open data stream of information, validation or quality assessment of the CS geo-located data to their appropriate usage for evidence-based policy making needs a flexible and easily adaptable data [...] Read more.
Environmental policy involving citizen science (CS) is of growing interest. In support of this open data stream of information, validation or quality assessment of the CS geo-located data to their appropriate usage for evidence-based policy making needs a flexible and easily adaptable data curation process ensuring transparency. Addressing these needs, this paper describes an approach for automatic quality assurance as proposed by the Citizen OBservatory WEB (COBWEB) FP7 project. This approach is based upon a workflow composition that combines different quality controls, each belonging to seven categories or “pillars”. Each pillar focuses on a specific dimension in the types of reasoning algorithms for CS data qualification. These pillars attribute values to a range of quality elements belonging to three complementary quality models. Additional data from various sources, such as Earth Observation (EO) data, are often included as part of the inputs of quality controls within the pillars. However, qualified CS data can also contribute to the validation of EO data. Therefore, the question of validation can be considered as “two sides of the same coin”. Based on an invasive species CS study, concerning Fallopia japonica (Japanese knotweed), the paper discusses the flexibility and usefulness of qualifying CS data, either when using an EO data product for the validation within the quality assurance process, or validating an EO data product that describes the risk of occurrence of the plant. Both validation paths are found to be improved by quality assurance of the CS data. Addressing the reliability of CS open data, issues and limitations of the role of quality assurance for validation, due to the quality of secondary data used within the automatic workflow, are described, e.g., error propagation, paving the route to improvements in the approach. Full article
(This article belongs to the Special Issue Open Data and Robust & Reliable GIScience)
Show Figures

Figure 1

648 KiB  
Data Descriptor
The #BTW17 Twitter Dataset–Recorded Tweets of the Federal Election Campaigns of 2017 for the 19th German Bundestag
by Nane Kratzke
Data 2017, 2(4), 34; https://doi.org/10.3390/data2040034 - 20 Oct 2017
Cited by 8 | Viewed by 9718
Abstract
The German Bundestag elections are the most important elections in Germany. This dataset comprises Twitter interactions related to German politicians of the most important political parties over several months in the (pre-)phase of the German federal election campaigns in 2017. The Twitter accounts [...] Read more.
The German Bundestag elections are the most important elections in Germany. This dataset comprises Twitter interactions related to German politicians of the most important political parties over several months in the (pre-)phase of the German federal election campaigns in 2017. The Twitter accounts of more than 360 politicians were followed for four months. The collected data comprise a sample of approximately 10 GB of Twitter raw data, and they cover more than 120,000 active Twitter users and more than 1,200,000 recorded tweets. Even without sophisticated data analysis techniques, it was possible to deduce a likely political party proximity for more than half of these accounts simply by looking at the re-tweet behavior. This might be of interest for innovative data-driven party campaign strategists in the future. Furthermore, it is observable, that, in Germany, supporters and politicians of populist parties make use of Twitter much more intensively and aggressively than supporters of other parties. Furthermore, established left-wing parties seem to be more active on Twitter than established conservative parties. The dataset can be used to study how political parties, their followers and supporters make use of social media channels in political election campaigns and what kind of content is shared. Full article
Show Figures

Figure 1

1928 KiB  
Article
Temporal Statistical Analysis of Degree Distributions in an Undirected Landline Phone Call Network Graph Series
by Orgeta Gjermëni
Data 2017, 2(4), 33; https://doi.org/10.3390/data2040033 - 9 Oct 2017
Cited by 2 | Viewed by 4093
Abstract
This article aims to provide new results about the intraday degree sequence distribution considering phone call network graph evolution in time. More specifically, it tackles the following problem. Given a large amount of landline phone call data records, what is the best way [...] Read more.
This article aims to provide new results about the intraday degree sequence distribution considering phone call network graph evolution in time. More specifically, it tackles the following problem. Given a large amount of landline phone call data records, what is the best way to summarize the distinct number of calling partners per client per day? In order to answer this question, a series of undirected phone call network graphs is constructed based on data from a local telecommunication source in Albania. All network graphs of the series are simplified. Further, a longitudinal temporal study is made on this network graphs series related to the degree distributions. Power law and log-normal distribution fittings on the degree sequence are compared on each of the network graphs of the series. The maximum likelihood method is used to estimate the parameters of the distributions, and a Kolmogorov–Smirnov test associated with a p-value is used to define the plausible models. A direct distribution comparison is made through a Vuong test in the case that both distributions are plausible. Another goal was to describe the parameters’ distributions’ shape. A Shapiro-Wilk test is used to test the normality of the data, and measures of shape are used to define the distributions’ shape. Study findings suggested that log-normal distribution models better the intraday degree sequence data of the network graphs. It is not possible to say that the distributions of log-normal parameters are normal. Full article
Show Figures

Figure 1

5482 KiB  
Data Descriptor
Wi-Fi Crowdsourced Fingerprinting Dataset for Indoor Positioning
by Elena Simona Lohan, Joaquín Torres-Sospedra, Helena Leppäkoski, Philipp  Richter, Zhe Peng and Joaquín Huerta
Data 2017, 2(4), 32; https://doi.org/10.3390/data2040032 - 3 Oct 2017
Cited by 139 | Viewed by 14713
Abstract
Benchmark open-source Wi-Fi fingerprinting datasets for indoor positioning studies are still hard to find in the current literature and existing public repositories. This is unlike other research fields, such as the image processing field, where benchmark test images such as the Lenna image [...] Read more.
Benchmark open-source Wi-Fi fingerprinting datasets for indoor positioning studies are still hard to find in the current literature and existing public repositories. This is unlike other research fields, such as the image processing field, where benchmark test images such as the Lenna image or Face Recognition Technology (FERET) databases exist, or the machine learning field, where huge datasets are available for example at the University of California Irvine (UCI) Machine Learning Repository. It is the purpose of this paper to present a new openly available Wi-Fi fingerprint dataset, comprised of 4648 fingerprints collected with 21 devices in a university building in Tampere, Finland, and to present some benchmark indoor positioning results using these data. The datasets and the benchmarking software are distributed under the open-source MIT license and can be found on the EU Zenodo repository. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop