Next Issue
Volume 8, March
Previous Issue
Volume 8, January
 
 

Data, Volume 8, Issue 2 (February 2023) – 23 articles

Cover Story (view full-size image): Invasion by alien plant species such as Acacia spp. can significantly impact tropical forest ecosystems, although quantifications of nutrient fluxes for invaded lowland tropical rain forests in aseasonal climates remain understudied. We present a comprehensive dataset from a year-long study of litterfall production and leaf litter decomposition rates in two distinct tropical lowland forests affected by Acacia invasion located in Brunei Darussalam, Borneo. Our dataset is crucial in assessing the long-term impact of Acacia invasion on these high conservation value tropical forests and improves understanding of nutrient cycling and ecosystem processes in tropical forest ecosystems. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
13 pages, 7719 KB  
Data Descriptor
Deep Learning with Northern Australian Savanna Tree Species: A Novel Dataset
by Andrew J. Jansen, Jaylen D. Nicholson, Andrew Esparon, Timothy Whiteside, Michael Welch, Matthew Tunstill, Harinandanan Paramjyothi, Varma Gadhiraju, Steve van Bodegraven and Renee E. Bartolo
Data 2023, 8(2), 44; https://doi.org/10.3390/data8020044 - 20 Feb 2023
Cited by 2 | Viewed by 3252
Abstract
The classification of savanna woodland tree species from high-resolution Remotely Piloted Aircraft Systems (RPAS) imagery is a complex and challenging task. Difficulties for both traditional remote sensing algorithms and human observers arise due to low interspecies variability (species difficult to discriminate because they [...] Read more.
The classification of savanna woodland tree species from high-resolution Remotely Piloted Aircraft Systems (RPAS) imagery is a complex and challenging task. Difficulties for both traditional remote sensing algorithms and human observers arise due to low interspecies variability (species difficult to discriminate because they are morphologically similar) and high intraspecies variability (individuals of the same species varying to the extent that they can be misclassified), and the loss of some taxonomic features commonly used for identification when observing trees from above. Deep neural networks are increasingly being used to overcome challenges in image recognition tasks. However, supervised deep learning algorithms require high-quality annotated and labelled training data that must be verified by subject matter experts. While training datasets for trees have been generated and made publicly available, they are mostly acquired in the Northern Hemisphere and lack species-level information. We present a training dataset of tropical Northern Australia savanna woodland tree species that was generated using RPAS and on-ground surveys to confirm species labels. RPAS-derived imagery was annotated, resulting in 2547 polygons representing 36 tree species. A baseline dataset was produced consisting of: (i) seven orthomosaics that were used for in-field labelling; (ii) a tiled dataset at 1024 × 1024 pixel size in Common Objects in Context (COCO) format that can be used for deep learning model training; (iii) and the annotations. Full article
(This article belongs to the Topic Methods for Data Labelling for Intelligent Systems)
Show Figures

Figure 1

16 pages, 1573 KB  
Article
Federated Learning for Data Analytics in Education
by Christian Fachola, Agustín Tornaría, Paola Bermolen, Germán Capdehourat, Lorena Etcheverry and María Inés Fariello
Data 2023, 8(2), 43; https://doi.org/10.3390/data8020043 - 20 Feb 2023
Cited by 26 | Viewed by 8253
Abstract
Federated learning techniques aim to train and build machine learning models based on distributed datasets across multiple devices while avoiding data leakage. The main idea is to perform training on remote devices or isolated data centers without transferring data to centralized repositories, thus [...] Read more.
Federated learning techniques aim to train and build machine learning models based on distributed datasets across multiple devices while avoiding data leakage. The main idea is to perform training on remote devices or isolated data centers without transferring data to centralized repositories, thus mitigating privacy risks. Data analytics in education, in particular learning analytics, is a promising scenario to apply this approach to address the legal and ethical issues related to processing sensitive data. Indeed, given the nature of the data to be studied (personal data, educational outcomes, and data concerning minors), it is essential to ensure that the conduct of these studies and the publication of the results provide the necessary guarantees to protect the privacy of the individuals involved and the protection of their data. In addition, the application of quantitative techniques based on the exploitation of data on the use of educational platforms, student performance, use of devices, etc., can account for educational problems such as the determination of user profiles, personalized learning trajectories, or early dropout indicators and alerts, among others. This paper presents the application of federated learning techniques to a well-known learning analytics problem: student dropout prediction. The experiments allow us to conclude that the proposed solutions achieve comparable results from the performance point of view with the centralized versions, avoiding the concentration of all the data in a single place for training the models. Full article
Show Figures

Figure 1

17 pages, 6054 KB  
Data Descriptor
Dataset of Public Objects in Uncontrolled Environment for Navigation Aiding
by Teng-Lai Wong, Ka-Seng Chou, Kei-Long Wong and Su-Kit Tang
Data 2023, 8(2), 42; https://doi.org/10.3390/data8020042 - 20 Feb 2023
Cited by 3 | Viewed by 3044
Abstract
Computer vision is a new approach to navigation aiding that assists visually impaired people to travel independently. A deep learning-based solution implemented on a portable device that uses a monocular camera to capture public objects could be a low-cost and handy navigation aid. [...] Read more.
Computer vision is a new approach to navigation aiding that assists visually impaired people to travel independently. A deep learning-based solution implemented on a portable device that uses a monocular camera to capture public objects could be a low-cost and handy navigation aid. By recognizing public objects in the street and estimating their distance from the user, visually impaired people are able to avoid obstacles in the outdoor environment and walk safely. In this paper, we created a dataset of public objects in an uncontrolled environment for navigation aiding. The dataset contains three classes of objects which commonly exist on pavements in the city. It was verified that the dataset was of high quality for object detection and distance estimation, and was ultimately utilized as a navigation aid solution. Full article
Show Figures

Figure 1

12 pages, 832 KB  
Data Descriptor
VPAgs-Dataset4ML: A Dataset to Predict Viral Protective Antigens for Machine Learning-Based Reverse Vaccinology
by Zakia Salod and Ozayr Mahomed
Data 2023, 8(2), 41; https://doi.org/10.3390/data8020041 - 17 Feb 2023
Cited by 1 | Viewed by 3543
Abstract
Reverse vaccinology (RV) is a computer-aided approach for vaccine development that identifies a subset of pathogen proteins as protective antigens (PAgs) or potential vaccine candidates. Machine learning (ML)-based RV is promising, but requires a dataset of PAgs (positives) and non-protective protein sequences (negatives). [...] Read more.
Reverse vaccinology (RV) is a computer-aided approach for vaccine development that identifies a subset of pathogen proteins as protective antigens (PAgs) or potential vaccine candidates. Machine learning (ML)-based RV is promising, but requires a dataset of PAgs (positives) and non-protective protein sequences (negatives). This study aimed to create an ML dataset, VPAgs-Dataset4ML, to predict viral PAgs based on PAgs obtained from Protegen. We performed seven steps to identify PAgs from the Protegen website and non-protective protein sequences from Universal Protein Resource (UniProt). The seven steps included downloading viral PAgs from Protegen, performing quality checks on PAgs using the standard BLASTp identity check ≤30% via MMseqs2, and computational steps running on Google Colaboratory and the Ubuntu terminal to retrieve and perform quality checks (similar to the PAgs) on non-protective protein sequences as negatives from UniProt. VPAgs-Dataset4ML contains 2145 viral protein sequences, with 210 PAgs in positive.fasta and 1935 non-protective protein sequences in negative.fasta. This dataset can be used to train ML models to predict antigens for various viral pathogens with the aim of developing effective vaccines. Full article
Show Figures

Figure 1

9 pages, 2368 KB  
Data Descriptor
Whole-Slide Images and Patches of Clear Cell Renal Cell Carcinoma Tissue Sections Counterstained with Hoechst 33342, CD3, and CD8 Using Multiple Immunofluorescence
by Georg Wölflein, In Hwa Um, David J. Harrison and Ognjen Arandjelović
Data 2023, 8(2), 40; https://doi.org/10.3390/data8020040 - 15 Feb 2023
Viewed by 3203
Abstract
In recent years, there has been an increased effort to digitise whole-slide images of cancer tissue. This effort has opened up a range of new avenues for the application of deep learning in oncology. One such avenue is virtual staining, where a deep [...] Read more.
In recent years, there has been an increased effort to digitise whole-slide images of cancer tissue. This effort has opened up a range of new avenues for the application of deep learning in oncology. One such avenue is virtual staining, where a deep learning model is tasked with reproducing the appearance of stained tissue sections, conditioned on a different, often times less expensive, input stain. However, data to train such models in a supervised manner where the input and output stains are aligned on the same tissue sections are scarce. In this work, we introduce a dataset of ten whole-slide images of clear cell renal cell carcinoma tissue sections counterstained with Hoechst 33342, CD3, and CD8 using multiple immunofluorescence. We also provide a set of over 600,000 patches of size 256 × 256 pixels extracted from these images together with cell segmentation masks in a format amenable to training deep learning models. It is our hope that this dataset will be used to further the development of deep learning methods for digital pathology by serving as a dataset for comparing and benchmarking virtual staining models. Full article
Show Figures

Figure 1

7 pages, 7537 KB  
Data Descriptor
Multi-Year On-Farm Trial Data on the Performance of Long- and Short-Duration Wheat Varieties against Sowing Dates in the Eastern Indo-Gangetic Plain of India
by Anurag Ajay, Madhulika Singh, Subhajit Patra, Harshit Ranjan, Ajay Pundir, Shishpal Poonia, Anurag Kumar, Deepak K. Singh, Pankaj Kumar, Moben Ignatius, Prabhat Kumar, Sonam R. Sherpa, Ram K. Malik, Virender Kumar, Sudhanshu Singh, Peter Craufurd and Andrew J. McDonald
Data 2023, 8(2), 39; https://doi.org/10.3390/data8020039 - 10 Feb 2023
Viewed by 4818
Abstract
Sub-optimal wheat productivity in the eastern Indo-Gangetic plain of India can largely be attributed to delayed sowing and the use of short duration varieties. The second week of November is the ideal time for sowing wheat in eastern India, though farmers generally plant [...] Read more.
Sub-optimal wheat productivity in the eastern Indo-Gangetic plain of India can largely be attributed to delayed sowing and the use of short duration varieties. The second week of November is the ideal time for sowing wheat in eastern India, though farmers generally plant later. Late-sowing farmers tend to prefer short-duration varieties, leading to additional yield penalty. To validate the effect of timely sowing and the comparative performance of long- and short-duration varieties, multi-location on-farm trials were conducted continuously over five years starting from 2016–2017. Ten districts were selected to ensure that all the agro-climatic zones of the region were covered. There were five treatments of sowing windows: (T1) 1 to 10 November, (T2) 11–20 November, (T3) 21 to 30 November, (T4) 1–15 December, and (T5) 16–31 December. Varietal performance was compared in T3, T4, and T5, as short-duration varieties are normally sown after 20 November. There is asymmetry in the distribution of samples within treatments and over the years due to the allocation of fields by farmers. Altogether, the trial was conducted at 3735 sites and captured 61 variables, including yield and yield attributing traits. Findings suggested that grain yields of long-duration wheat varieties are better even under late sown scenarios. Full article
Show Figures

Figure 1

12 pages, 13919 KB  
Data Descriptor
Datasets of Groundwater Level and Surface Water Budget in a Central Mediterranean Site (21 June 2017–1 October 2022)
by Marco Delle Rose and Paolo Martano
Data 2023, 8(2), 38; https://doi.org/10.3390/data8020038 - 9 Feb 2023
Cited by 4 | Viewed by 2896
Abstract
This note makes available five years of data gathered in a measurement site equipped with a micrometeorological station and two monitoring wells. Series of data of hydrological and atmospheric variables make it possible to estimate the flux of water across the atmosphere-land interface [...] Read more.
This note makes available five years of data gathered in a measurement site equipped with a micrometeorological station and two monitoring wells. Series of data of hydrological and atmospheric variables make it possible to estimate the flux of water across the atmosphere-land interface and to calculate the water budget, which are crucial topics in climate and environmental sciences. The water-table measures began during 2017, one of the driest years of the whole instrumental period of climate history for the Central Mediterranean. Data from the micrometeorological station have been used to construct two more datasets of daily and monthly totals of different terms of the surface water budget, from which the net infiltration has been estimated. An apparent decreasing trend characterizes both the data time series of groundwater level and estimated infiltration in the considered period. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

15 pages, 6941 KB  
Data Descriptor
Experimental Spectroscopic Data of SnO2 Films and Powder
by Hawazin Alghamdi, Olasunbo Z. Farinre, Mathew L. Kelley, Adam J. Biacchi, Dipanjan Saha, Tehseen Adel, Kerry Siebein, Angela R. Hight Walker, Christina A. Hacker, Albert F. Rigosi and Prabhakar Misra
Data 2023, 8(2), 37; https://doi.org/10.3390/data8020037 - 9 Feb 2023
Cited by 3 | Viewed by 3143
Abstract
Powders and films composed of tin dioxide (SnO2) are promising candidates for a variety of high-impact applications, and despite the material’s prevalence in such studies, it remains of high importance that commercially available materials meet the quality demands of the industries [...] Read more.
Powders and films composed of tin dioxide (SnO2) are promising candidates for a variety of high-impact applications, and despite the material’s prevalence in such studies, it remains of high importance that commercially available materials meet the quality demands of the industries that these materials would most benefit. Imaging techniques, such as scanning electron microscopy (SEM), atomic force microscopy (AFM), were used in conjunction with Raman spectroscopy and X-ray photoelectron spectroscopy (XPS) to assess the quality of a variety of samples, such as powder and thin film on quartz with thicknesses of 41 nm, 78 nm, 97 nm, 373 nm, and 908 nm. In this study, the dependencies of the corresponding Raman, XPS, and SEM analysis results on properties of the samples, like the thickness and form (powder versus film) are determined. The outcomes achieved can be regarded as a guide for performing quality checks of such products, and as reference to evaluate commercially available samples. Full article
(This article belongs to the Section Chemoinformatics)
Show Figures

Figure 1

14 pages, 2082 KB  
Data Descriptor
A Global Multiscale SPEI Dataset under an Ensemble Approach
by Monia Santini, Sergio Noce, Marco Mancini and Luca Caporaso
Data 2023, 8(2), 36; https://doi.org/10.3390/data8020036 - 5 Feb 2023
Cited by 4 | Viewed by 6065
Abstract
A new multiscale Standardized Precipitation Evapotranspiration Index (SPEI) dataset is provided for a reference period (1960–1999) and two future time horizons (2040–2079) and (2060–2099). The historical forcing is based on combined climate observations and reanalysis (WATer and global CHange Forcing Dataset), and the [...] Read more.
A new multiscale Standardized Precipitation Evapotranspiration Index (SPEI) dataset is provided for a reference period (1960–1999) and two future time horizons (2040–2079) and (2060–2099). The historical forcing is based on combined climate observations and reanalysis (WATer and global CHange Forcing Dataset), and the future projections are fed by the Fast Track experiment of the Inter-Sectoral Impact Model Intercomparison Project under representative concentration pathways (RCPs) 4.5 and 8.5 and by an additional Earth system model (CMCC-CESM) forced by RCP 8.5. To calculate the potential evapotranspiration (PET) input to the SPEI, the Hargreaves–Samani and Thornthwaite equations were adopted. This ensemble considers uncertainty due to different climate models, development pathways, and input formulations. The SPEI is provided for accumulation periods of potential moisture deficit from 1 to 18 months starting in each month of the year, with a focus on the within-period variability, excluding long-term warming effects on PET. In addition to supporting drought analyses, this dataset is also useful for assessing wetter-than-normal conditions spanning one or more months. The SPEI was calculated using the SPEIbase package. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

12 pages, 1128 KB  
Article
Accuracy Assessment of Machine Learning Algorithms Used to Predict Breast Cancer
by Mohamed Ebrahim, Ahmed Ahmed Hesham Sedky and Saleh Mesbah
Data 2023, 8(2), 35; https://doi.org/10.3390/data8020035 - 2 Feb 2023
Cited by 43 | Viewed by 8637
Abstract
Machine learning (ML) was used to develop classification models to predict individual tumor patients’ outcomes. Binary classification defined whether the tumor was malignant or benign. This paper presents a comparative analysis of machine learning algorithms used for breast cancer prediction. This study used [...] Read more.
Machine learning (ML) was used to develop classification models to predict individual tumor patients’ outcomes. Binary classification defined whether the tumor was malignant or benign. This paper presents a comparative analysis of machine learning algorithms used for breast cancer prediction. This study used a dataset obtained from the National Cancer Institute (NIH), USA, which contains 1.7 million data records. Classical and deep learning methods were included in the accuracy assessment. Classical decision tree (DT), linear discriminant (LD), logistic regression (LR), support vector machine (SVM), and ensemble techniques (ET) algorithms were used. Probabilistic neural network (PNN), deep neural network (DNN), and recurrent neural network (RNN) methods were used for comparison. Feature selection and its effect on accuracy were also investigated. The results showed that decision trees and ensemble techniques outperformed the other techniques, as they both achieved a 98.7% accuracy. Full article
(This article belongs to the Special Issue Artificial Intelligence and Big Data Applications in Diagnostics)
Show Figures

Figure 1

16 pages, 335 KB  
Article
Neural Coreference Resolution for Dutch Parliamentary Documents with the DutchParliament Dataset
by Ruben van Heusden, Jaap Kamps and Maarten Marx
Data 2023, 8(2), 34; https://doi.org/10.3390/data8020034 - 1 Feb 2023
Cited by 3 | Viewed by 2519
Abstract
The task of coreference resolution concerns the clustering of words and phrases referring to the same entity in text, either in the same document or across multiple documents. The task is challenging, as it concerns elements of named entity recognition and reading comprehension, [...] Read more.
The task of coreference resolution concerns the clustering of words and phrases referring to the same entity in text, either in the same document or across multiple documents. The task is challenging, as it concerns elements of named entity recognition and reading comprehension, as well as others. In this paper, we introduce DutchParliament, a new Dutch coreference resolution dataset obtained through the manual annotation of 74 government debates, expanded with a domain-specific class. In contrast to existing datasets, which are often composed of news articles, blogs or other documents, the debates in DutchParliament are transcriptions of speech, and therefore offer a unique structure and way of referencing compared to other datasets. By constructing and releasing this dataset, we hope to facilitate the research on coreference resolution in niche domains, with different characteristics than traditional datasets. The DutchParliament dataset was compared to SoNaR-1 and RiddleCoref, two other existing Dutch coreference resolution corpora, to highlight its particularities and differences from existing datasets. Furthermore, two coreference resolution models for Dutch, the rule-based DutchCoref model and the neural e2eDutch model, were evaluated on the DutchParliament dataset to examine their performance on the DutchParliament dataset. It was found that the characteristics of the DutchParliament dataset are quite different from that of the other two datasets, although the performance of the e2eDutch model does not seem to be significantly affected by this. Furthermore, experiments were conducted by utilizing the metadata present in the DutchParliament corpus to improve the performance of the e2eDutch model. The results indicate that the addition of available metadata about speakers has a beneficial effect on the performance of the model, although the addition of the gender of speakers seems to have a limited effect. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

8 pages, 1865 KB  
Data Descriptor
Volatiles Emitted by Three Genovese Basil Cultivars in Different Growing Systems and Successive Harvests
by Michele Ciriello, Luigi Formisano, Youssef Rouphael and Giandomenico Corrado
Data 2023, 8(2), 33; https://doi.org/10.3390/data8020033 - 31 Jan 2023
Cited by 2 | Viewed by 2261
Abstract
The Genovese basil (Ocimum basilicum L.) is the essential ingredient in “pesto” sauce, and it has always had ample use in Mediterranean gastronomy. This horticultural type of basil is grown in the open field and harvested more than once during its cultivation [...] Read more.
The Genovese basil (Ocimum basilicum L.) is the essential ingredient in “pesto” sauce, and it has always had ample use in Mediterranean gastronomy. This horticultural type of basil is grown in the open field and harvested more than once during its cultivation cycle, but in recent decades it is increasingly grown using alternative cultivation methods (e.g., soilless cultivation) that guarantee higher and more uniform production. The dataset presented in this contribution refers to the analysis of the aroma profile by solid-phase microextraction and gas chromatography coupled with a mass spectrometer, of three different cultivars of Genovese basil (Aroma 2, Eleonora, and Italiano Classico) grown in the open field or floating raft system in two successive harvests. The data are a record of the variability of volatile organic compounds due to key agronomic factors, such as the genotype, the cultivation method, and the cut. They may be of interest for those concerned about the impact of different technical factors on the aroma and flavor of basil plants. Full article
Show Figures

Figure 1

24 pages, 2563 KB  
Review
LGCM and PLS-SEM in Panel Survey Data: A Systematic Review and Bibliometric Analysis
by Zulkifli Mohd Ghazali, Wan Fairos Wan Yaacob and Wan Marhaini Wan Omar
Data 2023, 8(2), 32; https://doi.org/10.3390/data8020032 - 30 Jan 2023
Cited by 11 | Viewed by 7279
Abstract
The application of Latent Growth Curve Model (LGCM) and Partial Least Square Structural Equation Modeling (PLS-SEM) has gained much attention in panel survey studies. This study explores the distributions and trends of LGCM, and PLS-SEM used in panel survey data. It highlights the [...] Read more.
The application of Latent Growth Curve Model (LGCM) and Partial Least Square Structural Equation Modeling (PLS-SEM) has gained much attention in panel survey studies. This study explores the distributions and trends of LGCM, and PLS-SEM used in panel survey data. It highlights the gaps in the current and existing approaches of PLS-SEM practiced by researchers in analyzing panel survey data. The integrated bibliometric analysis and systematic review were employed in this study. Based on the reviewed articles, the LGCM and PLS-SEM showed an increasing trend of publication in the panel survey data. Though the popularity of LGCM was more outstanding than PLS-SEM for the panel survey data, LGCM has several limitations such as statistical assumptions, reliable sample size, number of repeated measures, and missing data. This systematic review identified five different approaches of PLS-SEM in analyzing the panel survey data namely pre- and post-approach with different constructs, a path comparison approach, a cross-lagged approach, pre- and post-approach with the same constructs, and an evaluation approach practiced by researchers. None of the previous approaches used can establish one structural model to represent the whole changes in the repeated measure. Thus, the findings of this paper could help researchers choose a more appropriate approach to analyzing panel survey data. Full article
Show Figures

Figure 1

13 pages, 1167 KB  
Data Descriptor
Runoff for Russia (RFR v1.0): The Large-Sample Dataset of Simulated Runoff and Its Characteristics
by Georgy Ayzel
Data 2023, 8(2), 31; https://doi.org/10.3390/data8020031 - 30 Jan 2023
Cited by 3 | Viewed by 2625
Abstract
Global warming challenges communities worldwide to develop new adaptation strategies that are required to be based on reliable data. As a vital component of life, river runoff comes into particular focus as a determining and limiting factor of water-related hazard assessment. Here, we [...] Read more.
Global warming challenges communities worldwide to develop new adaptation strategies that are required to be based on reliable data. As a vital component of life, river runoff comes into particular focus as a determining and limiting factor of water-related hazard assessment. Here, we present a dataset that makes it possible to estimate the influence of projected climate change on runoff and its characteristics. We utilize the HBV (in Swedish, Hydrologiska Byråns Vattenbalansavdelning) hydrological model and drive it with the ISIMIP (The Inter-Sectoral Impact Model Intercomparison Project) meteorological forcing data for both historical (1979–2016) and projected (2017–2099) periods to simulate runoff and the respective hydrological states and variables, i.e., state of the soil reservoir, snow water equivalent, and predicted amount of melted water, for 425 river basins across Russia. For the projected period, the bias-corrected outputs from four General Circulation Models (GCM) under three Representative Concentration Pathways (RCPs) are used, making it possible to assess the uncertainty of future projections. The simulated runoff formed the basis for calculating its characteristics (191 in total), representing the properties of water regime dynamics. The presented dataset also comprises two auxiliary parts to ensure the seamless assessment of inter-connected hydro-meteorological variables and characteristics: (1) meteorological forcing data and its characteristics and (2) geospatial data. The straightforward use of the presented dataset makes it possible for many interested parties to identify and further communicate water-related climate change issues in Russia on a national scale. Full article
Show Figures

Figure 1

6 pages, 364 KB  
Data Descriptor
Litterfall Production and Litter Decomposition Experiments: In Situ Datasets of Nutrient Fluxes in Two Bornean Lowland Rain Forests Associated with Acacia Invasion
by Salwana Md. Jaafar, Rahayu Sukmaria Sukri, Faizah Metali and David F. R. P. Burslem
Data 2023, 8(2), 30; https://doi.org/10.3390/data8020030 - 29 Jan 2023
Viewed by 2691
Abstract
It is increasingly recognized that invasion by alien plant species such as Acacia spp. can impact tropical forest ecosystems, although quantifications of nutrient fluxes for invaded lowland tropical rain forests in aseasonal climates remain understudied. This paper describes the methodology and presents data [...] Read more.
It is increasingly recognized that invasion by alien plant species such as Acacia spp. can impact tropical forest ecosystems, although quantifications of nutrient fluxes for invaded lowland tropical rain forests in aseasonal climates remain understudied. This paper describes the methodology and presents data collected during a year-long study of litterfall production and leaf litter decomposition rates in two distinct tropical lowland forests in Borneo affected by Acacia invasion. The study is the first to present a comprehensive dataset on the impacts of invasive Acacia species on Bornean forests and can be further used for future research to assess the long-term impact of Acacia invasion in these forest ecosystems. Extensive studies of nutrient cycling processes in aseasonal tropical lowland rainforests occurring on different soil types remain limited. Therefore, this dataset improves understanding of nutrient cycling and ecosystem processes in tropical forests and can be utilized by the wider scientific community to examine ecosystem responses in tropical forests. Full article
Show Figures

Figure 1

16 pages, 7224 KB  
Data Descriptor
Retinal Fundus Multi-Disease Image Dataset (RFMiD) 2.0: A Dataset of Frequently and Rarely Identified Diseases
by Sachin Panchal, Ankita Naik, Manesh Kokare, Samiksha Pachade, Rushikesh Naigaonkar, Prerana Phadnis and Archana Bhange
Data 2023, 8(2), 29; https://doi.org/10.3390/data8020029 - 28 Jan 2023
Cited by 38 | Viewed by 18097
Abstract
Irreversible vision loss is a worldwide threat. Developing a computer-aided diagnosis system to detect retinal fundus diseases is extremely useful and serviceable to ophthalmologists. Early detection, diagnosis, and correct treatment could save the eye’s vision. Nevertheless, an eye may be afflicted with several [...] Read more.
Irreversible vision loss is a worldwide threat. Developing a computer-aided diagnosis system to detect retinal fundus diseases is extremely useful and serviceable to ophthalmologists. Early detection, diagnosis, and correct treatment could save the eye’s vision. Nevertheless, an eye may be afflicted with several diseases if proper care is not taken. A single retinal fundus image might be linked to one or more diseases. Age-related macular degeneration, cataracts, diabetic retinopathy, Glaucoma, and uncorrected refractive errors are the leading causes of visual impairment. Our research team at the center of excellence lab has generated a new dataset called the Retinal Fundus Multi-Disease Image Dataset 2.0 (RFMiD2.0). This dataset includes around 860 retinal fundus images, annotated by three eye specialists, and is a multiclass, multilabel dataset. We gathered images from a research facility in Jalna and Nanded, where patients across Maharashtra come for preventative and therapeutic eye care. Our dataset would be the second publicly available dataset consisting of the most frequent diseases, along with some rarely identified diseases. This dataset is auxiliary to the previously published RFMiD dataset. This dataset would be significant for the research and development of artificial intelligence in ophthalmology. Full article
Show Figures

Figure 1

10 pages, 4067 KB  
Data Descriptor
A Drought Dataset Based on a Composite Index for the Sahelian Climate Zone of Niger
by Issa Garba, Zakari Seybou Abdourahamane and Alisher Mirzabaev
Data 2023, 8(2), 28; https://doi.org/10.3390/data8020028 - 28 Jan 2023
Cited by 2 | Viewed by 3189
Abstract
Agricultural drought monitoring in Niger is relevant for the implementation of effective early warning systems and for improving climate change adaptation strategies. However, the scarcity of in situ data hampers an efficient analysis of drought in the country. The present dataset was created [...] Read more.
Agricultural drought monitoring in Niger is relevant for the implementation of effective early warning systems and for improving climate change adaptation strategies. However, the scarcity of in situ data hampers an efficient analysis of drought in the country. The present dataset was created for agricultural drought characterization in the Sahelian climate zone of Niger. The dataset comprises the three-month scale and monthly time series of a composite drought index (CDI) and their corresponding drought classes at a spatial resolution of 1 km2 for the period 2000–2020. The CDI was generated from remote sensing data, namely CHIRPS (Climate Hazards Group InfraRed Precipitation with Stations), normalized difference vegetation index (NDVI) and land surface temperature (LST) from MODIS (Moderate Resolution Imaging Spectroradiometer). A weighing technique combining entropy and Euclidian distance was applied in the CDI derivation. From the present dataset, the extraction of the CDI time series can be performed for any location of the study area using its geographic coordinates. Therefore, seasonal drought characteristics, such as onset, end, duration, severity and frequency can be computed from the CDI time series using the theory of runs. The availability of the present dataset is relevant for the socio-economic assessment of drought impacts at small spatial scales, such as district and household level. This dataset is also important for the assessment of drought characteristics in remote areas or areas inaccessible due to civil insecurity in the country as it was entirely generated from remote sensing data. Finally, by including temperature data, the dataset enables drought modelling under global warming. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

3 pages, 172 KB  
Editorial
Challenges and Perspectives of Open Data in Modelling Infectious Diseases
by Francesco Branda and Giorgia Lodi
Data 2023, 8(2), 27; https://doi.org/10.3390/data8020027 - 26 Jan 2023
Cited by 1 | Viewed by 2056
Abstract
The pandemic challenged the scientific community and governments around the world, who were looking for real-time answers but lacked the data or evidence to guide decision-making [...] Full article
10 pages, 677 KB  
Data Descriptor
C2C e-Marketplaces and How Their Micro-Segmentation Strategies Influence Their Customers
by Sandra Castillo-Sotomayor, Nicholas Guimet-Cornejo and Manuel Luis Lodeiros-Zubiria
Data 2023, 8(2), 26; https://doi.org/10.3390/data8020026 - 19 Jan 2023
Cited by 4 | Viewed by 3311
Abstract
The purpose of this study is to contribute to the literature, understanding how the micro-segmentation strategies developed by the C2C e-marketplaces influence customer satisfaction, brand loyalty, trust, and brand equity by proposing a PLS-SEM model with seven hypotheses. An online questionnaire was answered [...] Read more.
The purpose of this study is to contribute to the literature, understanding how the micro-segmentation strategies developed by the C2C e-marketplaces influence customer satisfaction, brand loyalty, trust, and brand equity by proposing a PLS-SEM model with seven hypotheses. An online questionnaire was answered by a sample of 403 people. The results were edited, coded, transformed, and finally analysed with the software Smart- PLS 3.3.7. The results confirm that the reflective model shows good reliability and validity and that six of the seven were accepted. Furthermore, micro-segmentation mostly influences customer satisfaction, followed by brand equity and trust. On the other hand, the results confirm that, apparently, customer satisfaction does not impact brand loyalty, and micro-segmentation is the more significant construct in reaching brand loyalty in the C2C e-marketplaces. It is worth noting that this research contributes to knowledge about two issues unexplored by the academia, micro-segmentation and the C2C e-marketplaces. Full article
Show Figures

Figure 1

8 pages, 848 KB  
Data Descriptor
How to Reach Green Word of Mouth through Green Trust, Green Perceived Value and Green Satisfaction
by Jose Antonio Román-Augusto, Camila Garrido-Lecca-Vera, Manuel Luis Lodeiros-Zubiria and Martin Mauricio-Andia
Data 2023, 8(2), 25; https://doi.org/10.3390/data8020025 - 19 Jan 2023
Cited by 6 | Viewed by 4561
Abstract
The production and consumption of green food products have become hot topics in marketing. Companies are implementing marketing strategies such as green perceived value, green trust, and green satisfaction to guarantee green word of mouth. An online questionnaire distributed through social media was [...] Read more.
The production and consumption of green food products have become hot topics in marketing. Companies are implementing marketing strategies such as green perceived value, green trust, and green satisfaction to guarantee green word of mouth. An online questionnaire distributed through social media was used to collect the data. The sample consists of 297 people. The 297 responses were coded and analysed with the Software Smart-PLS. The data described include the sample sociodemographic profile, the descriptive analysis of all items, the reliability and validity of the measures of the reflective model and the evaluation of the results of the structural model. Four hypotheses included in the PLS-SEM proposed were validated for a p-value of 0.001. The results confirmed the influence of green perceived value on green trust and green satisfaction. Moreover, the results highlight that green satisfaction and green trust influence green word of mouth. Full article
Show Figures

Figure 1

19 pages, 9596 KB  
Communication
Basic Input Data for Audiences’ Geotargeting by Destinations’ Partial Accessibility: Notes from Slovakia
by Csaba Sidor, Branislav Kršák and Ľubomír Štrba
Data 2023, 8(2), 24; https://doi.org/10.3390/data8020024 - 19 Jan 2023
Viewed by 2200
Abstract
The presented notes focus partially on two of the basic elements (accessibility and image) of any managed tourism destination from the perspective of basic ETL processes over open and third-party data. The specific case aims to investigate the usability of open government data [...] Read more.
The presented notes focus partially on two of the basic elements (accessibility and image) of any managed tourism destination from the perspective of basic ETL processes over open and third-party data. The specific case aims to investigate the usability of open government data on occupancy in combination with third-party data on online audiences’ engagement for DMOs’ potential seasonal geotargeting via utilizing Openrouteservice’s APIs. For the pilot case, a Slovak (Central Europe) destination’s data on occupancy, and the DMO’s website and social media engagement by origin were used to determine potential audiences’ accessibility by car. Testing of the pilot results on a sample of foreign markets indicates that by a partial mix of the means of transportation, the vast majority of audiences are within a 4 h long incoming trip. Although the preliminary tests indicate a linear correlation between the destination’s occupancy and online audiences’ share accessibility by car, for further extrapolation, the list of missing input remains long. The main addition to the field of tourism and destination management may be the partial reusability of developed techniques for data extraction, and transformation for further data overlays, which may save some time. Full article
(This article belongs to the Special Issue Data-Driven Approach on Urban Planning and Smart Cities)
Show Figures

Figure 1

6 pages, 177 KB  
Editorial
Acknowledgment to the Reviewers of Data in 2022
by Data Editorial Office
Data 2023, 8(2), 23; https://doi.org/10.3390/data8020023 - 19 Jan 2023
Viewed by 1599
Abstract
High-quality academic publishing is built on rigorous peer review [...] Full article
5 pages, 441 KB  
Data Descriptor
Transcriptome Dataset of Strawberry (Fragaria × ananassa Duch.) Leaves Using Oxford Nanopore Sequencing under LED Irradiation and Application of Methyl Jasmonate and Methyl Salicylate Hormones Treatment
by M Adrian, Roedhy Poerwanto, Eiichi Inoue and Deden Derajat Matra
Data 2023, 8(2), 22; https://doi.org/10.3390/data8020022 - 17 Jan 2023
Cited by 3 | Viewed by 3110
Abstract
This data descriptor introduces a transcriptome dataset of strawberry plant left exposed to an LED light treatment and plant hormones of Methyl Jasmonate (MeJA) and Methyl Salicylate (MeSA). These data consist of a transcriptome dataset (four libraries) obtained from the leaves of strawberry [...] Read more.
This data descriptor introduces a transcriptome dataset of strawberry plant left exposed to an LED light treatment and plant hormones of Methyl Jasmonate (MeJA) and Methyl Salicylate (MeSA). These data consist of a transcriptome dataset (four libraries) obtained from the leaves of strawberry plants treated with LEDs of blue and red spectrums and the hormones of Methyl Jasmonate (MeJA) and Methyl Salicylate (MeSA), which allowed us to conduct a further analysis of the growth and development processes of strawberry plants. In addition, we describe detailed procedures on how the plants were prepared and treated and how the data were generated and processed beforehand. Further analysis of these data will significantly help to improve our understanding of the molecular mechanisms of LED light and MeJA-MeSA in strawberry plants. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop