Next Issue
Volume 10, September
Previous Issue
Volume 10, July
 
 

Data, Volume 10, Issue 8 (August 2025) – 16 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
8 pages, 2781 KB  
Data Descriptor
Experimental Dataset of Greenhouse Gas Emissions from Laboratory Biocover Experiment
by Kristaps Siltumens, Inga Grinfelde and Juris Burlakovs
Data 2025, 10(8), 134; https://doi.org/10.3390/data10080134 - 21 Aug 2025
Viewed by 139
Abstract
The dataset presented in this manuscript consists of three distinct sets of data collected during a laboratory experiment aimed at quantifying the emissions of greenhouse gases (GHGs), specifically methane (CH4), carbon dioxide (CO2), and nitrous oxide (N2O). [...] Read more.
The dataset presented in this manuscript consists of three distinct sets of data collected during a laboratory experiment aimed at quantifying the emissions of greenhouse gases (GHGs), specifically methane (CH4), carbon dioxide (CO2), and nitrous oxide (N2O). The experiment was conducted in three phases, each initiated at different times. The first phase began on 6 June 2022, using a biocover composed of 60% fine-fraction waste, 20% clay soil, and 20% stabilized compost. The second phase commenced on 26 August 2022, with two biocover variants: one composed of 50% fine-fraction waste and 50% clay soil, and the other consisting of 40% fine-fraction waste, 40% clay soil, and 20% shredded paper. The final phase started on 27 October 2022, introducing two biocovers: one containing 25% dried algae, 25% fine-fraction waste, 25% gravel (0–20 mm), and 25% ash, and the other composed of 40% fine-fraction waste, 40% dried algae, and 20% chernozem. Emission assessments were conducted three weeks after the biocover installation to allow for settling and stabilization, followed by weekly measurements two to three days before irrigation with 250 mL of water to simulate field conditions. GHG emission quantification was carried out using the Cavity Ring-Down Spectroscopy gas measurement device, Picarro G2508. This dataset offers substantial scientific value for advancing biocover technologies aimed at reducing GHG emissions in landfill environments, particularly for mitigating methane emissions. In addition to initial experimental use, the dataset offers a wide range of possibilities for reuse, including modeling landfill gas emissions, validating gas flow measurement methods, developing machine learning models, and performing meta-analyses. Its detailed structure facilitates multi-faceted environmental research and supports optimization of landfill management. Full article
Show Figures

Figure 1

29 pages, 991 KB  
Article
GroupView: A Visual Framework for Exploring Group Membership Dynamics over Time
by Mithilesh Kumar Singh and Klaus Mueller
Data 2025, 10(8), 133; https://doi.org/10.3390/data10080133 - 21 Aug 2025
Viewed by 177
Abstract
Tracking group membership dynamics over time is a persistent challenge in visual analytics, particularly when dealing with complex, multidimensional datasets. Existing tools often struggle to visualize dynamic group transitions while preserving attribute relationships and maintaining consistent group definitions. We present GroupView, a visual [...] Read more.
Tracking group membership dynamics over time is a persistent challenge in visual analytics, particularly when dealing with complex, multidimensional datasets. Existing tools often struggle to visualize dynamic group transitions while preserving attribute relationships and maintaining consistent group definitions. We present GroupView, a visual framework designed to explore temporal data and group dynamics to address this. GroupView enables users to slice data into time-based segments and create dynamic groupings, facilitating the identification of trends and patterns that may otherwise remain hidden. Its features include automated grouping based on data similarities, combinatorial grouping for richer insights, and custom grouping for tailored analysis. A heuristic user study involving visualization experts provided feedback on usability and analytical value, highlighting the strengths of GroupView in intuitive exploration and insight discovery. These features position GroupView as a valuable tool for analysts and researchers working with evolving datasets, offering new avenues for uncovering trends and tracking group-level changes over time. Full article
Show Figures

Figure 1

19 pages, 1398 KB  
Systematic Review
Data Science Project Barriers—A Systematic Review
by Natan Labarrère, Lino Costa and Rui M. Lima
Data 2025, 10(8), 132; https://doi.org/10.3390/data10080132 - 20 Aug 2025
Viewed by 199
Abstract
This study aims to identify and categorize barriers to the success of Data Science (DS) projects through a systematic literature review combined with quantitative methods of analysis. PRISMA is used to conduct a literature review to identify the barriers in the existing literature. [...] Read more.
This study aims to identify and categorize barriers to the success of Data Science (DS) projects through a systematic literature review combined with quantitative methods of analysis. PRISMA is used to conduct a literature review to identify the barriers in the existing literature. With techniques from bibliometrics and network science, the barriers are hierarchically clustered using the Jaccard distance as a measure of dissimilarity. The review identified 27 barriers to the success of DS projects from 26 studies. These barriers were grouped into six thematic clusters: people, data and technology, management, economic, project, and external barriers. The barrier “insufficient skills” is the most frequently cited in the literature and the most frequently considered critical. From the quantitative analysis, the barriers “insufficient skills”, “poor data quality”, “data privacy and security”, “lack of support from top management”, “insufficient funding”, “insufficient ROI or justification”, “government policies and regulation”, and “inadequate, immature or inconsistent methodology” were identified as the most central in their cluster. Full article
Show Figures

Figure 1

10 pages, 1572 KB  
Data Descriptor
Simultaneous EEG-fNIRS Data on Learning Capability via Implicit Learning Induced by Cognitive Tasks
by Chayapol Chaiyanan, Thanate Angsuwatanakul, Keiji Iramina and Boonserm Kaewkamnerdpong
Data 2025, 10(8), 131; https://doi.org/10.3390/data10080131 - 18 Aug 2025
Viewed by 274
Abstract
The development of real-time learning assessment tools is hindered by an incomplete understanding of the underlying neural mechanisms. To address this gap, this study aimed to identify the specific neural correlates of implicit learning, a foundational process crucial for skill acquisition. We collected [...] Read more.
The development of real-time learning assessment tools is hindered by an incomplete understanding of the underlying neural mechanisms. To address this gap, this study aimed to identify the specific neural correlates of implicit learning, a foundational process crucial for skill acquisition. We collected simultaneous electroencephalography and functional near-infrared spectroscopy data from thirty healthy adults (ages 21–29) performing a serial reaction time task designed to induce implicit learning. By capturing both electrophysiological and hemodynamic responses concurrently at shared locations, this dataset offers a unique opportunity to investigate neurovascular coupling during implicit learning and gain deeper insights into the neural mechanisms of learning. The dataset is categorized into two groups: participants who demonstrated implicit learning (based on post-experiment interviews) and those who did not. This dataset enables the identification of prominent brain regions, features, and temporal patterns associated with successful implicit learning. This identification will form the basis for future real-time learning assessment tools. Full article
Show Figures

Figure 1

8 pages, 529 KB  
Data Descriptor
An Extended Dataset of Educational Quality Across Countries (1970–2023)
by Hanol Lee and Jong-Wha Lee
Data 2025, 10(8), 130; https://doi.org/10.3390/data10080130 - 15 Aug 2025
Viewed by 272
Abstract
This study presents an extended dataset on educational quality covering 101 countries, from 1970 to 2023. While existing international assessments, such as the Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS), offer valuable snapshots of student [...] Read more.
This study presents an extended dataset on educational quality covering 101 countries, from 1970 to 2023. While existing international assessments, such as the Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS), offer valuable snapshots of student performance, their limited coverage across countries and years constrains broader analyses. To address this limitation, we harmonized observed test scores across assessments and imputed missing values using both linear interpolation and machine learning (Least Absolute Shrinkage and Selection Operator (LASSO) regression). The dataset included (i) harmonized test scores for 15 year olds, (ii) annual educational quality indicators for the 15–19 age group, and (iii) educational quality indexes for the working-age population (15–64). These measures are provided in machine-readable formats and support empirical research on human capital, economic development, and global education inequalities across economies. Full article
Show Figures

Figure 1

11 pages, 697 KB  
Data Descriptor
A Multi-Sensor Dataset for Human Activity Recognition Using Inertial and Orientation Data
by Jhonathan L. Rivas-Caicedo, Laura Saldaña-Aristizabal, Kevin Niño-Tejada and Juan F. Patarroyo-Montenegro
Data 2025, 10(8), 129; https://doi.org/10.3390/data10080129 - 14 Aug 2025
Viewed by 295
Abstract
Human Activity Recognition (HAR) using wearable sensors is an increasingly relevant area for applications in healthcare, rehabilitation, and human–computer interaction. However, publicly available datasets that provide multi-sensor, synchronized data combining inertial and orientation measurements are still limited. This work introduces a publicly available [...] Read more.
Human Activity Recognition (HAR) using wearable sensors is an increasingly relevant area for applications in healthcare, rehabilitation, and human–computer interaction. However, publicly available datasets that provide multi-sensor, synchronized data combining inertial and orientation measurements are still limited. This work introduces a publicly available dataset for Human Activity Recognition, captured using wearable sensors placed on the chest, hands, and knees. Each device recorded inertial and orientation data during controlled activity sessions involving participants aged 20 to 70. A standardized acquisition protocol ensured consistent temporal alignment across all signals. The dataset was preprocessed and segmented using a sliding window approach. An initial baseline classification experiment, employing a Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM) model, demonstrated an average accuracy of 93.5% in classifying activities. The dataset is publicly available in CSV format and includes raw sensor signals, activity labels, and metadata. This dataset offers a valuable resource for evaluating machine learning models, studying distributed HAR approaches, and developing robust activity recognition pipelines utilizing wearable technologies. Full article
Show Figures

Figure 1

27 pages, 9197 KB  
Data Descriptor
A Six-Year, Spatiotemporally Comprehensive Dataset and Data Retrieval Tool for Analyzing Chlorophyll-a, Turbidity, and Temperature in Utah Lake Using Sentinel and MODIS Imagery
by Kaylee B. Tanner, Anna C. Cardall and Gustavious P. Williams
Data 2025, 10(8), 128; https://doi.org/10.3390/data10080128 - 13 Aug 2025
Viewed by 321
Abstract
Data from earth observation satellites provide unique and valuable information about water quality conditions in freshwater lakes but require significant processing before they can be used, even with the use of tools like Google Earth Engine. We use imagery from Sentinel 2 and [...] Read more.
Data from earth observation satellites provide unique and valuable information about water quality conditions in freshwater lakes but require significant processing before they can be used, even with the use of tools like Google Earth Engine. We use imagery from Sentinel 2 and MODIS and in situ data from the State of Utah Ambient Water Quality Management System (AQWMS) database to develop models and to generate a highly accessible, easy-to-use CSV file of chlorophyll-a (which is an indicator of algal biomass), turbidity, and water temperature measurements on Utah Lake. From a collection of 937 Sentinel 2 images spanning the period from January 2019 to May 2025, we generated 262,081 estimates each of chlorophyll-a and turbidity, with an additional 1,140,777 data points interpolated from those estimates to provide a dataset with a consistent time step. From a collection of 2333 MODIS images spanning the same time period, we extracted 1,390,800 measurements each of daytime water surface temperature and nighttime water surface temperature and interpolated or imputed an additional 12,058 data points from those estimates. We interpolated the data using piecewise cubic Hermite interpolation polynomials to preserve the original distribution of the data and provide the most accurate estimates of measurements between observations. We demonstrate the processing steps required to extract usable, accurate estimates of these three water quality parameters from satellite imagery and format them for analysis. We include summary statistics and charts for the resulting dataset, which show the usefulness of this data for informing Utah Lake management issues. We include the Jupyter Notebook with the implemented processing steps and the formatted CSV file of data as supplemental materials. The Jupyter Notebook can be used to update the Utah Lake data or can be easily modified to generate similar data for other waterbodies. We provide this method, tool set, and data to make remotely sensed water quality data more accessible to researchers, water managers, and others interested in Utah Lake and to facilitate the use of satellite data for those interested in applying remote sensing techniques to other waterbodies. Full article
(This article belongs to the Collection Modern Geophysical and Climate Data Analysis: Tools and Methods)
Show Figures

Graphical abstract

16 pages, 2323 KB  
Article
Limitations of Influence-Based Dataset Compression for Waste Classification
by Julian Aberger, Lena Brensberger, Gerald Koinig, Benedikt Häcker, Jesús Pestana and Renato Sarc
Data 2025, 10(8), 127; https://doi.org/10.3390/data10080127 - 7 Aug 2025
Viewed by 299
Abstract
Influence-based data selection methods, such as TracIn, aim to estimate the impact of individual training samples on model predictions and are increasingly used for dataset curation and reduction. This study investigates whether selecting the most positively influential training examples can be used to [...] Read more.
Influence-based data selection methods, such as TracIn, aim to estimate the impact of individual training samples on model predictions and are increasingly used for dataset curation and reduction. This study investigates whether selecting the most positively influential training examples can be used to create compressed yet effective training datasets for transfer learning in plastic waste classification. Using a ResNet-18 model trained on a custom dataset of plastic waste images, TracIn was applied to compute influence scores across multiple training checkpoints. The top 50 influential samples per class were extracted and used to train a new model. Contrary to expectations, models trained on these highly influential subsets significantly underperformed compared to models trained on either the full dataset or an equally sized random sample. Further analysis revealed that many top-ranked influential images originated from different classes, indicating model biases and potential label confusion. These findings highlight the limitations of using influence scores for dataset compression. However, TracIn proved valuable for identifying problematic or ambiguous samples, class imbalance issues, and issues with fuzzy class boundaries. Based on the results, the utilized TracIn approach is recommended as a diagnostic instrument rather than for dataset curation. Full article
Show Figures

Figure 1

37 pages, 2744 KB  
Article
Synergistic Evolution or Competitive Disruption? Analysing the Dynamic Interaction Between Digital and Real Economies in Henan, China, Based on Panel Data
by Yaping Zhu, Qingwei Xu, Chutong Hao, Shuaishuai Geng and Bingjun Li
Data 2025, 10(8), 126; https://doi.org/10.3390/data10080126 - 4 Aug 2025
Viewed by 466
Abstract
In the digital transformation era, understanding the relationship between digital and real economies is vital for regional development. This study analyses the interaction between these two economies in Henan Province using panel data from 18 cities (2011–2023). It incorporates policy support intensity through [...] Read more.
In the digital transformation era, understanding the relationship between digital and real economies is vital for regional development. This study analyses the interaction between these two economies in Henan Province using panel data from 18 cities (2011–2023). It incorporates policy support intensity through fuzzy set theory, applies an integrated weighting method to measure development levels, and uses regression models to assess the digital economy’s impact on the real economy. The coupling coordination degree model, kernel density estimation, and Gini coefficient reveal the coordination status and spatial distribution, while the ecological Lotka–Volterra model identifies the symbiotic patterns. The key findings are as follows: (1) The digital economy does not directly determine the state of the real economy. For example, cities such as Zhoukou and Zhumadian have low digital economy levels but high real economy levels. However, the development of the digital economy promotes the real economy without signs of diminishing returns. (2) The two economies are generally coordinated but differ spatially, with greater coordination in the Central Plains urban agglomeration. (3) The digital and real economies exhibit both collaboration and competition, with reciprocal mutualism as the dominant mode of integration. These insights provide guidance for policymakers and offer a new perspective on the integration of both economies. Full article
Show Figures

Figure 1

11 pages, 3192 KB  
Data Descriptor
Carbon Monoxide (CO) and Ozone (O3) Concentrations in an Industrial Area: A Dataset at the Neighborhood Level
by Jailene Marlen Jaramillo-Perez, Bárbara A. Macías-Hernández, Edgar Tello-Leal and René Ventura-Houle
Data 2025, 10(8), 125; https://doi.org/10.3390/data10080125 - 1 Aug 2025
Viewed by 383
Abstract
The growth of urban and industrial areas is accompanied by an increase in vehicle traffic, resulting in rising concentrations of various air pollutants. This is a global issue that causes environmental damage and risks to human health. The dataset presented in this research [...] Read more.
The growth of urban and industrial areas is accompanied by an increase in vehicle traffic, resulting in rising concentrations of various air pollutants. This is a global issue that causes environmental damage and risks to human health. The dataset presented in this research contains records with measurements of the air pollutants ozone (O3) and carbon monoxide (CO), as well as meteorological parameters such as temperature (T), relative humidity (RH), and barometric pressure (BP). This dataset was collected using a set of low-cost sensors over a four-month study period (March to June) in 2024. The monitoring of air pollutants and meteorological parameters was conducted in a city with high industrial activity, heavy traffic, and close proximity to a petrochemical refinery plant. The data were subjected to a series of statistical analyses for visualization using plots that allow for the identification of their behavior. Finally, the dataset can be utilized for air quality studies, public health research, and the development of prediction models based on mathematical approaches or artificial intelligence algorithms. Full article
Show Figures

Figure 1

7 pages, 1048 KB  
Data Descriptor
Dataset of Morphometry and Metal Concentrations in Coptodon rendalli and Oreochromis mossambicus from the Shongweni Dam, South Africa
by Smangele Ncayiyana, Neo Mashila Maleka and Jeffrey Lebepe
Data 2025, 10(8), 124; https://doi.org/10.3390/data10080124 - 1 Aug 2025
Viewed by 344
Abstract
The uMlazi River receives effluents from wastewater work before feeding the Shongweni Dam. However, local communities are consuming fish from this dam for protein supplements. This study was undertaken to investigate the metal concentrations in the water and sediment, the general health of [...] Read more.
The uMlazi River receives effluents from wastewater work before feeding the Shongweni Dam. However, local communities are consuming fish from this dam for protein supplements. This study was undertaken to investigate the metal concentrations in the water and sediment, the general health of Coptodon rendalli and Oreochromis mossambicus, and metal bioaccumulation. Sampling was conducted during the dry (July–August) and wet seasons (November and December) in 2021. Water was sampled using acid-pre-treated sampling bottles, whereas sediment was collected using the Van Veen grab at the inflow, middle, and dam wall. Fish were collected, and their tissues were digested using aqua regia. Metal concentrations were measured using inductively coupled plasma optical emission spectroscopy (ICP-OES). This data manuscript reports the physical parameters of the water and concentrations of antimony, arsenic, cadmium, copper, iron, manganese, lead, selenium, and strontium in the water and sediment from the Shongweni Dam. Moreover, the fish morphometric data and metal concentrations observed in the muscle are also presented. This data could be used as baseline information on metal concentrations in the Shongweni Dam. Moreover, it provides insight into the potential impact of wastewater effluents on metal increases in freshwater bodies. Full article
Show Figures

Figure 1

21 pages, 22884 KB  
Data Descriptor
An Open-Source Clinical Case Dataset for Medical Image Classification and Multimodal AI Applications
by Mauro Nievas Offidani, Facundo Roffet, María Carolina González Galtier, Miguel Massiris and Claudio Delrieux
Data 2025, 10(8), 123; https://doi.org/10.3390/data10080123 - 31 Jul 2025
Viewed by 734
Abstract
High-quality, openly accessible clinical datasets remain a significant bottleneck in advancing both research and clinical applications within medical artificial intelligence. Case reports, often rich in multimodal clinical data, represent an underutilized resource for developing medical AI applications. We present an enhanced version of [...] Read more.
High-quality, openly accessible clinical datasets remain a significant bottleneck in advancing both research and clinical applications within medical artificial intelligence. Case reports, often rich in multimodal clinical data, represent an underutilized resource for developing medical AI applications. We present an enhanced version of MultiCaRe, a dataset derived from open-access case reports on PubMed Central. This new version addresses the limitations identified in the previous release and incorporates newly added clinical cases and images (totaling 93,816 and 130,791, respectively), along with a refined hierarchical taxonomy featuring over 140 categories. Image labels have been meticulously curated using a combination of manual and machine learning-based label generation and validation, ensuring a higher quality for image classification tasks and the fine-tuning of multimodal models. To facilitate its use, we also provide a Python package for dataset manipulation, pretrained models for medical image classification, and two dedicated websites. The updated MultiCaRe dataset expands the resources available for multimodal AI research in medicine. Its scale, quality, and accessibility make it a valuable tool for developing medical AI systems, as well as for educational purposes in clinical and computational fields. Full article
Show Figures

Figure 1

27 pages, 4973 KB  
Article
LSTM-Based River Discharge Forecasting Using Spatially Gridded Input Data
by Kamilla Rakhymbek, Balgaisha Mukanova, Andrey Bondarovich, Dmitry Chernykh, Almas Alzhanov, Dauren Nurekenov, Anatoliy Pavlenko and Aliya Nugumanova
Data 2025, 10(8), 122; https://doi.org/10.3390/data10080122 - 27 Jul 2025
Viewed by 784
Abstract
Accurate river discharge forecasting remains a critical challenge in hydrology, particularly in data-scarce mountainous regions where in situ observations are limited. This study investigated the potential of long short-term memory (LSTM) networks to improve discharge prediction by leveraging spatially distributed reanalysis data. Using [...] Read more.
Accurate river discharge forecasting remains a critical challenge in hydrology, particularly in data-scarce mountainous regions where in situ observations are limited. This study investigated the potential of long short-term memory (LSTM) networks to improve discharge prediction by leveraging spatially distributed reanalysis data. Using the ERA5-Land dataset, we developed an LSTM model that integrates grid-based meteorological inputs and assesses their relative importance. We conducted experiments on two snow-dominated basins with contrasting physiographic characteristics, the Uba River basin in Kazakhstan and the Flathead River basin in the USA, to answer three research questions: (1) whether full-grid input outperforms reduced configurations and models trained on Caravan, (2) the impact of spatial resolution on accuracy and efficiency, and (3) the effect of partial spatial coverage on prediction reliability. Specifically, we compared the full-grid LSTM with a single-cell LSTM, a basin-average LSTM, a Caravan-trained LSTM, and coarser cell aggregations. The results demonstrate that the full-grid LSTM consistently yields the highest forecasting performance, achieving a median Nash–Sutcliffe efficiency of 0.905 for Uba and 0.93 for Middle Fork Flathead, while using coarser grids and random subsets reduces performance. Our findings highlight the critical importance of spatial input richness and provide a reproducible framework for grid selection in flood-prone basins lacking dense observation networks. Full article
(This article belongs to the Special Issue New Progress in Big Earth Data)
Show Figures

Figure 1

9 pages, 2733 KB  
Data Descriptor
Investigating Mid-Latitude Lower Ionospheric Responses to Energetic Electron Precipitation: A Case Study
by Aleksandra Kolarski, Vladimir A. Srećković, Zoran R. Mijić and Filip Arnaut
Data 2025, 10(8), 121; https://doi.org/10.3390/data10080121 - 26 Jul 2025
Viewed by 294
Abstract
Localized ionization enhancements (LIEs) in altitude range corresponding to the D-region ionosphere, disrupting Very-Low-Frequency (VLF) signal propagation. This case study focuses on Lightning-induced Electron Precipitation (LEP), analyzing amplitude and phase variations in VLF signals recorded in Belgrade, Serbia, from worldwide transmitters. Due to [...] Read more.
Localized ionization enhancements (LIEs) in altitude range corresponding to the D-region ionosphere, disrupting Very-Low-Frequency (VLF) signal propagation. This case study focuses on Lightning-induced Electron Precipitation (LEP), analyzing amplitude and phase variations in VLF signals recorded in Belgrade, Serbia, from worldwide transmitters. Due to the localized, transient nature of Energetic Electron Precipitation (EEP) events and the path-dependence of VLF responses, research relies on event-specific case studies to model reflection height and sharpness via numerical simulations. Findings show LIEs are typically under 1000 × 500 km, with varying internal structure. Accumulated case studies and corresponding data across diverse conditions contribute to a broader understanding of ionospheric dynamics and space weather effects. These findings enhance regional modeling, support aerosol–electricity climate research, and underscore the value of VLF-based ionospheric monitoring and collaboration in Europe. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

12 pages, 249 KB  
Data Descriptor
Time Series Dataset of Phenology, Biomass, and Chemical Composition of Cassava (Manihot esculenta Crantz) as Affected by Time of Planting and Variety Interactions in Field Trials at Koronivia, Fiji
by Poasa Nauluvula, Bruce L. Webber, Roslyn M. Gleadow, William Aalbersberg, John N. G. Hargreaves, Bianca T. Das, Diogenes L. Antille and Steven J. Crimp
Data 2025, 10(8), 120; https://doi.org/10.3390/data10080120 - 23 Jul 2025
Viewed by 793
Abstract
Cassava is the sixth most important food crop and is cultivated in more than 100 countries. The crop tolerates low soil fertility and drought, enabling it to play a role in climate adaptation strategies. Cassava generally requires careful preparation to remove toxic hydrogen [...] Read more.
Cassava is the sixth most important food crop and is cultivated in more than 100 countries. The crop tolerates low soil fertility and drought, enabling it to play a role in climate adaptation strategies. Cassava generally requires careful preparation to remove toxic hydrogen cyanide (HCN) before its consumption, but HCN concentrations can vary considerably between varieties. Climate change and low inputs, particularly carbon and nutrients, affect agriculture in Pacific Island countries where cassava is commonly grown alongside traditional crops (e.g., taro). Despite increasing popularity in this region, there is limited experimental data about cassava crop management for different local varieties, their relative toxicity and nutritional value for human consumption, and their interaction with changing climate conditions. To help address this knowledge gap, three field experiments were conducted at the Koronivia Research Station of the Fiji Ministry of Agriculture. Two varieties of cassava with contrasting HCN content were planted at three different times coinciding with the start of the wet (September-October) or dry (April) seasons. A time series of measurements was conducted during the full 18-month or differing 6-month durations of each crop, based on destructive harvests and phenological observations. The former included determination of total biomass, HCN potential, carbon isotopes (δ13C), and elemental composition. Yield and nutritional value were significantly affected by variety and time of planting, and there were interactions between the two factors. Findings from this work will improve cassava management locally and will provide a valuable dataset for agronomic and biophysical model testing. Full article
16 pages, 5175 KB  
Data Descriptor
From Raw GPS to GTFS: A Real-World Open Dataset for Bus Travel Time Prediction
by Aigerim Mansurova, Aigerim Mussina, Sanzhar Aubakirov, Aliya Nugumanova and Didar Yedilkhan
Data 2025, 10(8), 119; https://doi.org/10.3390/data10080119 - 23 Jul 2025
Viewed by 984
Abstract
The data descriptor introduces an open, high-resolution dataset of real-world bus operations in Astana, Kazakhstan, captured from GPS trajectories between July and September 2024. The data covers three high-frequency routes and have been processed into a GTFS format, enabling direct use with existing [...] Read more.
The data descriptor introduces an open, high-resolution dataset of real-world bus operations in Astana, Kazakhstan, captured from GPS trajectories between July and September 2024. The data covers three high-frequency routes and have been processed into a GTFS format, enabling direct use with existing transit modeling tools. Unlike typical static GTFS feeds, this dataset provides empirically observed dwell times, run times, and travel times, offering a detailed snapshot of operational variability in urban bus systems. The dataset supports applications in machine learning–based travel time prediction, timetable optimization, and transit reliability analysis, especially in settings where live feeds are unavailable. By releasing this dataset publicly, we aim to promote transparent, data-driven transport research in emerging urban contexts. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop