Previous Issue
Volume 9, September
 
 

Data, Volume 9, Issue 10 (October 2024) – 11 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
9 pages, 3341 KiB  
Data Descriptor
Rainfall Erosivity over Brazil: A Large National Database
by Mariza P. Oliveira-Roza, Roberto A. Cecílio, David B. S. Teixeira, Michel C. Moreira, André Q. Almeida, Alexandre C. Xavier and Sidney S. Zanetti
Data 2024, 9(10), 120; https://doi.org/10.3390/data9100120 - 14 Oct 2024
Viewed by 409
Abstract
Rainfall erosivity (RE) represents the potential of rainfall to cause soil erosion, and understanding its impact is essential for the adoption of soil and water conservation practices. Although several studies have estimated RE for Brazil, currently, no single reliable and easily accessible database [...] Read more.
Rainfall erosivity (RE) represents the potential of rainfall to cause soil erosion, and understanding its impact is essential for the adoption of soil and water conservation practices. Although several studies have estimated RE for Brazil, currently, no single reliable and easily accessible database exists for the country. To fill this gap, this work aimed to review the research and generate a rainfall erosivity database for Brazil. Data were gathered from studies that determined rainfall erosivity from observed rainfall records and synthetic rainfall series. Monthly and annual rainfall erosivity values were organized on a spreadsheet and in the shapefile format. In total, 54 studies from 1990 to 2023 were analyzed, resulting in the compilation of 5516 erosivity values for Brazil, of which 6.3% were pluviographic, and 93.7% were synthetic. The regions with the highest availability of information were the Northeast (35.6%), Southeast (30.1%), South (19.9%), Central-West (7.7%), and North (6.7%). The database, which can be accessed on the Mendeley Data platform, can aid professionals and researchers in adopting public policies and carrying out studies aimed at environmental conservation and management basin development. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

19 pages, 8517 KiB  
Article
Data Mining Approach for Evil Twin Attack Identification in Wi-Fi Networks
by Roman Banakh, Elena Nyemkova, Connie Justice, Andrian Piskozub and Yuriy Lakh
Data 2024, 9(10), 119; https://doi.org/10.3390/data9100119 - 14 Oct 2024
Viewed by 499
Abstract
Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a [...] Read more.
Recent cyber security solutions for wireless networks during internet open access have become critically important for personal data security. The newest WPA3 network security protocol has been used to maximize this protection; however, attackers can use an Evil Twin attack to replace a legitimate access point. The article is devoted to solving the problem of intrusion detection at the OSI model’s physical layers. To solve this, a hardware–software complex has been developed to collect information about the signal strength from Wi-Fi access points using wireless sensor networks. The collected data were supplemented with a generative algorithm considering all possible combinations of signal strength. The k-nearest neighbor model was trained on the obtained data to distinguish the signal strength of legitimate from illegitimate access points. To verify the authenticity of the data, an Evil Twin attack was physically simulated, and a machine learning model analyzed the data from the sensors. As a result, the Evil Twin attack was successfully identified based on the signal strength in the radio spectrum. The proposed model can be used in open access points as well as in large corporate and home Wi-Fi networks to detect intrusions aimed at substituting devices in the radio spectrum where IEEE 802.11 networking equipment operates. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

9 pages, 1033 KiB  
Data Descriptor
A Dataset of Two-Dimensional XBeach Model Set-Up Files for Northern California
by Andrea C. O’Neill, Kees Nederhoff, Li H. Erikson, Jennifer A. Thomas and Patrick L. Barnard
Data 2024, 9(10), 118; https://doi.org/10.3390/data9100118 - 11 Oct 2024
Viewed by 471
Abstract
Here, we describe a dataset of two-dimensional (2D) XBeach model files that were developed for the Coastal Storm Modeling System (CoSMoS) in northern California as an update to an earlier CoSMoS implementation that relied on one-dimensional (1D) modeling methods. We provide details on [...] Read more.
Here, we describe a dataset of two-dimensional (2D) XBeach model files that were developed for the Coastal Storm Modeling System (CoSMoS) in northern California as an update to an earlier CoSMoS implementation that relied on one-dimensional (1D) modeling methods. We provide details on the data and their application, such that they might be useful to end-users for other coastal studies. Modeling methods and outputs are presented for Humboldt Bay, California, in which we compare output from a nested 1D modeling approach to 2D model results, demonstrating that the 2D method, while more computationally expensive, results in a more cohesive and directly mappable flood hazard result. Full article
Show Figures

Figure 1

21 pages, 325 KiB  
Article
Perception and Reuse of Open Data in the Spanish University Teaching and Research Community
by Christian Vidal-Cabo, Enrique Alfonso Sánchez-Pérez and Antonia Ferrer-Sapena
Data 2024, 9(10), 117; https://doi.org/10.3390/data9100117 - 11 Oct 2024
Viewed by 481
Abstract
Introduction. Open Government is a form of public policy based on the pillars of collaboration and citizen participation, transparency and the right of access to public information. With the help of information and communication technologies, governments and administrations carry out open data initiatives, [...] Read more.
Introduction. Open Government is a form of public policy based on the pillars of collaboration and citizen participation, transparency and the right of access to public information. With the help of information and communication technologies, governments and administrations carry out open data initiatives, making reusable datasets available to all citizens. The academic community, highly qualified personnel, can become potential reusers of this data, which would lead to its use for scientific research, generating knowledge, and for teaching, improving the training of university students and promoting the reuse of open data in the future. Method. This study was developed using a quantitative research methodology (survey), which was distributed by email in one context block and six technical blocks, with a total of 30 questions. The data collection period was between 15 March and 10 May 2021. Analysis. The data obtained through this quantitative methodology were processed, normalised, and analysed. Results. A total of 783 responses were obtained, from 34 Spanish provinces. The researchers come from 47 Spanish universities and 21 research centres, and 19 research areas of the State Research Agency are represented. In addition, a platform was developed with the data for the purpose of visualising the results of the survey. Conclusions. The sample thus obtained is representative and the conclusions can be extrapolated to the rest of the Spanish university teaching staff. In terms of gender, the study is balanced between men and women (41.76% W vs. 56.58% M). In general, researchers responding to the survey know what open data is (79.31%) but only 50.57% reuse open data. The main conclusion is that open government data prove to be useful sources of information for science, especially in areas such as Social Sciences, Industrial Production, Engineering and Engineering for Society, Information and Communication Technologies, Economics and Environmental Sciences. Full article
(This article belongs to the Section Information Systems and Data Management)
8 pages, 171 KiB  
Data Descriptor
Data Descriptor for “Understanding and Perception of Automated Text Generation among the Public: Two Surveys with Representative Samples in Germany”
by Angelica Lermann Henestrosa and Joachim Kimmerle
Data 2024, 9(10), 116; https://doi.org/10.3390/data9100116 - 11 Oct 2024
Viewed by 355
Abstract
With the release of ChatGPT, text-generating AI became accessible to the general public virtually overnight, and automated text generation (ATG) became the focus of public debate. Previously, however, little attention had been paid to this area of AI, resulting in a gap in [...] Read more.
With the release of ChatGPT, text-generating AI became accessible to the general public virtually overnight, and automated text generation (ATG) became the focus of public debate. Previously, however, little attention had been paid to this area of AI, resulting in a gap in the research on people’s attitudes and perceptions of this technology. Therefore, two representative surveys among the German population were conducted before (March 2022) and after (July 2023) the release of ChatGPT to investigate people’s attitudes, concepts, and knowledge on ATG in detail. This data descriptor depicts the structure of the two datasets, the measures collected, and potential analysis approaches beyond the existing research paper. Other researchers are encouraged to take up these data sets and explore them further as suggested or as they deem appropriate. Full article
37 pages, 28638 KiB  
Article
Characterization and Dataset Compilation of Torque–Angle Curve Behavior for M2/M3 Screws
by Iván Juan Carlos Pérez-Olguín, Consuelo Catalina Fernández-Gaxiola, Luis Alberto Rodríguez-Picón and Luis Carlos Méndez-González
Data 2024, 9(10), 115; https://doi.org/10.3390/data9100115 - 6 Oct 2024
Viewed by 565
Abstract
This research explores the torque–angle behavior of M2/M3 screws in automotive applications, focusing on ensuring component reliability and manufacturing precision within the recommended assembly specification limits. M2/M3 screws, often used in tight spaces, are susceptible to issues like stripped threads and inconsistent torque, [...] Read more.
This research explores the torque–angle behavior of M2/M3 screws in automotive applications, focusing on ensuring component reliability and manufacturing precision within the recommended assembly specification limits. M2/M3 screws, often used in tight spaces, are susceptible to issues like stripped threads and inconsistent torque, which can compromise safety and performance. The study’s primary objective is to develop a comprehensive dataset of torque–angle measurements for these screws, facilitating the analysis of key parameters such as torque-to-seat, torque-to-fail, and process windows. By applying Gaussian curve fitting and Gaussian process regression, the research models and simulates torque behavior to understand torque dynamics in small fasteners and remarks on the potential of statistical methods in torque analysis, offering insights for improving manufacturing practices. As a result, it can be concluded that the proposed stochastics methodologies offer the benefit of fail-to-seat ratio improvement, allow inference, reduce the sample size needed in incoming test studies, and minimize the number of destructive test samples needed. Full article
Show Figures

Figure 1

15 pages, 2190 KiB  
Data Descriptor
Open and Collaborative Dataset for the Classification of Operational Transconductance Amplifiers for Switched-Capacitor Applications
by Francesco Gagliardi and Michele Dei
Data 2024, 9(10), 114; https://doi.org/10.3390/data9100114 - 3 Oct 2024
Viewed by 513
Abstract
This study introduces a collaborative and open dataset designed to classify operational transconductance amplifiers (OTAs) in switched-capacitor applications. The dataset comprises a diverse collection of OTA designs sourced from the literature, facilitating benchmarking, analysis and innovation in analog and mixed-signal integrated circuit design. [...] Read more.
This study introduces a collaborative and open dataset designed to classify operational transconductance amplifiers (OTAs) in switched-capacitor applications. The dataset comprises a diverse collection of OTA designs sourced from the literature, facilitating benchmarking, analysis and innovation in analog and mixed-signal integrated circuit design. Various evaluation methodologies, implemented through a companion Python notebook script, are discussed to assess OTA performances across different operating conditions and specifications. Several Figures of Merit (FoMs) are utilized as performance metrics to achieve significant performance classification. This study also uncovers intriguing behaviors and correlations among FoMs, providing valuable insights into OTA design considerations. By making the dataset openly available on platforms like GitHub, this work encourages collaboration and knowledge sharing within the integrated circuit design community, thereby enhancing transparency, reproducibility and innovation in OTA design research. Full article
Show Figures

Figure 1

12 pages, 6417 KiB  
Data Descriptor
Dataset for Machine Learning: Explicit All-Sky Image Features to Enhance Solar Irradiance Prediction
by Joylan Nunes Maciel, Jorge Javier Gimenez Ledesma and Oswaldo Hideo Ando Junior
Data 2024, 9(10), 113; https://doi.org/10.3390/data9100113 - 29 Sep 2024
Viewed by 816
Abstract
Prediction of solar irradiance is crucial for photovoltaic energy generation, as it helps mitigate intermittencies caused by atmospheric fluctuations such as clouds, wind, and temperature. Numerous studies have applied machine learning and deep learning techniques from artificial intelligence to address this challenge. Based [...] Read more.
Prediction of solar irradiance is crucial for photovoltaic energy generation, as it helps mitigate intermittencies caused by atmospheric fluctuations such as clouds, wind, and temperature. Numerous studies have applied machine learning and deep learning techniques from artificial intelligence to address this challenge. Based on the recently proposed Hybrid Prediction Method (HPM), this paper presents an original and comprehensive dataset with nine attributes extracted from all-sky images developed using image processing techniques. This dataset and analysis of its attributes offer new avenues for research into solar irradiance forecasting. To ensure reproducibility, the data processing workflow and the standardized dataset have been meticulously detailed and made available to the scientific community to promote further research into prediction methods for photovoltaic energy generation. Full article
(This article belongs to the Topic Smart Energy Systems, 2nd Edition)
Show Figures

Figure 1

14 pages, 1676 KiB  
Article
Fundamentals of Analysis of Health Data for Non-Physicians
by Carlos Hernández-Nava, Miguel-Félix Mata-Rivera and Sergio Flores-Hernández
Data 2024, 9(10), 112; https://doi.org/10.3390/data9100112 - 27 Sep 2024
Viewed by 386
Abstract
The increasing prevalence of diabetes worldwide, including in Mexico, presents significant challenges to healthcare systems. This has a notable impact on hospital admissions, as diabetes is considered an ambulatory care-sensitive condition, meaning that hospitalizations could be avoided. This is just one example of [...] Read more.
The increasing prevalence of diabetes worldwide, including in Mexico, presents significant challenges to healthcare systems. This has a notable impact on hospital admissions, as diabetes is considered an ambulatory care-sensitive condition, meaning that hospitalizations could be avoided. This is just one example of many challenges faced in the medical and public health fields. Traditional healthcare methods have been effective in managing diabetes and preventing complications. However, they often encounter limitations when it comes to analyzing large amounts of health data to effectively identify and address diseases. This paper aims to bridge this gap by outlining a comprehensive methodology for non-physicians, particularly data scientists, working in healthcare. As a case study, this paper utilizes hospital diabetes discharge records from 2010 to 2023, totaling 36,665,793 records from medical units under the Ministry of Health of Mexico. We aim to highlight the importance for data scientists to understand the problem and its implications. By doing so, insights can be generated to inform policy decisions and reduce the burden of avoidable hospitalizations. The approach primarily relies on stratification and standardization to uncover rates based on sex and age groups. This study provides a foundation for data scientists to approach health data in a new way. Full article
Show Figures

Figure 1

13 pages, 3690 KiB  
Article
Non-Linear Relationship between MiRNA Regulatory Activity and Binding Site Counts on Target mRNAs
by Shuangmei Tian, Ziyu Zhao, Beibei Ren and Degeng Wang
Data 2024, 9(10), 111; https://doi.org/10.3390/data9100111 - 25 Sep 2024
Viewed by 505
Abstract
MicroRNAs (miRNA) exert regulatory actions via base pairing with their binding sites on target mRNAs. Cooperative binding, i.e., synergism, among binding sites on an mRNA is biochemically well characterized. We studied whether this synergism is reflected in the global relationship between miRNA-mediated regulatory [...] Read more.
MicroRNAs (miRNA) exert regulatory actions via base pairing with their binding sites on target mRNAs. Cooperative binding, i.e., synergism, among binding sites on an mRNA is biochemically well characterized. We studied whether this synergism is reflected in the global relationship between miRNA-mediated regulatory activity and miRNA binding site count on the target mRNAs, i.e., leading to a non-linear relationship between the two. Recently, using our own and public datasets, we have enquired into miRNA regulatory actions: first, we analyzed the power-law distribution pattern of miRNA binding sites; second, we found that, strikingly, mRNAs for core miRNA regulatory apparatus proteins have extraordinarily high binding site counts, forming self-feedback-control loops; third, we revealed that tumor suppressor mRNAs generally have more sites than oncogene mRNAs; and fourth, we characterized enrichment of miRNA-targeted mRNAs in translationally less active polysomes relative to more active polysomes. In these four studies, we qualitatively observed obvious positive correlation between the extent to which an mRNA is miRNA-regulated and its binding site count. This paper summarizes the datasets used. We also quantitatively analyzed the correlation by comparative linear and non-linear regression analyses. Non-linear relationships, i.e., accelerating rise of regulatory activity as binding site count increases, fit the data much better, conceivably a transcriptome-level reflection of cooperative binding among miRNA binding sites on a target mRNA. This observation is potentially a guide for integrative quantitative modeling of the miRNA regulatory system. Full article
Show Figures

Figure 1

11 pages, 2193 KiB  
Article
Comprehensive Overview of Long-Term Ecosystem Research Datasets at LTER Site Oberes Stubachtal
by Bernhard Zagel, Hans Wiesenegger, Robert R. Junker and Gerhard Ehgartner
Data 2024, 9(10), 110; https://doi.org/10.3390/data9100110 - 25 Sep 2024
Viewed by 401
Abstract
This article provides a comprehensive overview of all currently available datasets of the Long-term Ecosystem Research (LTER) site Oberes Stubachtal. The site is located in the Hohe Tauern mountain range (Eastern Alps, Austria) and includes both protected areas (Hohe Tauern National Park) and [...] Read more.
This article provides a comprehensive overview of all currently available datasets of the Long-term Ecosystem Research (LTER) site Oberes Stubachtal. The site is located in the Hohe Tauern mountain range (Eastern Alps, Austria) and includes both protected areas (Hohe Tauern National Park) and unprotected areas (Stubach valley). While the main research focus of the site is on high mountains, glaciology, glacial hydrology, and biodiversity, the eLTER Whole-System Approach (WAILS) was used for data selection. This approach involves a systematic screening of all available data to assess their suitability as eLTER Standard Observations (SOs). This includes the geosphere, atmosphere, hydrosphere, biosphere, and sociosphere. These SOs are fundamental to the development of a comprehensive long-term ecosystem research framework. In total, more than 40 datasets have been collated for the LTER site Oberes Stubachtal and included in the Dynamic Ecological Information Management System—Site and Data Registry (DEIMS-SDR), the eLTER’s data platform. This paper provides a detailed inventory of the datasets and their primary attributes, evaluates them against the WAILS-required observation data, and offers insights into strategies for future initiatives. All datasets are made available through dedicated repositories for FAIR (findable, accessible, interoperable, reusable) use. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Previous Issue
Back to TopTop