Next Issue
Volume 6, February
Previous Issue
Volume 5, December
 
 

Data, Volume 6, Issue 1 (January 2021) – 6 articles

Cover Story (view full-size image): A key enabler of intelligent maintenance systems is the ability to predict the remaining useful lifetime of its components, i.e., prognostics. The development of data-driven prognostic models requires datasets with run-to-failure trajectories. However, large representative run-to-failure datasets are often unavailable in real applications because failures are rare in safety-critical systems. To foster the development of prognostics methods, we provided a new realistic dataset of run-to-failure trajectories for a fleet of aircraft engines. The dataset was generated with the Commercial Modular Aero-Propulsion System Simulation (CMAPSS) model developed at NASA. The damage propagation modelling used builds on the modelling strategy from previous work and incorporated two new levels of fidelity: real flight conditions and a degradation process dependent on the operation history. View this paper.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
17 pages, 3387 KiB  
Article
The Hierarchical Classifier for COVID-19 Resistance Evaluation
by Nataliya Shakhovska, Ivan Izonin and Nataliia Melnykova
Data 2021, 6(1), 6; https://doi.org/10.3390/data6010006 - 15 Jan 2021
Cited by 10 | Viewed by 3735
Abstract
Finding dependencies in the data requires the analysis of relations between dozens of parameters of the studied process and hundreds of possible sources of influence on this process. Dependencies are nondeterministic and therefore modeling requires the use of statistical methods for analyzing random [...] Read more.
Finding dependencies in the data requires the analysis of relations between dozens of parameters of the studied process and hundreds of possible sources of influence on this process. Dependencies are nondeterministic and therefore modeling requires the use of statistical methods for analyzing random processes. Part of the information is often hidden from observation or not monitored. That is why many difficulties have arisen in the process of analyzing the collected information. The paper aims to find frequent patterns and parameters affected by COVID-19. The novelty of the paper is hierarchical architecture comprises supervised and unsupervised methods. It allows the development of an ensemble of the methods based on k-means clustering and classification. The best classifiers from the ensemble are random forest with 500 trees and XGBoost. Classification for separated clusters gives us higher accuracy on 4% in comparison with dataset analysis. The proposed approach can be used also for personalized medicine decision support in other domains. The features selection allows us to analyze the following features with the highest impact on COVID-19: age, sex, blood group, had influenza. Full article
(This article belongs to the Special Issue Data-Driven Modelling of Infectious Diseases)
Show Figures

Figure 1

14 pages, 2518 KiB  
Data Descriptor
Aircraft Engine Run-to-Failure Dataset under Real Flight Conditions for Prognostics and Diagnostics
by Manuel Arias Chao, Chetan Kulkarni, Kai Goebel and Olga Fink
Data 2021, 6(1), 5; https://doi.org/10.3390/data6010005 - 13 Jan 2021
Cited by 100 | Viewed by 16610
Abstract
A key enabler of intelligent maintenance systems is the ability to predict the remaining useful lifetime (RUL) of its components, i.e., prognostics. The development of data-driven prognostics models requires datasets with run-to-failure trajectories. However, large representative run-to-failure datasets are often unavailable in real [...] Read more.
A key enabler of intelligent maintenance systems is the ability to predict the remaining useful lifetime (RUL) of its components, i.e., prognostics. The development of data-driven prognostics models requires datasets with run-to-failure trajectories. However, large representative run-to-failure datasets are often unavailable in real applications because failures are rare in many safety-critical systems. To foster the development of prognostics methods, we develop a new realistic dataset of run-to-failure trajectories for a fleet of aircraft engines under real flight conditions. The dataset was generated with the Commercial Modular Aero-Propulsion System Simulation (CMAPSS) model developed at NASA. The damage propagation modelling used in this dataset builds on the modelling strategy from previous work and incorporates two new levels of fidelity. First, it considers real flight conditions as recorded on board of a commercial jet. Second, it extends the degradation modelling by relating the degradation process to its operation history. This dataset also provides the health, respectively, fault class. Therefore, besides its applicability to prognostics problems, the dataset can be used for fault diagnostics. Full article
Show Figures

Figure 1

11 pages, 829 KiB  
Article
No-z Model for Magnetic Fields of Different Astrophysical Objects and Stability of the Solutions
by Evgeny Mikhailov, Daniela Boneva and Maria Pashentseva
Data 2021, 6(1), 4; https://doi.org/10.3390/data6010004 - 10 Jan 2021
Viewed by 2611
Abstract
A wide range of astrophysical objects, such as the Sun, galaxies, stars, planets, accretion discs etc., have large-scale magnetic fields. Their generation is often based on the dynamo mechanism, which is connected with joint action of the alpha-effect and differential rotation. They compete [...] Read more.
A wide range of astrophysical objects, such as the Sun, galaxies, stars, planets, accretion discs etc., have large-scale magnetic fields. Their generation is often based on the dynamo mechanism, which is connected with joint action of the alpha-effect and differential rotation. They compete with the turbulent diffusion. If the dynamo is intensive enough, the magnetic field grows, else it decays. The magnetic field evolution is described by Steenbeck—Krause—Raedler equations, which are quite difficult to be solved. So, for different objects, specific two-dimensional models are used. As for thin discs (this shape corresponds to galaxies and accretion discs), usually, no-z approximation is used. Some of the partial derivatives are changed by the algebraic expressions, and the solenoidality condition is taken into account as well. The field generation is restricted by the equipartition value and saturates if the field becomes comparable with it. From the point of view of mathematical physics, they can be characterized as stable points of the equations. The field can come to these values monotonously or have oscillations. It depends on the type of the stability of these points, whether it is a node or focus. Here, we study the stability of such points and give examples for astrophysical applications. Full article
(This article belongs to the Special Issue Astronomy in the Big Data Era: Perspectives)
Show Figures

Figure 1

14 pages, 1002 KiB  
Data Descriptor
Drugs, Active Ingredients and Diseases Database in Spanish. Augmenting the Resources for Analyses on Drug–Illness Interactions
by Irene López-Rodríguez, César F. Reyes-Manzano, Israel Reyes-Ramírez, Tania J. Contreras-Uribe and Lev Guzmán-Vargas
Data 2021, 6(1), 3; https://doi.org/10.3390/data6010003 - 9 Jan 2021
Cited by 1 | Viewed by 4552
Abstract
Quantitative and qualitative data on active-ingredient drug composition are essential information for characterizing near-field exposure of consumers to product-related chemicals, among other things. Equally as important is the characterization of the relationship between one or many active ingredients in terms of the diseases [...] Read more.
Quantitative and qualitative data on active-ingredient drug composition are essential information for characterizing near-field exposure of consumers to product-related chemicals, among other things. Equally as important is the characterization of the relationship between one or many active ingredients in terms of the diseases they are prescribed for. Such evaluations, however, require quantitative information at different anatomical levels. To complement the available sources of information on active substances and diseases, we have designed a database with enough versatility to potentially be used in a variety of analyzes. By using information provided by a well-established online pharmacological dictionary, we present a database with 11 tables which are easy to access and manipulate. Specifically, we present datasets containing the details of 12,827 marketed drug products, 40,164 diseases, 6231 active pharmaceutical ingredients and 4093 side effects. We exemplify the usefulness of our database with three simple visualizations, which confirm the importance of the data for quantifying the complexity in the associations among active substances, diseases and side effects. Although there are databases with detailed information on active substances and diseases, none of them can be found in Spanish. Our work presents an option that contributes substantially to obtaining well classified information in order to evaluate the roles of active pharmaceutical ingredients, diseases and side effects. These datasets also provide information about clinical and pharmacological groupings which may be useful for clinical and academic researchers. The database will be regularly updated and extended with the newly available Virtual Medicinal Products. Full article
(This article belongs to the Section Chemoinformatics)
Show Figures

Figure 1

20 pages, 3212 KiB  
Article
The Use of National Strategic Reference Framework Data in Knowledge Graphs and Data Mining to Identify Red Flags
by Charalampos Bratsas, Evangelos Chondrokostas, Kleanthis Koupidis and Ioannis Antoniou
Data 2021, 6(1), 2; https://doi.org/10.3390/data6010002 - 4 Jan 2021
Cited by 9 | Viewed by 3714
Abstract
Red Flags in fiscal projects are warning signs that may indicate underlying problems with their implementation. In this paper, we present how National Strategic Reference Framework Open Data can be used to take full advantage of semantic web technologies and data mining techniques [...] Read more.
Red Flags in fiscal projects are warning signs that may indicate underlying problems with their implementation. In this paper, we present how National Strategic Reference Framework Open Data can be used to take full advantage of semantic web technologies and data mining techniques to build a knowledge-based system that identifies Red Flags. We collected the data from the Open Data API provided by the Greek Ministry of Economy and Finance. Data modeling consist of two ontologies; the Vocabulary of Fiscal Projects, describing the fiscal projects and the National Strategic Reference Framework Greece Vocabulary, illustrating the Greek National Strategic Reference Framework data. We transformed the data into RDF triples and uploaded them onto an OpenLink Virtuoso Server, so that we could retrieve them via SPARQL queries. Performance indicators were defined to assess the state of the project and Density-Based Spatial Clustering of Applications with Noise, (DBSCAN) was used to identify Red Flags. User’s demands is that rejected projects should raise Red Flags, to avoid project failure and assist the auditor to organize the monitoring process efficiently, by avoiding to examine most of the non-problematic projects. We performed a use case scenario in which an auditor has to examine NSRF projects, approximately 12 months before the end of the programming period. The system retrieved the fiscal information, calculated the performance indicators and identified the Red Flags. The last update of the projects status after the end of the programming period was retrieved and extracted the number of rejected projects, to test whether the user requirements are satisfied. Rejected projects consist of 3.8% of the total projects. The results of the use case scenario show that RedFlags platform is more likely to identify project failures and not raise Red Flags on not rejected projects. Therefore, the RedFlags platform using open data, assists the auditor to organize the monitoring process better. Full article
(This article belongs to the Special Issue Challenges in Business Intelligence)
Show Figures

Figure 1

20 pages, 1380 KiB  
Article
OFCOD: On the Fly Clustering Based Outlier Detection Framework
by Ahmed Elmogy, Hamada Rizk and Amany M. Sarhan
Data 2021, 6(1), 1; https://doi.org/10.3390/data6010001 - 30 Dec 2020
Cited by 16 | Viewed by 3244
Abstract
In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect [...] Read more.
In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop