applsci-logo

Journal Browser

Journal Browser

Data Analysis and Mining: New Techniques and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 August 2024) | Viewed by 12649

Special Issue Editor


E-Mail Website
Guest Editor
College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Interests: data mining; social network analysis; multimodel learning; graph data analysis; time serial analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Learning hierarchical representation and finding useful patterns from data by differentiable models in an end-to-end fashion has been amongst of the greatest developments in data mining so far. Despite its application in traditional research fields like computer vision, natural language processing, and recommendation systems, such a data-driven approach shows great potential when it comes to the intersection of AI and science. From protein structure prediction to quantum artificial intelligence, data mining techniques are providing amazing insight into fitting data and have assisted in the discovery of scientific laws in various domains, as well as contributing to a new research paradigm called AI for science.

Even though artificial general intelligence (AGI) is far from reach, mining scientific data still find many intriguing applications. Recent applications include, but are not restricted to, quantum physics, computational chemistry, molecular biology, fluid dynamics, software engineering, and other disciplines. This Special Issue invites the submission of papers with innovative ideas either in data mining algorithms or in applications of a specific research field. To facilitate the application of data mining technology and accelerate the process of its industrial application, papers that present data mining tools in a specific domain are also welcomed.

Dr. Donghai Guan
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining
  • time series analysis
  • multimodel learning
  • social network analysis
  • classification
  • clustering
  • graph data analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 2515 KiB  
Article
Online Self-Learning-Based Raw Material Proportioning for Rotary Hearth Furnace and Intelligent Batching System Development
by Xianxia Zhang, Lufeng Wang, Shengjie Tang, Chang Zhao and Jun Yao
Appl. Sci. 2024, 14(19), 9126; https://doi.org/10.3390/app14199126 - 9 Oct 2024
Viewed by 753
Abstract
With the increasing awareness of environmental protection, the rotary hearth furnace system has emerged as a key technology that facilitates a win-win situation for both environmental protection and enterprise economic benefits. This is attributed to its high flexibility in raw material utilization, capability [...] Read more.
With the increasing awareness of environmental protection, the rotary hearth furnace system has emerged as a key technology that facilitates a win-win situation for both environmental protection and enterprise economic benefits. This is attributed to its high flexibility in raw material utilization, capability of directly supplying blast furnaces, low energy consumption, and high zinc removal rate. However, the complexity of the raw material proportioning process coupled with the rotary hearth furnace system’s reliance on human labor results in a time-consuming and inefficient process. This paper innovatively introduces an intelligent formula method for proportioning raw materials based on online clustering algorithms and develops an intelligent batching system for rotary hearth furnaces. Firstly, the ingredients of raw materials undergo data preprocessing, which involves using the local outlier factor (LOF) method to detect any abnormal values, using Kalman filtering to smooth the data, and performing one-hot encoding to represent the different kinds of raw materials. Afterwards, the affinity propagation (AP) clustering method is used to evaluate past data on the ingredients of raw materials and their ratios. This analysis aims to extract information based on human experience with ratios and create a library of machine learning formulas. The incremental AP clustering algorithm is utilized to learn new ratio data and continuously update the machine learning formula library. To ensure that the formula meets the actual production performance requirements of the rotary hearth furnace, the machine learning formula is fine-tuned based on expert experience. The integration of machine learning and expert experience demonstrates good flexibility and satisfactory performance in the practical application of intelligent formulas for rotary hearth furnaces. An intelligent batching system is developed and executed at a steel plant in China. It shows an excellent user interface and significantly enhances batching efficiency and product quality. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
Show Figures

Figure 1

24 pages, 4787 KiB  
Article
Hierarchical Indexing and Compression Method with AI-Enhanced Restoration for Scientific Data Service
by Biao Song, Yuyang Fang, Runda Guan, Rongjie Zhu, Xiaokang Pan and Yuan Tian
Appl. Sci. 2024, 14(13), 5528; https://doi.org/10.3390/app14135528 - 25 Jun 2024
Viewed by 1057
Abstract
In the process of data services, compressing and indexing data can reduce storage costs, improve query efficiency, and thus enhance the quality of data services. However, different service requirements have diverse demands for data precision. Traditional lossy compression techniques fail to meet the [...] Read more.
In the process of data services, compressing and indexing data can reduce storage costs, improve query efficiency, and thus enhance the quality of data services. However, different service requirements have diverse demands for data precision. Traditional lossy compression techniques fail to meet the precision requirements of different data due to their fixed compression parameters and schemes. Additionally, error-bounded lossy compression techniques, due to their tightly coupled design, cannot achieve high compression ratios under high precision requirements. To address these issues, this paper proposes a lossy compression technique based on error control. Instead of imposing precision constraints during compression, this method first uses the JPEG compression algorithm for multi-level compression and then manages data through a tree-based index structure to achieve error control. This approach satisfies error control requirements while effectively avoiding tight coupling. Additionally, this paper enhances data restoration effects using a deep learning network and provides a range query processing algorithm for the tree-based index to improve query efficiency. We evaluated our solution using ocean data. Experimental results show that, while maintaining data precision requirements (PSNR of at least 39 dB), our compression ratio can reach 64, which is twice that of the SZ compression algorithm. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
Show Figures

Figure 1

16 pages, 370 KiB  
Article
Pairwise Likelihood Estimation of the 2PL Model with Locally Dependent Item Responses
by Alexander Robitzsch
Appl. Sci. 2024, 14(6), 2652; https://doi.org/10.3390/app14062652 - 21 Mar 2024
Cited by 2 | Viewed by 1016
Abstract
The local independence assumption is crucial for the consistent estimation of item parameters in item response theory models. This article explores a pairwise likelihood estimation approach for the two-parameter logistic (2PL) model that treats the local dependence structure as a nuisance in the [...] Read more.
The local independence assumption is crucial for the consistent estimation of item parameters in item response theory models. This article explores a pairwise likelihood estimation approach for the two-parameter logistic (2PL) model that treats the local dependence structure as a nuisance in the optimization function. Hence, item parameters can be consistently estimated without explicit modeling assumptions of the dependence structure. Two simulation studies demonstrate that the proposed pairwise likelihood estimation approach allows nearly unbiased and consistent item parameter estimation. Our proposed method performs similarly to the marginal maximum likelihood and pairwise likelihood estimation approaches, which also estimate the parameters for the local dependence structure. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
27 pages, 3097 KiB  
Article
A Methodology for Knowledge Discovery in Labeled and Heterogeneous Graphs
by Víctor H. Ortega-Guzmán, Luis Gutiérrez-Preciado, Francisco Cervantes and Mildreth Alcaraz-Mejia
Appl. Sci. 2024, 14(2), 838; https://doi.org/10.3390/app14020838 - 18 Jan 2024
Viewed by 1218
Abstract
Graph mining has emerged as a significant field of research with applications spanning multiple domains, including marketing, corruption analysis, business, and politics. The exploration of knowledge within graphs has garnered considerable attention due to the exponential growth of graph-modeled data and its potential [...] Read more.
Graph mining has emerged as a significant field of research with applications spanning multiple domains, including marketing, corruption analysis, business, and politics. The exploration of knowledge within graphs has garnered considerable attention due to the exponential growth of graph-modeled data and its potential in applications where data relationships are a crucial component, and potentially being even more important than the data themselves. However, the increasing use of graphs for data storing and modeling presents unique challenges that have prompted advancements in graph mining algorithms, data modeling and storage, query languages for graph databases, and data visualization techniques. Despite there being various methodologies for data analysis, they predominantly focus on structured data and may not be optimally suited for highly connected data. Accordingly, this work introduces a novel methodology specifically tailored for knowledge discovery in labeled and heterogeneous graphs (KDG), and it presents three case studies demonstrating its successful application in addressing various challenges across different application domains. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
Show Figures

Figure 1

20 pages, 4842 KiB  
Article
SbMBR Tree—A Spatiotemporal Data Indexing and Compression Algorithm for Data Analysis and Mining
by Runda Guan, Ziyu Wang, Xiaokang Pan, Rongjie Zhu, Biao Song and Xinchang Zhang
Appl. Sci. 2023, 13(19), 10562; https://doi.org/10.3390/app131910562 - 22 Sep 2023
Cited by 1 | Viewed by 1114
Abstract
In the field of data analysis and mining, adopting efficient data indexing and compression techniques to spatiotemporal data can significantly reduce computational and storage overhead for the abilities to control the volume of data and exploit the spatiotemporal characteristics. However, traditional lossy compression [...] Read more.
In the field of data analysis and mining, adopting efficient data indexing and compression techniques to spatiotemporal data can significantly reduce computational and storage overhead for the abilities to control the volume of data and exploit the spatiotemporal characteristics. However, traditional lossy compression techniques are hardly suitable due to their inherently random nature. They often impose unpredictable damage to scientific data, which affects the results of data mining and analysis tasks that require certain precision. In this paper, we propose a similarity-based minimum bounding rectangle (SbMBR) tree, a tree-based indexing and compression method, to address the aforementioned problem. Our method can hierarchically select appropriate minimum bounding rectangles according to the given maximum acceptable errors and use the average value contained in each selected MBR to replace the original data to achieve data compression with multi-layer loss control. This paper also provides the corresponding tree construction algorithm and range query processing algorithm for the indexing structure mentioned above. To evaluate the data quality preservation in cross-domain data analysis and mining scenarios, we use mutual information as the estimation metric. Experimental results emphasize the superiority of our method over some of the typical indexing and compression algorithms. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
Show Figures

Figure 1

30 pages, 5836 KiB  
Article
Machine Learning Ensemble Modelling for Predicting Unemployment Duration
by Barbora Gabrikova, Lucia Svabova and Katarina Kramarova
Appl. Sci. 2023, 13(18), 10146; https://doi.org/10.3390/app131810146 - 8 Sep 2023
Cited by 5 | Viewed by 2653
Abstract
Predictions of the unemployment duration of the economically active population play a crucial assisting role for policymakers and employment agencies in the well-organised allocation of resources (tied to solving problems of the unemployed, whether on the labour supply or demand side) and providing [...] Read more.
Predictions of the unemployment duration of the economically active population play a crucial assisting role for policymakers and employment agencies in the well-organised allocation of resources (tied to solving problems of the unemployed, whether on the labour supply or demand side) and providing targeted support to jobseekers in their job search. This study aimed to develop an ensemble model that can serve as a reliable tool for predicting unemployment duration among jobseekers in Slovakia. The ensemble model was developed using real data from the database of jobseekers (those registered as unemployed and actively searching for a job through the Local Labour Office, Social Affairs, and Family) using the stacking method, incorporating predictions from three individual models: CART, CHAID, and discriminant analysis. The final meta-model was created using logistic regression and indicates an overall accuracy of the prediction of unemployment duration of almost 78%. This model demonstrated high accuracy and precision in identifying jobseekers at risk of long-term unemployment exceeding 12 months. The presented model, working with real data of a robust nature, represents an operational tool that can be used to check the functionality of the current labour market policy and to solve the problem of long-term unemployed individuals in Slovakia, as well as in the creation of future government measures aimed at solving the problem of unemployment. The measures from the state are financed from budget funds, and by applying the appropriate model, it is possible to arrive at the rationalization of the financing of these measures, or to specifically determine the means intended to solve the problem of long-term unemployment in Slovakia (this, together with the regional disproportion of unemployment, is considered one of the most prominent problems in the labour market in Slovakia). The model also has the potential to be adapted in other economies, taking into account country-specific conditions and variables, which is possible due to the data-mining approach used. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
Show Figures

Figure 1

14 pages, 2476 KiB  
Article
Effects of the Hybrid CRITIC–VIKOR Method on Product Aspect Ranking in Customer Reviews
by Saif Addeen Ahmad Alrababah and Keng Hoon Gan
Appl. Sci. 2023, 13(16), 9176; https://doi.org/10.3390/app13169176 - 11 Aug 2023
Cited by 4 | Viewed by 1487
Abstract
Product aspect ranking is critical for prioritizing the most important aspects of a specific product/service to assist probable customers in selecting suitable products that can realize their needs. However, given the voluminous customer reviews published on websites, customers are hindered from manually extracting [...] Read more.
Product aspect ranking is critical for prioritizing the most important aspects of a specific product/service to assist probable customers in selecting suitable products that can realize their needs. However, given the voluminous customer reviews published on websites, customers are hindered from manually extracting and characterizing the specific aspects of searched products. A few multicriteria decision-making methods have been implemented to rank the most relevant product aspects. As weights greatly affect the ranking results of product aspects, this study used objective methods in finding the importance degree of a criteria set to overcome the limitations of subjective weighting. The growing popularity of online shopping has led to an exponential increase in the number of customer reviews available on various e-commerce websites. The sheer volume of these reviews makes it nearly impossible for customers to manually extract and analyze the specific aspects of the products they are interested in. This challenge highlights the need for automated techniques that can efficiently rank the product aspects based on their relevance and importance. Multicriteria decision-making techniques can address the issue of product aspect ranking. These techniques seek to offer a methodical strategy for assessing and contrasting various product attributes based on various criteria. The subjective nature of determining weights for each criterion raises serious issues because it might lead to bias and inconsistent ranking outcomes. The CRITIC–VIKOR method was adopted in the product aspect ranking process. The statistical findings based on a benchmark dataset using NDCG demonstrate the superior performance of the method of using objective weighting to reasonably acquire subjective weighting results. Also, the results show that the product aspects ranked by using CRITIC–VIKOR could be considered guidelines for probable customers to make a wise purchasing decision. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
Show Figures

Figure 1

24 pages, 4667 KiB  
Article
A Dynamic Grid Index for CkNN Queries on Large-Scale Road Networks with Moving Objects
by Kailei Tang, Zhiyan Dong, Wenxiang Shi and Zhongxue Gan
Appl. Sci. 2023, 13(8), 4946; https://doi.org/10.3390/app13084946 - 14 Apr 2023
Viewed by 1549
Abstract
As the Internet of Things devices are deployed on a large scale, location-based services are being increasingly utilized. Among these services, kNN (k-nearest neighbor) queries based on road network constraints have gained importance. This study focuses on the Ck [...] Read more.
As the Internet of Things devices are deployed on a large scale, location-based services are being increasingly utilized. Among these services, kNN (k-nearest neighbor) queries based on road network constraints have gained importance. This study focuses on the CkNN (continuous k-nearest neighbor) queries for non-uniformly distributed moving objects with large-scale dynamic road network constraints, where CkNN objects are continuously and periodically queried based on their motion evolution. The present CkNN high-concurrency query under the constraints of a super-large road network faces problems, such as high computational cost and low query efficiency. The aim of this study is to ensure high concurrency nearest neighbor query requests while shortening the query response time and reducing global computation costs. To address this issue, we propose the DVTG-Index (Dynamic V-Tree Double-Layer Grid Index), which intelligently adjusts the index granularity by continuously merging and splitting subgraphs as the objects move, thereby filtering unnecessary vertices. Based on DVTG-Index, we further propose the DVTG-CkNN algorithm to calculate the initial kNN query and utilize the existing results to speed up the CkNN query. Finally, extensive experiments on real road networks confirm the superior performance of our proposed method, which has significant practical applications in large-scale dynamic road network constraints with non-uniformly distributed moving objects. Full article
(This article belongs to the Special Issue Data Analysis and Mining: New Techniques and Applications)
Show Figures

Figure 1

Back to TopTop