applsci-logo

Journal Browser

Journal Browser

Big Data Engineering and Application

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (29 February 2024) | Viewed by 7169

Special Issue Editors

Associate Professor, School of Physics and Electronics, Central South University, Changsha 410017, China
Interests: big data, machine learning, data mining

Special Issue Information

Dear Colleagues,

We are inviting submissions to the Special Issue on Big Data Engineering and Application. 

Big data is at the forefront of the current research and provides a complete and effective data and information analysis method for artificial intelligence and in many fields of intelligent decisions. In this Special Issue, we invite submissions exploring cutting-edge research and recent advances in the fields of Big Data Engineering and Application. Both theoretical and experimental studies are welcome, as well as comprehensive reviews and survey papers.

The main topics include computing models, algorithms, frameworks and related applications and so on, as well as optimization and application of machine learning theory in big data.

Dr. Linzi Yin
Dr. Zhiwen Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data and data mining
  • big data and machine learning
  • big data engineering and application
  • big data-driven decision modeling
  • high performance big data learning architecture, algorithm and system

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 4034 KiB  
Article
An Analysis of the 2008 Ms 8.0 Wenchuan Earthquake’s Aftershock Activity
by Haoyu Wu, Weijin Xu and Xia Wang
Appl. Sci. 2024, 14(11), 4754; https://doi.org/10.3390/app14114754 - 31 May 2024
Viewed by 409
Abstract
We investigated the magnitude–frequency relationship and decay pattern of an aftershock sequence using data from the 2008 Wenchuan earthquake. We analyzed the spatial variations in aftershock activity parameters b and p. The calculated b-value of the aftershock sequence is 0.89 ± [...] Read more.
We investigated the magnitude–frequency relationship and decay pattern of an aftershock sequence using data from the 2008 Wenchuan earthquake. We analyzed the spatial variations in aftershock activity parameters b and p. The calculated b-value of the aftershock sequence is 0.89 ± 0.02, which is relatively small, probably owing to the absence of small earthquakes in the aftershock catalog. The p-value, indicating the decay rate of aftershock activity, is 1.05 ± 0.02, which is normal. The decay pattern of the Wenchuan aftershock sequence agrees well with the modified Omori law. The b-value of the aftershock sequence mainly spatially varies between 0.6 and 1.2, and the p-value varies between 0.6 and 1.8. Although the physical significance of the spatial variations in b- and p-values has not been clearly defined, in this study, the physical significance of the b-value is mainly related to changes in stress, P-wave velocity, and the density of media in the earthquake area, and that in the p-value is associated with the fault slip amount during the mainshock; the b- and p-values show a strong linear correlation. After the mainshock, stress decreased and increased in areas with large and small b-values, respectively; the regions with large and small b-values were associated with low and high P-wave velocities, respectively. The subsurface media experienced relatively high and low apparent velocities in areas with small and large b-values, respectively. The amount of fault slip was small and large in regions with small and large p-values, respectively, exhibiting a linear correlation between the fault slip amount and p-value. The results indicate that the spatial variations in the b- and p-values were related to the physical properties of the media in the earthquake area and focal earthquake mechanism. Full article
(This article belongs to the Special Issue Big Data Engineering and Application)
Show Figures

Figure 1

20 pages, 5161 KiB  
Article
Anomaly Detection and Identification Method for Shield Tunneling Based on Energy Consumption Perspective
by Min Hu, Fan Zhang and Huiming Wu
Appl. Sci. 2024, 14(5), 2202; https://doi.org/10.3390/app14052202 - 6 Mar 2024
Viewed by 976
Abstract
Various abnormal scenarios might occur during the shield tunneling process, which have an impact on construction efficiency and safety. Existing research on shield tunneling construction anomaly detection typically designs models based on the characteristics of a specific anomaly, so the scenarios of anomalies [...] Read more.
Various abnormal scenarios might occur during the shield tunneling process, which have an impact on construction efficiency and safety. Existing research on shield tunneling construction anomaly detection typically designs models based on the characteristics of a specific anomaly, so the scenarios of anomalies that can be detected are limited. Therefore, the research objective of this article is to establish an accurate anomaly detection model with generalization and identification capabilities on multiple types of abnormal scenarios. Inspired by energy dissipation theory, this paper innovatively detects various anomalies in the shield tunneling process from the perspective of energy consumption and designs the AD_SI model (Anomaly Detection and Scenario Identification model of shield tunneling) based on machine learning. The AD_SI model first monitors the shield machine’s energy consumption status based on the VAE-LSTM (Variational Autoencoder–Long Short-Term Memory) algorithm with a dynamic threshold, thereby detecting abnormal sections. Secondly, the AD_SI model uses the correlation of construction parameters to represent different known scenarios and further clarifies scenarios of the abnormal sections, thus achieving anomaly identification. The application of the AD_SI model in a shield tunneling construction project demonstrates its capability to accurately detect and identify different anomalies, with a recall value exceeding 0.9 and F1 exceeding 0.8, thereby providing guidance for accurately detecting multiple types anomaly scenarios in practical applications. Full article
(This article belongs to the Special Issue Big Data Engineering and Application)
Show Figures

Figure 1

18 pages, 2614 KiB  
Article
A Fast Parallel Random Forest Algorithm Based on Spark
by Linzi Yin, Ken Chen, Zhaohui Jiang and Xuemei Xu
Appl. Sci. 2023, 13(10), 6121; https://doi.org/10.3390/app13106121 - 17 May 2023
Cited by 1 | Viewed by 2761
Abstract
To improve the computational efficiency and classification accuracy in the context of big data, an optimized parallel random forest algorithm is proposed based on the Spark computing framework. First, a new Gini coefficient is defined to reduce the impact of feature redundancy for [...] Read more.
To improve the computational efficiency and classification accuracy in the context of big data, an optimized parallel random forest algorithm is proposed based on the Spark computing framework. First, a new Gini coefficient is defined to reduce the impact of feature redundancy for higher classification accuracy. Next, to reduce the number of candidate split points and Gini coefficient calculations for continuous features, an approximate equal-frequency binning method is proposed to determine the optimal split points efficiently. Finally, based on Apache Spark computing framework, the forest sampling index (FSI) table is defined to speed up the parallel training process of decision trees and reduce data communication overhead. Experimental results show that the proposed algorithm improves the efficiency of constructing random forests while ensuring classification accuracy, and is superior to Spark-MLRF in terms of performance and scalability. Full article
(This article belongs to the Special Issue Big Data Engineering and Application)
Show Figures

Figure 1

28 pages, 888 KiB  
Article
Efficient False Positive Control Algorithms in Big Data Mining
by Xuze Liu, Yuhai Zhao, Tongze Xu, Fazal Wahab, Yiming Sun and Chen Chen
Appl. Sci. 2023, 13(8), 5006; https://doi.org/10.3390/app13085006 - 16 Apr 2023
Cited by 5 | Viewed by 1944
Abstract
The typical hypothesis testing issue in statistical analysis is determining whether a pattern is significantly associated with a specific class label. This usually leads to highly challenging multiple-hypothesis testing problems in big data mining scenarios, as millions or billions of hypothesis tests in [...] Read more.
The typical hypothesis testing issue in statistical analysis is determining whether a pattern is significantly associated with a specific class label. This usually leads to highly challenging multiple-hypothesis testing problems in big data mining scenarios, as millions or billions of hypothesis tests in large-scale exploratory data analysis can result in a large number of false positive results. The permutation testing-based FWER control method (PFWER) is theoretically effective in dealing with multiple hypothesis testing issues. In reality, however, this theoretical approach confronts a serious computational efficiency problem. It takes an extremely long time to compute an appropriate FWER false positive control threshold using PFWER, which is almost impossible to achieve in a reasonable amount of time using human effort on medium- or large-scale data. Although some methods for improving the efficiency of the FWER false positive control threshold calculation have been proposed, most of them are stand-alone, and there is still a lot of space for efficiency improvement. To address this problem, this paper proposes a distributed PFWER false-positive threshold calculation method for large-scale data. The computational effectiveness increases significantly when compared to the current approaches. The FP-growth algorithm is used first for pattern mining, and the mining process reduces the computation of invalid patterns by using pruning operations and index optimization for merging patterns with index transactions. The distributed computing technique is introduced on this basis, and the constructed FP tree is decomposed into a set of subtrees, each corresponding to a subtask. All subtrees (subtasks) are distributed to different computing nodes. Each node independently calculates the local significance threshold according to the designated subtasks. Finally, all local results are aggregated to compute the FWER false positive control threshold, which is completely consistent with the theoretical result. A series of experimental findings on 11 real-world datasets demonstrate that the distributed algorithm proposed in this paper can significantly improve the computation efficiency of PFWER while ensuring its theoretical accuracy. Full article
(This article belongs to the Special Issue Big Data Engineering and Application)
Show Figures

Figure 1

Back to TopTop