Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (222)

Search Parameters:
Keywords = high performance computing (HPC)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 9219 KB  
Article
Uncertainty Quantification of the Impact of High-Pressure Compressor Blade Geometric Deviations on Aero Engine Performance
by Pengfei Tang, Jianzhong Sun, Jinchen Nian, Jilong Lu and Qin Liu
Aerospace 2025, 12(9), 767; https://doi.org/10.3390/aerospace12090767 - 27 Aug 2025
Viewed by 350
Abstract
High-pressure compressor (HPC) blades of aero engines inevitably exhibit various uncertain geometric deviations, which deteriorate engine performance and increase maintenance costs. Although the condition-based maintenance (CBM) strategy is increasingly adopted to reduce costs by tailoring repair actions based on condition monitoring data, maintenance [...] Read more.
High-pressure compressor (HPC) blades of aero engines inevitably exhibit various uncertain geometric deviations, which deteriorate engine performance and increase maintenance costs. Although the condition-based maintenance (CBM) strategy is increasingly adopted to reduce costs by tailoring repair actions based on condition monitoring data, maintenance practices often still rely on original equipment manufacturer (OEM) recommendations. To further refine the CBM strategy, this paper proposes an uncertainty quantification method based on the engine performance digital twin (PDT) model to quantify the impact of HPC blade geometric deviations on overall engine performance. The PDT model is developed by coupling computational fluid dynamics simulations with a zero-dimensional performance model using real operating data and is validated for high predictive accuracy. Surrogate models based on support vector regression are employed to efficiently quantify the impact of combined geometric deviations. The results show that combined deviations cause reductions in mass flow, pressure ratio, and efficiency while increasing exhaust gas temperature and specific fuel consumption. The proposed methodology is applied to a CBM scenario to demonstrate its effectiveness. In the real maintenance process, this method enables the prediction of performance after repair, facilitating optimized maintenance strategies. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

14 pages, 1412 KB  
Article
The Diagnostic and Prognostic Value of 18F-FDG PET/MR in Hypopharyngeal Cancer
by Cui Fan, Xinyun Huang, Hao Wang, Haixia Hu, Jichang Wu, Xiangwan Miao, Yuenan Liu, Mingliang Xiang, Nijun Chen and Bin Ye
Diagnostics 2025, 15(17), 2119; https://doi.org/10.3390/diagnostics15172119 - 22 Aug 2025
Viewed by 387
Abstract
Objective: To evaluate the diagnostic performance of fluorine 18 fluorodeoxyglucose positron emission tomography/magnetic resonance imaging (18F-FDG PET/MR) in the preoperative staging of hypopharyngeal cancer (HPC), compare it with conventional enhanced computed tomography (CT) and MR, and further explore the prognostic value [...] Read more.
Objective: To evaluate the diagnostic performance of fluorine 18 fluorodeoxyglucose positron emission tomography/magnetic resonance imaging (18F-FDG PET/MR) in the preoperative staging of hypopharyngeal cancer (HPC), compare it with conventional enhanced computed tomography (CT) and MR, and further explore the prognostic value of its metabolic and diffusion metrics for HPC. Methods: This retrospective study included 33 patients with pathologically confirmed HPC. All patients underwent preoperative 18F-FDG PET/MR, CT, and MR examination. The staging performance of the three modalities was evaluated using pathological staging as a reference. Additionally, metabolic indicators and diffusion-related parameters from PET/MR were collected to investigate their impact on larynx preservation and survival. Results: PET/MR demonstrated accuracies of 90.9% and 71.4% in the preoperative T and N staging, respectively, significantly higher than those of CT (54.5%, p = 0.001; 42.9%, p = 0.021) and MR (66.7%, p = 0.016; 42.9%, p = 0.021). On the whole, significant differences emerged in the maximum standard uptake value (SUVmax), metabolic tumor volume (MTV), minimum apparent diffusion coefficient (ADCmin), and mean ADC (ADCmean) and combined ratios across different T stages, while SUVmax, mean SUV (SUVmean), total lesion glycolysis (TLG), and MTV varied significantly across different N stages. The ADCmin and ADCmean showed good predictive capability for larynx preservation, with AUCs of 0.857 and 0.920 (p < 0.05), respectively. In Cox multivariate analysis of overall survival, high-level ADCmean (p = 0.004) and low-level TLG/ADCmean (p = 0.022) were significantly associated with better survival. Conclusion: In HPC, 18F-FDG PET/MR imaging significantly surpasses CT and MR in preoperative diagnostic staging. Its diffusion-related parameters have substantial prognostic value, with high ADC values associated with larynx preservation. ADCmean and TLG/ADCmean are potential prognostic indicators for HPC. Full article
Show Figures

Figure 1

20 pages, 2233 KB  
Article
HPC Cluster Task Prediction Based on Multimodal Temporal Networks with Hierarchical Attention Mechanism
by Xuemei Bai, Jingbo Zhou and Zhijun Wang
Computers 2025, 14(8), 335; https://doi.org/10.3390/computers14080335 - 18 Aug 2025
Viewed by 368
Abstract
In recent years, the increasing adoption of High-Performance Computing (HPC) clusters in scientific research and engineering has exposed challenges such as resource imbalance, node idleness, and overload, which hinder scheduling efficiency. Accurate multidimensional task prediction remains a key bottleneck. To address this, we [...] Read more.
In recent years, the increasing adoption of High-Performance Computing (HPC) clusters in scientific research and engineering has exposed challenges such as resource imbalance, node idleness, and overload, which hinder scheduling efficiency. Accurate multidimensional task prediction remains a key bottleneck. To address this, we propose a hybrid prediction model that integrates Informer, Long Short-Term Memory (LSTM), and Graph Neural Networks (GNN), enhanced by a hierarchical attention mechanism combining multi-head self-attention and cross-attention. The model captures both long- and short-term temporal dependencies and deep semantic relationships across features. Built on a multitask learning framework, it predicts task execution time, CPU usage, memory, and storage demands with high accuracy. Experiments show prediction accuracies of 89.9%, 87.9%, 86.3%, and 84.3% on these metrics, surpassing baselines like Transformer-XL. The results demonstrate that our approach effectively models complex HPC workload dynamics, offering robust support for intelligent cluster scheduling and holding strong theoretical and practical significance. Full article
Show Figures

Figure 1

19 pages, 4384 KB  
Article
Dynamic Temperature-Responsive MW Pulsing for Uniform and Energy-Efficient Plant-Based Food Drying
by Mohammad U. H. Joardder and Azharul Karim
Energies 2025, 18(16), 4391; https://doi.org/10.3390/en18164391 - 18 Aug 2025
Viewed by 329
Abstract
This study conducts a simulation-based approach to improve microwave (MW) convective drying using a temperature-responsive pulse ratio (TRPR) method. Traditional fixed-time pulse ratio (TimePR) techniques often result in uneven heating and reduced product quality due to uncontrolled temperature spikes. To address this, a [...] Read more.
This study conducts a simulation-based approach to improve microwave (MW) convective drying using a temperature-responsive pulse ratio (TRPR) method. Traditional fixed-time pulse ratio (TimePR) techniques often result in uneven heating and reduced product quality due to uncontrolled temperature spikes. To address this, a physics-based model was developed using COMSOL Multiphysics 6.3, executed on a high-performance computing (HPC) platform. The TRPR algorithm dynamically adjusts MW ON/OFF cycles based on internal temperature feedback, maintaining the maximum point temperature below a critical threshold of 75 °C. The model geometry, food materials (apple) properties, and boundary conditions were defined to reflect realistic drying scenarios. Simulation results show that TRPR significantly improves temperature and moisture uniformity across the sample. The TRPR method showed superior thermal stability over time-based regulation, maintaining a lower maximum COV of 0.026 compared to 0.045. These values are also well below the COV range of 0.05–0.26 reported in recent studies. Moreover, the TRPR system maintained a constant microwave energy efficiency of 40.7% across all power levels, outperforming the time-based system, which showed lower and slightly declining efficiency from 36.18% to 36.29%, along with higher energy consumption without proportional thermal or moisture removal benefits. These findings highlight the potential of the temperature-responsive pulse ratio (TRPR) method to enhance drying performance, reduce energy consumption, and improve product quality in microwave-assisted food processing. This approach presents a scalable and adaptable solution for future integration into intelligent drying systems. Full article
(This article belongs to the Section B: Energy and Environment)
Show Figures

Figure 1

32 pages, 2110 KB  
Article
Self-Attention Mechanisms in HPC Job Scheduling: A Novel Framework Combining Gated Transformers and Enhanced PPO
by Xu Gao, Hang Dong, Lianji Zhang, Yibo Wang, Xianliang Yang and Zhenyu Li
Appl. Sci. 2025, 15(16), 8928; https://doi.org/10.3390/app15168928 - 13 Aug 2025
Viewed by 513
Abstract
In HPC systems, job scheduling plays a critical role in determining resource allocation and task execution order. With the continuous expansion of computing scale and increasing system complexity, modern HPC scheduling faces two major challenges: a massive decision space consisting of tens of [...] Read more.
In HPC systems, job scheduling plays a critical role in determining resource allocation and task execution order. With the continuous expansion of computing scale and increasing system complexity, modern HPC scheduling faces two major challenges: a massive decision space consisting of tens of thousands of computing nodes and a huge job queue, as well as complex temporal dependencies between jobs and dynamically changing resource states.Traditional heuristic algorithms and basic reinforcement learning methods often struggle to effectively address these challenges in dynamic HPC environments. This study proposes a novel scheduling framework that combines GTrXL with PPO, achieving significant performance improvements through multiple technical innovations. The framework leverages the sequence modeling capabilities of the Transformer architecture and selectively filters relevant historical scheduling information through a dual-gate mechanism, improving long sequence modeling efficiency compared to standard Transformers. The proposed SECT module further enhances resource awareness through dynamic feature recalibration, achieving improved system utilization compared to similar attention mechanisms. Experimental results on multiple datasets (ANL-Intrepid, Alibaba, SDSC-SP2) demonstrate that the proposed components achieve significant performance improvements over baseline PPO implementations. Comprehensive evaluations on synthetic workloads and real HPC trace data show improvements in resource utilization and waiting time, particularly under high-load conditions, while maintaining good robustness across various cluster configurations. Full article
Show Figures

Figure 1

22 pages, 1750 KB  
Article
Towards Energy Efficiency of HPC Data Centers: A Data-Driven Analytical Visualization Dashboard Prototype Approach
by Keith Lennor Veigas, Andrea Chinnici, Davide De Chiara and Marta Chinnici
Electronics 2025, 14(16), 3170; https://doi.org/10.3390/electronics14163170 - 8 Aug 2025
Viewed by 559
Abstract
High-performance computing (HPC) data centers are experiencing rising energy consumption, despite the urgent need for increased efficiency. In this study, we develop an approach inspired by digital twins to enhance energy and thermal management in an HPC facility. We create a comprehensive framework [...] Read more.
High-performance computing (HPC) data centers are experiencing rising energy consumption, despite the urgent need for increased efficiency. In this study, we develop an approach inspired by digital twins to enhance energy and thermal management in an HPC facility. We create a comprehensive framework that incorporates a digital twin for the CRESCO7 supercomputer cluster at ENEA in Italy, integrating data-driven time series forecasting with an interactive analytical dashboard for resource prediction. We begin by reviewing relevant literature on digital twins and modern time series modeling techniques. After ingesting and cleansing sensor and job scheduling datasets, we perform exploratory and inferential analyses to understand key correlations. We then conduct descriptive statistical analyses and identify important features, which are used to train machine learning models for accurate short- and medium-term forecasts of power and temperature. These models feed into a simulated environment that provides real-time prediction metrics and a holistic “health score” for each node, all visualized in a dashboard built with Streamlit. The results demonstrate that a digital twin-based approach can help data center operators efficiently plan resources and maintenance, ultimately reducing the carbon footprint and improving energy efficiency. The proposed framework uniquely combines concepts inspired by digital twins with time series machine learning and interactive visualization for enhanced HPC energy planning. Key contributions include the novel integration of predictive models into a live virtual replica of the HPC cluster, employing a gradient-boosted tree-based LightGBM model. Our findings underscore the potential of data-driven digital twins to facilitate sustainable and intelligent management of HPC data centers. Full article
Show Figures

Figure 1

12 pages, 2807 KB  
Article
Evaluation of Hydroxyapatite–β-Tricalcium Phosphate Collagen Composites for Socket Preservation in a Canine Model
by Dong Woo Kim, Donghyun Lee, Jaeyoung Ryu, Min-Suk Kook, Hong-Ju Park and Seunggon Jung
J. Funct. Biomater. 2025, 16(8), 286; https://doi.org/10.3390/jfb16080286 - 3 Aug 2025
Viewed by 1000
Abstract
This study aimed to compare the performance of three hydroxyapatite–β-tricalcium phosphate (HA–β-TCP) collagen composite grafts in a canine model for extraction socket preservation. Eight mongrel dogs underwent atraumatic bilateral mandibular premolar extraction, and sockets were randomly grafted with HBC28 (20% high-crystalline HA, 80% [...] Read more.
This study aimed to compare the performance of three hydroxyapatite–β-tricalcium phosphate (HA–β-TCP) collagen composite grafts in a canine model for extraction socket preservation. Eight mongrel dogs underwent atraumatic bilateral mandibular premolar extraction, and sockets were randomly grafted with HBC28 (20% high-crystalline HA, 80% β-TCP bovine collagen), HBC37 (30% HA, 70% β-TCP, bovine collagen), or HPC64 (60% HA, 40% β-TCP, porcine collagen). Grafts differed in their HA–β-TCP ratio and collagen origin and content. Animals were sacrificed at 4 and 12 weeks, and the healing sites were evaluated using micro-computed tomography (micro-CT) and histological analysis. At 12 weeks, all groups showed good socket maintenance with comparable new bone formation. However, histological analysis revealed that HBC28 had significantly higher residual graft volume, while HPC64 demonstrated more extensive graft resorption. Histomorphometric analysis confirmed these findings, with statistically significant differences in residual graft area and bone volume fraction. No inflammatory response or adverse tissue reactions were observed in any group. These results suggest that all three HA–β-TCP collagen composites are biocompatible and suitable for socket preservation, with varying resorption kinetics influenced by graft composition. Selection of graft material may thus be guided by the desired rate of replacement by new bone. Full article
(This article belongs to the Special Issue Biomechanical Studies and Biomaterials in Dentistry)
Show Figures

Figure 1

25 pages, 1157 KB  
Article
Investigating Supercomputer Performance with Sustainability in the Era of Artificial Intelligence
by Haruna Chiroma
Appl. Sci. 2025, 15(15), 8570; https://doi.org/10.3390/app15158570 - 1 Aug 2025
Viewed by 613
Abstract
The demand for high-performance computing (HPC) continues to grow, driven by its critical role in advancing innovations in the rapidly evolving field of artificial intelligence. HPC has now entered the era of exascale supercomputers, introducing significant challenges related to sustainability. Balancing HPC performance [...] Read more.
The demand for high-performance computing (HPC) continues to grow, driven by its critical role in advancing innovations in the rapidly evolving field of artificial intelligence. HPC has now entered the era of exascale supercomputers, introducing significant challenges related to sustainability. Balancing HPC performance with environmental sustainability presents a complex, multi-objective optimization problem. To the best of the author’s knowledge, no recent comprehensive investigation has explored the interplay between supercomputer performance and sustainability over a five-year period. This paper addresses this gap by examining the balance between these two aspects over a five-year period. This study collects and analyzes multi-year data on supercomputer performance and energy efficiency. The findings indicate that supercomputers pursuing higher performance often face challenges in maintaining top sustainability, while those focusing on sustainability tend to face challenges in achieving top performance. The analysis reveals that both the performance and power consumption of supercomputers have been rapidly increasing over the last five years. The findings also reveal that the performance of the most computationally powerful supercomputers is directly proportional to power consumption. The energy efficiency gains achieved by some top-performing supercomputers become challenging to maintain in the pursuit of higher performance. The findings of this study highlight the ongoing race toward zettascale supercomputers. This study can provide policymakers, researchers, and technologists with foundational evidence for rethinking supercomputing in the era of artificial intelligence. Full article
Show Figures

Figure 1

24 pages, 4061 KB  
Article
The Impact of Hydrogeological Properties on Mass Displacement in Aquifers: Insights from Implementing a Mass-Abatement Scalable System Using Managed Aquifer Recharge (MAR-MASS)
by Mario Alberto Garcia Torres, Alexandra Suhogusoff and Luiz Carlos Ferrari
Water 2025, 17(15), 2239; https://doi.org/10.3390/w17152239 - 27 Jul 2025
Viewed by 439
Abstract
This study examines the use of a mass-abatement scalable system with managed aquifer recharge (MAR-MASS) as a sustainable solution for restoring salinized aquifers and improving water quality by removing dissolved salts. It offers a practical remediation approach for aquifers affected by salinization in [...] Read more.
This study examines the use of a mass-abatement scalable system with managed aquifer recharge (MAR-MASS) as a sustainable solution for restoring salinized aquifers and improving water quality by removing dissolved salts. It offers a practical remediation approach for aquifers affected by salinization in coastal regions, agricultural areas, and contaminated sites, where variable-density flow poses a challenge. Numerical simulations assessed hydrogeological properties such as hydraulic conductivity, anisotropy, specific yield, mechanical dispersion, and molecular diffusion. A conceptual model integrated hydraulic conditions with spatial and temporal discretization using the FLOPY API for MODFLOW 6 and the IFM API for FEFLOW 10. Python algorithms were run within the high-performance computing (HPC) server, executing simulations in parallel to efficiently process a large number of scenarios, including both preprocessing input data and post-processing results. The study simulated 6950 scenarios, each modeling flow and transport processes over 3000 days of method implementation and focusing on mass extraction efficiency under different initial salinity conditions (3.5 to 35 kg/m3). The results show that the MAR-MASS effectively removed salts from aquifers, with higher hydraulic conductivity prolonging mass removal efficiency. Of the scenarios, 88% achieved potability (0.5 kg/m3) in under five years; among these, 79% achieved potability within two years, and 92% of cases with initial concentrations of 3.5–17.5 kg/m3 reached potability within 480 days. This study advances scientific knowledge by providing a robust model for optimizing managed aquifer recharge, with practical applications in rehabilitating salinized aquifers and improving water quality. Future research may explore MAR-MASS adaptation for diverse hydrogeological contexts and its long-term performance. Full article
(This article belongs to the Section Hydrogeology)
Show Figures

Figure 1

39 pages, 3476 KB  
Article
Lattice Boltzmann Framework for Multiphase Flows by Eulerian–Eulerian Navier–Stokes Equations
by Matteo Maria Piredda and Pietro Asinari
Computation 2025, 13(7), 164; https://doi.org/10.3390/computation13070164 - 9 Jul 2025
Viewed by 315
Abstract
Although the lattice Boltzmann method (LBM) is relatively straightforward, it demands a well-crafted framework to handle the complex partial differential equations involved in multiphase flow simulations. For the first time to our knowledge, this work proposes a novel LBM framework to solve Eulerian–Eulerian [...] Read more.
Although the lattice Boltzmann method (LBM) is relatively straightforward, it demands a well-crafted framework to handle the complex partial differential equations involved in multiphase flow simulations. For the first time to our knowledge, this work proposes a novel LBM framework to solve Eulerian–Eulerian multiphase flow equations without any finite difference correction, including very-large-density ratios and also a realistic relation for the drag coefficient. The proposed methodology and all reported LBM formulas can be applied to any dimension. This opens a promising venue for simulating multiphase flows in large High Performance Computing (HPC) facilities and on novel parallel hardware. This LBM framework consists of six coupled LBM schemes—running on the same lattice—ensuring an efficient implementation in large codes with minimum effort. The preliminary numeral results agree in an excellent way with the reference numerical solution obtained by a traditional finite difference solver. Full article
(This article belongs to the Section Computational Engineering)
Show Figures

Figure 1

32 pages, 1154 KB  
Article
A Case Study on Virtual HPC Container Clusters and Machine Learning Applications
by Piotr Krogulski and Tomasz Rak
Appl. Sci. 2025, 15(13), 7433; https://doi.org/10.3390/app15137433 - 2 Jul 2025
Viewed by 428
Abstract
This article delves into the innovative application of Docker containers as High-Performance-Computing (HPC) environments, presenting the construction and operational efficiency of virtual container clusters. The study primarily focused on the integration of Docker technology in HPC, evaluating its feasibility and performance implications. A [...] Read more.
This article delves into the innovative application of Docker containers as High-Performance-Computing (HPC) environments, presenting the construction and operational efficiency of virtual container clusters. The study primarily focused on the integration of Docker technology in HPC, evaluating its feasibility and performance implications. A portion of the research was devoted to developing a virtual container cluster using Docker. Although the first Docker-enabled HPC studies date back several years, the approach remains highly relevant today, as modern AI-driven science demands portable, reproducible software stacks that can be deployed across heterogeneous, accelerator-rich clusters. Furthermore, the article explores the development of advanced distributed applications, with a special emphasis on Machine Learning (ML) algorithms. Key findings of the study include the successful implementation and operation of a Docker-based cluster. Additionally, the study successfully showcases a Python application using ML for anomaly detection in system logs, highlighting its effective execution in a virtual cluster. This research not only contributes to the understanding of Docker’s potential in distributed environments but also opens avenues for future explorations in the field of containerized HPC solutions and their applications in different areas. Full article
(This article belongs to the Special Issue Novel Insights into Parallel and Distributed Computing)
Show Figures

Figure 1

25 pages, 1615 KB  
Article
Efficient Parallel Processing of Big Data on Supercomputers for Industrial IoT Environments
by Isam Mashhour Al Jawarneh, Lorenzo Rosa, Riccardo Venanzi, Luca Foschini and Paolo Bellavista
Electronics 2025, 14(13), 2626; https://doi.org/10.3390/electronics14132626 - 29 Jun 2025
Cited by 1 | Viewed by 587
Abstract
The integration of distributed big data analytics into modern industrial environments has become increasingly critical, particularly with the rise of data-intensive applications and the need for real-time processing at the edge. While High-Performance Computing (HPC) systems offer robust petabyte-scale capabilities for efficient big [...] Read more.
The integration of distributed big data analytics into modern industrial environments has become increasingly critical, particularly with the rise of data-intensive applications and the need for real-time processing at the edge. While High-Performance Computing (HPC) systems offer robust petabyte-scale capabilities for efficient big data analytics, the performance of big data frameworks, especially on ARM-based HPC systems, remains underexplored. This paper presents an extensive experimental study on deploying Apache Spark 3.0.2, the de facto standard in-memory processing system, on an ARM-based HPC system. This study conducts a comprehensive performance evaluation of Apache Spark through representative big data workloads, including K-means clustering, to assess the effects of latency variations, such as those induced by network delays, memory bottlenecks, or computational overheads, on application performance in industrial IoT and edge computing environments. Our findings contribute to an understanding of how big data frameworks like Apache Spark can be effectively deployed and optimized on ARM-based HPC systems, particularly when leveraging vectorized instruction sets such as SVE, contributing to the broader goal of enhancing the integration of cloud–edge computing paradigms in modern industrial environments. We also discuss potential improvements and strategies for leveraging ARM-based architectures to support scalable, efficient, and real-time data processing in Industry 4.0 and beyond. Full article
Show Figures

Figure 1

21 pages, 6865 KB  
Article
Elegante+: A Machine Learning-Based Optimization Framework for Sparse Matrix–Vector Computations on the CPU Architecture
by Muhammad Ahmad, Sardar Usman, Ameer Hamza, Muhammad Muzamil and Ildar Batyrshin
Information 2025, 16(7), 553; https://doi.org/10.3390/info16070553 - 29 Jun 2025
Viewed by 512
Abstract
Sparse matrix–vector multiplication (SpMV) plays a significant role in the computational costs of many scientific applications such as 2D/3D robotics, power network problems, and computer vision. Numerous implementations using different sparse matrix formats have been introduced to optimize this kernel on CPUs and [...] Read more.
Sparse matrix–vector multiplication (SpMV) plays a significant role in the computational costs of many scientific applications such as 2D/3D robotics, power network problems, and computer vision. Numerous implementations using different sparse matrix formats have been introduced to optimize this kernel on CPUs and GPUs. However, due to the sparsity patterns of matrices and the diverse configurations of hardware, accurately modeling the performance of SpMV remains a complex challenge. SpMV computation is often a time-consuming process because of its sparse matrix structure. To address this, we propose a machine learning-based tool, namely Elegante+, that predicts optimal scheduling policies by analyzing matrix structures. This approach eliminates the need for repetitive trial and error, minimizes errors, and finds the best solution of the SpMV kernel, which enables users to make informed decisions about scheduling policies that maximize computational efficiency. For this purpose, we collected 1000+ sparse matrices from the SuiteSparse matrix market collection and converted them into the compressed sparse row (CSR) format, and SpMV computation was performed by extracting 14 key sparse matrix features. After creating a comprehensive dataset, we trained various machine learning models to predict the optimal scheduling policy, significantly enhancing the computational efficiency and reducing the overhead in high-performance computing environments. Our proposed tool, Elegante+ (XGB with all SpMV features), achieved the highest cross-validation score of 79% and performed five times faster than the default scheduling policy during SpMV in a high-performance computing (HPC) environment. Full article
Show Figures

Graphical abstract

43 pages, 2159 KB  
Systematic Review
A Systematic Review and Classification of HPC-Related Emerging Computing Technologies
by Ehsan Arianyan, Niloofar Gholipour, Davood Maleki, Neda Ghorbani, Abdolah Sepahvand and Pejman Goudarzi
Electronics 2025, 14(12), 2476; https://doi.org/10.3390/electronics14122476 - 18 Jun 2025
Viewed by 1067
Abstract
In recent decades, access to powerful computational resources has brought about a major transformation in science, with supercomputers drawing significant attention from academia, industry, and governments. Among these resources, high-performance computing (HPC) has emerged as one of the most critical processing infrastructures, providing [...] Read more.
In recent decades, access to powerful computational resources has brought about a major transformation in science, with supercomputers drawing significant attention from academia, industry, and governments. Among these resources, high-performance computing (HPC) has emerged as one of the most critical processing infrastructures, providing a suitable platform for evaluating and implementing novel technologies. In this context, the development of emerging computing technologies has opened up new horizons in information processing and the delivery of computing services. In this regard, this paper systematically reviews and classifies emerging HPC-related computing technologies, including quantum computing, nanocomputing, in-memory architectures, neuromorphic systems, serverless paradigms, adiabatic technology, and biological solutions. Within the scope of this research, 142 studies which were mostly published between 2018 and 2025 are analyzed, and relevant hardware solutions, domain-specific programming languages, frameworks, development tools, and simulation platforms are examined. The primary objective of this study is to identify the software and hardware dimensions of these technologies and analyze their roles in improving the performance, scalability, and efficiency of HPC systems. To this end, in addition to a literature review, statistical analysis methods are employed to assess the practical applicability and impact of these technologies across various domains, including scientific simulation, artificial intelligence, big data analytics, and cloud computing. The findings of this study indicate that emerging HPC-related computing technologies can serve as complements or alternatives to classical computing architectures, driving substantial transformations in the design, implementation, and operation of high-performance computing infrastructures. This article concludes by identifying existing challenges and future research directions in this rapidly evolving field. Full article
Show Figures

Figure 1

22 pages, 2191 KB  
Review
Towards Efficient HPC: Exploring Overlap Strategies Using MPI Non-Blocking Communication
by Yuntian Zheng and Jianping Wu
Mathematics 2025, 13(11), 1848; https://doi.org/10.3390/math13111848 - 2 Jun 2025
Viewed by 1026
Abstract
As high-performance computing (HPC) platforms continue to scale up, communication costs have become a critical bottleneck affecting overall application performance. An effective strategy to overcome this limitation is to overlap communication with computation. The Message Passing Interface (MPI), as the de facto standard [...] Read more.
As high-performance computing (HPC) platforms continue to scale up, communication costs have become a critical bottleneck affecting overall application performance. An effective strategy to overcome this limitation is to overlap communication with computation. The Message Passing Interface (MPI), as the de facto standard for communication in HPC, provides non-blocking communication primitives that make such overlapping feasible. By enabling asynchronous communication, non-blocking operations reduce idle time of cores caused by data transfer delays, thereby improving resource utilization. Overlapping communication with computation is particularly important for enhancing the performance of large-scale scientific applications, such as numerical simulations, climate modeling, and other data-intensive tasks. However, achieving efficient overlapping is non-trivial and depends not only on advances in hardware technologies such as Remote Direct Memory Access (RDMA), but also on well-designed and optimized MPI implementations. This paper presents a comprehensive survey on the principles of MPI non-blocking communication, the core techniques for achieving computation–communication overlap, and some representative applications in scientific computing. Alongside the survey, we include a preliminary experimental study evaluating the effectiveness of asynchronous progress mechanism on modern HPC platforms to support the development of parallel programs for HPC researchers and practitioners. Full article
(This article belongs to the Special Issue Numerical Analysis and Algorithms for High-Performance Computing)
Show Figures

Figure 1

Back to TopTop