applsci-logo

Journal Browser

Journal Browser

Advanced Technologies and Applications of High-Performance Computing and Parallel Computing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 May 2024) | Viewed by 3973

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China
Interests: parallel computing; parallel programming; parallel computational model
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China
Interests: parallel computational model
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Applied Sciences is a semi-monthly peer-reviewed, open access journal which provides an advanced forum for studies related to all aspects of applied physics, applied chemistry, applied biology, and engineering, environmental, and Earth sciences. It is free for readers and indexed within SCIE, Scopus, ESCI (Web of Science), Ei Compendex, MathSciNet, and many other databases. For more information, please check: https://www.mdpi.com/journal/applsci.

This Special Issue, “Advanced Technologies and Applications of High Performance Computing and Parallel Computing” of Applied Sciences, invites original, high-quality work presenting novel research on high-performance computing. Featured articles should present innovative strategies that address issues in different aspects of performance, such as parallelization, evaluation, algorithm, programming models, autotuning, co-design, and benchmarks.

Prof. Dr. Yunquan Zhang
Dr. Liang Yuan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • performance optimization
  • performance evaluation
  • parallel algorithms
  • parallel programming models
  • HPC applications
  • HPC in AI
  • big data
  • hardware/software co-design
  • performance and energy efficiency/benchmarks
  • performance tuning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 1194 KiB  
Article
CAL: Core-Aware Lock for the big.LITTLE Multicore Architecture
by Shiqiang Nie, Yingming Liu, Jie Niu and Weiguo Wu
Appl. Sci. 2024, 14(15), 6449; https://doi.org/10.3390/app14156449 - 24 Jul 2024
Viewed by 569
Abstract
The concept of “all cores are created equal” has been popular for several decades due to its simplicity and effectiveness in CPU (Central Processing Unit) design. The more cores the CPU has, the higher performance the host owns and the higher the power [...] Read more.
The concept of “all cores are created equal” has been popular for several decades due to its simplicity and effectiveness in CPU (Central Processing Unit) design. The more cores the CPU has, the higher performance the host owns and the higher the power consumption. However, power-saving is also one of the key goals for servers in data centers and embedded devices (e.g., mobile phones). The big.LITTLE multicore architecture, which contains high-performance cores (namely big core) and power-saved cores (namely little core), has been developed by ARM (Advanced RISC Machine) and Intel to trade off performance and power efficiency. Facing the new heterogeneous computing architecture, the traditional lock algorithms, which are designed to run on homogeneous computing architecture, cannot work optimally as usual and drop into the performance issue for the difference between big core and little core. In our preliminary experiment, we observed that, in the big.LITTLE multicore architecture, all these lock algorithms exhibit sub-optimal performance. The FIFO-based (First In First Out) locks experience throughput degradation, while the performance of competition-based locks can be divided into two categories. One of them is big-core-friendly, so their tail latency increases significantly; the other is little-core-friendly. Not only does the tail latency increase, but the throughput is also degraded. Motivated by this observation, we propose a Core-Aware Lock for the big.LITTLE multicore architecture named CAL, which keeps each core having an equal opportunity to access the critical section in the program. The core idea of the CAL is to take the slowdown ratio as the matric to reorder lock requests of these big and little cores. By evaluating benchmarks and a real-world application named LevelDB, CAL is confirmed to achieve fairness goals in heterogeneous computing architecture without sacrificing the performance of the big core. Compared to several traditional lock algorithms, the CAL’s fairness has increased by up to 67%; and Its throughput is 26% higher than FIFO-based locks and 53% higher than competition-based locks, respectively. In addition, the tail latency of CAL is always kept at a low level. Full article
Show Figures

Figure 1

33 pages, 2760 KiB  
Article
Developing a Platform Using Petri Nets and GPenSIM for Simulation of Multiprocessor Scheduling Algorithms
by Daniel Osmundsen Dirdal, Danny Vo, Yuming Feng and Reggie Davidrajuh
Appl. Sci. 2024, 14(13), 5690; https://doi.org/10.3390/app14135690 - 29 Jun 2024
Viewed by 549
Abstract
Efficient multiprocessor scheduling is pivotal in optimizing the performance of parallel computing systems. This paper leverages the power of Petri nets and the tool GPenSIM to model and simulate a variety of multiprocessor scheduling algorithms (the basic algorithms such as first come first [...] Read more.
Efficient multiprocessor scheduling is pivotal in optimizing the performance of parallel computing systems. This paper leverages the power of Petri nets and the tool GPenSIM to model and simulate a variety of multiprocessor scheduling algorithms (the basic algorithms such as first come first serve, shortest job first, and round robin, and more sophisticated schedulers like multi-level feedback queue and Linux’s completely fair scheduler). This paper presents the evaluation of three crucial performance metrics in multiprocessor scheduling (such as turnaround time, response time, and throughput) under various scheduling algorithms. However, the primary focus of the paper is to develop a robust simulation platform consisting of Petri Modules to facilitate the dynamic representation of concurrent processes, enabling us to explore the real-time interactions and dependencies in a multiprocessor environment; more advanced and newer schedulers can be tested with the simulation platform presented in this paper. Full article
Show Figures

Figure 1

16 pages, 563 KiB  
Article
An Asynchronous Parallel I/O Framework for Mass Conservation Ocean Model
by Renbo Pang, Fujiang Yu, Yu Zhang and Ye Yuan
Appl. Sci. 2023, 13(24), 13230; https://doi.org/10.3390/app132413230 - 13 Dec 2023
Viewed by 962
Abstract
I/O is often a performance bottleneck in global ocean circulation models with fine spatial resolution. In this paper, we present an asynchronous parallel I/O framework and demonstrate its efficacy in the Mass Conservation Ocean Model (MaCOM) as a case study. By largely reducing [...] Read more.
I/O is often a performance bottleneck in global ocean circulation models with fine spatial resolution. In this paper, we present an asynchronous parallel I/O framework and demonstrate its efficacy in the Mass Conservation Ocean Model (MaCOM) as a case study. By largely reducing I/O operations in computing processes and overlapping output in I/O processes with computation in computing processes, this framework significantly improves the performance of the MaCOM. Through both reordering output data for maintaining data continuity and combining file access for reducing file operations, the I/O optimizing algorithms are provided to improve output bandwidth. In the case study of the MaCOM, the cost of output in I/O processes can be overlapped by up to 99% with computation in computing processes as decreasing output frequency. The 1D data output bandwidth with these I/O optimizing algorithms is 3.1 times faster than before optimization at 16 I/O worker processes. Compared to the synchronous parallel I/O framework, the overall performance of MaCOM is improved by 38.8% at 1024 computing processes for a 7-day global ocean forecast with 1 output every 2 h through the asynchronous parallel I/O framework presented in this paper. Full article
Show Figures

Figure 1

19 pages, 1494 KiB  
Article
Exploiting Data Similarity to Improve SSD Read Performance
by Shiqiang Nie, Jie Niu, Zeyu Zhang, Yingmeng Hu, Chenguang Shi and Weiguo Wu
Appl. Sci. 2023, 13(24), 13017; https://doi.org/10.3390/app132413017 - 6 Dec 2023
Viewed by 1381
Abstract
Although NAND (Not And) flash-based Solid-State Drive (SSD) has recently demonstrated a significant performance advantage against hard disk, it still suffers from non-negligible performance under-utilization issues as the access conflict often occurs during servicing IO requests due to the share mechanism (e.g., several [...] Read more.
Although NAND (Not And) flash-based Solid-State Drive (SSD) has recently demonstrated a significant performance advantage against hard disk, it still suffers from non-negligible performance under-utilization issues as the access conflict often occurs during servicing IO requests due to the share mechanism (e.g., several chips share one channel bus, several planes share one data register inside the die). Many research works have been devoted to minimizing access conflict by redesigning IO scheduling, cache replacement, and so on. These works have achieved reasonable results; however, the potential data similarity characterization is not utilized fully in prior works to alleviate access conflict. The basic idea is that, as data duplication is common in many workloads where data with the same content from different requests could be distributed to the address with minimized access conflict (i.e., the address does not share the same channel or chip), the logic address is mapped to more than one physical address. Therefore, the data can be read out from candidate pages when the channel or chip of its original address is busy. Motivated by this idea, we propose Data Similarity aware Flash Translation Layer (DS-FTL), which mainly includes a content-aware page allocation scheme and a multi-path read scheme. The DS-FTL enables maximization of the channel-level and chip-level parallelism and avoids the read stall induced by bus-shared mechanisms. We also conducted a series of experiments on SSDsim, with the subsequent results depicting the effectiveness of our scheme. Compared with the state-of-art, our scheme reduces read latency by 35.3% on average in our workloads. Full article
Show Figures

Figure 1

Back to TopTop