Performance Optimization and Performance Evaluation

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (31 October 2022) | Viewed by 19800

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China
Interests: parallel computing; parallel programming; parallel computational model
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China
Interests: parallel computational model
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Algorithms is a peer-reviewed, open access journal which provides an advanced forum for studies related to algorithms and their applications. Algorithms is published monthly online by MDPI. It is free for readers and indexed within Scopus, ESCI (Web of Science), Ei Compendex, MathSciNet, and many other databases. For more information, please check the website

https://www.mdpi.com/journal/algorithms.

I am in charge of this Special Issue of Algorithms on the topic of “Performance Optimization and Performance Evaluation”. This Special Issue invites original, high-quality work presenting novel research on performance optimizations. Featured articles should present novel strategies that address issues in different aspects of performance such as evaluation, algorithm, programming models, AI, co-design, and benchmarks. It is my pleasure to invite you to submit your work to this Special Issue.

Prof. Dr. Yunquan Zhang
Dr. Liang Yuan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • performance optimization
  • performance evaluation
  • parallel algorithms
  • parallel programming models
  • HPC applications
  • HPC in AI
  • big data
  • hardware/software co-design
  • performance and energy efficiency/benchmarks
  • performance tuning

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 621 KiB  
Article
Performance Evaluation of Open-Source Serverless Platforms for Kubernetes
by Jonathan Decker, Piotr Kasprzak and Julian Martin Kunkel
Algorithms 2022, 15(7), 234; https://doi.org/10.3390/a15070234 - 2 Jul 2022
Cited by 1 | Viewed by 4134
Abstract
Serverless computing has grown massively in popularity over the last few years, and has provided developers with a way to deploy function-sized code units without having to take care of the actual servers or deal with logging, monitoring, and scaling of their code. [...] Read more.
Serverless computing has grown massively in popularity over the last few years, and has provided developers with a way to deploy function-sized code units without having to take care of the actual servers or deal with logging, monitoring, and scaling of their code. High-performance computing (HPC) clusters can profit from improved serverless resource sharing capabilities compared to reservation-based systems such as Slurm. However, before running self-hosted serverless platforms in HPC becomes a viable option, serverless platforms must be able to deliver a decent level of performance. Other researchers have already pointed out that there is a distinct lack of studies in the area of comparative benchmarks on serverless platforms, especially for open-source self-hosted platforms. This study takes a step towards filling this gap by systematically benchmarking two promising self-hosted Kubernetes-based serverless platforms in comparison. While the resulting benchmarks signal potential, they demonstrate that many opportunities for performance improvements in serverless computing are being left on the table. Full article
(This article belongs to the Special Issue Performance Optimization and Performance Evaluation)
Show Figures

Figure 1

20 pages, 2715 KiB  
Article
Revisiting the Design of Parallel Stream Joins on Trusted Execution Environments
by Souhail Meftah, Shuhao Zhang, Bharadwaj Veeravalli and Khin Mi Mi Aung
Algorithms 2022, 15(6), 183; https://doi.org/10.3390/a15060183 - 25 May 2022
Cited by 1 | Viewed by 1989
Abstract
The appealing properties of secure hardware solutions such as trusted execution environment (TEE) including low computational overhead, confidentiality guarantee, and reduced attack surface have prompted considerable interest in adopting them for secure stream processing applications. In this paper, we revisit the design of [...] Read more.
The appealing properties of secure hardware solutions such as trusted execution environment (TEE) including low computational overhead, confidentiality guarantee, and reduced attack surface have prompted considerable interest in adopting them for secure stream processing applications. In this paper, we revisit the design of parallel stream join algorithms on multicore processors with TEEs. In particular, we conduct a series of profiling experiments to investigate the impact of alternative design choices to parallelize stream joins on TEE including: (1) execution approaches, (2) partitioning schemes, and (3) distributed scheduling strategies. From the profiling study, we observe three major high-performance impediments: (a) the computational overhead introduced with cryptographic primitives associated with page swapping operations, (b) the restrictive Enclave Page Cache (EPC) size that limits the supported amount of in-memory processing, and (c) the lack of vertical scalability to support the increasing workload often required for near real-time applications. Addressing these issues allowed us to design SecJoin, a more efficient parallel stream join algorithm that exploits modern scale-out architectures with TEEs rendering no trade-offs on security whilst optimizing performance. We present our model-driven parameterization of SecJoin and share our experimental results which have shown up to 4-folds of improvements in terms of throughput and latency. Full article
(This article belongs to the Special Issue Performance Optimization and Performance Evaluation)
Show Figures

Figure 1

18 pages, 602 KiB  
Article
Trinity: Neural Network Adaptive Distributed Parallel Training Method Based on Reinforcement Learning
by Yan Zeng, Jiyang Wu, Jilin Zhang, Yongjian Ren and Yunquan Zhang
Algorithms 2022, 15(4), 108; https://doi.org/10.3390/a15040108 - 24 Mar 2022
Cited by 1 | Viewed by 2523
Abstract
Deep learning, with increasingly large datasets and complex neural networks, is widely used in computer vision and natural language processing. A resulting trend is to split and train large-scale neural network models across multiple devices in parallel, known as parallel model training. Existing [...] Read more.
Deep learning, with increasingly large datasets and complex neural networks, is widely used in computer vision and natural language processing. A resulting trend is to split and train large-scale neural network models across multiple devices in parallel, known as parallel model training. Existing parallel methods are mainly based on expert design, which is inefficient and requires specialized knowledge. Although automatically implemented parallel methods have been proposed to solve these problems, these methods only consider a single optimization aspect of run time. In this paper, we present Trinity, an adaptive distributed parallel training method based on reinforcement learning, to automate the search and tuning of parallel strategies. We build a multidimensional performance evaluation model and use proximal policy optimization to co-optimize multiple optimization aspects. Our experiment used the CIFAR10 and PTB datasets based on InceptionV3, NMT, NASNet and PNASNet models. Compared with Google’s Hierarchical method, Trinity achieves up to 5% reductions in runtime, communication, and memory overhead, and up to a 40% increase in parallel strategy search speeds. Full article
(This article belongs to the Special Issue Performance Optimization and Performance Evaluation)
Show Figures

Figure 1

16 pages, 8774 KiB  
Article
KMC3 and CHTKC: Best Scenarios, Deficiencies, and Challenges in High-Throughput Sequencing Data Analysis
by Deyou Tang, Daqiang Tan, Weihao Xiao, Jiabin Lin and Juan Fu
Algorithms 2022, 15(4), 107; https://doi.org/10.3390/a15040107 - 24 Mar 2022
Viewed by 2252
Abstract
Background: K-mer frequency counting is an upstream process of many bioinformatics data analysis workflows. KMC3 and CHTKC are the representative partition-based k-mer counting and non-partition-based k-mer counting algorithms, respectively. This paper evaluates the two algorithms and presents their best applicable scenarios and potential [...] Read more.
Background: K-mer frequency counting is an upstream process of many bioinformatics data analysis workflows. KMC3 and CHTKC are the representative partition-based k-mer counting and non-partition-based k-mer counting algorithms, respectively. This paper evaluates the two algorithms and presents their best applicable scenarios and potential improvements using multiple hardware contexts and datasets. Results: KMC3 uses less memory and runs faster than CHTKC on a regular configuration server. CHTKC is efficient on high-performance computing platforms with high available memory, multi-thread, and low IO bandwidth. When tested with various datasets, KMC3 is less sensitive to the number of distinct k-mers and is more efficient for tasks with relatively low sequencing quality and long k-mer. CHTKC performs better than KMC3 in counting assignments with large-scale datasets, high sequencing quality, and short k-mer. Both algorithms are affected by IO bandwidth, and decreasing the influence of the IO bottleneck is critical as our tests show improvement by filtering and compressing consecutive first-occurring k-mers in KMC3. Conclusions: KMC3 is more competitive for running counter on ordinary hardware resources, and CHTKC is more competitive for counting k-mers in super-scale datasets on higher-performance computing platforms. Reducing the influence of the IO bottleneck is essential for optimizing the k-mer counting algorithm, and filtering and compressing low-frequency k-mers is critical in relieving IO impact. Full article
(This article belongs to the Special Issue Performance Optimization and Performance Evaluation)
Show Figures

Figure 1

19 pages, 1861 KiB  
Article
Accelerate Incremental TSP Algorithms on Time Evolving Graphs with Partitioning Methods
by Shalini Sharma and Jerry Chou
Algorithms 2022, 15(2), 64; https://doi.org/10.3390/a15020064 - 14 Feb 2022
Cited by 1 | Viewed by 2412
Abstract
In time-evolving graphs, the graph changes at each time interval, and the previously computed results become invalid. We addressed this issue for the traveling salesman problem (TSP) in our previous work and proposed an incremental algorithm where the TSP tour is computed from [...] Read more.
In time-evolving graphs, the graph changes at each time interval, and the previously computed results become invalid. We addressed this issue for the traveling salesman problem (TSP) in our previous work and proposed an incremental algorithm where the TSP tour is computed from the previous result instead of the whole graph. In our current work, we have mapped the TSP problem to three partitioning methods named vertex size attribute, edge attribute, and k-means; then, we compared the TSP tour results. We have also examined the effect of increasing the number of partitions on the total computation time. Through our experiments, we have observed that the vertex size attribute performs the best because of a balanced number of vertices in each partition. Full article
(This article belongs to the Special Issue Performance Optimization and Performance Evaluation)
Show Figures

Figure 1

15 pages, 682 KiB  
Article
An ADMM Based Parallel Approach for Fund of Fund Construction
by Yidong Chen, Chen Li and Zhonghua Lu
Algorithms 2022, 15(2), 35; https://doi.org/10.3390/a15020035 - 25 Jan 2022
Viewed by 2179
Abstract
In this paper, we propose a parallel algorithm for a fund of fund (FOF) optimization model. Based on the structure of objective function, we create an augmented Lagrangian function and separate the quadratic term from the nonlinear term by the alternate direction multiplier [...] Read more.
In this paper, we propose a parallel algorithm for a fund of fund (FOF) optimization model. Based on the structure of objective function, we create an augmented Lagrangian function and separate the quadratic term from the nonlinear term by the alternate direction multiplier method (ADMM), which creates two new subproblems that are much easier to be computed. To accelerate the convergence speed of the proposed algorithm, we use an adaptive step size method to adjust the step parameter according to the residual of the dual problem at every iterate. We show the parallelization of the proposed algorithm and implement it on CUDA with block storage for the structured matrix, which is shown to be up to two orders of magnitude faster than the CPU implementation on large-scale problems. Full article
(This article belongs to the Special Issue Performance Optimization and Performance Evaluation)
Show Figures

Figure 1

18 pages, 1248 KiB  
Article
A New Algorithm for Simultaneous Retrieval of Aerosols and Marine Parameters
by Taddeo Ssenyonga, Øyvind Frette, Børge Hamre, Knut Stamnes, Dennis Muyimbwa, Nicolausi Ssebiyonga and Jakob J. Stamnes
Algorithms 2022, 15(1), 4; https://doi.org/10.3390/a15010004 - 24 Dec 2021
Viewed by 2526
Abstract
We present an algorithm for simultaneous retrieval of aerosol and marine parameters in coastal waters. The algorithm is based on a radiative transfer forward model for a coupled atmosphere-ocean system, which is used to train a radial basis function neural network (RBF-NN) to [...] Read more.
We present an algorithm for simultaneous retrieval of aerosol and marine parameters in coastal waters. The algorithm is based on a radiative transfer forward model for a coupled atmosphere-ocean system, which is used to train a radial basis function neural network (RBF-NN) to obtain a fast and accurate method to compute radiances at the top of the atmosphere (TOA) for given aerosol and marine input parameters. The inverse modelling algorithm employs multidimensional unconstrained non-linear optimization to retrieve three marine parameters (concentrations of chlorophyll and mineral particles, as well as absorption by coloured dissolved organic matter (CDOM)), and two aerosol parameters (aerosol fine-mode fraction and aerosol volume fraction). We validated the retrieval algorithm using synthetic data and found it, for both low and high sun, to predict each of the five parameters accurately, both with and without white noise added to the top of the atmosphere (TOA) radiances. When varying the solar zenith angle (SZA) and retraining the RBF-NN without noise added to the TOA radiance, we found the algorithm to predict the CDOM absorption, chlorophyll concentration, mineral concentration, aerosol fine-mode fraction, and aerosol volume fraction with correlation coefficients greater than 0.72, 0.73, 0.93, 0.67, and 0.87, respectively, for 45 SZA ≤ 75. By adding white Gaussian noise to the TOA radiances with varying values of the signal-to-noise-ratio (SNR), we found the retrieval algorithm to predict CDOM absorption, chlorophyll concentration, mineral concentration, aerosol fine-mode fraction, and aerosol volume fraction well with correlation coefficients greater than 0.77, 0.75, 0.91, 0.81, and 0.86, respectively, for high sun and SNR ≥ 95. Full article
(This article belongs to the Special Issue Performance Optimization and Performance Evaluation)
Show Figures

Figure 1

Back to TopTop