Next Article in Journal
CoroNet: Deep Neural Network-Based End-to-End Training for Breast Cancer Diagnosis
Next Article in Special Issue
A Global-Local Feature Fusion Convolutional Neural Network for Bone Age Assessment of Hand X-ray Images
Previous Article in Journal
Cold Storage Media versus Optisol-GS in the Preservation of Corneal Quality for Keratoplasty: A Systematic Review
Previous Article in Special Issue
A Feedback System Supporting Students Approaching a High-Level Programming Course
 
 
Article
Peer-Review Record

AAQAL: A Machine Learning-Based Tool for Performance Optimization of Parallel SPMV Computations Using Block CSR

Appl. Sci. 2022, 12(14), 7073; https://doi.org/10.3390/app12147073
by Muhammad Ahmed 1, Sardar Usman 2,†, Nehad Ali Shah 3, M. Usman Ashraf 4,*,†, Ahmed Mohammed Alghamdi 5, Adel A. Bahadded 6 and Khalid Ali Almarhabi 7
Reviewer 1:
Appl. Sci. 2022, 12(14), 7073; https://doi.org/10.3390/app12147073
Submission received: 5 May 2022 / Revised: 19 June 2022 / Accepted: 5 July 2022 / Published: 13 July 2022
(This article belongs to the Special Issue Applied and Innovative Computational Intelligence Systems)

Round 1

Reviewer 1 Report

The paper proposes a machine learning based tool for performance optimization of parallel SPMV computations using block CSR. In order to reduce the pressure on the storage subsystem, this paper uses the BCSR storage format to optimize the performance of the SpMV kernel. It's a good question. But there are still some problems with this paper.

(1) The writing logic of the abstract part of the article is very poor, please revise this part with your heart. The abstract part of the introduction to write too much, the main part of the abstract to write the contribution of this paper.

(2) The introductory part is not very logical. It is necessary to clarify why the article is researched and what problems it solves.

(3) The compression algorithm proposed in the article needs to show the accuracy of the compression results obtained. Compared with the data characteristics before uncompressed, does it change the intrinsic characteristics of the original data?

(4) The pictures in the article will be blurred when zoomed in, please use vector graphics.

(5) The article selects a variety of machine learning-based prediction models to compare the results, but the obtained results cannot compare the superiority of the compression algorithm proposed in the article, and it is necessary to compare the existing compression algorithms.

(6) There are many errors in the English single abbreviation, please check and modify carefully.

Author Response

We are really grateful to the respected reviewer for spending precious time over our manuscript and giving constructive comments and suggestions. We have addressed and responded to all the valued comments one by one in attached document. 

Author Response File: Author Response.docx

Reviewer 2 Report

The paper introduces a machine learning approach to determining optimal block sizes for block-CSR matrix storage and SpMV operations. By using wide range of matrices from a variety of problem domains, the authors determine a number of feautres ranging from trivially computable to ones that require significant compute on all matrix data. These features are used for a number of fairly simple machine learning algorithms to automatically determine the best block size. The results are evaluated extensively using various performance metrics and accuracy measures. 

The strength of the paper is that it studies a very wide range of matrices, making it potentially applicable to a wide range of problems. The measures are resonably simple, and so are the machine learning algorithms, makign them easy to include in applications, and the performance advantages are worth it. The key weakness of the paper is that it doesn't motivate how AAQAL would be used, and does not discuss the involved overheads. It is clear that once the matrix is converted to block CSR with the optimal/predicted block size, there is a performance advantage, but what is the cost of computing "important" features, what is the cost of running the machine learning model (for inference), and most importantly what is the cost of converting a matrix from CSR to BCSR? Finally, what are the overall speedups (with and without including time for conversion) compared to just basic CSR?

The paper is overall well written and easy to follow, though there are numerous typos (a few below). However, the performance figures are inconsistent and formatted quite poorly (different sizes, scales, borders, no vector graphics), this has to be addressed and cleaned up.

Minor comments:
- sec 2.1.1 why is Y compitalized? It's supposed to be a vector, which usually use lower case letters.
- page 7 line 328 - who is "Author"?
- page 8 line 296 - 309 - no need to say "In step XX" or "segment" etc. 
- section 4.1 why mention OpenMP and Operating System multiple times? Openmpi-3.0.0 does not do OpenMP, why is it there?
- what is the hardware used for running the tests? How many times were each test run? Was there any significant noise in measurements? I am sure there was for small matrices...
- How is "Score" computed?
- Section 4.6 line 522 SPMV all caps
- Section 4.6 what is "hit and trail"?

Author Response

We are really grateful to the respected reviewer for spending precious time over our manuscript and giving constructive comments and suggestions. We have addressed and responded all the valued comments one by one in attached document. 

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors have not addressed my main concerns (the minor ones they did), namely those in the following paragraph:
The key weakness of the paper is that it doesn't motivate how AAQAL would be used, and does not discuss the involved overheads. It is clear that once the matrix is converted to block CSR with the optimal/predicted block size, there is a performance advantage, but what is the cost of computing "important" features, what is the cost of running the machine learning model (for inference), and most importantly what is the cost of converting a matrix from CSR to BCSR? Finally, what are the overall speedups (with and without including time for conversion) compared to just basic CSR?

These have to be addressed before the paper's results can be considered relevant and appropriately placed in the state of the art.

Author Response

Thank you very much to the respected reviewer for your valued comments. We have given the justification in the attached file as per given comments. 

Author Response File: Author Response.docx

Round 3

Reviewer 2 Report

The authors have sufficiently addressed my concerns.

Author Response

Thank you very much to the editor for giving constructive comments and suggestions to improve our paper. We have updated the manuscript as per the given comments. 

Author Response File: Author Response.docx

Back to TopTop