Feature-Interaction-Enhanced Sequential Transformer for Click-Through Rate Prediction
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper presents a method for predicting the click-through rate (CTR) using a combination of feature extraction and sequential modelling. The work is well-written and cohesive, and I only have a few minor queries:
1. Is there a specific reference for AutoInt? Citing this when it is first introduced (in line 144) would be helpful.
2. There are also several abbreviations introduced throughout the work; a nomenclature listing them for easy reference should be provided.
3. In line 242, it is stated that `if the number of AutoInt layers... is n, the output of the last AutoInt layer... becomes the input of the Transformer layer.' Are there any other cases besides this (i.e., if the number of layers is say, n-1 instead)? What would happen in those cases?
4. What is meant by `scale factor \sqrt{d_h} to avoid large values.' in line 376? What are these values of, and how large can they be allowed to become before being detrimental?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper represents the original idea of the author. The literature review is extensive and relevant. The methodology is presented well and it seems that everything the authors did could be repeated. The experimental analysis is very comprehensive and of high quality. Various scenarios, impacts of individual parts of the proposed model were analyzed, two publicly available databases were used. My main complaint is the conclusion, that is, its very short form. A lot of things have been analyzed and the work has a large number of experiments, but the conclusions from such a large number of analyzes are very limited. It is necessary to state clear conclusions based on the individual analyzes that have been carried out. You should write in the conclusion what are the direct conclusions of individual experimenters. In this way, it may seem to the readers that perhaps some experimental analyzes are superfluous. As for the graphical representation of the results, I think that it could be better in some parts. - Figure 1 can be made larger in order to see very small text on the lowest layer. - In Figure 2 all the text is in bold, I'm not sure this is right because all the marks are not matrices. Figure 6 may be enlarged
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsMost of my comments have been satisfactorily addressed; there are however still some abbreviations left out of the list. While not every abbreviation needs to be included (especially if they are only used within the same few lines or paragraph), the following are referred to in other later pages, and so should be included in the list for convenience:
- Markov chains (MC)
- factorization machine (FM)
- product-based neural networks (PNN)
- Gated Recurrent Unit (GRU)
- attentional update gate (AUGRU)
Additionally, check if `attentional update gate' corresponds to AUGRU (since the abbreviation implies otherwise), and if the Model label `PNN' in Table 3 is correct (or if it should be `IPNN').
Author Response
Please see the attachment.
Author Response File: Author Response.pdf