Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings

Appl. Sci. 2024, 14(7), 3076; https://doi.org/10.3390/app14073076

by Rongsheng Li¹

, Jin Xu¹, Zhixiong Cao¹, Hai-Tao Zheng^1,2,*

and Hong-Gee Kim³

Reviewer 1: Anonymous

Reviewer 2:

Kevin Matthe Caramancion

Reviewer 3:

Ricardo Malheiro

Reviewer 4: Anonymous

Appl. Sci. 2024, 14(7), 3076; https://doi.org/10.3390/app14073076

Submission received: 23 February 2024 / Revised: 29 March 2024 / Accepted: 30 March 2024 / Published: 6 April 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

A newly proposed technique called Segmented Base Adjustment for Rotary Position Embeddings (SBA-RoPE) has been introduced by researchers to enhance the performance of long text processing by extending the context window. The proposed method was compared to existing state-of-the-art methods such as Position Interpolation (PI), NTK, and YaRN. The researchers fine-tuned the Pythia-2.8B model on the PG-19 dataset and conducted passkey retrieval experiments. The results show that the proposed method outperforms the state-of-the-art methods in the category of model perplexity or is the second-best in most cases. Additionally, the proposed method demonstrates high accuracy in passkey retrieval across different context window sizes with a scaling factor of s = 4. However, the proposed method is less efficient with a scaling factor of s = 2, where it is the second-best in most cases. Overall, the article is well-written and presents promising results.

However, the article could be improved with some minor textual changes. For instance, the sentence "There are existing methods, such as Positional Interpolation (PI)[12], Neural Tangent Kernel (NTK)[13], and YaRN[14], to expand the context window, each addressing different aspects of the challenge, but they have their own limitations" is difficult to comprehend in its current form. Similarly, the sentence "This approach not only preserves the model's performance on tasks within the original context window but also enhances its capacity to generalize to tasks that necessitate longer contexts" could use some rephrasing.

Comments on the Quality of English Language

The English is very good.

Author Response

Dear Reviewer,

We greatly appreciate your thoughtful review and constructive feedback on our manuscript. Your comments have provided us with valuable insights that have helped us refine our work and improve its clarity and readability. In response to your suggestions, we have made the following revisions:

1. We have carefully revised the sentences you highlighted for clarity and ease of understanding.

2. Beyond these specific sentences, we have also reviewed the manuscript as a whole to identify and revise any other sections that might benefit from clearer expression or more direct language.

We believe these revisions address your concerns and enrich the manuscript, making it more accessible to our readers. We are grateful for the opportunity to improve our work based on your feedback and hope that our revisions meet your expectations.

Thank you once again for your invaluable input and guidance.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper introduces a novel approach called SBA-RoPE (Segmented Base Adjustment for Rotary Position Embeddings) to efficiently extend the context window in large language models (LLMs) by adjusting the base of rotary position embeddings (RoPE) in a segmented manner. This method stands out by offering a smart solution to a significant challenge in natural language processing: extending the capacity of models to process and understand longer text sequences. The approach is methodically validated through experiments that demonstrate SBA-RoPE's effectiveness in extending context windows without compromising the model's performance, even on texts exceeding original training lengths. Moreover, the paper contributes to the field by highlighting its potential for practical application in improving the performance of LLMs on long-text processing tasks, making it a valuable resource for researchers and practitioners looking to enhance model capabilities in handling extended contexts.

While the paper effectively addresses extending the context window in LLMs, it primarily focuses on the technical and methodological aspects, with less emphasis on the broader implications of this enhancement. For instance, the discussion on how SBA-RoPE could impact various NLP applications, such as document summarization or question answering over long documents, is somewhat limited. Additionally, while the paper compares SBA-RoPE with other methods like PI, NTK, and YaRN, a more detailed analysis of the trade-offs between these approaches, considering factors such as computational efficiency, ease of implementation, and scalability, could provide a clearer guide for choosing among them. Finally, the paper could benefit from a deeper exploration of the limitations and potential challenges in deploying SBA-RoPE in real-world scenarios, including its integration with different model architectures and its performance on diverse datasets.

To enhance the paper, the authors could expand the discussion section to explore the broader implications of SBA-RoPE on various NLP applications that deal with long texts, providing insights into potential use cases and benefits. A more comprehensive comparison with existing methods, detailing the trade-offs in terms of computational resources, scalability, and practical implementation challenges, would help readers better understand the positioning of SBA-RoPE in the landscape of solutions for extending context windows. Additionally, addressing potential limitations and challenges in real-world applications, including considerations for different languages, domains, and model architectures, would offer a more nuanced view of SBA-RoPE's applicability and areas for future research. Expanding on these aspects could significantly enhance the paper's contribution to the field and its utility for a wider audience.

Author Response

Dear Reviewer,

We are deeply grateful for your comprehensive review and insightful feedback on our manuscript. Your constructive comments have not only helped us identify areas for improvement but also guided us in enhancing the depth and breadth of our paper. In response to your suggestions, we have undertaken the following revisions to address the concerns raised:

1. **Expanded Discussion on NLP Applications**: We have significantly expanded the "Discussions of the Results" section to include a more detailed discussion on how SBA-RoPE impacts various NLP applications, particularly focusing on document summarization and question answering over long documents. This expansion aims to provide readers with a clearer understanding of the practical implications and potential benefits of our approach in real-world applications, highlighting its versatility and effectiveness in handling extended text sequences.

2. **Comprehensive Comparison with Existing Methods**: Furthermore, we have enriched the same section by incorporating a more thorough comparison of SBA-RoPE with other existing methods such as PI, NTK, and YaRN. This includes an in-depth analysis of the trade-offs involved in terms of computational efficiency, ease of implementation, and scalability. Our goal here is to offer a comprehensive guide that aids readers in making informed decisions when choosing among different methods for extending context windows in LLMs, based on their specific needs and constraints.

These revisions have been made with the intent to not only address your concerns but also to significantly enhance the value of our paper to the research community and practitioners in the field. We believe that these changes have improved our manuscript by providing a more comprehensive understanding of SBA-RoPE's potential impact, practical applications, and the considerations needed for its implementation.

Thank you once again for your invaluable feedback and the opportunity to improve our work. We hope that our revisions meet your expectations and look forward to any further suggestions you might have.

Reviewer 3 Report

Comments and Suggestions for Authors

The article is generally well-written and structured despite not always being very easy to read. It presents interesting contributions from a methodological point of view.

Therefore, apart from some typos described above that must be corrected, the article is very interesting from the point of view of its possible contribution to the state of the art and is ready to be published. I therefore propose a minor revision depending on the correction of typos.

On line 21 we have the acronym LLM without a space in the text that comes immediately before it. The same thing with the acronym ICL. References also have a space before or not. Review these questions.

They present equations (5), (6), and (7), but in fact, the text only has 2 equations. In the document's text, reference should be made to equations as is done for tables and figures.

Table 2 is not correctly formatted in the document.

Author Response

Dear Reviewer,

First and foremost, we would like to express our sincere gratitude for your constructive feedback on our manuscript. Your insights have been invaluable in enhancing the quality and clarity of our work. Below, we address each of your comments in detail:

1. **Corrections to Acronyms and References Spacing**: We have carefully reviewed the manuscript and corrected the issues related to the spacing around acronyms and references. Specifically, we have ensured that there is a proper space before the acronyms "LLM" and "ICL" wherever they appear in the text. Additionally, we have standardized the spacing before and after references throughout the document to maintain consistency.

2. **Equation Numbering**: Upon your observation, we realized the discrepancy in the numbering of equations. We have now removed the numbering for Equation (7) to correct this mistake. We appreciate your attention to detail, which helped us identify and rectify this error.

3. **Table 2 Formatting**: We have conducted a thorough review of Table 2 as per your suggestion. The table did appear slightly shifted towards the left, which was an unintended formatting issue. This was due to our attempt to fit the table within the column width using the `\begin{adjustwidth}{-\extralength}{0cm}` command, as recommended by the MDPI LaTeX template for cases where content does not fit within the standard column width. We believe that the formatting, while slightly adjusted, adheres to the acceptable standards provided by the template. However, we are open to further adjustments should there be a recommended approach to better align the table within the document's format.

We hope that the revisions and clarifications provided address the concerns raised. We are committed to contributing valuable insights to the field and believe that the revised manuscript is now in a better state for publication. We look forward to your further instructions and are more than willing to make additional adjustments if required.

Thank you once again for your thorough review and helpful suggestions.

Reviewer 4 Report

Comments and Suggestions for Authors

Firstly, I would like to acknowledge the article that was well-researched, grounded, and well-referenced. The article addresses a contemporary discourse on Extending Context Window in Large Language Models. The experiments on the Pythia model are well explained and supported by various equations deployed. The data presentation is detailed and well-explained. I enjoyed reading and reviewing the article. The literature review is comprehensive and supported by current citations.

However, there are a few aspects that I have picked up that I feel should be addressed to strengthen the article, as stated below:

1). On page 4, below Line 135, you have numbered Equations 5 to 7, yet there are only two equations (i.e., 5 and 6). Please either remove the number 7 or add the equation.

2). Lines 286 to 305: Please have different sections for “Discussions of the Results” and “Conclusions and Recommendations”. My biggest worry is that you have squished everything into a tiny section that lacks detailed discussions of your findings. Similarly, the Contributions of your study to “literature” and “practice” in the field of large language models (LLMs) are missing. Please add this.

3). Reference number “20” on page 11; the page numbers for the conference proceeding are missing. Please ensure that all references are uniform.

4). The rest of the document looks good, and I have no issues with the results and discussions.

Author Response

Dear Reviewer,

Firstly, we would like to extend our sincere gratitude for your thorough review and constructive feedback on our manuscript. Your insights and suggestions have significantly contributed to refining our work, making it more comprehensive and impactful. We have addressed each of your comments as detailed below:

1. **Correction of Equation Numbering**: As you rightly pointed out, there was an inconsistency in the numbering of equations on page 4, below line 135, where Equations 5 to 7 were mentioned, but only two equations (5 and 6) were actually present. We have resolved this issue by removing the reference to the non-existent Equation 7. We appreciate your attention to this detail, which helped us correct the error and ensure the accuracy of our mathematical representations.

2. **Expansion of Discussion and Conclusion Sections**: We have taken your advice to heart regarding the need for more detailed discussions of our findings and the contributions of our study to both literature and practice in the field of large language models (LLMs). Accordingly, we have separated the original discussion section into two distinct sections: “Discussions of the Results” and “Conclusions and Recommendations.” This restructuring has allowed us to delve deeper into the implications of our findings, providing a clearer and more comprehensive analysis that enriches the reader's understanding of our study's significance. We believe these enhancements will greatly strengthen the article's contribution to the ongoing discourse in our field.

3. **Correction of Reference Formatting**: Upon reviewing reference number 20 on page 11, we identified and corrected the issue related to the missing page numbers for the conference proceeding. We have ensured that all references are now uniformly formatted, adhering to the journal's guidelines.

We are confident that these revisions have addressed your concerns and have further polished our manuscript. We hope that our responses and the changes made reflect our commitment to contributing valuable research to the field. We are grateful for the opportunity to improve our work based on your feedback and look forward to any further suggestions you might have.

Thank you once again for your invaluable feedback and guidance.

Article Menu

Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings

Further Information

Guidelines

MDPI Initiatives

Follow MDPI