Next Article in Journal
RETRACTED: Zhou, H.; Davarpanah, A. Hybrid Chemical Enhanced Oil Recovery Techniques: A Simulation Study. Symmetry 2020, 12, 1086
Previous Article in Journal
Analytic Theory of Seven Classes of Fractional Vibrations Based on Elementary Functions: A Tutorial Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A Survey of Semantic Parsing Techniques

School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(9), 1201; https://doi.org/10.3390/sym16091201
Submission received: 25 June 2024 / Revised: 26 August 2024 / Accepted: 4 September 2024 / Published: 12 September 2024
(This article belongs to the Section Computer)

Abstract

:
In the information age, semantic parsing technology drives efficiency improvement and accelerates the process of intelligence. However, it faces complex understanding, data inflation, inappropriate evaluation, and difficult application of advanced large models. This study analyses the current challenges and looks forward to the development trend of the technology. Specific approaches include: this study adopts a systematic review method and strictly follows the PRISMA framework, deeply analyzes the key ideas, methods, problems, and solutions of traditional and neural network methods, and explores the model performance, API application, dataset, and evaluation mechanism. Through literature analysis, the technology is classified according to its application scenarios. Then, the practical application contributions are summarized, current limitations such as data size, model performance, and resource requirements are analyzed, and future directions such as dataset expansion, real-time performance enhancement, and industrial applications are envisioned. The results of the study show significant advances in semantic parsing technology with far-reaching impacts. Traditional and neural network methods complement each other to promote theoretical and practical innovation. In the future, with the continuous progress and in-depth application of machine learning technology, semantic parsing technology needs to further deepen the research on logical reasoning and evaluation, to better cope with technical challenges and lead the new development of natural language processing and AI.

1. Introduction

Semantic parsing technology is the process of transforming natural language into machine-processable representations of meaning for automated language processing and human-computer interaction development. It plays a key role in the fields of natural language processing (NLP), engineering, and computer programming. Traditional semantic representation methods mainly rely on the principle of semantic combinatoriality [1,2], which combines manual features and simple logic rules to parse sentences, and parses sentence semantics by combining basic semantic units into more complex structures. This approach excels in interpretability, data efficiency, and domain-specific applications, but is limited by the complexity of manual design and the lack of generalization ability.
With the development of machine learning and deep learning, the problems of the initial stage semantic parsing techniques have been further addressed. Contrasting with traditional semantic parsing techniques is the neural network-based approach, which shows significant advantages in dealing with complex natural language phenomena and generalization. In recent years, many scholars at home and abroad have studied the methods and models of neural network-based semantic parsing. For example, researchers have proposed a novel globally guided selective context network. It adaptively selects contextual information and thus improves parsing in different application scenarios [3]. Neural network representations can be broadly classified into three categories: symbolic methods, pure neural network methods, and neural symbolic methods [4]. Symbolic methods rely on predefined grammar rules and word lists to transform natural language into candidate logical utterances and select the most probable utterances through a linear model [5,6,7]. The symbolic semantic analyzer uses the generated grammar rules to derive the results and finds the most probable derivation through a conditional probability model. Each derivation result is represented using hand-extracted features that can be obtained from a corpus or a partially structured meaning representation. Despite its focus on interpretability and data efficiency, this approach is limited in its ability to generalize due to the need to manually design rules and features.
The pure neural network approach treats semantic parsing as a machine translation problem, where natural language is directly converted into structured meaning representations through a codec model [8]. The encoder is responsible for mapping natural language to hidden layer representations, while the decoder generates these representations step by step. A commonly used model is the Sequence to Sequence (Seq2Seq) model, which generates a series of logical statement tokens, or the Sequence to Tree (Seq2Tree) decoder can be used to generate a tree structure to ensure syntactic correctness. Contextual information is utilized in two ways in this approach: either by building context-aware encoders to encode historical corpus or structured meaning representations into the representations, or by using a context-aware decoder to reuse or revise previously predicted representations to generate current structured meaning representations. This approach does not rely on generative grammar and is efficient and flexible as it enables semantic transformations through neural networks. However, it typically requires large numbers of annotated data and computational resources and has challenges in terms of interpretability.
Neuro-symbolic approaches, on the other hand, attempt to combine the strengths of both, utilizing neural networks to generate features and drawing on the syntactic a priori knowledge of symbolic approaches. This approach also uses the codec model but generates sequences of actions rather than simple tokens and is constrained by syntactic rules in generating abstract syntactic trees, thus ensuring syntactic and semantic correctness. It considers both natural language sentences and contextual information, encodes them through a perceptual encoder, and inputs the encoded results along with previous logical statements into a perceptual decoder to generate new logical statements [9,10]. This approach improves generalization while maintaining interpretability.
With the rapid changes in technology, big language modeling technology is leading the neural network approach and neural symbol approach in semantic parsing technology to new heights. In recent years, breakthroughs in the field of deep learning, the enrichment of computational resources, and the accumulation of huge amounts of training data have jointly driven the development of advanced language models such as Transformer [11]. This milestone not only provides a solid foundation for the innovation of big language models but also further promotes the progress of the whole semantic parsing field. Big language models, relying on neural networks with billions of parameters, are continuously trained and optimized through self-supervised learning on a vast sea of unlabeled text data. Representative models such as Bidirectional Encoder Representations from Transformers (BERT) [12], Span BERT [13], XLNet [14], Robustly Optimized BERT Pretraining Approach (Roberta) [15], the GPT family [16,17,18,19], and PANGU-Σ [20], which are mostly pre-trained on a huge web corpus, can provide deep insights into the complex patterns, subtleties, and intrinsic connections. These large language models have become the cornerstone of semantic parsing techniques, which are dedicated to finding universal strategies for encoding text sequences. Instead of relying on simple categorization goals and direct processing of individual words, these approaches adopt more comprehensive and complex text parsing strategies than traditional semantic parsing methods. However, in the face of such a wide range of semantic parsing methods, it becomes particularly important to evaluate them comprehensively and reliably. Although there have been some studies revealing the potential and advantages of these techniques, a comprehensive review of their recent advances, possibilities, and limitations is still lacking.
In addition, researchers have extensively explored various aspects of semantic parsing techniques in several studies For example, Zhang’s [21] study provides a comprehensive overview of the progress of syntactic analysis and semantic parsing, introduces the mainstream methods and models of constituent parsing, and dependency parsing, and explores dependency graph parsing as well as cross-domain, cross-linguistic, and federated parsing models, and explores their applications and corpus development. Kumar P et al. [22], on the other hand, focused on innovative explorations of semantic parsing, especially how syntactic structure builds meaning and lexical processing capabilities in the context of knowledge bases, while evaluating the performance of semantic parsers. Lee C et al. [23], on the other hand, analyzed the evolutionary trend of semantic parsing by reviewing semantic parsing techniques in comparison with program synthesis and pointed out the challenges ahead. However, these studies often lack in-depth analyses of the core ideas, specific implementations, problems, and their solutions to semantic parsing methods. Also, there are few detailed discussions on the performance evaluation metrics, datasets, evaluation methods, and applications of semantic parsing methods in specific domains. Therefore, there is an urgent need to study these key aspects in depth to understand and advance the development of semantic parsing technology more comprehensively. Based on this, this study adopts a systematic review methodology to conduct the literature review, and the whole process closely follows the key aspects specified in the PRISMA [24] (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. Firstly, this study clarifies that the objective of the review is to comprehensively and accurately capture the literature resources that are closely related to semantic parsing technologies and define the scope of the study, which covers semantic parsing technologies and their multi-disciplinary applications. Then, with carefully defined keywords and strings, this study constructed a complex search strategy on several authoritative academic databases and Google Scholar to comprehensively retrieve relevant literature. At the literature screening stage, this study set strict inclusion and exclusion criteria and conducted an initial screening based on titles and abstracts, followed by a full-text review of the retained literature to ensure its relevance and quality. To further enhance the quality assessment of the literature, we invited three NLP experts to independently score 152 semantic parsing papers and verified the scientific validity and effectiveness of the scoring system through intragroup correlation coefficient (ICC) analysis. Subsequently, we de-duplicated the literature using the literature management software Zotero supplemented it with manual comparison, and retained the most representative and authoritative versions after comprehensive assessment. In the data extraction and coding stage, we systematically structured the final selected literature and constructed a detailed coding scheme covering basic information, research methods, performance indicators, dataset use, and application areas. Finally, we wrote a detailed review report based on the extracted and coded data, ensuring that the report followed the requirements of the PRISMA framework to enhance the transparency and credibility of the study. This series of rigorous and systematic steps ensured that the literature review was comprehensive, accurate, and reliable.
This series of rigorous and systematic steps ensures the comprehensiveness, accuracy, and reliability of the literature review, and lays a solid foundation for subsequent in-depth exploration of the field of semantic parsing. On this basis, this study focuses on the core topics in the field of semantic parsing, aiming to improve the comprehensive understanding of the field by systematically analyzing the logical relationships between different semantic parsing approaches, the key ideas, methods, and problems, and exploring the corresponding solutions. At the same time, we plan to comprehensively evaluate the performance of different semantic parsing models in multiple task contexts, especially the differences in key metrics such as accuracy and F1, to provide a scientific basis for model optimization. In addition, this study will also take an in-depth look at the current datasets and evaluation methods used for evaluating semantic parsing models, and identify and improve the possible deficiencies in them, to ensure the fairness and effectiveness of the evaluation. For specific domains such as healthcare, law, and finance, we will explore the practical contributions, and limitations of semantic parsing techniques, and propose future research directions based on these limitations to facilitate the application and development of the technology in these domains. Finally, this study will examine the major issues and challenges currently facing semantic parsing technology, as well as the key technologies or research directions that deserve further attention and exploration, to provide valuable references and inspiration for the continuous progress and innovative development of the technology and promote the semantic parsing field to move forward continuously.
The systematic review methodology adopted in this study ensures the transparency and verifiability of the research process with its clear and repeatable process. By constructing a dataset of 152 high-quality papers spanning different years, ranging from classical studies to cutting-edge explorations, the development of semantic parsing technologies is comprehensively covered. The dataset covers 79 top conferences and journals in the field of natural language processing, ensuring the authority and breadth of the literature. This study also selects 30 papers from the ArXiv preprint platform between 2013 and 2024, which represent the latest research results and trends in semantic parsing technology, adding timeliness and foresight to the dataset. In addition, papers spanning multiple years were selected from two important academic conferences, ACL and AAAI, to further enrich the diversity and depth of the dataset.
By analyzing the above datasets, readers will gain a comprehensive insight into the progress and wide range of applications of semantic parsing techniques. From the traditional manual feature approach based on the principle of semantic combinatoriality to the modern technique based on neural networks, especially the large language model, the paper analyzes in detail the core ideas, implementations, and the advantages and disadvantages of different approaches. By comparing symbolic methods, pure neural network methods, and neural symbolic methods, the thesis reveals the significant advantages of neural networks in dealing with complex natural language phenomena and improving generalization ability. Meanwhile, the paper also discusses the performance evaluation indexes of semantic parsing technology, the selection of datasets and analysis methods, and demonstrates the practical application value of the technology through domain-specific application cases. Finally, the paper looks forward to the future development direction of semantic parsing technology, points out the limitations of the current technology and potential research directions, and provides valuable references and insights for researchers and practitioners in related fields.
The core contribution points of this study are presented in the form of a list of key points:
  • In-depth exploration of semantic parsing methods and problem solutions: we explore in detail the core ideas and technical implementations of semantic parsing methods and propose solutions to potential problems with comprehensive and in-depth discussions and analyses.
  • Systematic Comparison of Model Performance in Multi-task Contexts: We systematically compare the performance of semantic parsing models under different evaluation metrics in multiple real-world task scenarios, which provides strong support for model optimization and application.
  • Detailed analysis of datasets: We conduct a comprehensive analysis of the datasets used for training and evaluation of semantic parsing models, revealing the characteristics of the datasets and their impact on model performance.
  • Summary of interfaces in real-world application scenarios: We summarize the interfaces of semantic parsing models in real-world application scenarios, providing practical reference information for model application and integration.
  • Detailed Analysis of Domain-Specific Applications and Future Prospects: For specific domains, we analyze in detail the application examples and limitations of semantic parsing models, and look forward to future research directions, which guide in-depth research in this area.
  • Review of Technical Challenges and Future Research Directions: We review the main problems and challenges faced by semantic parsing technology at present, and at the same time look forward to its future development trends and research directions, aiming at providing valuable references and inspirations for the further development of this field.
This study explores various aspects of semantic parsing methods in a systematic and in-depth manner through a clear chapter structure. The study begins with an overview of current leading research in the field, which builds a research context for the reader. This is followed by a detailed description of the methodology, which ensures the comprehensiveness, rigor, and credibility of the synthesis of semantic parsing techniques, and lays a solid foundation for subsequent research. Moving on to Section 4, the study reveals the research lineage and trends in the field of semantic parsing through an in-depth analysis of 152 top papers between 1994 and 2024. Section 5 focuses on the core of the semantic parsing approach, exploring its ideas, implementation methods, and performance, and analyzing it in application-to-Application Programming Interfaces (APIs) applications. Section 6, on the other hand, provides an in-depth analysis of the dataset and the evaluation methodology, offering practical guidance to researchers. Section 7 demonstrates the practical application value of the semantic parsing model through case studies. However, the study also points out the challenges and problems faced by the semantic parsing approach, which are discussed in depth in Section 8. Finally, Section 9 summarizes the whole study, highlighting the stepwise developmental dynamics, current constraints, and emerging challenges of semantic parsing techniques, and pointing the way for future research.

2. Literature Review

Before proceeding to the literature review, we clarify a few key definitions to lay a clear foundation for the subsequent discussion. Combinatorial Category Grammar (CCG) is a syntactic model for natural language processing that describes the structure and syntactic relations of sentences by defining the combinatorial categories of lexical items and the corresponding combinatorial rules. Dependency-based Combinatorial Semantics (DCS) is a new semantic formalism proposed in the field of NLP, especially in semantic parsing and semantic representation. In our literature review, we must mention in particular the Seq2Seq model, a landmark model in the field of NLP. With its unique encoder–decoder architecture, the Seq2Seq model provides strong support for sequence generation tasks, especially in the areas of machine translation, text summarization, and dialog systems. The core of the model is to encode the input sequence into a fixed-length vector, and then decode the output sequence, which breaks through the limitations of traditional NLP on the input length. With the introduction of Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Transformer, the Seq2Seq model has been continuously optimized, especially the self-attention mechanism of the groundbreaking Transformer model. In the field of semantic parsing, the Seq2Seq model also shows great potential, which can generate logical output sequences directly from natural language text, which can be used in knowledge base question and answer, code generation, and other scenarios. Closely related to the current research, the work of Zhang M [21], Kumar P et al. [22], and Lee C et al. [23] does not directly mention Seq2Seq but discusses the evolutionary trends of syntactic analysis, dependency parsing, and semantic parsing techniques, all of which are Seq2Seq models. Seq2Seqmodel provides an important foundation and inspiration for the development and application of Seq2Seq.
Semantic parsing, as a key area of Artificial Intelligence (AI), has been developing rapidly in recent years, with interdisciplinary teams contributing many innovations to advance text-matching technology. As shown in Table 1, Zhang M [21] reviewed the progress of syntactic and semantic parsing, involving constituent, dependency parsing, and cross-domain models, while Kumar P et al. [22] and others focused on the syntactic construction of meanings and the lexical processing capability of knowledge bases, and explored the application of CCGs and formal languages. Lee C et al. [23] contrasted the semantic parsing and program synthesis, analyzed the evolution of neural symbols and other methods, and discussed the progress and challenges of code generation frameworks to guide future research. Together, these studies broaden the boundaries and application potential of semantic parsing.
Table 1 provides an in-depth comparison of the breadth and depth of semantic parsing in several review papers, pointing out that although most of the studies make use of the large language model, studies such as Zhang M [21], Kumar P, et al. [22], and Lee C, et al. [23] have not yet analyzed the key ideas, problems, solutions, logical trajectories, performance comparisons, API interfaces, and specific applications when exploring semantic parsing methods. In contrast, this study attempts to fill these gaps by not only analyzing the core ideas and methodologies of semantic parsing approaches in detail, but also exploring the problems, and corresponding solutions, and evaluating the performance of the semantic parsing model under different tasks through multiple metrics. In addition, this study also provides a systematic overview of the construction of the dataset, the selection of evaluation metrics, the design of the application program interface of the semantic parsing model, and an in-depth analysis of the actual application cases of the model in specific domains, which provides a more detailed and comprehensive reference for researchers and practitioners in the field. Through these comprehensive analyses and in-depth explorations, this study not only reveals the strengths and shortcomings of semantic parsing technology but also provides new insights into open problems and challenges, which provides strong support for future research directions and strategies.

3. Methodology

To map the PRISMA process, this paper refers to previous studies [25,26] and the use of Shiny APP to draw a flowchart that meets the requirements of the PRISMA framework, as shown in Figure 1. The diagram demonstrates the total number of databases, literature screening, quality assessment, and the whole process of synthesizing and analyzing the results, providing the reader with an intuitive overview.

3.1. A Comprehensive Systematic Review and Evaluation of Semantic Parsing Techniques in the PRISMA Framework

In this study, the PRISMA framework [24] was strictly followed in the systematic review of semantic parsing techniques, starting from clarifying the review objectives and research scope, and comprehensively searching for literature closely related to semantic parsing techniques in multiple authoritative databases through well-defined keywords and sophisticated search strategies. Subsequently, based on strict inclusion and exclusion criteria, the literature was initially screened and reviewed in full text to ensure high relevance and quality. Further, the selected 152 papers were scientifically evaluated through independent scoring and reliability analysis by NLP experts. In the literature data processing stage, the most representative versions were retained using software de-weighting and manual comparison and systematically structured to construct an exhaustive coding scheme. Eventually, based on these data, a detailed review report was written to ensure the transparency, comprehensiveness, and credibility of the study, providing a solid reference for subsequent research. The following is a detailed account of the steps taken to specify the above.
Firstly, in the process of developing a comprehensive and precise literature review on semantic parsing technology, we started by defining the objective of the review, which is to comprehensively capture the literature resources that are closely related to semantic parsing technology. Subsequently, we defined the scope of the study to ensure coverage of semantic parsing technologies and their applications in multiple domains such as healthcare, economics, industry, and agriculture. Next, the study carefully defined a series of keywords and strings to guide the subsequent literature search. The keywords or strings searched were “semantic parsing”, “symbolic approach”, “neural approach”, “symbolic neural approach”, “neural parsing”, “neural parsing”, “neural parsing”, “neural parsing”, and “neural parsing”. “neural network”, “NLP”, “natural language programming”, “knowledge representation “, “logical reasoning”, “context understanding”, “pre-trained language models”, “semantic parsing and Medicine”, “Semantic Parsing and Programming”, “Semantic Parsing and Economics”, “Semantic Parsing and Industry” and “Semantic Parsing and Agriculture”.
Then, in selecting databases to support this study, we comprehensively considered multiple dimensions and finally targeted Web of Science, Engineering Village, IEEE, and SpringerLink as the main information sources, supplemented by Google Scholar to expand the search, as shown in Figure 2, with a total of 21. These databases stand out firstly because of their professionalism and authority, Web of Science and SpringerLink ensure the high quality of the literature by strict academic screening, while IEEE and Engineering Village are deeply engaged in engineering technology and computer science, providing the most cutting-edge research results in the field. Secondly, their comprehensiveness and comprehensiveness allow us to cover multiple research methods in semantic parsing technology, whether it is symbolic, neural, or semiotic neural method, we can find abundant resources in these databases. Further, the efficient search tools and comprehensive literature information, including citations, abstracts, and full-text downloads, greatly enhance our research efficiency. Finally, Google Scholar, as a supplement, further enriches our literature base with its huge indexing volume and convenient search interface, ensuring the comprehensiveness and accuracy of our research.
Although it is important to rely on well-known databases and search engines to obtain formal literature in semantic parsing technology research, grey literature should not be underestimated. Grey literature is not subject to the traditional publishing process, but it is rich in new technological developments, practical experiences, and diverse perspectives, providing cutting-edge information and complementing research. When selecting grey literature, we focus on the reliability and credibility of the content, and ArXiv is the preferred source of grey literature because of its timeliness, breadth, diversity, openness, and high-quality preprints. It not only rapidly reflects the latest research results, but also promotes cross-disciplinary understanding and academic exchanges. Although the preprints are not formally reviewed, they are often of high academic value, which makes up for the lack of timeliness and comprehensiveness of the formal literature and provides strong support for research on semantic parsing techniques. In this study, 353 papers were initially searched using the 21 databases.
In refining the data collection strategy for this study, we formulated strict inclusion and exclusion criteria as shown in Table 2, to ensure that the selected literature is highly relevant to semantic parsing technology and comprehensively reflects the latest progress in the field. The inclusion criteria cover domain relevance, keyword matching, high quality and wide recognition of the literature, and full-text accessibility. Exclusion criteria address non-relevant domains, keyword mismatches, low-quality or not widely recognized literature, and literature that is not available in full text. Together, these criteria ensured that the literature was systematically collected and reviewed, providing a solid, high-quality database for subsequent research.
The initially screened literature was then reviewed in full text to further confirm its relevance and quality. We invited three experts in the field of NLP as scorers to ensure comprehensive coverage and authoritative judgment of the assessment. Before scoring, we formulated detailed scoring rules covering multiple dimensions such as innovativeness, academic value, experimental design and results, research relevance, and writing quality, with each dimension having a clear scoring scale of 0–10 points, to achieve a comprehensive measurement of the paper’s quality. During the scoring process, we assigned papers in a random or logical order to ensure that each scorer proceeded independently, strictly followed the scoring rules, and read and analyzed the papers in depth, thus guaranteeing the independence and rigor of the assessment. After completing the scoring, we collected the data and used Python’s pingouin library for reliability analysis and derived a high degree of inter-rater agreement (ICC3k value of 0.95) by calculating the ICC, which not only verified the scientific and validity of the scoring system but also indirectly proved the close correlation of the selected 152 papers in the semantic parsing field and their important academic value. This analysis lays a solid data foundation for the subsequent research and ensures the depth and reliability of the study.
To ensure the uniqueness of the literature, we used literature management software (e.g., Zotero) to perform automatic de-duplication, supplemented by manual comparison of potential hidden duplicates, after which we comprehensively evaluated and retained the most representative and authoritative versions of the literature. In the data extraction and coding stage, we systematically structured the final selection of 152 articles and constructed a detailed coding scheme covering basic information, research methodology, performance indicators, use of datasets, and application areas.
Finally, in the result reporting stage, we wrote a detailed review report based on the extracted and coded data, which followed the requirements of the PRISMA framework and included sections on methods, results, discussion, and conclusions to ensure the transparency and credibility of the study. Through this series of rigorous and meticulous steps, we completed a comprehensive review of the literature related to semantic parsing techniques.

3.2. Inter-Rater Reliability Assessment and Data Quality

In the complex process of data collection and analysis, ensuring high-quality data is the cornerstone of scientific research and is directly related to the accuracy, credibility, and reproducibility of research results. For subjective rating studies, inter-rater reliability assessment is particularly critical. In this study, the ratings of three independent raters on 152 samples were quantitatively analyzed by Cohen’s Kappa statistic [27]. According to the criteria of Cohen’s Kappa score, a Kappa value of 0.40–0.60 is considered fair agreement, 0.61–0.75 is considered good agreement, and above 0.75 is considered excellent agreement. The results of this study showed that the Kappa values between all rater combinations were well above 0.75, especially 0.897 between rater 1 and rater 2 and 0.869 between rater 1 and rater 3, which demonstrated excellent agreement, and 0.784 between rater 2 and rater 3, which was well above the threshold for good agreement. This finding provides strong evidence of a high degree of inter-rater agreement and expertise, which enhances the reliability of the scoring results and lays a solid foundation for subsequent research.

4. Overview of Results

Figure 3 and Figure 4 provide us with a detailed depiction of the distribution of research literature on semantic parsing technology in terms of years, regions, and journals. Firstly, the statistics in Figure 3 show that research on semantic parsing technology has shown significant growth since 2015. This trend is like the growth of research in the field of machine vision, behind which both reflect the sharp increase in the demand for AI technology in various industries, the continuous improvement of computer hardware performance, and the rapid development and wide dissemination of AI technology. As for the literature sources, Figure 4 shows us the detailed distribution of 79 top conferences and journals. These literatures mainly focus on the field of NLP, reflecting the close connection between semantic parsing techniques and NLP. Among them, 31 papers from 2013 to 2024 were selected in the ArXiv preprint, which represents the latest research results and developments in semantic parsing technology. In addition, 18 papers from 2003 to 2023 were selected in ACL. Three papers from 1996 to 2023 are selected in AAAI. These papers not only provide us with a window of insight into the current research status of semantic parsing technology but also provide valuable references and insights for future research.

5. Key Technologies for Semantic Parsing

In the field of NLP, the classification of parsing methods is multidimensional and deep. The technological basis distinguishes between traditional (based on combinatorial and symbolic methods, with a focus on manual rules) and neural network methods (using deep learning to learn features automatically). Implementations cover symbolic, neural, and neural-symbolic approaches that combine the two. In terms of model structure, the Seq2seq model is designed symmetrically with codecs, typically showing the advantages of neural methods. In terms of processing mechanism, Seq2seq realizes efficient conversion from input to output. The dataset and evaluation method serves as the cornerstone of practice, highlighting the necessity of classification. The common goal is to pursue the precision and efficiency of semantic expression, and this pursuit promotes technical progress and deepens the intrinsic connection and symmetry between methods. The technical foundation, implementation method, model structure, processing mechanism, dataset, evaluation method, and pursuit of the goal together build a three-dimensional framework of parsing method classification in NLP.
Based on the above multivariate classification methods, this section introduces two semantic parsing methods in NLP: traditional semantic parsing methods and neural network-based semantic parsing methods [2]. Traditional semantic parsing methods include combinatorial and ordinary symbolic methods, while neural network-based semantic table parsing methods include symbolic, neural, and neural symbolic methods. The neural symbol method combines the symbolic method and the neural network method. The method reflects the symmetry between the symbolic method and the neural network method. In particular, the neural method utilizes the Seq2seq model and consists of an encoder and a decoder. This structure also embodies symmetry. The Seq2seq encoder encodes features through history, while the Seq2seq decoder decodes using inputs, encoded features, and copying and co-referencing methods. Both methods have corresponding datasets and evaluation methods. This further exemplifies the symmetry between traditional semantic parsing methods and neural network-based semantic parsing methods, i.e., both are pursuing more accurate and efficient semantic representations. As shown in Figure 5.
By categorizing the semantic parsing methods above, this study also compares the traditional and neural network semantic parsing methods in NLP from the perspective of efficiency. Traditional methods, such as combinatorial and symbolic methods, are limited in efficiency because they rely on complex manual rules and notation systems and are prone to bottlenecks especially when dealing with large-scale data. On the contrary, neural network methods, especially the Seq2seq model, are more efficient and can handle large-scale data with ease due to their automatic feature learning capability, and the codec design of Seq2seq optimizes the processing flow and speeds up the parsing process further. However, although the neural network method is efficient, its training cost is high, and the model is complex and difficult to interpret, which may affect the application efficiency in specific scenarios. Therefore, when choosing a parsing method, it is necessary to weigh the training cost, resource requirements, and application requirements to find the best balance. Overall, the neural network method shows obvious advantages in terms of efficiency, but the practical application needs to consider several factors.

5.1. Traditional Semantic Parsing Methods

Before proceeding with the study of traditional semantic parsing methods, we clarify a few key definitions to lay a clear foundation for the subsequent discussion. This study defines Relaxed Combinatorial Category Grammar (Relaxed CCG) as an extension or variant of traditional CCGs that aims to improve its ability to process natural language sentences by relaxing certain restrictions or adding flexibility. Learning to Parse Database Queries Using Inductive Logic Programming (LPDQUILP) uses Inductive Logic Programming to automate the process of parsing natural language database query statements into Structured Query Language (SQL) statements. Traditional semantic representation methods utilize the principle of semantic combinatoriality [1,2] in conjunction with manual features and simple logic for sentence parsing, combining simple semantic units into more complex structures to understand the semantics of a sentence. This study summarizes models of typical traditional semantic representation methods such as the Constructive Heuristics Induction for Language Learning (CHILL) parser, learning weighted combinatorial categorical grammar algorithms, learning dependency-based combinatorial semantic methods, and ordinary symbolic algorithms. The following is a summary of this research, including four key ideas and methods and 14 key problems and their corresponding solutions (please refer to Table 3).
Typical traditional semantic representation methods such as CHILL parser, learning weighted combinatorial categorical grammar algorithm, and other models. The researcher used inductive logic programming techniques to construct a natural language database query parser that converts natural language queries into SQL statements [28]. However, the approach faces multiple challenges such as dataset limitations, syntax rule readability, inductive bias, and complex query processing. To improve these issues approaches such as remote supervision, transfer learning, and visualization techniques can be used. An algorithm for learning weighted combinatorial categorial grammars has been proposed to address the problem of parsing sentences into a lambda computational representation of their underlying semantics [33]. The key idea is that the model uses an online learning algorithm to learn a Relaxed CCG which in turn parses natural language sentences into logical form. However, the algorithm suffers from the problems of falling into local optimal solutions, difficulty in handling complex sentence structures, and the need to explore more efficient representations of logical forms. To address these problems, it is recommended to explore robust optimization algorithms, incorporate external knowledge using fine-grained CCG models, investigate the efficient and accurate representation of logical forms, such as graph neural network-based methods, and try to integrate multiple representation methods.
DCS [38] is used as a method for learning semantic parsers from question–answer pairs, where the logical form is modeled as a target of potential variables. The model uses dependency structures to implement a combinatorial semantic representation and computes the semantic representation of an entire sentence through syntactic analysis, semantic type and operator assignment, and local combinatorial operation [38]. However, there are problems such as dependency syntactic analysis errors, complex semantic reasoning problems, and loss of semantic information. To address these problems, solutions such as more robust syntactic analyses, research on advanced semantic representations, enhancement of semantic representation expressiveness, and exploration of global combinatorial operations and attention mechanisms are suggested. The key aspects of semantic parsing are the breadth of knowledge sources and the depth of logical combinations. Existing approaches usually compromise on these two aspects, but the Strongly Typed Logical Forms parsing algorithm extends the knowledge sources and improves the depth of parsing, which is achieved by converting semi-structured tables into knowledge graphs [44]. However, the algorithm still faces issues such as data volume limitations, table noise handling, and multilingual limitations. To address these issues, solutions such as data augmentation, noise handling modules, and multilingual support are suggested to enhance the robustness of the model and cross-language parsing capabilities.
By analyzing the above methods and models, this study summarizes their similarities and differences. In terms of similarities, they all aim to convert natural language into machine-understandable forms rely on data to train models or learn rules and require feature engineering to represent input sentences and structure output. In terms of dissimilarities, inductive logic programming uses logic programming techniques for reasoning and parsing queries; online learning algorithms and logical form representations use machine learning algorithms to learn semantic parsing models; the DCS semantic parsing model uses dynamic semantic representations in combinatorial semantics; and combinatorial semantic parsing of semi-structured forms interactively parses natural language and structured forms. These methods have different sources of training data and differences in complexity and scalability. Appropriate methods are selected based on specific scenarios and needs, and researchers can explore combinations or improvements of these methods to improve the accuracy, efficiency, and scalability of semantic parsing.

5.2. A Neural Network-Based Approach to Semantic Parsing

Traditional semantic representation methods have advantages in terms of interpretability, data efficiency, and domain-specific performance, but suffer from manual design and complexity challenges. In contrast, neural network-based approaches perform better in handling complex natural language phenomena and generalizability. Neural network-based representations include symbolic methods, neural network methods, and neural symbolic methods, as shown in Figure 6. Symbolic methods use predefined grammar rules and word lists to transform natural language into candidate logical statements, whereas linear models are commonly used to select the most likely correct logical statements. Neural network approaches view semantic parsing as a neural machine translation problem, where a commonly used model is the Seq2Seq model, which generates a series of logical utterance tokens, or a tree structure using the Seq2Tree decoder to ensure syntactic correctness. The Neuro-Symbolic approach combines a neural network and a symbolic approach, using a neural network to generate features and drawing on the syntactic prior knowledge of the symbolic approach. The method also uses the Seq2Seq model to generate logical statements but generates sequences of actions rather than tokens and generates abstract syntax trees constrained by syntactic rules to ensure syntactic and semantic correctness.

5.2.1. Symbolic Methods

Symbolic methods use predefined grammar rules and word lists to transform natural language into candidate logical statements. Linear models are often used to select logical statements that are most likely to be correct. The symbolic semantic analyzer uses the generated grammar rules to derive the results and finds the most likely derivation using a conditional probability model. Each derivation result is represented using hand-extracted features that can be obtained from a corpus or a partially structured meaning representation. A commonly used scoring model is the log-linear model. The Symbolic Semantic Analyzer relies on predefined grammar rules and word lists to generate candidate logical statements when processing natural language. To select from these candidates the derivations that are most likely to be correct, it employs the powerful tool of linear modeling, in particular log-linear modeling. The model (shown in Equation (1)) quantifies the likelihood of a derivation d given input x and context C as a probability value using conditional probability. This probability value is computed based on a weighted sum of feature functions ϕ(x, C, d), where the feature functions are manually extracted from the corpus or structured information, reflecting the complex relationship between the derivation and the input and context. The introduction of the normalization factor ensures that the probability of all possible derivations sums to 1, thus satisfying the basic requirement of a probability distribution and enabling the model to accurately compare the relative likelihoods between different derivations. Therefore, the inclusion of log-linear models in the scoring system of the symbolic semantic analyzer is justified as it not only makes use of a flexible feature representation but also effectively evaluates the reasonableness of the derivations through the conditional probability model.
p d x , C = e θ ϕ ( x , C , d ) d G ( x , C ) e θ ϕ ( x , C , d )
ϕ(x,C,d) denotes the discourse, contextual information, and derived features. G(x) is the set of candidate derivations based on the input discourse x and contextual information C. θ denotes the model parameters.
Before proceeding to the study of symbolic semantic parsing methods, we clarify a few key definitions to lay a clear foundation for the subsequent discussion. Learning Context-Dependent Mappings from Sentences to Logical Form (LCDMSLF) is the conversion of sentences to logical forms through context-dependent mappings. This study summarizes six key ideas and methods and 21 problems and solutions of the symbolic approach. The core idea is to integrate contextual information into the feature model and transform the perceptual context decoder into a structured prediction problem. The approach illustrated in Figure 7 first inputs textual content and logical statements into the model for processing, generating the corresponding results. Subsequently, the output of the model is transformed into a logical language representation by applying syntactic rules. The key ideas and methods of typical models and approaches in the symbolic approach, as well as the key problems identified and the corresponding solutions, are summarized in Table 4. Each problem has a corresponding solution, and certain problems may have multiple solutions.
Typical methods of the symbolic approach include LCDMSLF, neural semantic parsing through feedback method, and deep semantic parsing method using upper ontology, as shown in Table 4. Firstly, CCG parsers are utilized to convert sentences into context-independent logical forms [5] and map to the output results through contextual information to solve the problem of context-dependent mapping of sentences to logical forms [5]. However, CCG parsers face challenges such as knowledge representation limitations, ambiguity disambiguation, and corpus dependency, which need to be improved by expanding the knowledge base, utilizing contextual information, and machine learning. Secondly, high-quality semantic parsing is advanced by mapping natural language directly to SQL through neural sequence models to quickly build natural language interfaces and optimize performance through user feedback [7]. Despite the challenges of user engagement, human interaction, model generalization, and noisy feedback, the practicality and performance of the method can be improved by optimizing the interaction interface, applying automatic annotation algorithms, expanding the application scenarios, enhancing the model generalization capability, and filtering noisy data, among other strategies. Then, the study proposes a method to rapidly construct semantic analyzers from zero training samples in new domains, which are trained by generating logical forms from simple grammar and paraphrasing them into natural language vocabularies [56]. The study verified its effectiveness [56], but there are limitations in the generic grammar. To address these issues, researchers propose methods such as introducing complex query structures and meticulous semantic analysis to overcome the limitations and enable the rapid construction of applicable semantic parsers.
In addition, for the long text semantic parsing challenges, a deep semantic parsing approach using upper-layer ontology is proposed [59], which combines Frame Net annotations and BERT’s distributed contextual representation [59]. Although facing challenges such as a large amount of training data, reliance on prior knowledge, and high computational resources, more efficient semantic parsing is expected to be achieved through solutions such as optimizing annotation, improving knowledge acquisition efficiency, and using lightweight models. Then, to solve the problem of decoders copying or modifying the previous part of speech parsing, a memory-based context-dependent semantic analysis model [64] was proposed to store contextual semantic information through a memory matrix [64]. The model faces issues such as computational resources, dialogue history length, data sparsity, and interpretive challenges, but can be improved by solutions such as accelerated computation, handling long sequences, data augmentation and transfer learning, and interpretive enhancement. Finally, to address the problem of programming language syntax error detection, the use of parser combiners has been explored [68] to construct abstract syntax trees to detect errors through standard execution or preprocessing patterns [68]. Despite the limitations of high memory consumption and only detecting syntax errors, optimizing the memory footprint, and using static code analysis tools are expected to improve the utility and performance of the approach for syntax error detection and other types of error detection.
By analyzing the above methods, we summarize their similarities and differences. The similarities include mapping sentences to logical forms, considering contextual information to capture sentence meaning, and using neural networks for training and inference. However, these approaches differ in specific techniques and expressions. Context-dependent mapping uses dependencies and contextual information to improve the logical representation, feedback mechanisms train the neural semantic parser by learning interactively with the user, rapid construction methods build semantic parsers in new domains with zero training examples, deep semantic parsing combines ontological knowledge using upper ontology methods, memory-based methods use memory networks to store and retrieve information, and the parser combiner methods by searching for syntactic errors. In addition, there are differences in the training data and complexity of these methods. This has led more scholars to explore combinations and integrations between methods, improve the quality and diversity of training data, and introduce more contextual information and semantic knowledge. Meanwhile, new techniques such as parser combiners and memory mechanisms are also worth further exploring to promote the development of neural network-based symbolic methods in the field of semantic parsing.

5.2.2. Neural Network Approach

The symbolic method focuses on interpretability and data efficiency but has weak generalization capabilities and requires the manual design of rules and features. In contrast, the neural network method automatically learns feature representations through end-to-end learning and has strong generalization ability. The neural network-based semantic representation method transforms semantic parsing into a neural machine translation problem. As shown in Figure 8, it includes natural utterances and contextual information as inputs, which are encoded by a perceptual encoder. The encoding results are fed into the perceptual decoder along with the previously generated logical utterances to generate new logical utterances.
The neural network approach treats semantic parsing as a machine translation problem and uses a codec model to translate natural language into structured meaning representations. The encoder maps the natural language to a hidden layer representation and the decoder progressively generates a linearized structured meaning representation. This approach provides an efficient way to handle semantic parsing tasks by performing semantic transformations through neural networks that do not rely on generative grammar. The neural network approach to the semantic parsing task treats the problem as a variant of machine translation, where the conversion from natural language to structured meaning representation is achieved through a codec model. Equation (2) describes this process in detail, where the encoder is responsible for mapping the input natural language sequence X into a series of hidden layer representations  h t , and ultimately into a vector C containing the global context. The decoder initializes its hidden layer state  s 0  based on C, and gradually generates a sequence of structured meaning representations Y, which are ultimately converted into the final structured meaning representation Z by the output function g(.) This framework does not rely on complex generative grammars, but utilizes the powerful learning capability of neural networks to learn semantic transformation rules directly from data, making it reasonable and effective to incorporate it into semantic parsing tasks. The formulation of the basic framework is shown below:
X = x 1 , x 2 , , x T h 0 = i n i t i a l i z e _ h i d d e n _ s t a t e ( ) h t = e n c o c o d e r ( x t , h t 1 ) C = h T Y = { y 1 , y 2 , , y T } s 0 = f ( C ) s t = D e c o d e r ( y t , s t 1 ) z t = g ( s t ) Z = { z 1 , z 2 , , z T }
X denotes the input natural language and  x i  denotes the input word.  h 0  denotes the initial hidden layer information and initialize_hidden_state() denotes the initial hidden layer operation.  h t  and  h t 1  denotes the hidden layer information at the moments t and t − 1, respectively. Encoder(.) denotes an encoder network, such as a Recurrent Neural Network (RNN), transformer Encoder, or a large language model encoder. c denotes a context vector. y denotes a constructed sense representation of a portion of the output.  s 0  denotes the initialization of the hidden layer state in the decoder. f(.) denotes the function to initialize the hidden layer state.  s t  denotes the hidden layer information at moment t in the decoder.  z t  denotes the constructed sense representation of the output at moment t. g(.) Produces the output function. z denotes the sequence of constructed sense representations of the output. decoder(.) Represents a decoder network, such as an RNN or transformer Decoder.
The neural network approach maps the input utterances to a continuous vector representation through an encoder and gradually generates the target semantic representation through a decoder. Contextual information is utilized in two ways in the method: first, a context-aware encoder is built to encode historical corpus or structured meaning representations into the representation; second, a context-aware decoder is used to reuse or revise previously predicted representations to generate the current structured meaning representation.
Before proceeding with the neural network approach, let us clarify a few key definitions to lay a clear foundation for the subsequent discussion. Question Rewriting Guided Context-Dependent Text-to-SQL Semantic Parsing (QURG) is an approach that enhances the Text-to -SQL system’s context understanding to generate SQL queries more accurately. Good-Enough Compositional Data Augmentation (GCDA) is to generate new training samples by recognizing and replacing local phrases allowed in the common context. Retrieval-based Neural Code Generation Good-Enough Compositional Data Augmentation (RECODE) is to improve the performance of the code generation task by introducing explicit references to existing code examples in the neural code generation model by means of sub-tree-based retrieval. This study summarizes the 14 key ideas and approaches of the neural network approach as well as 59 key problems and solutions, as shown in Table 5. Each problem has a corresponding solution, while some problems may have multiple solutions.
The Seq2Seq model utilizes a perceptual context encoder to encode the preceding or generated logical statements into semantic features, and then the perceptual decoder uses the contextual information from the encoder to compute the final semantic representation, as shown in Figure 9. This Seq2Seq model is structured as the QURG approach [71], which improves the contextual understanding of Text-to-SQL through problem rewriting and two-stream matrix encoding but faces the challenges of accuracy, efficiency, and complex context parsing. To optimize, model training needs to be improved, encoder efficiency needs to be enhanced, domain knowledge needs to be fused, and technology integration needs to be explored.
Problems with the encoder’s use of context include failure to utilize task-specific a priori knowledge, over-reliance on lexicons, templates, and linguistic features, type constraints and entity linking issues, difficulty in providing combinatorial inductive bias for both conditional and non-conditional sequential models, and difficulty in parsing long and complex texts.
Firstly, in semantic parsing, clear modeling of logical laws is crucial. The data restructuring approach in neural semantic analysis improves the performance of the model by injecting prior knowledge [39] but suffers from noise, efficiency, and application limitations. To optimize the method, data quality needs to be controlled, parameters and training process needs to be optimized, and applications in other domains need to be explored. Secondly, the Attention Enhanced Seq2Seq model [8] was proposed to solve the generality problem of traditional semantic parsing methods by utilizing codec structure and attention mechanism. Despite the improved performance of the model, it still faces challenges such as computational resource consumption, gradient problems, long-term dependency, and training data requirements. To address these issues, strategies such as accelerated hardware, improved network structure, regularization, and research on interpretable attention mechanisms can be used. Then, to solve the type constraints and entity linking problems in semantic parsing, semi-structured table neural semantic parsing with type constraints is proposed [6]. The method ensures logical form compliance through special embedding and linking modules, and type constraint syntax. An encoding-decoding structure is used to encode and decode questions and table entities using RNN and LSTM [6]. However, it is only applicable to semi-structured tabular data and requires extensive manual intervention for data preprocessing. In the future, the parser can be extended to other types of data and automated algorithms can be used to reduce manual intervention, e.g., using multimodal data extension methods and automated algorithms to detect and correct data errors and inconsistencies.
Then, to address the problem of combinatorial induction bias in conditional and unconditional sequence models, a data augmentation protocol [84] was proposed to generate new samples by identifying and replacing localized phrases allowed in a common context [84]. However, this method may be affected by the case, symbols, synonyms, spelling errors, and contextual variations. These problems can be addressed by unifying cases, selectively retaining, or excluding specific symbols, building a vocabulary of synonyms, using the Transformer model for spelling correction, and using fuzzy matching algorithms to deal with contextual changes. Finally, to solve the problem of long complex discourse parsing, researchers have proposed to augment the neural semantic parser by iterative discourse segmentation involving segmented and parser iterations [2]. However, there are problems with accuracy, efficiency, long dependency, model capacity, and training stability. To address these issues, the parser performance can be improved by using advanced utterance boundary recognition models, sampling and averaging techniques, attention mechanisms and parallel computation, use of complex RNN structures and increased model capacity, as well as gradient pruning, optimization algorithms, and residual concatenation.
The Seq2Seq model also utilizes contextual information through a perceptual context decoder. It does this in three ways [4]: using conditional probabilities to compute the final semantic representation, copying contextual information, and exploiting context through co-referential resolution. These approaches improve the decoding and semantic representation capabilities under contextual conditions.
The perceptual context decoder faces three main challenges in utilizing contextual information. Firstly, the current sentence information is incomplete, which makes it difficult for the perceptual context decoder to generate compliant semantic expressions. Second, the replication approach lacks knowledge acquisition and reasoning capabilities. Finally, neural network-based semantic parsing methods have difficulty designing specific actions and utilizing symbolic contextual representations to process contextual information in co-referential resolution.
Firstly, the context-aware decoder model generates semantic representations using conditional probabilities and contextual information but faces the problem of incomplete sentence information. Incomplete sentence information poses a significant challenge in the decoding process of the Seq2Seq model, mainly because the decoder relies on limited and compressed context vectors at the initial stage, which makes it difficult to fully grasp the complete semantics of the input sentences. As the decoding progresses, the decoder is restricted to a “local view”, i.e., it can only make predictions based on the currently generated partial sequences and limited context vectors, which increases the risk of generating outputs that do not match the overall context. Although context-aware decoders aim to improve generation accuracy, they are not fully effective when information is incomplete, especially when dealing with long texts or complex reasoning tasks, and may generate biased outputs due to the inability to capture remote dependencies or implicit semantic structures. Therefore, solving the problem of incomplete sentence information is the key to improving the decoding performance of the Seq2Seq model. To solve this problem, the researcher proposes to combine previous sentences and historical semantic representations to optimize the decoding process by introducing typical models such as grammar-based decoding, global normalized neural model, edit-based SQL query generation, and bottom-up neural semantic parsing to generate semantic representations more accurately. To overcome the challenges of context-aware decoder models in generating accurate semantic representations, researchers have proposed the grammar-based decoding semantic parser approach [93], which combines context modeling with grammatical decoding using a sequence-to-sequence architecture of the attention mechanism. However, this method still faces problems such as context modeling capability, lack of knowledge, data sparsity, lexical ambiguity, and syntactic difficulties. To address these issues, more complex model architectures, the introduction of external knowledge bases, pre-training, and the use of external lexical and syntactic resources can be considered to optimize model performance. The global normalized neural model [97] considers inter-tag interactions and dependencies by jointly normalizing the output space to enhance semantic parsing performance [97]. However, this model faces problems such as high computational complexity, difficulty in modeling tag dependencies, limited structural expressiveness, and data scarcity. To solve these problems, approximation methods can be used to reduce computation, hierarchical modeling to enhance tag dependency processing, structural expression capability, and migration learning and data augmentation techniques to improve the data scarcity problem.
The editor-based SQL query generation approach [101] improves the generation quality by exploiting interaction history and adopts an encoder–decoder architecture [101], including a discourse-table encoder, a turn-attention mechanism, and a table-aware decoder. However, there are problems of underutilization of contextual information and decoder attention allocation. To solve these problems, a more powerful context encoder, a multi-layer attention mechanism, and a combination of other attention mechanisms can be used. At the same time, regularization and control mechanisms are introduced to balance the attention allocation and avoid over-attention and information loss. The quality of the semantic representation parsed by the decoder is limited by the quality of the context encoding, which remains a challenge. Researchers have empirically investigated the sensitivity of models to unnatural changes or perturbations artificially introduced by their contexts during testing to understand how models use dialogue history [104]. However, existing methods are limited to specific categories of models. To address this issue, it is recommended to introduce more categories of models for experiments, such as BERT, GPT, and XLNet, and compare them with existing models to evaluate the performance of different models and to advance NLP research. This will help to improve the reliability and extensiveness of the experimental results and provide a reference for future research. The bottom-up neural semantic parsing approach [105] enhances combinatorial generalization by constructing candidate parses from logical form leaf nodes and improves composite generalization by constructing potential parsing candidate sets using a bottom-up decoding strategy and lazy expansion. However, there are problems of unordered nary relations and user annotation burden. To solve these problems, heuristic search, pruning techniques, optimization algorithms, parallel computing, optimized data structures, and domain knowledge exploitation can be used to deal with unordered nary relations, while semantic parsing can be considered as a text-to-text problem, and a fine-tuned large-scale language model can be used to reduce the user annotation burden.
Second, the perceptual context decoder uses contextual information to generate queries by replication, which mainly suffers from a lack of knowledge acquisition and reasoning capabilities. The approach maps dialogues to executable queries and considers historical information [106]. To improve performance, more complex loop units, attention mechanisms, combining other model architectures, and cut processing of long sequences can be used. Methods for conversational semantic parsing using tables require knowledge acquisition and reasoning capabilities, but methods are not yet fully developed. To address this problem, researchers have proposed knowledge-aware semantic parsers that integrate different types of knowledge to improve performance [107]. However, its problematic encoder suffers from long dependencies, limited expressiveness, sensitivity to input order, and a high number of parameters. To solve these problems, more powerful RNN units, the introduction of an attention mechanism, the use of SRNN, parameter sharing, regularization techniques, and pruning algorithms can be used.
Third, the perceptual context decoder exploits context using co-referential resolution [110] and introduces REF tags and pointer networks to handle references [120]. Although neural network-based semantic parsing approaches can obtain good contextual representations, the problem of difficulty in designing specific actions and processing symbolic contextual representations still exists. To address this problem, RECODE introduces retrieval-based neural code generation that uses subtree retrieval to improve complex code structure generation. However, sentence similarity scoring methods suffer from problems such as semantic comprehension limitations, which can be solved by advanced NLP techniques, combining rule and machine learning, and introducing attention mechanisms. Problems such as language dependency need to be addressed when dealing with abstract syntactic trees, which can be dealt with by improving AST design, introducing semantic information, etc.
This study analyzes and summarizes a variety of approaches in neural semantic parsing, involving studies on data restructuring, attention mechanisms, table processing, global consistency, dialogue segmentation, grammar-based parsing, global normalization, cross-domain querying, dialogue history exploitation, bottom-up semantic parsing, sentence-to-query mapping, Web form applications, RECODE, etc. Among the methods that aim to improve the effectiveness of neural semantic parsing are improvement of data reorganization and annotation methods; the introduction of an attention mechanism to improve accuracy; processing of tabular data through a specific model; global consistency through joint normalization of the entire output space; consideration of interactions and dependencies between output labels; iterative segmentation of dialogues to provide context; use of grammatical guidance for parsing; global normalization to consider interactions and dependencies between output labels; editing operations to generate generalization; more capable SQL queries; effective use of dialogue history; bottom-up semantic parsing; automated mapping of sentences to queries; enhanced comprehension based on web forms; and RECODE’s retrieval methodology introduced into the neural code generation model. Progress and challenges in the field of neural semantic parsing can be better understood by examining the key ideas, methods, problems, and solutions of various approaches.

5.2.3. Neuro-Symbolic Method

The symbolic method has weak generalization ability and requires a manual design of rules and features. The neural network method requires a large amount of labeled data and computational resources and has poor interpretability. The neuro-symbolic method tries to combine symbolic reasoning and neural network representation with both interpretability and generalization ability. This study summarizes 2 core ideas and methods as well as 16 key problems and solutions for typical models in the Neuro-Symbolic Method. The core idea is to combine symbolic logic and neural networks to achieve semantic parsing of natural language sentences. The method solves the problem that neural network-based methods have difficulty in processing contextual information and designing specific actions. It generates encoding results by sensing the input sentence and context through a context encoder and then generates decoding results by inputting the encoding results together with previous logical statements through a context-aware decoder. Finally, the decoding results are transformed into logical statements using grammar rules as shown in Figure 10.
In Table 6, the key ideas, methods, as well as problems identified, and corresponding solutions of a typical symbolic neural network semantic parsing model or approach are summarized. The approach incorporates grammar into the decoding step or treats the symbolic representation as an intermediate representation while applying neural networks to representation learning. Each problem has a corresponding solution, and some problems may have multiple possible solutions.
The dialog-to-action parser handles the omission phenomenon by mapping dialogue statements into logical action sequences [10]. A top-down generation model is used to generate logical forms using an encoder–decoder structure and a dialogue memory management component. Advanced breadth-first search is used for efficiency. However, the model suffers from context symmetry, computation, memory consumption, and decoder dependency. Improvements include the introduction of an attention mechanism, simplification of the RNN, Transformer decoder, prior knowledge, bundle search, and increased training data. These measures are expected to improve the model performance and quality of generated sequences. Retrieved data in neural symbolic methods support contextual semantic parsing models using retrieved data and meta-learning to improve performance [9]. However, bidirectional RNNs suffer from long training times and high memory consumption. Solutions include using faster hardware, reducing model size, trying lightweight network structures, regularization techniques, increasing the amount of data, model compression, and visualization techniques. In addition, LSTM decoders are also problematic and can be improved by gradient trimming, other recursive network structures, regularization, attention mechanisms, variable length sequence processing, batching, and parallel computing.
Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representation utilizes lexical ontology hierarchies to introduce novel symbolic representations to enhance the semantic richness and interpretability of neural semantic parsers. However, the implementation encounters problems such as large-scale language model tuning, hierarchical learning challenges, and the complexity of categorical coding. To cope with the challenges, model architectures, training strategies, and new ontologies and similarity measures can be optimized. Weakly supervised inference with neural-symbolic methods approach fuses neural networks and symbolic logic, aiming to build efficient and interpretable neural-symbolic NLP systems that break through the limitations of traditional deep learning models. Currently, it faces challenges such as high training costs, inference patterns needing to be manually designed, and complex semantic processing being insufficient. Optimizing algorithms, exploring automatic inference pattern discovery techniques, and introducing advanced semantic understanding and multimodal fusion methods are the key paths to improve system efficiency and automation. Neuro-symbolic learning generating logic constraints approach fuses neural networks and symbolic logic to construct a neuro-symbolic system that achieves efficient intelligent learning under weak supervision while ensuring learning accuracy and preventing logic degradation. However, it faces challenges such as learning complexity and resource consumption. Further development of the system can be promoted by optimizing the design and algorithms and combining advanced technologies.
This study compares and analyzes the semantic parsing techniques of the above models and methods and concludes that they have something in common: all of them use neural networks to parse the semantics of dialogues, generate logical outputs, and rely on big data training to improve performance. The differences lie in the diversity of parsing goals, model architectures, and application domains. Some models are dedicated to dialogue-to-logic, facilitating reasoning and knowledge graphs; others use retrieved data to enhance contextual understanding and are suitable for database queries. Neuro-symbolic approaches are particularly notable, enhancing the semantic richness and interpretability of parsers through novel symbolic representations, and demonstrating the potential of fusing neural networks and symbolic logic to build efficient and interpretable NLP systems. Neuro-symbolic learning achieves accurate and consistent learning under weak supervision, provides new ways to transfer natural language to logical forms, and supports contextual parsing. These approaches have great potential in semantic understanding, and further integration of techniques and exploration of new applications, such as knowledge quizzing and automatic planning, are needed to improve parsing accuracy and robustness.
Before investigating the performance of different matching models on different datasets this study gives some definitions of evaluation methods. Accuracy (Acc.) is a commonly used metric to measure the performance of a classification model, indicating the proportion of correctly predicted samples to the total number of samples. Precision (Prec.), also known as the “check rate”, is for the prediction results, which indicates how many of the positively predicted samples are positive samples. Recall (Rec.), also known as “check rate”, is for the original sample, which indicates how many of the positive cases in the sample were predicted correctly. BLEU (Bilingual Evaluation Understudy) is a widely used automatic evaluation metric for assessing the quality of the output of a machine translation system, which is scored by comparing the similarity between the result of the machine translation and a set of reference translations. This paper delves into the performance comparison of different matching models on different datasets, aiming to provide researchers with a comprehensive and detailed analysis perspective. Through comparative experiments and data analyses, different matching models do have significant differences in performance, as shown in Table 7.
In the area of semantic parsing of the ATIS dataset, we observed a significant contrast and overall increase in accuracy between several methods. Specifically, the initial Neural Semantic Parsing via Feedback approach achieved an accuracy of 79.24%, followed by the introduction of the data recombination technique which increased the accuracy to 83.3%. Further, the global normalized neural model combined with the Copy data recombination strategy similarly achieved 83.3% accuracy on the ATIS test set. However, the semantic parsing approach based on the attention-enhanced encoder–decoder model further pushes the accuracy to 84.6%, highlighting the potential of the attention mechanism in improving model performance. Meanwhile, the Online Learning of Relaxed CCG Grammars approach also shows strong performance in logical form parsing. Among them, Single-Pass Parsing achieved 90.61% Prec., 81.92% Recall, and 85.06% F1 scores on the ATIS test set. The partial matching performance of LCDMSLF on the ATIS DEC94 test set is even better, with 95% Prec., 96.5% Recall, and 95.7% F1, respectively.
In the semantic parsing task on the GEO dataset, LJK11w/augmented triggers demonstrated an accuracy of 87.6% on the GEO test set, which shows the model’s strength in data augmentation and trigger word recognition. In contrast, the neural semantic parsing via feedback approach achieves an accuracy of 82.5%, which is slightly less impressive but demonstrates the role of the feedback mechanism in enhancing the model’s performance. However, when we quickly constructed a semantic parsing method, its accuracy on the GEO880 dataset was only 56.4%, a result that contrasts with the other methods and highlights the fact that quickly constructed methods may struggle to achieve the desired performance when resources are limited. Subsequently, the data restructuring technique for neural semantic parsing achieved an accuracy of 89.3% on the GEO dataset via the model AWP + AE + C2, a result that exceeds that of the LJK11w/augmented triggers, demonstrating the importance of data restructuring and model optimization for improving semantic parsing performance. Immediately after, the semantic parsing model based on the attention-enhanced encoder–decoder also achieved an accuracy of 87.1%, which shows that the attention mechanism plays an active role in helping the model to focus on the key information and improve the parsing accuracy. Finally, iterative dialogue-based segmentation techniques, especially the SEQ2SEQ + PDE* model, achieved 90.7% accuracy on the GEO test set, a remarkably high score that not only sets a record for semantic parsing accuracy on the GEO dataset, but also demonstrates the strong potential of iterative segmentation strategies in dialogue understanding and semantic parsing tasks. Overall, these methods show a growing trend of accuracy on the GEO dataset, providing us with a variety of efficient and accurate solutions for semantic parsing.
In the JOBS dataset, LJK11w/augmented triggers demonstrated good performance on the JOBS test set, answering questions with 95% accuracy, significantly outperforming the 90% accuracy achieved by model SEQ2TREE on the same dataset. In the semantic parsing task on the SparC dataset, the models show different performance differences. Among them, the memory-based semantic parsing method MenCE + Liu et al. (2020) [93] using Men-Grammar achieved query and interaction accuracies of 40.3% and 16.7%, respectively, on the test set, which is a relatively low result, suggesting that the method may face challenges in handling complex semantic parsing tasks. In contrast, Ours + BERT, a grammar-based decoding semantic parser, achieved significant improvements on the development set, with question and interaction matching accuracies of 52.6% and 41%, respectively, demonstrating the importance of syntactic information in improving semantic parsing performance. In addition, Ours + query attention and sequence editing (w/gold query), an editing-based SQL query generation model for cross-domain context-dependent problems, also achieves notable performance in the test set, with 47.9% and 25.3% accuracy for question and interaction matching, respectively, further validating the model’s further validates the effectiveness of the model in handling cross-domain context-related problems. In the WIK-ITABLEQUESTIONS dataset, for the semantic parsing task of semi-structured tables, the combined semantic parsing model exhibits an accuracy rate of 37.1% on the WIKITABLE-QUESTION test data. However, the neural network approach by introducing type constraints achieves 42.7% accuracy on the WIK-ITABLEQUESTIONS validation set, showing a significant performance improvement. This comparative result not only reflects the importance of type constraints in improving the accuracy of table semantic parsing but also signals that neural semantic parsing methods with type constraints have greater potential in the field of structured data parsing.
This paper also outlines the APIs of semantic parsing models, different APIs have their own advantages and disadvantages in terms of performance, ease of use, and scalability, and the selection of the suitable API depends on the specific application scenarios and requirements, as shown in Table 8. When comparing the NLP APIs mentioned above, we can analyze them from the dimensions of API-providing companies, supported languages, interface types, application scenarios, advantages, and disadvantages. First of all, from the perspective of API-providing companies, these APIs cover well-known technology companies at home and abroad, such as BosonNLP, Baidu AI Open Platform, Tencent Cloud, Azure, Google Cloud, IBM Watson, etc., which each have deep accumulation and technical strength in the field of NLP. In terms of supported languages, the kinds of languages supported by APIs from different companies vary. The APIs of some international companies such as Google Cloud and IBM Watson usually support multiple languages, including English, Chinese, French, Spanish, etc., while some companies focusing on specific markets may mainly support Chinese or other single languages. In terms of interface types, most of these APIs provide common HTTP interfaces such as RESTful for easy integration and use by developers. In terms of application scenarios, these NLP APIs cover a wide range of domains, including text classification, sentiment analysis, named entity recognition, semantic role annotation, and so on. Developers can choose the right APIs for integration according to their needs. In terms of advantages, the NLP APIs of major companies have strong technical strength and rich application scenarios. For example, Google Cloud’s NLP APIs excel in semantic analysis and text generation, while IBM Watson’s API has deep application experience in vertical areas such as healthcare and finance. However, these APIs also have some drawbacks. Firstly, due to the high technical threshold, there may be some learning costs for beginners. Second, some APIs may have higher calling restrictions and fees, which may be less friendly for small-scale applications or startups. In addition, due to language and cultural differences, some APIs may not perform well in processing texts in specific languages or cultural backgrounds. To sum up, each of these NLP APIs has its own advantages and disadvantages, and developers should make a comprehensive consideration based on their own needs, technical strength, and budget when choosing one.

6. Datasets and Evaluation Indicators

Research on semantic parsing methods relies on a range of datasets and evaluation criteria, as shown in Table 9, which outlines 23 English-language datasets and five evaluation methods in this study. These datasets are widely used to train and improve various semantic parsing models, such as feedback-based neural semantic parsing methods, semantic parsing techniques for iterative dialogue segmentation, and contextual semantic parsing models supported by retrieved data. As can be seen in Table 9, all the listed datasets are in English, which reflects that current semantic parsing research mainly focuses on the English language. Although this helps to promote the progress of English semantic parsing technology, it also suggests that we need more multilingual datasets to support cross-lingual semantic parsing research in the future. In terms of evaluation methods, common evaluation criteria in the field of semantic parsing include Acc., Prec., Rec., F1 value, and BLEU. These evaluation metrics provide a basis for quantifying the performance of semantic parsing models, where the F1 value, as a reconciled average of the precision and recall rates, can provide a comprehensive evaluation result. It is worth noting that although evaluation methods such as BLUE, ROUGE, and F1 can assess the key contributions in the output information, they still have some limitations. These automatic evaluation metrics are usually unable to understand the semantics as deeply as humans do and make a reasonable assessment. Therefore, when designing new evaluation methods, we need to consider how to better simulate human semantic comprehension to provide more accurate and comprehensive evaluation results.
Datasets and evaluation metrics are critical to the success of semantic parsing models. A high-quality dataset is the basis for model training and should have enough samples, diversity, and broad coverage. Such a dataset ensures that the model learns a variety of contexts and background knowledge during training, covering different domains and subject content. A dataset that lacks representativeness or diversity may cause the model to perform poorly in a particular domain or context. In the field of semantic parsing, commonly used datasets include ATIS, Geo880, and Sparc, which provide researchers with rich training and testing resources. However, relying on these datasets alone may not be sufficient to comprehensively evaluate the performance of a model, and thus, continuously collecting, and refining datasets to cover a wider range of scenarios and contexts is key to advancing semantic parsing research. The selection of appropriate evaluation metrics is crucial to quantify the performance of models in context understanding tasks. Acc., as a measure of overall performance, is well suited for tasks where data is balanced, and global accuracy is a concern. However, in some cases, we also need to focus on the quality and coverage of the model’s predictions, and this is where Prec. and Rec. become particularly important. To comprehensively evaluate the performance of a model, researchers often adopt the F1 value as an evaluation metric, which is the reconciled average of Prec. and Recall. the F1 value can provide an evaluation result that balances Prec. and Comprehensiveness and is a suitable evaluation metric for most semantic parsing tasks. However, it is worth noting that BLEU, as a commonly used evaluation metric in machine translation, although it is also applied in some semantic parsing tasks, mainly focuses on the similarity between the translated text and the reference translation, which is not the same as the core concern of semantic parsing tasks. Therefore, when choosing evaluation indicators, we need to determine them according to the specific needs of the task and the characteristics of the data.
In summary, datasets and evaluation metrics play a crucial role in the success of semantic parsing models. A high-quality and comprehensive dataset can provide sufficient training resources for the model, while appropriate evaluation metrics can accurately quantify the model’s performance in context-understanding tasks. Therefore, continuous improvement of datasets and the design of more targeted and comprehensive evaluation metrics are key to advancing semantic parsing research.

7. Domain-Specific Applications

This section aims to comprehensively explore the wide range of applications of matching methods in multiple domains, with a special focus on the convergence of applications at the intersection of healthcare, computer science, NLP, and database management systems (DBMS). At the same time, we also cover practical applications in the fields of economics, industry, and agriculture. By meticulously analyzing the data listed in Table 10, we can recognize the core contribution of semantic parsing models in each domain. In addition, we also discuss in depth the limitations that these models may encounter in practice and give future research directions accordingly. Overall, semantic parsing technology shows a broad application prospect and high research value in many fields
In the medical field, semantic parsing technology achieves high-precision transformation of EHR natural language questions into logical forms and effective identification and processing of unknown terms, providing a solid foundation for building efficient medical question-answering systems. In the computer field, semantic parsing technology has shown remarkable applications, especially in NLPG and automatic code generation. Through deep learning techniques, especially Transformer-based models, researchers have succeeded in transforming human linguistic intentions into programming languages, such as Java, thus improving the efficiency of software engineering. These techniques not only advance the development of transformation techniques from natural language to programming language but also provide valuable directions for future research by delving into aspects such as semantic parsing and neural symbolic methods. Future research will explore ways to customize language models, optimize source code representations, and so on, to further improve the accuracy and efficiency of automatic code generation.
The application of semantic parsing technology in the field of NLP and DBMS focuses on the task of text-to-SQL parsing, i.e., automatically transforming the user’s query requirements expressed in natural language into SQL query statements. This technology significantly improves the accuracy and efficiency of query transformation through deep learning approaches and pre-trained language models. Despite the challenges of semantic encoding and transformation, existing research has proposed various strategies to cope with them. In addition, the NLIDB framework utilizes semantic parsing techniques to address the challenge of users accessing databases without SQL knowledge, expanding database access and enhancing user experience. These advances not only demonstrate the value of semantic parsing technology in a wide range of applications in the NLP and DBMS fields but also provide important directions and insights for future research.
The application of semantic parsing technology in the economic domain focuses on KBQA, especially when dealing with economic data and financial knowledge. Despite the challenges of schema and fact complexity, existing approaches such as information retrieval and neural semantic parsing have made progress in extracting information from knowledge bases to answer natural language questions. Future research will seek to further improve the performance of KBQA systems to support more accurate economic analyses and decision-making.
The application of semantic parsing technology in industry is mainly in log parsing and Text-to-SQL conversion tasks. By proposing the FastRAT model, the researchers significantly improved the decoding speed of the Text-to-SQL task and extended the model’s ability to handle multilingual inputs, thus demonstrating the effectiveness of the technique in handling cross-lingual, complex database query tasks. In addition, the development of the ECLIPSE system further demonstrates the efficiency and cross-language performance of semantic parsing techniques in industrial log parsing, combining data-driven template matching and large-scale language models to significantly improve log parsing accuracy and efficiency. the introduction of the Log Parser framework optimizes log parsing for IICTSs using deep learning techniques, providing safe and reliable production, which provides strong support for safe and reliable production. These studies and applications demonstrate the value and potential of semantic parsing technology in improving the efficiency of data processing and analysis in the industrial field.
The application of semantic parsing technology in the agricultural field mainly reflects the improvement of user interactivity and ease of use of agricultural measurement and control systems. By introducing the AMR-OPO semantic parsing framework based on AMR, the technology can transform the user’s natural language input into structured ternary (OPO) statements and accurately extract the entities in agricultural instructions using the BERT-BiLSTM-ATT-CRF-OPO entity recognition model. This innovative approach not only simplifies the interaction process between the user and the agricultural system but also improves the intelligent response capability of the system and brings a better user experience.

8. Open Issues, Challenges, and Future Research Directions

8.1. Open Problems and Challenges

8.1.1. Open Problems and Challenges I: Obtaining Semantic Trajectories for Reasonable Contexts

Related techniques in semantic representation focus on extracting contextual semantic information, and one of the key challenges is finding reasonable contextual semantic trajectories. Through the acquisition of semantic trajectories, high-quality contextual information can be obtained and utilized to build semantic parsing models. However, there are relatively few studies on contextual semantic trajectories. Some studies have used context-based meta-reinforcement learning methods that draw on the experience of previous training tasks, from which common knowledge is gained and adapted to new tasks with a small number of interactions. However, previous context-based meta-reinforcement learning methods neglected the importance of collecting information trajectories to generate unique contexts, leading to a degradation in the quality of contextual information [148]. Poor extraction of contextual information trajectories limits the ability to obtain high-quality contexts, which affects the effectiveness of semantic parsing. Therefore, recapturing reasonable contextual semantic trajectories is a challenge in semantic representation. Further research and development of effective methods are needed to ensure that reasonable and high-quality information trajectories are extracted from contexts, thus improving the performance and effectiveness of semantic parsing.

8.1.2. Open Issues and Challenges II: Delineating Contextual Logic Boundaries

In NLP, neural semantic parsers cannot capture combinatorial structure in discourse, which leads to poor generalization when dealing with unseen semantic combinations and sentences with complex structures. To address this problem, neural semantic techniques focus on exploiting combinatorial structure and the collection of multiple neural parsers, which enables better parsing of the semantics of long and complex sentences. In contrast, less research has been done on semantic parsing techniques that utilize contextual logic boundaries. Some studies have attempted to augment neural semantic parsers by utilizing combinatorial principles and have identified finding causal relationships in context as a challenge [4]. However, this approach mainly focuses on the combinatorial structure in discourse and does not fully consider the content related to the contextual logic of the discourse, leading to limitations in the accuracy of semantic expressions. The lack of an approach to finding the boundaries of contextual logic limits accurate and comprehensive semantic parsing of content that spans a logical range. Therefore, recapturing reasonable contextual logical boundaries is a challenge in semantic representation. Further research and development of effective methods are needed to ensure that logical boundaries in context can be accurately identified and utilized when parsing semantics, thereby improving the accuracy and completeness of semantic representations.

8.1.3. Open Issues and Challenges III: Small Chinese Semantic Expression Datasets and Unreasonable Evaluation Methods

Chinese semantic expressions face two major challenges: a lack of suitable Chinese datasets and unreasonable evaluation methods. Currently, most context-based semantic parsing models are mainly trained using English datasets [2], which limits their adaptability to Chinese application scenarios. Although some studies have used Chinese datasets to train semantic parsing models, these datasets are limited in number and difficult to collect [104], and thus cannot meet the needs of various Chinese application scenarios. Therefore, increasing the resources of Chinese datasets is an important challenge that requires more effort to solve. In addition, commonly used evaluation criteria such as BLUE, Rec. and F1 value cannot understand the semantics of the original text and the statements to be evaluated as deeply as humans do in some cases, and their applicability is limited. For example, some models use the BLUE evaluation criterion, which is widely used in the field of NLP, but it does not apply to tasks outside the field of machine translation [149]. This leads to the same problem with evaluation methods as with semantic understanding challenges, i.e., unreasonable evaluation criteria. Although the evaluation criteria are being updated and iterated every year, these new criteria are not widely used. Unreasonable evaluation methods limit the accurate assessment of the generalization ability of semantic representation models. In summary, Chinese semantic expressions face two challenges: a lack of suitable Chinese datasets and unreasonable evaluation methods. Solving these challenges requires increasing the development and collection of Chinese datasets and researching reasonable evaluation guidelines to promote Chinese semantic expressions.

8.1.4. IV Openness Issues and Challenges: State-of-the-Art Big Language Models Difficult to Make Public and Exploit in Semantic Parsing Techniques

State-of-the-art large language models, such as Generative Pre-trained Transformer 4 Omni (GPT-4o) [150], have certainly shown impressive potential in semantic parsing technologies. This is a transformer-based multimodal multilingual generative bigram model trained by OPAI. It is twice as fast as its predecessor GPT4 at half the price. However, the current situation, where these models are difficult to make public and widely available, highlights the serious challenges currently facing the AI field. Firstly, the conflict between commercial interests and open source is a central issue. Leading companies like Open AI have invested a lot of resources and effort in training and optimizing these large-scale language models, and as a result, they tend to protect these models as their commercial assets rather than open-source them. While this strategy protects the company’s interests, it also limits the widespread application and further development of these advanced technologies. Meanwhile, individuals and small and medium-sized teams face tremendous difficulties in developing and optimizing large-scale language models. Arithmetic power, data, and funding are the three major challenges that constrain their progress. Training a large language model requires huge computational resources, which is almost unimaginable for individuals and small and medium-sized teams. In addition, collecting and labeling a large amount of training data likewise requires a huge investment, which is also a huge challenge for teams with limited funds. As a result of these limitations, the development of semantic parsing technologies is constrained by technical bottlenecks and limited innovation. Many researchers and developers are unable to take advantage of the latest language modeling techniques to advance the field of semantic parsing, which to some extent limits the rate of innovation and development in the field.

8.2. Future Research Directions

8.2.1. Direction I: Effective Contextual Semantic Trajectory Acquisition Methods

The technical results of this research direction can address Challenge 1. Context-based Meta-RL methods train to enhance model generalization with potentially contextual strategies [151]. These techniques can acquire common knowledge from previous training task experience and adapt to new tasks with a small number of interactions. In addition, to address the problem that previous context-based Meta-RL approaches ignore the importance of collecting information trajectories to generate unique contexts, CCM techniques can be used as an effective semantic trajectory acquisition method to improve the quality of contextual information and thus provide high-quality contextual content for semantic parsing. Furthermore, the first contribution of the CCM approach is an unsupervised training framework for context encoders utilizing contrast learning. By setting transitions from the same task as positive samples and transitions from different tasks as negative samples, contrastive learning can directly distinguish contexts in the original latent space without the need to model irrelevant dependencies. Its second contribution is an exploration strategy based on information acquisition. To collect as much trajectory information as possible, a theoretical lower bound estimate of the exploration goal in the contrastive learning framework is obtained. It is then used as an intrinsic reward on which to train an individual exploration pattern [148]. Thus, CCM can improve Meta-RL by refining the search for semantic trajectories.

8.2.2. Direction II: Precise Contextual Logic Boundary Delineation

The technical results of this research direction can address Challenge 2. The Semantic Parsing Iterative Expression Segmentation model can delineate contextual logical boundaries and thus parse semantics in logical spans. In addition, this model can also use the GPT4 model [19] in semantic parsing to achieve a more stable delineation of logical boundaries. These techniques not only help to delineate the logical boundaries of the context but also, in a way, to find the logical content associated with the context since the delineation of the logical boundaries also identifies the logically associated content related to the context. In addition, REFUEL [152] can complement the semantic parsing iterative expression segmentation model as a method for collecting evidence of contextual relevance, which can find logical evidence of contextual relevance and thus contribute to the logical segmentation of contexts.

8.2.3. Direction III: Development of Chinese Parsing Datasets and Rational Evaluation Criteria

The technical results of this research direction can address Challenge 3. Firstly, as mentioned before, the current research focuses on English datasets and scenarios, while the development of Chinese datasets is crucial with the increase of Chinese application scenarios. This requires the use of Chinese datasets to train semantic parsing models to meet the corresponding application scenarios. Although some Chinese datasets have been used to train semantic parsing models, they can only meet the needs of specific application scenarios. There are two ways to solve this problem. One is to train a context-based semantic parsing model on many unlabeled datasets, and then fine-tune the model to adapt to the task in the corresponding application scenario. The other is to utilize typical methods for resolving low resources including meta-learning methods, supervision, data augmentation, semi-supervised learning, and self-supervised learning [4]. Secondly, there are likewise two solutions to the problem of evaluation criteria. First, a semantic similarity model is trained to evaluate whether the expressed semantics and the original semantics are faithful to the original semantics, without using co-occurrence information for judgment. Second, promote the latest and most advanced evaluation methods to replace the old ones. In this way, we can promote the progress of evaluation technology instead of keeping it at the paper level.

8.2.4. Direction IV: Challenges and Strategy Exploration of Large Language Models in Breakthrough Semantic Parsing Technology

To overcome the current challenges faced by the AI field in semantic parsing technology, especially the difficulties in utilizing advanced large language models (e.g., GPT-4o [150]), we need to adopt a series of strategies and measures. Firstly, cross-company and cross-organization collaboration is essential to develop and optimize large language models by pooling resources, expertise, and experience from all parties to drive technological advancement and widespread adoption, achieving a win–win situation in terms of commercial interests and technological development. Secondly, the development of lightweight models is encouraged to cope with the high requirements of large models on computing power and data, so that more individuals and small and medium-sized teams can participate in the development of semantic parsing technology. At the same time, public datasets and benchmarking platforms are established to provide researchers with a unified testing environment and promote technology iteration and optimization. Finally, the government’s policy and financial support are also crucial, through the formulation of relevant policies and the provision of financial support, to encourage cooperation between enterprises, research institutions, and universities, to promote the research and development and application of big language modeling technology, and to help the innovation and development of the technology.

9. Conclusions

Through systematic literature review and in-depth analyses, this study comprehensively explores semantic parsing techniques at both the theoretical and practical levels. At the theoretical level, we find that semantic parsing techniques show remarkable diversity and complementarity and have undergone stepwise development and made significant progress. Traditional methods, rooted in the principle of semantic composability and relying on manual features and simple logic, have built a solid foundation for semantic parsing. These methods show unique advantages in interpretability, data efficiency, and domain-specific applications, providing valuable theoretical support and practical references for subsequent research. Meanwhile, the rapid rise of neural network methods, especially the integration and innovation of symbolic methods, pure neural network methods, and neural symbolic methods, has greatly broadened the research boundary of semantic parsing. The different focuses of these methods in terms of interpretability, generalization ability, data efficiency, etc. have prompted a deeper understanding of the nature of semantic parsing and promoted the cross-fertilization and theoretical innovation of multiple disciplines such as computational linguistics and cognitive science.
At the practical level, this study reveals the revolutionary changes that semantic parsing techniques have brought to NLP applications. Neural network-based approaches, especially due to their excellence in processing complex natural language phenomena and enhancing system versatility, have greatly enhanced the level of intelligence and efficiency in areas such as intelligent customer service, intelligent Q&A, medical e-health record analyses, automatic source code generation from natural language descriptions, industrial log parsing, and agricultural measurement and control systems. The widespread popularity of these technologies not only significantly improves user experience, but also promotes the rapid development transformation, and upgrading of related industries.
In addition, this study looks into the future direction of semantic understanding technology based on current technological advances. We find that the need for more in-depth research in specialized areas with small data training, breakthroughs in chip technology to drive arithmetic enhancement, and innovations in logical reasoning and evaluation methods is expected to open new paths in the theory and application of semantic expressions and to further advance the overall progress in the fields of NLP and AI.

Author Contributions

Conceptualization, P.J. and X.C.; methodology, P.J. investigation, P.J.; resources, X.C.; data curation, P.J.; writing—original draft preparation, P.J.; writing—review and editing, P.J.; visualization, X.C.; supervision, X.C.; project administration, X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the intelligent integrated media platform R&D and application demonstration project (PM21014X).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  1. Pelletier, F.J. The Principle of Semantic Compositionality. Topoi 1994, 13, 11–24. [Google Scholar] [CrossRef]
  2. Guo, Y.N.; Lin, Z.Q.; Lou, J.G.; Zhang, D.M. Iterative Utterance Segmentation for Neural Semantic Parsing. In Proceedings of the 35th AAAI Conference on Artificial Intelligence/33rd Conference on Innovative Applications of Artificial Intelligence/11th Symposium on Educational Advances in Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 12937–12945. [Google Scholar]
  3. Jiang, J.; Liu, J.; Fu, J.; Zhu, X.X.; Li, Z.C.; Lu, H.Q. Global-Guided Selective Context Network for Scene Parsing. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1752–1764. [Google Scholar] [CrossRef] [PubMed]
  4. Li, Z.; Qu, L.; Haffari, G. Context Dependent Semantic Parsing: A Survey. arXiv 2020, arXiv:2011.00797. [Google Scholar]
  5. Zettlemoyer, L.S.; Collins, M. Learning Context-Dependent Mappings from Sentences to Logical Form. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, 2–7 August 2009; pp. 976–984. [Google Scholar]
  6. Krishnamurthy, J.; Dasigi, P.; Gardner, M. Neural Semantic Parsing with Type Constraints for Semi-Structured Tables. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 1516–1526. [Google Scholar]
  7. Iyer, S.; Konstas, I.; Cheung, A.; Krishnamurthy, J.; Zettlemoyer, L. Learning a Neural Semantic Parser from User Feedback. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 963–973. [Google Scholar]
  8. Dong, L.; Lapata, M. Language to Logical Form with Neural Attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 33–43. [Google Scholar]
  9. Guo, D.Y.; Tang, D.Y.; Duan, N.; Zhou, M.; Yin, J. Acl Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing. In Proceedings of the 57th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Florence, Italy, 28 July–2 August 2019; pp. 855–866. [Google Scholar]
  10. Guo, D.Y.; Tang, D.Y.; Duan, N.; Zhou, M.; Yin, J. Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS), Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
  11. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  12. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  13. Joshi, M.; Chen, D.; Liu, Y.; Weld, D.S.; Zettlemoyer, L.; Levy, O. Spanbert: Improving Pre-Training by Representing and Predicting Spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar] [CrossRef]
  14. Yang, Z.L.; Dai, Z.H.; Yang, Y.M.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
  15. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A Robustly Optimized Bert Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  16. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 21 May 2024).
  17. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
  18. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
  19. Achiam, J.; Adler, J.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
  20. Ren, X.; Zhou, P.; Meng, X.; Huang, X.; Wang, Y.; Wang, W.; Li, P.; Zhang, X.; Podolskiy, A.; Arshinov, G. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing. arXiv 2023, arXiv:2303.10845. [Google Scholar]
  21. Zhang, M. A Survey of Syntactic-Semantic Parsing Based on Constituent and Dependency Structures. Sci. China Technol. Sci. 2020, 63, 1898–1920. [Google Scholar] [CrossRef]
  22. Kumar, P.; Bedathur, S. A Survey on Semantic Parsing from the Perspective of Compositionality. arXiv 2020, arXiv:2009.14116. [Google Scholar]
  23. Lee, C.; Gottschlich, J.; Roth, D. Toward Code Generation: A Survey and Lessons from Semantic Parsing. arXiv 2021, arXiv:2105.03317. [Google Scholar]
  24. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; the PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Ann. Intern. Med. 2009, 151, 264–269. [Google Scholar] [CrossRef]
  25. Ogunleye, B.; Zakariyyah, K.I.; Ajao, O.; Olayinka, O.; Sharma, H. A Systematic Review of Generative AI for Teaching and Learning Practice. Educ. Sci. 2024, 14, 636. [Google Scholar] [CrossRef]
  26. Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouverneur, F. Systematic Review of Research on Artificial Intelligence Applications in Higher Education—Where Are the Educators? Int. J. Educ. Technol. High. Educ. 2019, 16, 39. [Google Scholar] [CrossRef]
  27. McHugh, M.L. Interrater Reliability: The Kappa Statistic. Biochem. Medica 2012, 22, 276–282. [Google Scholar] [CrossRef]
  28. Zelle, J.M.; Mooney, R.J. Learning to Parse Database Queries Using Inductive Logic Programming. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI 96)/8th Conference on Innovative Applications of Artificial Intelligence (IAAI 96), Portland, OR, USA, 4–8 August 1996; pp. 1050–1055. [Google Scholar]
  29. Li, Z. Semantic Parsing in Limited Resource Conditions. arXiv 2023, arXiv:2309.07429. [Google Scholar]
  30. Hoque, M.N.; Ghai, B.; Kraus, K.; Elmqvist, N. Portrayal: Leveraging NLP and Visualization for Analyzing Fictional Characters. In Proceedings of the 2023 ACM Designing Interactive Systems Conference, Pittsburgh, PA, USA, 10–14 July 2023; pp. 74–94. [Google Scholar]
  31. Dou, L.; Gao, Y.; Pan, M.; Wang, D.; Che, W.; Zhan, D.; Lou, J.-G. MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 13–14 February 2023; Volume 37, pp. 12745–12753. [Google Scholar]
  32. Maulud, D.H.; Zeebaree, S.R.; Jacksi, K.; Sadeeq, M.A.M.; Sharif, K.H. State of Art for Semantic Analysis of Natural Language Processing. Qubahan Acad. J. 2021, 1, 21–28. [Google Scholar] [CrossRef]
  33. Zettlemoyer, L.; Collins, M. Online Learning of Relaxed CCG Grammars for Parsing to Logical Form. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; pp. 678–687. [Google Scholar]
  34. Zhang, X.; Le Roux, J.; Charnois, T. Higher-Order Dependency Parsing for Arc-Polynomial Score Functions via Gradient-Based Methods and Genetic Algorithm. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, Online, 20–23 November 2022; pp. 1158–1171. [Google Scholar]
  35. Ng, A.Y. Feature Selection, L 1 vs. L 2 Regularization, and Rotational Invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 78. [Google Scholar]
  36. Peters, M.E.; Neumann, M.; Logan, R.L.; Schwartz, R.; Joshi, V.; Singh, S.; Smith, N.A. Knowledge Enhanced Contextual Word Representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing/9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 43–54. [Google Scholar]
  37. Glorot, X.; Anand, A.; Aygun, E.; Mourad, S.; Kohli, P.; Precup, D. Learning Representations of Logical Formulae Using Graph Neural Networks. In Proceedings of the Neural Information Processing Systems, Workshop on Graph Representation Learning, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  38. Liang, P.; Jordan, M.I.; Klein, D. Learning Dependency-Based Compositional Semantics. Comput. Linguist. 2013, 39, 389–446. [Google Scholar] [CrossRef]
  39. Jia, R.; Liang, P. Data Recombination for Neural Semantic Parsing. In Proceedings of the 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Berlin, Germany, 7–12 August 2016; pp. 12–22. [Google Scholar]
  40. Petit, A.; Corro, C. On Graph-Based Reentrancy-Free Semantic Parsing. arXiv 2023, arXiv:2302.07679. [Google Scholar] [CrossRef]
  41. Zong, W.; Wu, F.; Chu, L.-K.; Sculli, D. A Discriminative and Semantic Feature Selection Method for Text Categorization. Int. J. Prod. Econ. 2015, 165, 215–222. [Google Scholar] [CrossRef]
  42. Li, Y.; Yang, M.; Zhang, Z. A Survey of Multi-View Representation Learning. IEEE Trans. Knowl. Data Eng. 2018, 31, 1863–1883. [Google Scholar] [CrossRef]
  43. Lei, S.; Yi, W.; Ying, C.; Ruibin, W. Review of Attention Mechanism in Natural Language Processing. Data Anal. Knowl. Discov. 2020, 4, 1–14. [Google Scholar]
  44. Pasupat, P.; Liang, P. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association-for-Computational-Linguistics (ACS)/7th International Joint Conference on Natural Language Processing of the Asian-Federation-of-Natural-Language-Processing (IJCNLP), Beijing, China, 26–31 July 2015; pp. 1470–1480. [Google Scholar]
  45. Al Sharou, K.; Li, Z.; Specia, L. Towards a Better Understanding of Noise in Natural Language Processing. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), Online, 1–3 September 2021; pp. 53–62. [Google Scholar]
  46. Li, Z.; Haffari, G. Active Learning for Multilingual Semantic Parser. arXiv 2023, arXiv:2301.12920. [Google Scholar]
  47. Krishnamurthy, J.; Mitchell, T. Weakly Supervised Training of Semantic Parsers. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 754–765. [Google Scholar]
  48. Yadav, R.K.; Jiao, L.; Granmo, O.-C.; Goodwin, M. Interpretability in Word Sense Disambiguation Using Tsetlin Machine. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence, Vienna, Austria, 4–6 February 2021; pp. 402–409. [Google Scholar]
  49. Wang, X.; Sun, H.; Qi, Q.; Wang, J. SETNet: A Novel Semi-Supervised Approach for Semantic Parsing. In Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 2236–2243. [Google Scholar]
  50. Duffy, K.; Bhattamishra, S.; Blunsom, P. Structural Transfer Learning in NL-to-Bash Semantic Parsers. arXiv 2023, arXiv:2307.16795. [Google Scholar]
  51. Zhang, L.; Xie, X.; Xie, K.; Wang, Z.; Lu, Y.; Zhang, Y. An Efficient Log Parsing Algorithm Based on Heuristic Rules. In Proceedings of the Advanced Parallel Processing Technologies: 13th International Symposium, APPT 2019, Tianjin, China, 15–16 August 2019; pp. 123–134. [Google Scholar]
  52. Amershi, S.; Cakmak, M.; Knox, W.B.; Kulesza, T. Power to the People: The Role of Humans in Interactive Machine Learning. AI Mag. 2014, 35, 105–120. [Google Scholar] [CrossRef]
  53. Clark, K.; Manning, C.D. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Berlin, Germany, 7–12 August 2016; Association of Computational Linguistics—ACL: Stroudsburg, PA, USA, 2016; pp. 643–653. [Google Scholar]
  54. Fu, B.; Qiu, Y.; Tang, C.; Li, Y.; Yu, H.; Sun, J. A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges. arXiv 2020, arXiv:2007.13069. [Google Scholar]
  55. Chen, Y.; Das, M. An Automated Technique for Image Noise Identification Using a Simple Pattern Classification Approach. In Proceedings of the 2007 50th Midwest Symposium on Circuits and Systems, Montreal, QC, Canada, 5–8 August 2007; pp. 819–822. [Google Scholar]
  56. Wang, Y.S.; Berant, J.; Liang, P. Building a Semantic Parser Overnight. In Proceedings of the 53rd Annual Meeting of the Association-for-Computational-Linguistics (ACS)/7th International Joint Conference on Natural Language Processing of the Asian-Federation-of-Natural-Language-Processing (IJCNLP), Beijing, China, 26–31 July 2015; pp. 1332–1342. [Google Scholar]
  57. Bai, J.; Liu, X.; Wang, W.; Luo, C.; Song, Y. Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints. arXiv 2023, arXiv:2305.19068. [Google Scholar]
  58. Landauer, T. Latent Semantic Analysis: Theory, Method and Application. In Computer Support for Collaborative Learning; Routledge: Abingdon, UK, 2023; pp. 742–743. [Google Scholar]
  59. Laukaitis, A.; Ostasius, E.; Plikynas, D. Deep Semantic Parsing with Upper Ontologies. Appl. Sci. 2021, 11, 9423. [Google Scholar] [CrossRef]
  60. Hsu, W.N.; Zhang, Y.; Glass, J. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
  61. Zhang, S.; Jafari, O.; Nagarkar, P. A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data. arXiv 2021, arXiv:2109.03784. [Google Scholar]
  62. Abu-Salih, B. Domain-Specific Knowledge Graphs: A Survey. J. Netw. Comput. Appl. 2021, 185, 103076. [Google Scholar] [CrossRef]
  63. Jiao, X.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. Tinybert: Distilling Bert for Natural Language Understanding. arXiv 2019, arXiv:1909.10351. [Google Scholar]
  64. Jain, P.; Lapata, M. Memory-Based Semantic Parsing. Trans. Assoc. Comput. Linguist. 2021, 9, 1197–1212. [Google Scholar] [CrossRef]
  65. Goyal, P.; Dollár, P.; Girshick, R.; Noordhuis, P.; Wesolowski, L.; Kyrola, A.; Tulloch, A.; Jia, Y.; He, K. Accurate, Large Minibatch Sgd: Training Imagenet in 1 Hour. arXiv 2017, arXiv:1706.02677. [Google Scholar]
  66. Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
  67. Ribeiro, M.T.; Singh, S.; Guestrin, C. Model-Agnostic Interpretability of Machine Learning. arXiv 2016, arXiv:1606.05386. [Google Scholar]
  68. Kuznetsov, M.; Firsov, G. Syntax Error Search Using Parser Combinators. In Proceedings of the IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), St. Petersburg/Moscow, Russia, 26–28 January 2021; pp. 490–493. [Google Scholar]
  69. Vilares, D.; Gómez-Rodríguez, C. Transition-Based Parsing with Lighter Feed-Forward Networks. arXiv 2018, arXiv:1810.08997. [Google Scholar]
  70. Gomes, I.; Morgado, P.; Gomes, T.; Moreira, R. An Overview on the Static Code Analysis Approach in Software Development; Faculdade de Engenharia da Universidade do Porto: Porto, Portugal, 2009; Volume 16. [Google Scholar]
  71. Chai, L.; Xiao, D.; Yan, Z.; Yang, J.; Yang, L.; Zhang, Q.-W.; Cao, Y.; Li, Z. QURG: Question Rewriting Guided Context-Dependent Text-to-SQL Semantic Parsing. In Proceedings of the 20th Pacific Rim International Conference on Artificial Intelligence, Jakarta, Indonesia, 15–19 November 2023; pp. 275–286. [Google Scholar]
  72. Yıldırım, M.; Okay, F.Y.; Özdemir, S. Big Data Analytics for Default Prediction Using Graph Theory. Expert Syst. Appl. 2021, 176, 114840. [Google Scholar] [CrossRef]
  73. Zhang, F.; Peng, M.; Shen, Y.; Wu, Q. Hierarchical Features Extraction and Data Reorganization for Code Search. J. Syst. Softw. 2024, 208, 111896. [Google Scholar] [CrossRef]
  74. Mahmoudi, O.; Bouami, M.F. RNN and LSTM Models for Arabic Speech Commands Recognition Using PyTorch and GPU. In Proceedings of the International Conference on Artificial Intelligence & Industrial Applications, Meknes, Morocco, 17–18 February 2023; pp. 462–470. [Google Scholar]
  75. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
  76. Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
  77. Luong, M.-T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-Based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
  78. Pluščec, D.; Šnajder, J. Data Augmentation for Neural NLP. arXiv 2023, arXiv:2302.11412. [Google Scholar]
  79. Merity, S.; Keskar, N.S.; Socher, R. Regularizing and Optimizing LSTM Language Models. arXiv 2017, arXiv:1708.02182. [Google Scholar]
  80. Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  81. Dhingra, B.; Liu, H.X.; Yang, Z.L.; Cohen, W.W.; Salakhutdinov, R. Gated-Attention Readers for Text Comprehension. In Proceedings of the 55th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1832–1846. [Google Scholar]
  82. Zhao, N.; Li, H.; Wu, Y.; He, X. JDDC 2.1: A Multimodal Chinese Dialogue Dataset with Joint Tasks of Query Rewriting, Response Generation, Discourse Parsing, and Summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 12037–12051. [Google Scholar]
  83. Bilal, M.; Ali, G.; Iqbal, M.W.; Anwar, M.; Malik, M.S.A.; Kadir, R.A. Auto-Prep: Efficient and Automated Data Preprocessing Pipeline. IEEE Access 2022, 10, 107764–107784. [Google Scholar] [CrossRef]
  84. Andreas, J. Good-Enough Compositional Data Augmentation. arXiv 2019, arXiv:1904.09545. [Google Scholar]
  85. Li, W.; Srihari, R.K.; Niu, C.; Li, X. Question Answering on a Case Insensitive Corpus. In Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, Sapporo, Japan, 11 July 2003; pp. 84–93. [Google Scholar]
  86. Kakkar, V.; Sharma, C.; Pande, M.; Kumar, S. Search Query Spell Correction with Weak Supervision in E-Commerce. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 5, pp. 687–694. [Google Scholar]
  87. Nurcahyawati, V.; Mustaffa, Z. Improving Sentiment Reviews Classification Performance Using Support Vector Machine-Fuzzy Matching Algorithm. Bull. Electr. Eng. Inform. 2023, 12, 1817–1824. [Google Scholar] [CrossRef]
  88. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  89. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  90. Pascanu, R.; Gulcehre, C.; Cho, K.; Bengio, Y. How to Construct Deep Recurrent Neural Networks. arXiv 2013, arXiv:1312.6026. [Google Scholar]
  91. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  92. Sun, R.-Y. Optimization for Deep Learning: An Overview. J. Oper. Res. Soc. China 2020, 8, 249–294. [Google Scholar] [CrossRef]
  93. Liu, Q.; Chen, B.; Guo, J.; Lou, J.-G.; Zhou, B.; Zhang, D. How Far Are We from Effective Context Modeling? An Exploratory Study on Semantic Parsing in Context. arXiv 2020, arXiv:2002.00652. [Google Scholar]
  94. Azizi, S.; Mustafa, B.; Ryan, F.; Beaver, Z.; Freyberg, J.; Deaton, J.; Loh, A.; Karthikesalingam, A.; Kornblith, S.; Chen, T.; et al. Big Self-Supervised Models Advance Medical Image Classification. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 3458–3468. [Google Scholar]
  95. Vig, J.; Madani, A.; Varshney, L.R.; Xiong, C.; Socher, R.; Rajani, N.F. Bertology Meets Biology: Interpreting Attention in Protein Language Models. arXiv 2020, arXiv:2006.15222. [Google Scholar]
  96. Zafar, A.; Anwar, B. Analysis of Semantic and Syntactic Properties of Urdu Verb by Using Machine Learning. Pak. J. Soc. Sci. 2023, 43, 103–122. [Google Scholar]
  97. Huang, C.Y.; Yang, W.; Cao, Y.S.; Zaiane, O.; Mou, L.L.; Assoc Computat, L. A Globally Normalized Neural Model for Semantic Parsing. In Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP)/5th Workshop on Online Abuse and Harms (WOAH), Online, 6 August 2021; pp. 61–66. [Google Scholar]
  98. Dyer, C.; Ballesteros, M.; Ling, W.; Matthews, A.; Smith, N.A. Transition-Based Dependency Parsing with Stack Long Short-Term Memory. arXiv 2015, arXiv:1505.08075. [Google Scholar]
  99. Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
  100. Pellicer, L.F.A.O.; Ferreira, T.M.; Costa, A.H.R. Data Augmentation Techniques in Natural Language Processing. Appl. Soft. Comput. 2023, 132, 109803. [Google Scholar] [CrossRef]
  101. Zhang, R.; Yu, T.; Er, H.Y.; Shim, S.; Xue, E.R.; Lin, X.V.; Shi, T.Z.; Xiong, C.M.; Socher, R.; Radev, D.; et al. Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing/9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5338–5349. [Google Scholar]
  102. Fang, M.; Peng, S.; Liang, Y.; Hung, C.-C.; Liu, S. A Multimodal Fusion Model with Multi-Level Attention Mechanism for Depression Detection. Biomed. Signal Process. Control 2023, 82, 104561. [Google Scholar] [CrossRef]
  103. Yang, G.; Liu, S.; Li, Y.; He, L. Short-Term Prediction Method of Blood Glucose Based on Temporal Multi-Head Attention Mechanism for Diabetic Patients. Biomed. Signal Process. Control 2023, 82, 104552. [Google Scholar] [CrossRef]
  104. Sankar, C.; Subramanian, S.; Pal, C.; Chandar, S.; Bengio, Y. Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study. arXiv 2019, arXiv:1906.01603. [Google Scholar]
  105. Crouse, M.; Kapanipathi, P.; Chaudhury, S.; Naseem, T.; Astudillo, R.; Fokoue, A.; Klinger, T. Laziness Is a Virtue When It Comes to Compositionality in Neural Semantic Parsing. arXiv 2023, arXiv:2305.04346. [Google Scholar]
  106. Suhr, A.; Iyer, S.; Artzi, Y. Learning to Map Context-Dependent Sentences to Executable Formal Queries. arXiv 2018, arXiv:1804.06868. [Google Scholar]
  107. Sun, Y.B.; Tang, D.Y.; Xu, J.J.; Duan, N.; Feng, X.C.; Qin, B.; Liu, T.; Zhou, M. Knowledge-Aware Conversational Semantic Parsing over Web Tables. In Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC), Dunhuang, China, 9–14 October 2019; Volume 11838, pp. 827–839. [Google Scholar]
  108. Wang, P.; Zhang, J.; Lian, X.; Lu, L. Stacked Recurrent Neural Network Based High Precision Pointing Coupled Control of the Spacecraft and Telescopes. Adv. Space Res. 2023, 71, 692–704. [Google Scholar] [CrossRef]
  109. Neill, J.O. An Overview of Neural Network Compression. arXiv 2020, arXiv:2006.03669. [Google Scholar]
  110. Hayati, S.A.; Olivier, R.; Avvaru, P.; Yin, P.C.; Tomasic, A.; Neubig, G. Retrieval-Based Neural Code Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, 31 October–4 November 2018; pp. 925–930. [Google Scholar]
  111. Aghdam, S.N.; Hossayni, S.A.; Sadeh, E.K.; Khozouei, N.; Bidgoli, B.M. Persian Semantic Role Labeling Using Transfer Learning and BERT-Based Models. arXiv 2023, arXiv:2306.10339. [Google Scholar]
  112. Sherborne, T.; Lapata, M. Meta-Learning a Cross-Lingual Manifold for Semantic Parsing. Trans. Assoc. Comput. Linguist. 2023, 11, 49–67. [Google Scholar] [CrossRef]
  113. Ang, R.J. Rule-Based and Machine Learning Approaches to AI. Can. J. Nurs. Inform. 2023, 18, 2. [Google Scholar]
  114. Li, R.H.; Cheng, L.L.; Wang, D.P.; Tan, J.M. Siamese BERT Architecture Model with Attention Mechanism for Textual Semantic Similarity. Multimed. Tools Appl. 2023, 22, 46673–46694. [Google Scholar] [CrossRef]
  115. Evtikhiev, M.; Bogomolov, E.; Sokolov, Y.; Bryksin, T. Out of the Bleu: How Should We Assess Quality of the Code Generation Models? J. Syst. Softw. 2023, 203, 111741. [Google Scholar] [CrossRef]
  116. Holtzman, A.; Buys, J.; Du, L.; Forbes, M.; Choi, Y. The Curious Case of Neural Text Degeneration. arXiv 2019, arXiv:1904.09751. [Google Scholar]
  117. Zhang, K.; Wang, W.; Zhang, H.; Li, G.; Jin, Z. Learning to Represent Programs with Heterogeneous Graphs. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Pittsburgh, PA, USA, 16–17 May 2022; pp. 378–389. [Google Scholar]
  118. Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D. Codebert: A Pre-Trained Model for Programming and Natural Languages. arXiv 2020, arXiv:2002.08155. [Google Scholar]
  119. Stork, C.H.; Haldar, V. Compressed Abstract Syntax Trees for Mobile Code. In Proceedings of the Workshop on Intermediate Representation Engineering, Trinity College, Dublin, Ireland, 13–14 June 2002. [Google Scholar]
  120. Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. Adv. Neural Inf. Process. Syst. 2015, 2692, 2700. [Google Scholar]
  121. Phyu, H.P.; Naboulsi, D.; Stanica, R. Machine Learning in Network Slicing—A Survey. IEEE Access 2023, 11, 39123–39153. [Google Scholar] [CrossRef]
  122. Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The Efficient Transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
  123. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
  124. Freitag, M.; Al-Onaizan, Y. Beam Search Strategies for Neural Machine Translation. arXiv 2017, arXiv:1702.01806. [Google Scholar]
  125. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. IEEE Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  126. Yang, W.; Xie, Y.; Tan, L.; Xiong, K.; Li, M.; Lin, J. Data Augmentation for Bert Fine-Tuning in Open-Domain Question Answering. arXiv 2019, arXiv:1904.06652. [Google Scholar]
  127. Chetlur, S.; Woolley, C.; Vandermersch, P.; Cohen, J.; Tran, J.; Catanzaro, B.; Shelhamer, E. Cudnn: Efficient Primitives for Deep Learning. arXiv 2014, arXiv:1410.0759. [Google Scholar]
  128. Abadi, M.; Barham, P.; Chen, J.; Chen, Z. Adaptive Dropout: A Novel Regularization Technique for Deep Neural Networks. Available online: http://www.arxivgen.com/pdfs/adaptive_dropout_a_n-3p5.pdf (accessed on 21 May 2024).
  129. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  130. Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
  131. Pascanu, R.; Mikolov, T.; Bengio, Y. On the Difficulty of Training Recurrent Neural Networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–31 June 2013; pp. 1310–1318. [Google Scholar]
  132. Cambria, E.; White, B. Jumping NLP Curves: A Review of Natural Language Processing Research. IEEE Comput. Intell. M 2014, 9, 48–57. [Google Scholar] [CrossRef]
  133. Huang, Y.P.; Cheng, Y.L.; Bapna, A.; Firat, O.; Chen, M.X.; Chen, D.H.; Lee, H.; Ngiam, J.; Le, Q.V.; Wu, Y.H.; et al. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
  134. Zhang, X.; Bouma, G.; Bos, J. Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations. arXiv 2024, arXiv:2404.12698. [Google Scholar]
  135. Liu, X.; Lu, Z.; Mou, L. Weakly Supervised Reasoning by Neuro-Symbolic Approaches. arXiv 2023, arXiv:2309.13072, 665–692. [Google Scholar]
  136. Li, Z.; Huang, Y.; Li, Z.; Yao, Y.; Xu, J.; Chen, T.; Ma, X.; Lu, J. Neuro-Symbolic Learning Yielding Logical Constraints. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
  137. Roberts, K.; Patra, B.G. A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms. In Proceedings of the AMIA Annual Symposium, Washington, DC, USA, 4–8 November 2017; Volume 2017, p. 1478. [Google Scholar]
  138. Espejel, J.L.; Alassan, M.S.Y.; Chouham, E.M.; Dahhane, W.; Ettifouri, E.H. A Comprehensive Review of State-of-the-Art Methods for Java Code Generation from Natural Language Text. Nat. Lang. Process. J. 2023, 3, 100013. [Google Scholar] [CrossRef]
  139. Shin, J.; Nam, J. A Survey of Automatic Code Generation from Natural Language. J. Inf. Process Syst. 2021, 17, 537–555. [Google Scholar]
  140. Qin, B.; Hui, B.; Wang, L.; Yang, M.; Li, J.; Li, B.; Geng, R.; Cao, R.; Sun, J.; Si, L. A Survey on Text-to-Sql Parsing: Concepts, Methods, and Future Directions. arXiv 2022, arXiv:2208.13629. [Google Scholar]
  141. Deng, N.; Chen, Y.; Zhang, Y. Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect. arXiv 2022, arXiv:2208.10099. [Google Scholar]
  142. Ahkouk, K.; Mustapha, M.; Khadija, M.; Rachid, M. A Review of the Text to SQL Frameworks. In Proceedings of the 4th International Conference on Networking, Information Systems & Security, Kenitra, Morocco, 26 November 2021; pp. 1–6. [Google Scholar]
  143. Noor, S. Semantic Parsing for Knowledge Graph Question Answering. Int. J. Hum. Soc. 2024, 4, 33–45. [Google Scholar]
  144. Vougiouklis, P.; Papasarantopoulos, N.; Zheng, D.; Tuckey, D.; Diao, C.; Shen, Z.; Pan, J. FastRAT: Fast and Efficient Cross-Lingual Text-to-SQL Semantic Parsing. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Nusa Dua, Bali, 7 April 2023; Volume 1, pp. 564–576. [Google Scholar]
  145. Zhang, W.; Cheng, X.; Zhang, Y.; Yang, J.; Guo, H.; Li, Z.; Yin, X.; Guan, X.; Shi, X.; Zheng, L. ECLIPSE: Semantic Entropy-LCS for Cross-Lingual Industrial Log Parsing. arXiv 2024, arXiv:2405.13548. [Google Scholar]
  146. Yang, Y.; Wang, B.; Zhao, C. Deep Learning-Based Log Parsing for Monitoring Industrial ICT Systems. Appl. Sci. 2023, 13, 3691. [Google Scholar] [CrossRef]
  147. Yuan, W.; Yang, M.; Gu, H.; Xu, G. Natural Language Command Parsing for Agricultural Measurement and Control Based on AMR and Entity Recognition. J. Intell. Fuzzy Syst. 2024, 1–16. [Google Scholar] [CrossRef]
  148. Zheng, Y.Z.; Wang, H.B.; Dong, B.H.; Wang, X.J.; Li, C.S. HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing. In Proceedings of the 60th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Dublin, Ireland, 22–27 May 2022; pp. 2997–3007. [Google Scholar]
  149. Reiter, E. A Structured Review of the Validity of BLEU. Comput. Linguist. 2018, 44, 393–401. [Google Scholar] [CrossRef]
  150. Open AI Introducing Gpt-4o: Our Fastest and Most Affordable Flagship Model. 2024. Available online: https://platform.openai.com/docs/guides/vision (accessed on 14 June 2024).
  151. Fakoor, R.; Chaudhari, P.; Soatto, S.; Smola, A.J. Meta-q-Learning. arXiv 2019, arXiv:1910.00125. [Google Scholar]
  152. Gao, Y.F.; Zhu, H.H.; Ng, P.; dos Santos, C.N.; Wang, Z.G.; Nan, F.; Zhang, D.J.; Nallapati, R.; Arnold, A.O.; Xiang, B.; et al. Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction. In Proceedings of the Joint Conference of 59th Annual Meeting of the Association-for-Computational-Linguistics (ACL)/11th International Joint Conference on Natural Language Processing (IJCNLP)/6th Workshop on Representation Learning for NLP (RepL4NLP), Online, 1–6 August 2021; pp. 3263–3276. [Google Scholar]
Figure 1. PRISMA literature search flowchart.
Figure 1. PRISMA literature search flowchart.
Symmetry 16 01201 g001
Figure 2. Graph of search sources versus number of papers.
Figure 2. Graph of search sources versus number of papers.
Symmetry 16 01201 g002
Figure 3. Graph of number of papers vs. time.
Figure 3. Graph of number of papers vs. time.
Symmetry 16 01201 g003
Figure 4. Classification map of semantic parsing methods.
Figure 4. Classification map of semantic parsing methods.
Symmetry 16 01201 g004
Figure 5. Classification map of semantic parsing methods.
Figure 5. Classification map of semantic parsing methods.
Symmetry 16 01201 g005
Figure 6. Semantic representation based on neural network.
Figure 6. Semantic representation based on neural network.
Symmetry 16 01201 g006
Figure 7. Semantic parsing method of symbolic method.
Figure 7. Semantic parsing method of symbolic method.
Symmetry 16 01201 g007
Figure 8. Semantic parsing method of neural network.
Figure 8. Semantic parsing method of neural network.
Symmetry 16 01201 g008
Figure 9. Semantic parsing method model Seq2Seq model using the form of context.
Figure 9. Semantic parsing method model Seq2Seq model using the form of context.
Symmetry 16 01201 g009
Figure 10. Symbolic neural network semantic parsing method.
Figure 10. Symbolic neural network semantic parsing method.
Symmetry 16 01201 g010
Table 1. Comparison among leading studies.
Table 1. Comparison among leading studies.
AuthorBased on a Large Language Model for Semantic ParsingThe Key Ideas and Methods of the Semantic Parsing Approach Are Outlined and Analyzed in DetailDetailed Overview and Analysis of the Problems and Solutions of the Semantic Parsing ApproachDetailed Overview and Analysis of the Logical Trajectories of Typical Semantic Parsing ApproachesPerformance Comparison of Semantic Parsing Models for Different Tasks Using Different MetricsOverview of the Semantic Parsing Model APIOverview and Analysis of Semantic Parsing Model Datasets and Evaluation MetricsApplication of Semantic Parsing Models to Specific DomainsMain Research Work
Zhang M (2020) [21] × × × × × × × This paper surveys progress in syntactic and semantic parsing in NLP, including parsing techniques, cross-domain and cross-linguistic models, and discusses parser applications, corpus development for research guidance.
Kumar P, et al., (2020) [22] × × × × × × × This paper delves into semantic parsing, examining the syntactic formation of meaning, lexical diversity, formal grammars like CCG, semantic combination methods with λ-calculus and λDCS, assessing logical parser performance, and its capacity to handle complex issues using benchmark data.
Lee C, et al., (2021) [23] × × × × × × × × This review covers semantic parsing techniques, program synthesis comparison, evolutionary trends, neuro-symbolic methods, supervised applications, code generation progress, challenges, and future research directions.
Ours This study deeply analyzes semantic parsing, assesses models, surveys applications, addresses challenges, and points to future research for field advancement.
√ indicates the presence of this element,  ×  indicates the absence of this element.
Table 2. Literature exclusion and inclusion criteria.
Table 2. Literature exclusion and inclusion criteria.
ClassificationsStandard Description
Inclusion criteriaDomain relevance: The literature must belong to the field of NLP, with a particular focus on semantic parsing techniques.
Keyword matching: The document must contain the keywords “semantic parsing”, “semiotic approach”, “neural approach” or “semiotic neural approach” or their relevant variants. Keywords such as “semantic parsing”, “semiotic approach”, “neural approach” or “semiotic neural approach” or their related variants.
High quality and widely recognized: Literature should come from authoritative, comprehensive, efficient and resourceful databases, such as Web of Science, Engineering Village, IEEE, SpringerLink, etc., as well as from widely recognized and high-impact literature found through Google Scholar.
Full-text accessibility: The literature must be accessible in full text for in-depth review and analysis.
Exclusion criteriaUnrelated fields: Literature that does not belong to the field of NLP or does not focus on semantic parsing techniques will be excluded.
Keyword mismatch: Documents that do not contain the keywords “semantic parsing”, “semiotic”, “neural” or “semiotic neural” or their related variants will be excluded. or their related variants will be excluded.
Lower quality or not widely recognized: Literature from unknown or unreliable sources, or literature that is not widely recognized will be excluded.
Unavailability of full text: If the full text of a document is not available, it cannot be reviewed and analyzed in depth and will therefore be excluded
Table 3. Overview of key ideas, approaches, key problems, and solutions of traditional semantic parsing methods.
Table 3. Overview of key ideas, approaches, key problems, and solutions of traditional semantic parsing methods.
Modeling/MethodologyKey IdeasApproachesKey ProblemsSolutions
Parsing Database Queries Using Inductive Logic Programming [28]Using Inductive Logic Programming to Automate the Parsing of Natural Language Database Query Statements into SQL StatementsCHILL learns search and relational control with inductive logic programming to build parsers. Corpus training, sentence pairing database queries, straight mapping sentences to executable queries.(1) Dataset size limitation problem
(2) Grammar rule readability problem
(3) Generalization bias problem
(4) Restriction problem for natural language queries
Remote supervision, transfer learning, knowledge transfer or active, continuous and meta-learning techniques [29] are utilized, combined with visual and interactive methods [30], techniques such as cross-validation [31] are employed to reduce bias, and techniques such as contextual and semantic analysis [32] are introduced to enhance natural language understanding.
Online Learning Algorithms and Representations of Logical Forms [33]The model uses an online learning algorithm to learn a Relaxed CCG that parses natural language sentences into logical form.A gradient descent based online learning algorithm trains Relaxed CCG models to transform CCG analysis results into logical forms to support the inference process.The adopted gradient descent-based online learning algorithm has the risk of falling into a local optimum, the new method is still limited in its ability to improve the processing of complex sentences, and the logical form representation needs to be explored for a more optimal solution.Explore robust optimization algorithms (e.g., natural gradient, genetic algorithms) [34] with regularization techniques [35] to prevent overfitting; use CCG grammars combined with techniques such as word vectors and attention mechanisms to process complex sentences and introduce external knowledge to enhance inference [36]; and research on efficient logical formal representations such as graph-based neural networks [37], integrating a variety of representations to adapt to different scenarios.
Dependency-based combinatorial semantics (DCS) semantic parsing model [38]Using Dependency Structures for Combinatorial Semantic Representation(1) Syntactic analysis
(2) Semantic type and operator assignment
(3) Local combinatorial operations
Dependent syntactic analysis errors, complex semantic reasoning challenges, loss of information simplification in the DCS model and accumulation of local combinatorial operation errors with global information missing all affect the performance and accuracy of the method.More robust dependency syntactic analysis and integration methods are used to reduce errors [39]; graphical models or event semantic representations are explored to deal with complex semantics [40]; semantic representations are augmented to improve model representations and generalization through semantic feature selection [41] and multiple emphasis on graph learning [42]; and global combinatorial and attentional mechanisms [43] are introduced to deal with long-distance dependencies.
Combinatorial Semantic Parsing of Semi-Structured Tables [44]It also increases the breadth of knowledge sources and the depth of semantic parsingSemi-structured tables to knowledge graphs, parsing natural language with graph information, selecting the highest scoring logical form for execution, and obtaining answers.(1) Data volume limitation
(2) Form noise processing
(3) Multi-language limitation
(1) Data enhancement [12]
(2) Noise processing model [45]
(3) Multi-language support [46]
Table 4. Summary of key ideas, methods, problems, and solutions of the symbolic approach.
Table 4. Summary of key ideas, methods, problems, and solutions of the symbolic approach.
Modeling/MethodologyKey IdeasApproachesKey ProblemsSolutions
LCDMSLF [5]Converting Sentences to Logical Forms via Context-Dependent MappingSentences are parsed using the CCG lexicon and linear models, hidden variable perceptron’s are optimized, approximation searches evaluate the output, and correctness is determined by comparison with the standard.Knowledge representation limitations, ambiguity disambiguation difficulties, corpus dependency, complex efficiency issues, and unknown vocabulary processing challenges.Expand the knowledge base for broader linguistic coverage [47], employ contextual cues and machine learning for effective disambiguation [48], and enhance parser generalization with more annotations or advanced techniques like semi-supervised learning [49] and transfer learning [50]. Boost parser efficiency with optimized algorithms, heuristics, or parallel processing [51], and enrich lexical context with speculation and external resources.
Neural Semantic parsing methods through feedback [7]Training a Neural Semantic Parser by Interactive Learning with UsersNeural sequence model maps discourse directly to SQL, avoiding intermediate steps, fast online deployment, user feedback, market SQL notes, reducing workload, improving efficiency and accuracy.Inconsistent user participation, manual interaction dependency, object limitations, insufficient generalization, noisy user feedback.Enhance user engagement [52], minimize manual interaction costs [53], and expand application to diverse tasks such as Customer Questions & Answers (Q&A) and dialog systems [54]. Improve model generalization through data augmentation and integrating prior knowledge [39], while filtering out noisy feedback using expert evaluation and machine learning classification [55].
Rapid construction of a semantic analysis method [56]Building semantic parsers in new domains with zero training examplesBuilding optimized grammatical logic to natural language processes, training parsers, exploring impact and cross-domain effects.(1) Nested quantization problem
(2) Mimetic problems
Address nested quantization with complex queries [57], including subqueries, and enhance mimesis with detailed semantic analysis [58] using modifiers and logical operators.
Deep semantic parsing methods using upper ontologies [59]Semantic Parsing Using Frame Net Annotation and BERT-Based Upper Ontology for Distributed Representation of Sentence ContextsCombining WordNet ontology, PropBank role labelling and multi-source corpus, we propose model-annotated semantic role sets for identifying physics engine predicates and parameters in 3D VR simulations.(1) High demand for training data Question
(2) Dependence on prior knowledge problem
(3) High demand for computational resources problem
(1) Optimizing annotation e.g., expanding the training dataset through techniques such as semi-supervised learning [60] and automatic annotation [61]
(2) Improving the acquisition of domain expertise [62].
(3) Choosing smaller models e.g., Tinybert [63]
A memory-based approach to semantic parsing [64]Semantic parsing through memorized contextual semantic informationThe model uses a memory matrix to store contextual semantics, a memory controller to manage access, a combination of a discourse phrase encoder to turn inputs into vectors, and a decoder attention mechanism to interact and produce parsed results.(1) High consumption of computing resources
(2) Conversation history length limitation
(3) Data sparsity problem
(4) Poor interpretability problem
(1) Accelerated computation [65]
(2) Processing long sequences such as transformers [11].
(3) Data enhancement and transfer learning [66].
(4) Explanatory enhancement [67].
Syntax error search methods using parser combiners [68]Searching for syntax errors with parser combinersEnter code, choose check mode: standard or preprocess. Standard catches AST errors; preprocess checks syntax, builds AST.(1) Large memory consumption problem
(2) Can only detect syntax errors problem
(1) Optimization of memory space occupation [69]
(2) Static code analysis tools [70]
Table 5. Overview of key ideas, approaches, key problems, and solutions of semantic parsing methods for neural networks.
Table 5. Overview of key ideas, approaches, key problems, and solutions of semantic parsing methods for neural networks.
Modeling/MethodologyKey IdeasApproachesKey ProblemsSolutions
QURG [71]Explicitly handle problem and context dependencies to improve Text-to-SQL system understanding and accurately generate SQL.Problem Rewriting Completion, Constructing Editing Matrices, Two-Stream Encoder Processing Problem Context, Linking Natural Languages to Databases, and Generating SQL Queries.Rewrite accuracy affects SQL quality, encoder processing efficiency is limited, and parsing complex contexts is difficult.Optimal rewriting of models and datasets, improved encoder design for efficiency, and incorporation of domain knowledge and state-of-the-art techniques for model robustness.
Data reorganization for neural semantic parsing [39]Data reorganizationInput dataset, generate new examples using context-free grammar model, reorganize samples, then train Seq2Seq RNN model.Data reorganization faces problems of noise, dependency, increase in training time, and application limitations.(1) Controlling the quality of data reorganization
(2) Determining the optimal parameters for data reorganization [72]
(3) Optimizing the model training process [47]
(4) Exploring applications in other domains [73]
Based on the attention-enhanced encoder–decoder semantic parsing model [8]The attention-enhancing coding and decoding model encodes the corpus as vectors and regulates the vectors to generate logical forms.The encoder–decoder model incorporates an attention mechanism to convert linguistic sequences into logical form.High consumption of computational resources, gradient problem, sequence order sensitivity, difficult to rely on in the long term, large amount of data, risk of overfitting, alignment requirements, poor interpretability.The use of GPU acceleration [74], replacement of RNN models (e.g., GRU [75], Transformer [11]), application of bi-directional LSTMs [76], optimization of attentional mechanisms [77], data augmentation [78], regularization [79], improvement of the alignment method [80] and introduction of interpretable attentional mechanisms [81]
A neural semantic parsing approach for semi-structured tables with type constraints [6]Specialized Entity Embedding with Link Module Raw Link Embedding, Type Constrained Syntax Restricted Decoder Actions, Preserving Logical Form Compliance.Encoding–Decoding Recurrent Networks, Entity Embedding Links, Type-Constrained Decoding, Question–Answer Supervision, Marginal Log Enumeration, Efficient Training.The parser is limited to semi-structured tabular data and data preprocessing is highly dependent on manual intervention.(1) Extending the applicability of parsers [82]
(2) Automating the data preprocessing process [83]
GECA [84]Generating new training samples by recognizing and replacing local phrases allowed in a common contextThe data enhancement protocol identifies substitution clauses, co-occurrence as evidence, deletion to form templates, and padding to generate new samples.Case sensitivity, symbols and punctuation, synonyms and near-synonyms, misspellings and typos, contextual variationsunify the case of all input texts to eliminate case differences [85]; handle symbols and punctuation flexibly according to contextual needs; construct a glossary of synonyms and near-synonyms to extend the matching range; use the Transformer model for context-sensitive spelling correction [86]; and employ a fuzzy matching algorithm [87] to allow for a certain degree of textual differences, thus improving the match’s inclusiveness and robustness.
Iterative-based dialog segmentation for neural semantic parsing [2]A new framework for iterative corpus segmentation to enhance the neural semantic analyzer.Iterative segmentation to extract spans, parser to meaning representation, integration into full meaning.Statement segmentation accuracy, computational efficiency, long dependencies, model capacity, training stability challenges.(1) Improve segmentation accuracy using advanced utterance boundary recognition models e.g., using pre-training based language model Generative Pre-trained Transformer 4 (GPT4) [2].
(2) Introducing attention mechanisms [80], computational resource allocation and parallel computing to reduce the computational burden.
(3) Use complex RNN structures (e.g., LSTM [88]) to solve the long dependency problem
(4) Increasing codec capacity, deep network structure, and residual connectivity [89] to enhance expressive power.
(5) Using gradient trimming [90], other optimization algorithms [91] and residual connectivity [89], and jump connectivity to deal with gradient explosion and vanishing problems [92].
Syntax-based decoding semantic parser [93]Collaboration between context modelling and grammar decoding to improve model semantic parsing performance.Attentional Sequence Modelling, Problem Encoder Syntax Decoding, Textual Problems to SQL.Limited contextual modelling capabilities, lack of common sense and background knowledge, data sparsity and labelling difficulties, and lexical ambiguity and syntactic difficulties(1) Using more sophisticated modeling architectures such as Transformer [11].
(2) Introducing external knowledge bases [36] or pre-trained language models
(3) Using unsupervised learning [94] or semi-supervised learning methods [60] to utilize unlabeled data for pre-training.
(4) Introducing external lexical resources [95], syntactic analyzers, and other aids to help the model better handle lexical ambiguities and syntactic difficulties [96].
Global normalized neural models for semantic parsing [97]Jointly normalize the output space, considering inter-label role dependencies.The TranX system is based on context-independent grammars, neural networks encoding inputs, autoregressive prediction of grammar rules, normalized probability prediction, and training to maximize overall probability.(1) High computational complexity
(2) Label-dependent modeling is difficult
(3) Limited structural expressiveness
(4) Data scarcity
Use approximation methods such as pruning and sampling to reduce computation; decompose dependencies through hierarchical modelling and construct models step by step [98]; introduce graph structures [40] or deep networks to enhance structural expressiveness; leverage existing knowledge with the help of transfer learning [99] and expand the training set by combining with data augmentation techniques [100], in order to effectively solve the problem of insufficient data.
Edit-Based SQL Query Generation for Cross-Domain Context-Related Problems [101]Editing previous prediction queries to improve generation quality using interaction history.System discourse-form encoding processes input, turn-taking attention considers history to enhance understanding, and form decoding generates responses.(1) The problem of incomplete utilization of contextual information.
(2) The problem of the decoder’s attention mechanism overly focusing on some of the inputs
1. Use more powerful context encoders such as models like Transformer [11].
2. (1) Researchers can introduce multi-layer attention mechanisms [102].
(2) Researchers can incorporate other attention mechanisms such as multiple attention [103].
(3) Researchers can add regularization [79] and control mechanisms.
Validating the ability of neural dialog systems to effectively utilize dialog history [104]An empirical approach to testing model sensitivity and exploring dialogue history information use.Empirically investigating model sensitivity to contextual perturbations and understanding dialogue history use.Fewer categories of models used for testingExperiments were conducted using more classes of models such as BERT [12], GPT4 [19], and XLNet [14], and they were compared to previous models.
A bottom-up approach to neural semantic parsing generation [105]Bottom-Up Decoding, Lazy Extension Build Candidates, and Minimal Cost Boosting Generalization.The neural encoder parses the input and generates real-valued vectors, and the neural decoder iterates to generate a graphical representation.(1) The problem of unordered nary relations
(2) The problem of user annotation burden
(1) Dealing with unordered nary relations can be done using heuristic search, pruning techniques, optimization algorithms, parallel computing, optimized data structures, and domain knowledge exploitation.
(2) Consider semantic parsing as a text-to-text problem and use fine-tuned large-scale language models (e.g., GPT4) to reduce the user annotation burden.
Previous Context Related Sentence to Executable Form Query Mapping Learning [106]Mapping the interaction corpus as a formal query, considering historical information.Attentional coding and decoding models, combined with historical dialogue generation queries, to update contextual reuse history and improve performance.(1) Long-term dependency problems
(2) High computational resource requirements
(3) Lack of parallelism
(4) Loss of information
(1) Use of more complex loop units
(2) Introduce an attention mechanism [80]
(3) Consider the use of other model architectures, such as Transformer [11], in combination with other model architectures.
(4) Context cutting and segmentation
Semantic Parsing of Knowledge-Aware Conversations Based on Web Forms [107]Improving the performance of semantic parsing by integrating various types of knowledgeQuestion coding, table coding, controller coordination, column operator value prediction, copying actions, generating parsed statements.(1) Long dependency problems
(2) Limitations on expressiveness
(3) Sensitive to input order
(4) High number of parameters
The use of more powerful RNN units, introduction of attention mechanisms [80], use of stacked recurrent neural networks [108] (SRNN), parameter sharing, regularization techniques and pruning algorithms to reduce the number of parameters and decrease model complexity [109].
RECODE [110]Subtree retrieval introduces code examples with explicit references to improve neural code generation performance.Dynamic planning retrieves similar sentences, extracts AST subtrees, aligns modifications, improves the probability of decoding usage, and enhances code generation accuracy and efficiency.Semantic understanding limitations, ambiguous data sparsity, contextual incoherence, precise abstraction balancing, language dependence, abstraction degree limitations, complex redundancy, lack of contextual information, maintenance difficulties.Using advanced semantic understanding techniques (semantic role annotation [111], semantic parsing [112]), combination of rules and machine learning [113], attention [80] and context encoding [114], large-scale dataset training (CoNaLa dataset [115]), manual review feedback [116], flexible AST representation [97], semantic information fusion [117], contextual considerations [118], model compression optimization [119] and dynamic update strategies.
Table 6. Overview of key ideas, approaches, key problems, and solutions in neural symbolic methods.
Table 6. Overview of key ideas, approaches, key problems, and solutions in neural symbolic methods.
Modeling/MethodologyKey IdeasApproachesKey ProblemsSolutions
A method for mapping statements in a dialog to logical forms [10]Mapping dialogue statements into logical forms of sequential actionsDialogue memory management encoder–decoder structureIncomplete context symmetry, increased computational memory consumption, strong decoding dependency, limited sequence generation, error propagation, slow training, overdependence on inputs, poor generalization.The model performance and generation quality are comprehensively improved by introducing an attention mechanism [80], using lightweight RNN variants (simplified GRU or LightGRU [121]), optimizing the decoder [122], fusing prior knowledge [123], bundle search [124], label smoothing [125] and increasing training data [126].
Contextual semantic parsing model supported by retrieved data [9]Retrieving data points for contextual semantic parsing evidence(1) Retriever
(2) Meta-learner
Training is time consuming, memory consuming, prone to overfitting, poor interpretability, gradient problems, sequence length limit, slow.Enhance model efficiency with faster hardware [127], model downsizing, and lightweight RNNs [121]. Apply regularization [128], dropout [129], data augmentation [100], and model compression [130] for better generalization. Simplify with visualization and clear structures. Optimize gradients with pruning [131] and GRU [75]. Leverage attention [80], transformers [11], BERT [12] for sequence handling. Accelerate training with batching [132] and parallelism [133].
Neural Semantic parsing with extremely rich
symbolic meaning representation [134]
Introducing lexical ontologies, novel symbols to enhance semantic richness and interpretability of neural parsers.Design of a neural classification semantic parser with novel symbols to represent predicates, evaluated against traditional methods.Large language models are not as effective as expected, hierarchical learning is under-explored, categorical coding needs to be optimized, and ontology and similarity measures need to be extended.(1) Deeply analyze the reasons for model failure and optimize model architecture and training strategies
(2) Explore hierarchical learning methods
(3) Simplify classification coding
(4) Experiment with different ontologies and similarity measures
Weakly Supervised Reasoning with Neuro-Symbolic Methods [135]Fusing Neural Networks and Symbolic Logic to Enhance NLP Interpretive Reasoning and Break Deep Learning Black Boxes.Building Neuro-Symbolic Systems, Integrating Symbolic Explicit Reasoning, and Reinforcement Learning Optimization for Weakly Supervised Reasoning in NLP.Training is expensive, inference needs to be manually designed, and complex semantic relationships are not handled adequately.(1) Optimize training algorithms,
(2) Explore automatic reasoning pattern discovery techniques,
(3) introducing more advanced semantic understanding and multimodal fusion methods
Neuro-symbolic learning generates logical constraints [136]Integration of neural networks and symbols, building neural symbol system for weakly supervised intelligent learning, enhancing AI intelligence.Parallel training networks with logistic, base constraints, differential convex planning for accuracy, trust regions to prevent degradation, stable neural symbol learning.End-to-end learning is complex, resource-intensive, and logically constrained learning survives degradation.(1) Optimizing framework design
(2) Explore efficient algorithms,
(3) Applying trust region tech with language models to reduce cost, improve efficiency, prevent degradation, advance neuro-symbolic
Table 7. Comparison of semantic parsing model performance on different tasks.
Table 7. Comparison of semantic parsing model performance on different tasks.
ModelsAcc./Prec./Rec./F1/BLUETask
LPDQUILP [28]Acc. is 84%.In the best experiments, the CHLL-induced parser, which includes 1100 lines of prolog code, achieves 84% accuracy in responding to new queries.
Dynamic CCG Adaptation for Logical Parsing [33]Prec., Rec. and F1 were 90.61 and 95.49 percent, 81.92 and 83.2 percent, 85.06 and 88.93 percent, respectively.Single-Pass Parsing’s Precision, Rec. and F1 for the ATIS test set were 90.61%, 81.92% and 85.06%, respectively. Single-Pass Parsing’s Precision, Rec. and F1 for the Geo880 test set were 95.49%, 83.2% and 88.93%, respectively.
DCS [38]Acc. of 87.6 percent and 95 percent.LJK11w/augmented triggers were 87.6% accurate in the GEO test set.LJK11w/augmented triggers answered questions 95% accurately in the JOBS test set.
Semantic Parsing of Semi-Structured Tables [44]37.1% for Acc.
76.6 percent for oracle
The model has an Acc. of 37.1% on the WIKITABLE-QUESTION test data, compared to 76.6% for oracle.
LCDMSLF [5]95, 96.5 percent, 95.7 percent for Prec., Rec. and F1, respectively and 83.7% for Acc.The model partially matched Prec., Rec. and F1 on the ATIS DEC94 test set with 95%, 96.5%, and 95.7%, respectively, fully recovering 83.7% of the logical forms.
Neural Semantic Parsing Methods via Feedback [7]The accuracy rates were 79.24 percent and 82.5 percent, respectively.The model was tested on the ATIS and Geo datasets with an accuracy of 79.24% and 82.5%, respectively.
Rapidly building a semantic analysis method [56]56.4% accuracyThe parser in this paper has an accuracy of 56.4% in GEO880.
A memory-based approach to semantic parsing [64]The query, strict representation and loose representation accuracies were 45.3 percent, 70.2 percent and 69.8 percent, respectively. Query and interaction accuracies were 28.4 percent and 40.3 percent, 6.2 percent and 16.7 percent, respectively.The MenCE model using Men-SnipCopy’s codec has ATIS query, strict representation and loose representation accuracies of 45.3%, 70.2% and 69.8%, respectively. Model MenCE + Liu et al. (2020) [93] using Men-Grammar has 28.4% and 6.2% query and interaction accuracies on the CoSQL test set, respectively. Model MenCE + Liu et al. (2020) [93] adopts Men-Grammar with query and interaction accuracies of 40.3% and 16.7% on the SparC test set, respectively.
Data Restructuring for Neural Semantic Parsing [39]Acc. of 89.3 percent, 83.3 percent and 77.5 percent.Model AWP + AE + C2 was tested with 89.3% accuracy on the GEO dataset, model AE + C3 was 83.3% accurate on the ATIS dataset, and model AWP + AE + C2 had an average accuracy of 77.5% on OVERNIGHT.
Enhanced Encoder–Decoder Semantic Parsing with Attention Mechanism [8]The accuracy rates were 90 percent, 87.1 percent, and 84.6 percent, respectively. f1 was 74.2 percent.Model SEQ2TREE has an accuracy of 90%, 87.1%, and 84.6% on the JOBS, GEO, and ATIS test datasets, respectively. Model SEQ2TREE has an F1 of 74.2% on the IFTTT test dataset.
Neural Semantic Parsing for Semi-Structured Tables under Type Constraints [6]Acc. 84.2 percent and 84.6 percent, F1 value 84.1 percent and 86 percent The model in this paper has an accuracy of 42.7% on the WIK-ITABLEQUESTIONS validation set.
Iterative Dialogue Segmentation for Neural Semantic Parsing [2]The accuracy rates were 90.7 percent, 85.4 percent and 72.2 percent, respectively.Model SEQ2SEQ + PDE has an accuracy of 90.7% on the GEO test set. Model SEQ2SEQ + PDE has an accuracy of 85.4% on the FORMULAS test set. Model BASEPARSER2(TRANSFORMER + COPY *) + PDE has an accuracy of 72.2% on COMPLEXWEBQUESTIONS.
Syntax-based Decoding Semantic Parser [93]The accuracy of question and interaction matching on the development set was 52.6%, 41%and 29.9%, 14%, respectively.The accuracies of Ours + BERT for problem and interaction matching on the SparC and CoSOL development sets were 52.6%, 41% and 29.9%, 14%, respectively.
Global Normalized Neural Models for Semantic Parsing [97]The accuracy was 83.3 percent and 73.79 percent. the BLUE score was 28.39 percent.Copy + data recombination has an accuracy of 83.3% on the ATIS test set. Ours(local) + Reranking has a BLUE score of 28.39% on the CoNaLa test set.
Ours(local) has an accuracy of 73.79% on the Spider test set.
Edit-Driven SQL Generation for Cross-Domain Contextual Queries [101]The accuracy rates were 53.4 percent, 47.9 percent, 25.3 percent, 43.9 percent, 68.5 percent and 68.1 percent.Model Ours + utterance-table BERT Embedding has an accuracy of 53.4% on the spider test data. Model Ours + query attention and sequence editing (w/gold query) has 47.9% and 25.3% accuracy for question and interaction matching on SParC test set, respectively. The accuracy of this paper’s method is 43.9%, 68.5% and 68.1% for query, strict representation and loose representation on ATIS test dataset, respectively.
Bottom-up Neural Semantic Parsing Generation [105]The accuracy rates were 38.3 percent, 81.3 percent, 25.1 percent and 86.4 percent.The accuracy of LSP + T5-base is 38.3%, 81.3% and 25.1% in ATIS, Geoquery and Scholar. The accuracy of model LSP + LSTM Encoder in Geoquery is 86.4%.
Learning Sentence-to-Formal Query Mappings [106]The accuracies were 47.4 ± 1.3, 72.3 ± 0.5 and 72.0 ± 0.5, respectively.The Full-GOLD model was 47.4 ± 1.3, 72.3 ± 0.5, and 72.0 ± 0.5 for query, strict representation, and relaxed representation accuracies, respectively.
Knowledge-Aware Conversational Semantic Parsing for Web Forms [107]The accuracy rates were 45.5 percent, 13.2 percent, 70.3 percent, 42.6 percent and 24.8 percent, respectively.The accuracies of CAMP + TU + LM for ALL, SEQ, POS 1, POS 2 and POS 3 are 45.5%, 13.2%, 70.3%, 42.6% and 24.8%, respectively.
RECODE [110]Acc. and BLUE were 19.6 percent and 72.8 percent, 78.4 percent and 84.7 percent, respectively.RECODE’s Acc. and BLUE on HS and Django were 19.6% and 72.8%, 78.4% and 84.7%, respectively.
Dialogue-to-Logical Form Mapping Method [10]Rec. and Prec. were 25.09%, 0.01%, 19.36% and 12.13%, 0.01% and 17.36%, respectively. Acc. was 21.04%, 20.38% and 45.05%, respectively.HRED + KVmem, ContxIndp-SP and D2A methods have Rec. and Prec. of 25.09%, 0.01%, 19.36% and 12.13%, 0.01% and 17.36%, respectively in Clarification problem type. HRED + KVmem, ContxIndp-SP and D2A methods are 21.04%, 20.38% and 45.05% accurate in Verification (Boolean) problem type, respectively.
Retrieval-Augmented Contextual Semantic Parsing [9]Exact and BLEU were 9.15 percent, 10.5 percent and 23.34 percent, 24.40 percent, respectively. F1 value and Acc. were 16.35 percent, 18.31 percent, 18.9 percent, 18.42 percent, 18.70 percent, 19.12 percent and 21.04 percent 45.05 percent 51.17 percent 47.81 percent 55.00 percent 50.16 percent, respectively.Seq2Action in the parsing-based approach without retrieved examples is 9.15% and 23.34% in the CONCODE test set for Exact and BLEU, respectively. Seq2Action + MAML (Context-aware Retrieval) in the retrieval example-based parsing method is 10.50% and 24.40% in the CONCODE test set Exact and BLEU, respectively. In the CSQA test set, the models HRED + KVmen, D2A, S2A, S2A+ EditVec, EditVec +RAndE, RAndE +MAML have F1 values in Clarification question type and Acc. in Verification (Boolean) question type are 16.35%, 18.31%, 18.9%, 18.42%, 18.70%, 19.12%, and 21.04% 45.05% 51.17% 47.81% 55.00% 50.16%.
Table 8. Comparison of semantic parsing APIs.
Table 8. Comparison of semantic parsing APIs.
APIAPI-Providing CompaniesSupported LanguagesInterface TypeApplication ScenarioDominanceDrawbacks
Bosnian data (BosonNLP)Bosnian dataMainly supports ChineseRESTful API, based on the HTTP protocolE-commerce sentiment analysis, news categories and more.The system wins the crown for Chinese word segmentation and provides a full range of solutions from basic to advanced text analysis.Insufficient parsing of system terminology, some advanced features require payment.
gugudata.comgugudata.comPolyglotRESTful API, support HTTPS protocolIt is suitable for text similarity detection, content recommendation and other scenarios.The system is analyzed in seconds, the NLP algorithm is semantically accurate, and the service is continuously updated and guaranteed.The system is pay-per-use and provides customized services for domain text pre-processing.
Baidu AI Open PlatformBaiduPolyglotRESTful API, based on the HTTP protocol.The system is suitable for NLP tasks such as text classification, lexical analysis, word sense disambiguation, and entity recognition.Baidu leverages a large corpus and NLP technology to provide rich APIs that simplify developer integration.API key required to use Baidu NLP, comply with restrictions, complex tasks need to be combined with other technologies.
Tencent cloudTencentPolyglotRESTful API interfaceText categorization, entity recognition, sentiment analysis, etc.Tencent leverages user data and corpus to provide a stable and easy-to-integrate API.Tencent NLP service premium features are available for a fee, and additional development may be required for specific domain customization.
AzureMicrosoft corporationPolyglotRESTful API interfaceLanguage Comprehension, Text Analysis, Text Generation, etc.Microsoft provides diverse NLP API services based on global users.Premium features are available for a fee, in combination with other tools for specific needs.
Google Cloud NLP APIGoogle cloudPolyglotRESTful APIEntity recognition, sentiment analysis, text classification, etc.Google NLP technology is powerful and the API interface is rich and easy to use.Premium features are paid for and limited in some areas.
IBM Watson Natural Language UnderstandingIBM WatsonPolyglotRESTful API interfaceText analytics, sentiment analysis, entity recognition, etc.IBM provides rich NLP-related APIs with its deep AI accumulation.Premium features paid for, additional configuration or development required for specific needs
Lingju Semantic Understanding APISpirit Gathering CorporationPolyglotHTTP Online Service InterfaceSemantic analysis, Q&A systems and voice assistantsProvides semantic understanding analytics and supports private deployment scenarios.May require paid access or customized services
Table 9. Comparison of semantic parsing model training datasets and evaluation methods.
Table 9. Comparison of semantic parsing model training datasets and evaluation methods.
LLMs/DatasetGeoqueryATISGeo880JOBSWIKITABLE-QUESTION SParCCoSQLOVERNINGTIFTTTQuirkWIK-ITABLEQUESTIONSCOMPLEXWEBQUESTIONS FORMULAS CoNaLaSpiderCFQGeoqueryScholarSequentialQAHearthstoneDjangoCSQACONCODECSQAEvaluation Algorithm
LPDQUILP [28] × × × × × × × × × × × × × × × × × × × × × × Acc.
Dynamic CCG Adaptation for Logical Parsing [33] × × × × × × × × × × × × × × × × × × × × × Prec./Rec./F1.
DCS [38] × × × × × × × × × × × × × × × × × × × × × Acc.
Semantic Parsing of Semi-Structured Tables [44] × × × × × × × × × × × × × × × × × × × × × × Acc.
LCDMSLF [5] × × × × × × × × × × × × × × × × × × × × × × Prec./Rec./F1/Acc.
Neural Semantic Parsing Methods via Feedback [7] × × × × × × × × × × × × × × × × × × × × × Acc.
Rapidly building a semantic analysis method [56] × × × × × × × × × × × × × × × × × × × × × × Acc.
A memory-based approach to semantic parsing [64] × × × × × × × × × × × × × × × × × × × × Acc.
Data Restructuring for Neural Semantic Parsing [39] × × × × × × × × × × × × × × × × × × × × Acc.
Enhanced Encoder–Decoder Semantic Parsing with Attention Mechanism [8] × × × × × × × × × × × × × × × × × × × Acc./F1
Neural Semantic Parsing for Semi-Structured Tables under Type Constraints [6] × × × × × × × × × × × × × × × × × × × × × × Acc.
Iterative Dialogue Segmentation for Neural Semantic Parsing [2] × × × × × × × × × × × × × × × × × × × × Acc.
Syntax-based Decoding Semantic Parser [93] × × × × × × × × × × × × × × × × × × × × × Acc.
Global Normalized Neural Models for Semantic Parsing [97] × × × × × × × × × × × × × × × × × × × × Acc./BLUE
Edit-Driven SQL Generation for Cross-Domain Contextual Queries [101] × × × × × × × × × × × × × × × × × × × × Acc.
Bottom-up Neural Semantic Parsing Generation [105] × × × × × × × × × × × × × × × × × × × × Acc.
Learning Sentence-to-Formal Query Mappings [106] × × × × × × × × × × × × × × × × × × × × × × Acc.
Knowledge-Aware Conversational Semantic Parsing for Web Forms [107] × × × × × × × × × × × × × × × × × × × × × × Acc.
RECODE [110] × × × × × × × × × × × × × × × × × × × × × Acc./BLUE
Dialogue-to-Logical Form Mapping Method [10] × × × × × × × × × × × × × × × × × × × × × × Prec./Rec./Acc.
Retrieval-Augmented Contextual Semantic Parsing [9] × × × × × × × × × × × × × × × × × × × × × BLUE/F1/Acc.
Size250541088064022,0334298; 12k+ 30k+, 10k+ -86,96018,496-37,00023798695---606666518,805196K---
SourceQn telephone call recorderRay Moony Ray Moony WikipediaYale WizardofOz MethodAmazon crowdsourcing platIFTTT -Other Google Google Codes Other ---------
√ indicates the presence of this element,  ×  indicates the absence of this element.
Table 10. Comparison of different semantic parsing models for domain-specific studies.
Table 10. Comparison of different semantic parsing models for domain-specific studies.
DomainsTime/AuthorMain ContributionsLimitationFuture Research Directions
Medicine(2017, Roberts K, et al.) [137]Fusing rule-based and machine learning, we propose EHR data-to-logic form method to achieve 95.6% precision parsing, enhance system utility, reliability and unknown term processing, and help medical Q&A system.Semantic parsing research has made a breakthrough, but we need to expand the problem set, improve the applicability and integration of the system, and strengthen the generalization and verify the effect in the future.Extending the question set, optimizing the semantic model, and improving the generalization power; building an end-to-end Q&A system, verifying in real environment, and facilitating medical information integration and service application.
Computers(2023, Espejel J L, et al.) [138]An overview of deep learning applications in Java code generation, analyzing the advantages and disadvantages, exploring the evaluation, and looking forward to research directions for development.Java code generation evaluates multiple syntaxes and ignores semantics, the number of high references, resource requirements limit the application, and time-consuming training becomes a challenge.Developing semantic evaluation metrics, researching efficient models, optimizing training, cross-language generation, and driving innovation in programming and software development.
(2021, Lee C, et al.) [23]An overview of NLP applications in software engineering, exploring advances in natural language to programming language conversion and deep semantic parsing techniques.Semantic parsing evaluation methods are limited, do not adequately assess deep semantics, and are prone to misclassification; the lack of unified code semantics hinders cross-language understanding.In the future, a semantic inference evaluator needs to be developed to unify the semantic representation of the code and extend the semantic parsing to optimize the user interaction and improve the overall semantic parsing capability.
(2021, Shin J, et al.) [139]This paper provides an in-depth overview of natural language programming (NLPG), explores natural language to source code conversion, and proposes customized models, optimized representations aiming to improve code generation accuracy and efficiency and enhance programming capabilities.Automatic code generation research needs to balance naturalness, adaptability, and completeness, and although machine learning helps, the limitations of existing methods make it difficult to satisfy them all.Deep learning combined with statistical modelling, pre-training corpus fine-tuning, optimizing automated code, improving naturalness and adaptability, and driving technological innovation.
Cross-application areas of NLP and Database Management (DBMS)(2022, Qin B, et al.) [140]This paper reviews advances in deep learning for text-to-SQL parsing, covering datasets, pre-training model approaches, pointing out challenges, exploring future directions, and informing research.Insufficient datasets limit scenario performance; optimization challenges in pre-training techniques; insufficient model capacity to handle complex queries.Text-to-SQL research focuses on diverse data, pre-training techniques, cross-language parsing, and combining domain knowledge to broaden applications and promote technology development.
(2022, Deng N, et al.) [141]This paper reviews Text-to-SQL research, covering datasets, methods, and evaluations, summarizing challenges and strategies, pointing out shortcomings, exploring future directions, and stimulating research interests.Text-to-SQL system benchmarks are superior, cross-domain real-world application performance drops, robustness to be improved, and noise impact.Text-to-SQL research should be cross-domain, mention robustness, mine cue learning, innovate evaluation, and promote practical application of the technology.
(2021, Ahkouk K et al.) [142]In this paper, we propose the NLIDB framework to transform natural language to SQL to simplify database access for non-technical users, broaden the scope of use, and optimize the experience.Current frameworks turn SQL to non-technical users, but complex query processing is weak, input sensitivity, poor generalization, affecting accuracy and robustness.The research should improve the framework NLP processing, robustness and accuracy, explore NLP technology, optimize user experience and promote efficient database services.
Economics(2024, Noor S.) [143]This paper analyzes the reasons for the slowness of Knowledge Base Question Answering (KBQA) research, points out the pattern and factual complexity, discusses the limitations of existing methods, and highlights the underutilization of semantic parsing results.The SPICE dataset contributes dialogue semantic parsing but is limited in size, coverage, representation, and lacks baseline model evaluation.Research should extend SPICE scale coverage, delve into linguistic phenomena, cross-language applications, develop advanced models, and enhance dialogue comprehension interactions.
(2020, Fu B., et al.) [54]KBQA progresses on complex Chinese problems, combines information retrieval and neural semantic parsing, looks at future directions, and demonstrates team results.KBQA systems deal with complex problems and face interpretive, reasoning, knowledge base data and large-scale real-time efficiency challenges.KBQA research will improve interpretability, enhance question handling, expand the knowledge base, optimize queries, explore multimodality and provide accurate answers.
Industry(2023, Vougiouklis P, et al.) [144]This paper introduces the FastRAT model, which significantly improves Text-to-SQL decoding speed and cross-language performance with a frameless decoder and multilingual pre-training.The FastRAT model decodes effectively, but the deterministic design limits SQL coverage, some queries are difficult to decode, and the baseline assessment may be incomplete.The research will extend FastRAT to support full SQL, enhance decoding, introduce baseline comprehensive evaluation, and improve cross-linguistic capabilities.
(2024, Zhang W, et al.) [145]The ECLIPSE system fuses template matching with a large language model to optimize cross-language industrial log parsing, launching the ECLIPSE-BENCH benchmark.ECLIPSE Encounters Extreme Logging, Large-Scale Data Challenges, Cross-Language Capabilities Limited by Language Model Support.The research will explore advanced algorithmic models, optimize index storage, extend cross-language, and validate ECLIPSE industrial utility.
(2023, Yang Y, et al.) [146]The LogParser framework applies deep learning to in industrial ICT systems (IICTSs) log parsing, significantly improving accuracy and efficiency. It provides support for safe and reliable production.LogParser is expensive to train, requires customization of specific logs, regular updates to adapt to changes, and copes with large-scale diversity.The research will optimize deep learning models, improve generalization, incorporate tools to enhance analysis, automate updates and extend LogParser applications.
Agriculture(2024, Yuan W, et al.) [147]The AMR-OPO framework combines the BERT-BiLSTM-ATT-CRF-OPO model to transform the agricultural user language into a triad to enhance the interaction and experience of agricultural measurement and control systems.The results of this research are remarkable, but the data scale, cross-domain validation, and real-time performance need to be optimized for in-depth research.The research will explore the AMR-OPO cross-domain, optimize algorithms to improve real-time, iterate on user experience, and expand the diversity of data sizes.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, P.; Cai, X. A Survey of Semantic Parsing Techniques. Symmetry 2024, 16, 1201. https://doi.org/10.3390/sym16091201

AMA Style

Jiang P, Cai X. A Survey of Semantic Parsing Techniques. Symmetry. 2024; 16(9):1201. https://doi.org/10.3390/sym16091201

Chicago/Turabian Style

Jiang, Peng, and Xiaodong Cai. 2024. "A Survey of Semantic Parsing Techniques" Symmetry 16, no. 9: 1201. https://doi.org/10.3390/sym16091201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop