A Survey of Semantic Parsing Techniques
Abstract
:1. Introduction
- In-depth exploration of semantic parsing methods and problem solutions: we explore in detail the core ideas and technical implementations of semantic parsing methods and propose solutions to potential problems with comprehensive and in-depth discussions and analyses.
- Systematic Comparison of Model Performance in Multi-task Contexts: We systematically compare the performance of semantic parsing models under different evaluation metrics in multiple real-world task scenarios, which provides strong support for model optimization and application.
- Detailed analysis of datasets: We conduct a comprehensive analysis of the datasets used for training and evaluation of semantic parsing models, revealing the characteristics of the datasets and their impact on model performance.
- Summary of interfaces in real-world application scenarios: We summarize the interfaces of semantic parsing models in real-world application scenarios, providing practical reference information for model application and integration.
- Detailed Analysis of Domain-Specific Applications and Future Prospects: For specific domains, we analyze in detail the application examples and limitations of semantic parsing models, and look forward to future research directions, which guide in-depth research in this area.
- Review of Technical Challenges and Future Research Directions: We review the main problems and challenges faced by semantic parsing technology at present, and at the same time look forward to its future development trends and research directions, aiming at providing valuable references and inspirations for the further development of this field.
2. Literature Review
3. Methodology
3.1. A Comprehensive Systematic Review and Evaluation of Semantic Parsing Techniques in the PRISMA Framework
3.2. Inter-Rater Reliability Assessment and Data Quality
4. Overview of Results
5. Key Technologies for Semantic Parsing
5.1. Traditional Semantic Parsing Methods
5.2. A Neural Network-Based Approach to Semantic Parsing
5.2.1. Symbolic Methods
5.2.2. Neural Network Approach
5.2.3. Neuro-Symbolic Method
6. Datasets and Evaluation Indicators
7. Domain-Specific Applications
8. Open Issues, Challenges, and Future Research Directions
8.1. Open Problems and Challenges
8.1.1. Open Problems and Challenges I: Obtaining Semantic Trajectories for Reasonable Contexts
8.1.2. Open Issues and Challenges II: Delineating Contextual Logic Boundaries
8.1.3. Open Issues and Challenges III: Small Chinese Semantic Expression Datasets and Unreasonable Evaluation Methods
8.1.4. IV Openness Issues and Challenges: State-of-the-Art Big Language Models Difficult to Make Public and Exploit in Semantic Parsing Techniques
8.2. Future Research Directions
8.2.1. Direction I: Effective Contextual Semantic Trajectory Acquisition Methods
8.2.2. Direction II: Precise Contextual Logic Boundary Delineation
8.2.3. Direction III: Development of Chinese Parsing Datasets and Rational Evaluation Criteria
8.2.4. Direction IV: Challenges and Strategy Exploration of Large Language Models in Breakthrough Semantic Parsing Technology
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pelletier, F.J. The Principle of Semantic Compositionality. Topoi 1994, 13, 11–24. [Google Scholar] [CrossRef]
- Guo, Y.N.; Lin, Z.Q.; Lou, J.G.; Zhang, D.M. Iterative Utterance Segmentation for Neural Semantic Parsing. In Proceedings of the 35th AAAI Conference on Artificial Intelligence/33rd Conference on Innovative Applications of Artificial Intelligence/11th Symposium on Educational Advances in Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 12937–12945. [Google Scholar]
- Jiang, J.; Liu, J.; Fu, J.; Zhu, X.X.; Li, Z.C.; Lu, H.Q. Global-Guided Selective Context Network for Scene Parsing. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1752–1764. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Qu, L.; Haffari, G. Context Dependent Semantic Parsing: A Survey. arXiv 2020, arXiv:2011.00797. [Google Scholar]
- Zettlemoyer, L.S.; Collins, M. Learning Context-Dependent Mappings from Sentences to Logical Form. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, 2–7 August 2009; pp. 976–984. [Google Scholar]
- Krishnamurthy, J.; Dasigi, P.; Gardner, M. Neural Semantic Parsing with Type Constraints for Semi-Structured Tables. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 1516–1526. [Google Scholar]
- Iyer, S.; Konstas, I.; Cheung, A.; Krishnamurthy, J.; Zettlemoyer, L. Learning a Neural Semantic Parser from User Feedback. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 963–973. [Google Scholar]
- Dong, L.; Lapata, M. Language to Logical Form with Neural Attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 33–43. [Google Scholar]
- Guo, D.Y.; Tang, D.Y.; Duan, N.; Zhou, M.; Yin, J. Acl Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing. In Proceedings of the 57th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Florence, Italy, 28 July–2 August 2019; pp. 855–866. [Google Scholar]
- Guo, D.Y.; Tang, D.Y.; Duan, N.; Zhou, M.; Yin, J. Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS), Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Joshi, M.; Chen, D.; Liu, Y.; Weld, D.S.; Zettlemoyer, L.; Levy, O. Spanbert: Improving Pre-Training by Representing and Predicting Spans. Trans. Assoc. Comput. Linguist. 2020, 8, 64–77. [Google Scholar] [CrossRef]
- Yang, Z.L.; Dai, Z.H.; Yang, Y.M.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A Robustly Optimized Bert Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 21 May 2024).
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Achiam, J.; Adler, J.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Ren, X.; Zhou, P.; Meng, X.; Huang, X.; Wang, Y.; Wang, W.; Li, P.; Zhang, X.; Podolskiy, A.; Arshinov, G. PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing. arXiv 2023, arXiv:2303.10845. [Google Scholar]
- Zhang, M. A Survey of Syntactic-Semantic Parsing Based on Constituent and Dependency Structures. Sci. China Technol. Sci. 2020, 63, 1898–1920. [Google Scholar] [CrossRef]
- Kumar, P.; Bedathur, S. A Survey on Semantic Parsing from the Perspective of Compositionality. arXiv 2020, arXiv:2009.14116. [Google Scholar]
- Lee, C.; Gottschlich, J.; Roth, D. Toward Code Generation: A Survey and Lessons from Semantic Parsing. arXiv 2021, arXiv:2105.03317. [Google Scholar]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; the PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Ann. Intern. Med. 2009, 151, 264–269. [Google Scholar] [CrossRef]
- Ogunleye, B.; Zakariyyah, K.I.; Ajao, O.; Olayinka, O.; Sharma, H. A Systematic Review of Generative AI for Teaching and Learning Practice. Educ. Sci. 2024, 14, 636. [Google Scholar] [CrossRef]
- Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouverneur, F. Systematic Review of Research on Artificial Intelligence Applications in Higher Education—Where Are the Educators? Int. J. Educ. Technol. High. Educ. 2019, 16, 39. [Google Scholar] [CrossRef]
- McHugh, M.L. Interrater Reliability: The Kappa Statistic. Biochem. Medica 2012, 22, 276–282. [Google Scholar] [CrossRef]
- Zelle, J.M.; Mooney, R.J. Learning to Parse Database Queries Using Inductive Logic Programming. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI 96)/8th Conference on Innovative Applications of Artificial Intelligence (IAAI 96), Portland, OR, USA, 4–8 August 1996; pp. 1050–1055. [Google Scholar]
- Li, Z. Semantic Parsing in Limited Resource Conditions. arXiv 2023, arXiv:2309.07429. [Google Scholar]
- Hoque, M.N.; Ghai, B.; Kraus, K.; Elmqvist, N. Portrayal: Leveraging NLP and Visualization for Analyzing Fictional Characters. In Proceedings of the 2023 ACM Designing Interactive Systems Conference, Pittsburgh, PA, USA, 10–14 July 2023; pp. 74–94. [Google Scholar]
- Dou, L.; Gao, Y.; Pan, M.; Wang, D.; Che, W.; Zhan, D.; Lou, J.-G. MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 13–14 February 2023; Volume 37, pp. 12745–12753. [Google Scholar]
- Maulud, D.H.; Zeebaree, S.R.; Jacksi, K.; Sadeeq, M.A.M.; Sharif, K.H. State of Art for Semantic Analysis of Natural Language Processing. Qubahan Acad. J. 2021, 1, 21–28. [Google Scholar] [CrossRef]
- Zettlemoyer, L.; Collins, M. Online Learning of Relaxed CCG Grammars for Parsing to Logical Form. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; pp. 678–687. [Google Scholar]
- Zhang, X.; Le Roux, J.; Charnois, T. Higher-Order Dependency Parsing for Arc-Polynomial Score Functions via Gradient-Based Methods and Genetic Algorithm. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, Online, 20–23 November 2022; pp. 1158–1171. [Google Scholar]
- Ng, A.Y. Feature Selection, L 1 vs. L 2 Regularization, and Rotational Invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 78. [Google Scholar]
- Peters, M.E.; Neumann, M.; Logan, R.L.; Schwartz, R.; Joshi, V.; Singh, S.; Smith, N.A. Knowledge Enhanced Contextual Word Representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing/9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 43–54. [Google Scholar]
- Glorot, X.; Anand, A.; Aygun, E.; Mourad, S.; Kohli, P.; Precup, D. Learning Representations of Logical Formulae Using Graph Neural Networks. In Proceedings of the Neural Information Processing Systems, Workshop on Graph Representation Learning, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Liang, P.; Jordan, M.I.; Klein, D. Learning Dependency-Based Compositional Semantics. Comput. Linguist. 2013, 39, 389–446. [Google Scholar] [CrossRef]
- Jia, R.; Liang, P. Data Recombination for Neural Semantic Parsing. In Proceedings of the 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Berlin, Germany, 7–12 August 2016; pp. 12–22. [Google Scholar]
- Petit, A.; Corro, C. On Graph-Based Reentrancy-Free Semantic Parsing. arXiv 2023, arXiv:2302.07679. [Google Scholar] [CrossRef]
- Zong, W.; Wu, F.; Chu, L.-K.; Sculli, D. A Discriminative and Semantic Feature Selection Method for Text Categorization. Int. J. Prod. Econ. 2015, 165, 215–222. [Google Scholar] [CrossRef]
- Li, Y.; Yang, M.; Zhang, Z. A Survey of Multi-View Representation Learning. IEEE Trans. Knowl. Data Eng. 2018, 31, 1863–1883. [Google Scholar] [CrossRef]
- Lei, S.; Yi, W.; Ying, C.; Ruibin, W. Review of Attention Mechanism in Natural Language Processing. Data Anal. Knowl. Discov. 2020, 4, 1–14. [Google Scholar]
- Pasupat, P.; Liang, P. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association-for-Computational-Linguistics (ACS)/7th International Joint Conference on Natural Language Processing of the Asian-Federation-of-Natural-Language-Processing (IJCNLP), Beijing, China, 26–31 July 2015; pp. 1470–1480. [Google Scholar]
- Al Sharou, K.; Li, Z.; Specia, L. Towards a Better Understanding of Noise in Natural Language Processing. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), Online, 1–3 September 2021; pp. 53–62. [Google Scholar]
- Li, Z.; Haffari, G. Active Learning for Multilingual Semantic Parser. arXiv 2023, arXiv:2301.12920. [Google Scholar]
- Krishnamurthy, J.; Mitchell, T. Weakly Supervised Training of Semantic Parsers. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 754–765. [Google Scholar]
- Yadav, R.K.; Jiao, L.; Granmo, O.-C.; Goodwin, M. Interpretability in Word Sense Disambiguation Using Tsetlin Machine. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence, Vienna, Austria, 4–6 February 2021; pp. 402–409. [Google Scholar]
- Wang, X.; Sun, H.; Qi, Q.; Wang, J. SETNet: A Novel Semi-Supervised Approach for Semantic Parsing. In Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020; IOS Press: Amsterdam, The Netherlands, 2020; pp. 2236–2243. [Google Scholar]
- Duffy, K.; Bhattamishra, S.; Blunsom, P. Structural Transfer Learning in NL-to-Bash Semantic Parsers. arXiv 2023, arXiv:2307.16795. [Google Scholar]
- Zhang, L.; Xie, X.; Xie, K.; Wang, Z.; Lu, Y.; Zhang, Y. An Efficient Log Parsing Algorithm Based on Heuristic Rules. In Proceedings of the Advanced Parallel Processing Technologies: 13th International Symposium, APPT 2019, Tianjin, China, 15–16 August 2019; pp. 123–134. [Google Scholar]
- Amershi, S.; Cakmak, M.; Knox, W.B.; Kulesza, T. Power to the People: The Role of Humans in Interactive Machine Learning. AI Mag. 2014, 35, 105–120. [Google Scholar] [CrossRef]
- Clark, K.; Manning, C.D. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Berlin, Germany, 7–12 August 2016; Association of Computational Linguistics—ACL: Stroudsburg, PA, USA, 2016; pp. 643–653. [Google Scholar]
- Fu, B.; Qiu, Y.; Tang, C.; Li, Y.; Yu, H.; Sun, J. A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges. arXiv 2020, arXiv:2007.13069. [Google Scholar]
- Chen, Y.; Das, M. An Automated Technique for Image Noise Identification Using a Simple Pattern Classification Approach. In Proceedings of the 2007 50th Midwest Symposium on Circuits and Systems, Montreal, QC, Canada, 5–8 August 2007; pp. 819–822. [Google Scholar]
- Wang, Y.S.; Berant, J.; Liang, P. Building a Semantic Parser Overnight. In Proceedings of the 53rd Annual Meeting of the Association-for-Computational-Linguistics (ACS)/7th International Joint Conference on Natural Language Processing of the Asian-Federation-of-Natural-Language-Processing (IJCNLP), Beijing, China, 26–31 July 2015; pp. 1332–1342. [Google Scholar]
- Bai, J.; Liu, X.; Wang, W.; Luo, C.; Song, Y. Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints. arXiv 2023, arXiv:2305.19068. [Google Scholar]
- Landauer, T. Latent Semantic Analysis: Theory, Method and Application. In Computer Support for Collaborative Learning; Routledge: Abingdon, UK, 2023; pp. 742–743. [Google Scholar]
- Laukaitis, A.; Ostasius, E.; Plikynas, D. Deep Semantic Parsing with Upper Ontologies. Appl. Sci. 2021, 11, 9423. [Google Scholar] [CrossRef]
- Hsu, W.N.; Zhang, Y.; Glass, J. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Zhang, S.; Jafari, O.; Nagarkar, P. A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data. arXiv 2021, arXiv:2109.03784. [Google Scholar]
- Abu-Salih, B. Domain-Specific Knowledge Graphs: A Survey. J. Netw. Comput. Appl. 2021, 185, 103076. [Google Scholar] [CrossRef]
- Jiao, X.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. Tinybert: Distilling Bert for Natural Language Understanding. arXiv 2019, arXiv:1909.10351. [Google Scholar]
- Jain, P.; Lapata, M. Memory-Based Semantic Parsing. Trans. Assoc. Comput. Linguist. 2021, 9, 1197–1212. [Google Scholar] [CrossRef]
- Goyal, P.; Dollár, P.; Girshick, R.; Noordhuis, P.; Wesolowski, L.; Kyrola, A.; Tulloch, A.; Jia, Y.; He, K. Accurate, Large Minibatch Sgd: Training Imagenet in 1 Hour. arXiv 2017, arXiv:1706.02677. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Model-Agnostic Interpretability of Machine Learning. arXiv 2016, arXiv:1606.05386. [Google Scholar]
- Kuznetsov, M.; Firsov, G. Syntax Error Search Using Parser Combinators. In Proceedings of the IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), St. Petersburg/Moscow, Russia, 26–28 January 2021; pp. 490–493. [Google Scholar]
- Vilares, D.; Gómez-Rodríguez, C. Transition-Based Parsing with Lighter Feed-Forward Networks. arXiv 2018, arXiv:1810.08997. [Google Scholar]
- Gomes, I.; Morgado, P.; Gomes, T.; Moreira, R. An Overview on the Static Code Analysis Approach in Software Development; Faculdade de Engenharia da Universidade do Porto: Porto, Portugal, 2009; Volume 16. [Google Scholar]
- Chai, L.; Xiao, D.; Yan, Z.; Yang, J.; Yang, L.; Zhang, Q.-W.; Cao, Y.; Li, Z. QURG: Question Rewriting Guided Context-Dependent Text-to-SQL Semantic Parsing. In Proceedings of the 20th Pacific Rim International Conference on Artificial Intelligence, Jakarta, Indonesia, 15–19 November 2023; pp. 275–286. [Google Scholar]
- Yıldırım, M.; Okay, F.Y.; Özdemir, S. Big Data Analytics for Default Prediction Using Graph Theory. Expert Syst. Appl. 2021, 176, 114840. [Google Scholar] [CrossRef]
- Zhang, F.; Peng, M.; Shen, Y.; Wu, Q. Hierarchical Features Extraction and Data Reorganization for Code Search. J. Syst. Softw. 2024, 208, 111896. [Google Scholar] [CrossRef]
- Mahmoudi, O.; Bouami, M.F. RNN and LSTM Models for Arabic Speech Commands Recognition Using PyTorch and GPU. In Proceedings of the International Conference on Artificial Intelligence & Industrial Applications, Meknes, Morocco, 17–18 February 2023; pp. 462–470. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Luong, M.-T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-Based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
- Pluščec, D.; Šnajder, J. Data Augmentation for Neural NLP. arXiv 2023, arXiv:2302.11412. [Google Scholar]
- Merity, S.; Keskar, N.S.; Socher, R. Regularizing and Optimizing LSTM Language Models. arXiv 2017, arXiv:1708.02182. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Dhingra, B.; Liu, H.X.; Yang, Z.L.; Cohen, W.W.; Salakhutdinov, R. Gated-Attention Readers for Text Comprehension. In Proceedings of the 55th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1832–1846. [Google Scholar]
- Zhao, N.; Li, H.; Wu, Y.; He, X. JDDC 2.1: A Multimodal Chinese Dialogue Dataset with Joint Tasks of Query Rewriting, Response Generation, Discourse Parsing, and Summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 12037–12051. [Google Scholar]
- Bilal, M.; Ali, G.; Iqbal, M.W.; Anwar, M.; Malik, M.S.A.; Kadir, R.A. Auto-Prep: Efficient and Automated Data Preprocessing Pipeline. IEEE Access 2022, 10, 107764–107784. [Google Scholar] [CrossRef]
- Andreas, J. Good-Enough Compositional Data Augmentation. arXiv 2019, arXiv:1904.09545. [Google Scholar]
- Li, W.; Srihari, R.K.; Niu, C.; Li, X. Question Answering on a Case Insensitive Corpus. In Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, Sapporo, Japan, 11 July 2003; pp. 84–93. [Google Scholar]
- Kakkar, V.; Sharma, C.; Pande, M.; Kumar, S. Search Query Spell Correction with Weak Supervision in E-Commerce. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; Volume 5, pp. 687–694. [Google Scholar]
- Nurcahyawati, V.; Mustaffa, Z. Improving Sentiment Reviews Classification Performance Using Support Vector Machine-Fuzzy Matching Algorithm. Bull. Electr. Eng. Inform. 2023, 12, 1817–1824. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Pascanu, R.; Gulcehre, C.; Cho, K.; Bengio, Y. How to Construct Deep Recurrent Neural Networks. arXiv 2013, arXiv:1312.6026. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Sun, R.-Y. Optimization for Deep Learning: An Overview. J. Oper. Res. Soc. China 2020, 8, 249–294. [Google Scholar] [CrossRef]
- Liu, Q.; Chen, B.; Guo, J.; Lou, J.-G.; Zhou, B.; Zhang, D. How Far Are We from Effective Context Modeling? An Exploratory Study on Semantic Parsing in Context. arXiv 2020, arXiv:2002.00652. [Google Scholar]
- Azizi, S.; Mustafa, B.; Ryan, F.; Beaver, Z.; Freyberg, J.; Deaton, J.; Loh, A.; Karthikesalingam, A.; Kornblith, S.; Chen, T.; et al. Big Self-Supervised Models Advance Medical Image Classification. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 3458–3468. [Google Scholar]
- Vig, J.; Madani, A.; Varshney, L.R.; Xiong, C.; Socher, R.; Rajani, N.F. Bertology Meets Biology: Interpreting Attention in Protein Language Models. arXiv 2020, arXiv:2006.15222. [Google Scholar]
- Zafar, A.; Anwar, B. Analysis of Semantic and Syntactic Properties of Urdu Verb by Using Machine Learning. Pak. J. Soc. Sci. 2023, 43, 103–122. [Google Scholar]
- Huang, C.Y.; Yang, W.; Cao, Y.S.; Zaiane, O.; Mou, L.L.; Assoc Computat, L. A Globally Normalized Neural Model for Semantic Parsing. In Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP)/5th Workshop on Online Abuse and Harms (WOAH), Online, 6 August 2021; pp. 61–66. [Google Scholar]
- Dyer, C.; Ballesteros, M.; Ling, W.; Matthews, A.; Smith, N.A. Transition-Based Dependency Parsing with Stack Long Short-Term Memory. arXiv 2015, arXiv:1505.08075. [Google Scholar]
- Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
- Pellicer, L.F.A.O.; Ferreira, T.M.; Costa, A.H.R. Data Augmentation Techniques in Natural Language Processing. Appl. Soft. Comput. 2023, 132, 109803. [Google Scholar] [CrossRef]
- Zhang, R.; Yu, T.; Er, H.Y.; Shim, S.; Xue, E.R.; Lin, X.V.; Shi, T.Z.; Xiong, C.M.; Socher, R.; Radev, D.; et al. Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing/9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5338–5349. [Google Scholar]
- Fang, M.; Peng, S.; Liang, Y.; Hung, C.-C.; Liu, S. A Multimodal Fusion Model with Multi-Level Attention Mechanism for Depression Detection. Biomed. Signal Process. Control 2023, 82, 104561. [Google Scholar] [CrossRef]
- Yang, G.; Liu, S.; Li, Y.; He, L. Short-Term Prediction Method of Blood Glucose Based on Temporal Multi-Head Attention Mechanism for Diabetic Patients. Biomed. Signal Process. Control 2023, 82, 104552. [Google Scholar] [CrossRef]
- Sankar, C.; Subramanian, S.; Pal, C.; Chandar, S.; Bengio, Y. Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study. arXiv 2019, arXiv:1906.01603. [Google Scholar]
- Crouse, M.; Kapanipathi, P.; Chaudhury, S.; Naseem, T.; Astudillo, R.; Fokoue, A.; Klinger, T. Laziness Is a Virtue When It Comes to Compositionality in Neural Semantic Parsing. arXiv 2023, arXiv:2305.04346. [Google Scholar]
- Suhr, A.; Iyer, S.; Artzi, Y. Learning to Map Context-Dependent Sentences to Executable Formal Queries. arXiv 2018, arXiv:1804.06868. [Google Scholar]
- Sun, Y.B.; Tang, D.Y.; Xu, J.J.; Duan, N.; Feng, X.C.; Qin, B.; Liu, T.; Zhou, M. Knowledge-Aware Conversational Semantic Parsing over Web Tables. In Proceedings of the 8th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC), Dunhuang, China, 9–14 October 2019; Volume 11838, pp. 827–839. [Google Scholar]
- Wang, P.; Zhang, J.; Lian, X.; Lu, L. Stacked Recurrent Neural Network Based High Precision Pointing Coupled Control of the Spacecraft and Telescopes. Adv. Space Res. 2023, 71, 692–704. [Google Scholar] [CrossRef]
- Neill, J.O. An Overview of Neural Network Compression. arXiv 2020, arXiv:2006.03669. [Google Scholar]
- Hayati, S.A.; Olivier, R.; Avvaru, P.; Yin, P.C.; Tomasic, A.; Neubig, G. Retrieval-Based Neural Code Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, 31 October–4 November 2018; pp. 925–930. [Google Scholar]
- Aghdam, S.N.; Hossayni, S.A.; Sadeh, E.K.; Khozouei, N.; Bidgoli, B.M. Persian Semantic Role Labeling Using Transfer Learning and BERT-Based Models. arXiv 2023, arXiv:2306.10339. [Google Scholar]
- Sherborne, T.; Lapata, M. Meta-Learning a Cross-Lingual Manifold for Semantic Parsing. Trans. Assoc. Comput. Linguist. 2023, 11, 49–67. [Google Scholar] [CrossRef]
- Ang, R.J. Rule-Based and Machine Learning Approaches to AI. Can. J. Nurs. Inform. 2023, 18, 2. [Google Scholar]
- Li, R.H.; Cheng, L.L.; Wang, D.P.; Tan, J.M. Siamese BERT Architecture Model with Attention Mechanism for Textual Semantic Similarity. Multimed. Tools Appl. 2023, 22, 46673–46694. [Google Scholar] [CrossRef]
- Evtikhiev, M.; Bogomolov, E.; Sokolov, Y.; Bryksin, T. Out of the Bleu: How Should We Assess Quality of the Code Generation Models? J. Syst. Softw. 2023, 203, 111741. [Google Scholar] [CrossRef]
- Holtzman, A.; Buys, J.; Du, L.; Forbes, M.; Choi, Y. The Curious Case of Neural Text Degeneration. arXiv 2019, arXiv:1904.09751. [Google Scholar]
- Zhang, K.; Wang, W.; Zhang, H.; Li, G.; Jin, Z. Learning to Represent Programs with Heterogeneous Graphs. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, Pittsburgh, PA, USA, 16–17 May 2022; pp. 378–389. [Google Scholar]
- Feng, Z.; Guo, D.; Tang, D.; Duan, N.; Feng, X.; Gong, M.; Shou, L.; Qin, B.; Liu, T.; Jiang, D. Codebert: A Pre-Trained Model for Programming and Natural Languages. arXiv 2020, arXiv:2002.08155. [Google Scholar]
- Stork, C.H.; Haldar, V. Compressed Abstract Syntax Trees for Mobile Code. In Proceedings of the Workshop on Intermediate Representation Engineering, Trinity College, Dublin, Ireland, 13–14 June 2002. [Google Scholar]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. Adv. Neural Inf. Process. Syst. 2015, 2692, 2700. [Google Scholar]
- Phyu, H.P.; Naboulsi, D.; Stanica, R. Machine Learning in Network Slicing—A Survey. IEEE Access 2023, 11, 39123–39153. [Google Scholar] [CrossRef]
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The Efficient Transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Freitag, M.; Al-Onaizan, Y. Beam Search Strategies for Neural Machine Translation. arXiv 2017, arXiv:1702.01806. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. IEEE Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Yang, W.; Xie, Y.; Tan, L.; Xiong, K.; Li, M.; Lin, J. Data Augmentation for Bert Fine-Tuning in Open-Domain Question Answering. arXiv 2019, arXiv:1904.06652. [Google Scholar]
- Chetlur, S.; Woolley, C.; Vandermersch, P.; Cohen, J.; Tran, J.; Catanzaro, B.; Shelhamer, E. Cudnn: Efficient Primitives for Deep Learning. arXiv 2014, arXiv:1410.0759. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z. Adaptive Dropout: A Novel Regularization Technique for Deep Neural Networks. Available online: http://www.arxivgen.com/pdfs/adaptive_dropout_a_n-3p5.pdf (accessed on 21 May 2024).
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the Difficulty of Training Recurrent Neural Networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–31 June 2013; pp. 1310–1318. [Google Scholar]
- Cambria, E.; White, B. Jumping NLP Curves: A Review of Natural Language Processing Research. IEEE Comput. Intell. M 2014, 9, 48–57. [Google Scholar] [CrossRef]
- Huang, Y.P.; Cheng, Y.L.; Bapna, A.; Firat, O.; Chen, M.X.; Chen, D.H.; Lee, H.; Ngiam, J.; Le, Q.V.; Wu, Y.H.; et al. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Zhang, X.; Bouma, G.; Bos, J. Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations. arXiv 2024, arXiv:2404.12698. [Google Scholar]
- Liu, X.; Lu, Z.; Mou, L. Weakly Supervised Reasoning by Neuro-Symbolic Approaches. arXiv 2023, arXiv:2309.13072, 665–692. [Google Scholar]
- Li, Z.; Huang, Y.; Li, Z.; Yao, Y.; Xu, J.; Chen, T.; Ma, X.; Lu, J. Neuro-Symbolic Learning Yielding Logical Constraints. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Roberts, K.; Patra, B.G. A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms. In Proceedings of the AMIA Annual Symposium, Washington, DC, USA, 4–8 November 2017; Volume 2017, p. 1478. [Google Scholar]
- Espejel, J.L.; Alassan, M.S.Y.; Chouham, E.M.; Dahhane, W.; Ettifouri, E.H. A Comprehensive Review of State-of-the-Art Methods for Java Code Generation from Natural Language Text. Nat. Lang. Process. J. 2023, 3, 100013. [Google Scholar] [CrossRef]
- Shin, J.; Nam, J. A Survey of Automatic Code Generation from Natural Language. J. Inf. Process Syst. 2021, 17, 537–555. [Google Scholar]
- Qin, B.; Hui, B.; Wang, L.; Yang, M.; Li, J.; Li, B.; Geng, R.; Cao, R.; Sun, J.; Si, L. A Survey on Text-to-Sql Parsing: Concepts, Methods, and Future Directions. arXiv 2022, arXiv:2208.13629. [Google Scholar]
- Deng, N.; Chen, Y.; Zhang, Y. Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect. arXiv 2022, arXiv:2208.10099. [Google Scholar]
- Ahkouk, K.; Mustapha, M.; Khadija, M.; Rachid, M. A Review of the Text to SQL Frameworks. In Proceedings of the 4th International Conference on Networking, Information Systems & Security, Kenitra, Morocco, 26 November 2021; pp. 1–6. [Google Scholar]
- Noor, S. Semantic Parsing for Knowledge Graph Question Answering. Int. J. Hum. Soc. 2024, 4, 33–45. [Google Scholar]
- Vougiouklis, P.; Papasarantopoulos, N.; Zheng, D.; Tuckey, D.; Diao, C.; Shen, Z.; Pan, J. FastRAT: Fast and Efficient Cross-Lingual Text-to-SQL Semantic Parsing. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Nusa Dua, Bali, 7 April 2023; Volume 1, pp. 564–576. [Google Scholar]
- Zhang, W.; Cheng, X.; Zhang, Y.; Yang, J.; Guo, H.; Li, Z.; Yin, X.; Guan, X.; Shi, X.; Zheng, L. ECLIPSE: Semantic Entropy-LCS for Cross-Lingual Industrial Log Parsing. arXiv 2024, arXiv:2405.13548. [Google Scholar]
- Yang, Y.; Wang, B.; Zhao, C. Deep Learning-Based Log Parsing for Monitoring Industrial ICT Systems. Appl. Sci. 2023, 13, 3691. [Google Scholar] [CrossRef]
- Yuan, W.; Yang, M.; Gu, H.; Xu, G. Natural Language Command Parsing for Agricultural Measurement and Control Based on AMR and Entity Recognition. J. Intell. Fuzzy Syst. 2024, 1–16. [Google Scholar] [CrossRef]
- Zheng, Y.Z.; Wang, H.B.; Dong, B.H.; Wang, X.J.; Li, C.S. HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing. In Proceedings of the 60th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Dublin, Ireland, 22–27 May 2022; pp. 2997–3007. [Google Scholar]
- Reiter, E. A Structured Review of the Validity of BLEU. Comput. Linguist. 2018, 44, 393–401. [Google Scholar] [CrossRef]
- Open AI Introducing Gpt-4o: Our Fastest and Most Affordable Flagship Model. 2024. Available online: https://platform.openai.com/docs/guides/vision (accessed on 14 June 2024).
- Fakoor, R.; Chaudhari, P.; Soatto, S.; Smola, A.J. Meta-q-Learning. arXiv 2019, arXiv:1910.00125. [Google Scholar]
- Gao, Y.F.; Zhu, H.H.; Ng, P.; dos Santos, C.N.; Wang, Z.G.; Nan, F.; Zhang, D.J.; Nallapati, R.; Arnold, A.O.; Xiang, B.; et al. Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction. In Proceedings of the Joint Conference of 59th Annual Meeting of the Association-for-Computational-Linguistics (ACL)/11th International Joint Conference on Natural Language Processing (IJCNLP)/6th Workshop on Representation Learning for NLP (RepL4NLP), Online, 1–6 August 2021; pp. 3263–3276. [Google Scholar]
Author | Based on a Large Language Model for Semantic Parsing | The Key Ideas and Methods of the Semantic Parsing Approach Are Outlined and Analyzed in Detail | Detailed Overview and Analysis of the Problems and Solutions of the Semantic Parsing Approach | Detailed Overview and Analysis of the Logical Trajectories of Typical Semantic Parsing Approaches | Performance Comparison of Semantic Parsing Models for Different Tasks Using Different Metrics | Overview of the Semantic Parsing Model API | Overview and Analysis of Semantic Parsing Model Datasets and Evaluation Metrics | Application of Semantic Parsing Models to Specific Domains | Main Research Work |
---|---|---|---|---|---|---|---|---|---|
Zhang M (2020) [21] | This paper surveys progress in syntactic and semantic parsing in NLP, including parsing techniques, cross-domain and cross-linguistic models, and discusses parser applications, corpus development for research guidance. | ||||||||
Kumar P, et al., (2020) [22] | This paper delves into semantic parsing, examining the syntactic formation of meaning, lexical diversity, formal grammars like CCG, semantic combination methods with λ-calculus and λDCS, assessing logical parser performance, and its capacity to handle complex issues using benchmark data. | ||||||||
Lee C, et al., (2021) [23] | This review covers semantic parsing techniques, program synthesis comparison, evolutionary trends, neuro-symbolic methods, supervised applications, code generation progress, challenges, and future research directions. | ||||||||
Ours | This study deeply analyzes semantic parsing, assesses models, surveys applications, addresses challenges, and points to future research for field advancement. |
Classifications | Standard Description |
---|---|
Inclusion criteria | Domain relevance: The literature must belong to the field of NLP, with a particular focus on semantic parsing techniques. |
Keyword matching: The document must contain the keywords “semantic parsing”, “semiotic approach”, “neural approach” or “semiotic neural approach” or their relevant variants. Keywords such as “semantic parsing”, “semiotic approach”, “neural approach” or “semiotic neural approach” or their related variants. | |
High quality and widely recognized: Literature should come from authoritative, comprehensive, efficient and resourceful databases, such as Web of Science, Engineering Village, IEEE, SpringerLink, etc., as well as from widely recognized and high-impact literature found through Google Scholar. | |
Full-text accessibility: The literature must be accessible in full text for in-depth review and analysis. | |
Exclusion criteria | Unrelated fields: Literature that does not belong to the field of NLP or does not focus on semantic parsing techniques will be excluded. |
Keyword mismatch: Documents that do not contain the keywords “semantic parsing”, “semiotic”, “neural” or “semiotic neural” or their related variants will be excluded. or their related variants will be excluded. | |
Lower quality or not widely recognized: Literature from unknown or unreliable sources, or literature that is not widely recognized will be excluded. | |
Unavailability of full text: If the full text of a document is not available, it cannot be reviewed and analyzed in depth and will therefore be excluded |
Modeling/Methodology | Key Ideas | Approaches | Key Problems | Solutions |
---|---|---|---|---|
Parsing Database Queries Using Inductive Logic Programming [28] | Using Inductive Logic Programming to Automate the Parsing of Natural Language Database Query Statements into SQL Statements | CHILL learns search and relational control with inductive logic programming to build parsers. Corpus training, sentence pairing database queries, straight mapping sentences to executable queries. | (1) Dataset size limitation problem (2) Grammar rule readability problem (3) Generalization bias problem (4) Restriction problem for natural language queries | Remote supervision, transfer learning, knowledge transfer or active, continuous and meta-learning techniques [29] are utilized, combined with visual and interactive methods [30], techniques such as cross-validation [31] are employed to reduce bias, and techniques such as contextual and semantic analysis [32] are introduced to enhance natural language understanding. |
Online Learning Algorithms and Representations of Logical Forms [33] | The model uses an online learning algorithm to learn a Relaxed CCG that parses natural language sentences into logical form. | A gradient descent based online learning algorithm trains Relaxed CCG models to transform CCG analysis results into logical forms to support the inference process. | The adopted gradient descent-based online learning algorithm has the risk of falling into a local optimum, the new method is still limited in its ability to improve the processing of complex sentences, and the logical form representation needs to be explored for a more optimal solution. | Explore robust optimization algorithms (e.g., natural gradient, genetic algorithms) [34] with regularization techniques [35] to prevent overfitting; use CCG grammars combined with techniques such as word vectors and attention mechanisms to process complex sentences and introduce external knowledge to enhance inference [36]; and research on efficient logical formal representations such as graph-based neural networks [37], integrating a variety of representations to adapt to different scenarios. |
Dependency-based combinatorial semantics (DCS) semantic parsing model [38] | Using Dependency Structures for Combinatorial Semantic Representation | (1) Syntactic analysis (2) Semantic type and operator assignment (3) Local combinatorial operations | Dependent syntactic analysis errors, complex semantic reasoning challenges, loss of information simplification in the DCS model and accumulation of local combinatorial operation errors with global information missing all affect the performance and accuracy of the method. | More robust dependency syntactic analysis and integration methods are used to reduce errors [39]; graphical models or event semantic representations are explored to deal with complex semantics [40]; semantic representations are augmented to improve model representations and generalization through semantic feature selection [41] and multiple emphasis on graph learning [42]; and global combinatorial and attentional mechanisms [43] are introduced to deal with long-distance dependencies. |
Combinatorial Semantic Parsing of Semi-Structured Tables [44] | It also increases the breadth of knowledge sources and the depth of semantic parsing | Semi-structured tables to knowledge graphs, parsing natural language with graph information, selecting the highest scoring logical form for execution, and obtaining answers. | (1) Data volume limitation (2) Form noise processing (3) Multi-language limitation | (1) Data enhancement [12] (2) Noise processing model [45] (3) Multi-language support [46] |
Modeling/Methodology | Key Ideas | Approaches | Key Problems | Solutions |
---|---|---|---|---|
LCDMSLF [5] | Converting Sentences to Logical Forms via Context-Dependent Mapping | Sentences are parsed using the CCG lexicon and linear models, hidden variable perceptron’s are optimized, approximation searches evaluate the output, and correctness is determined by comparison with the standard. | Knowledge representation limitations, ambiguity disambiguation difficulties, corpus dependency, complex efficiency issues, and unknown vocabulary processing challenges. | Expand the knowledge base for broader linguistic coverage [47], employ contextual cues and machine learning for effective disambiguation [48], and enhance parser generalization with more annotations or advanced techniques like semi-supervised learning [49] and transfer learning [50]. Boost parser efficiency with optimized algorithms, heuristics, or parallel processing [51], and enrich lexical context with speculation and external resources. |
Neural Semantic parsing methods through feedback [7] | Training a Neural Semantic Parser by Interactive Learning with Users | Neural sequence model maps discourse directly to SQL, avoiding intermediate steps, fast online deployment, user feedback, market SQL notes, reducing workload, improving efficiency and accuracy. | Inconsistent user participation, manual interaction dependency, object limitations, insufficient generalization, noisy user feedback. | Enhance user engagement [52], minimize manual interaction costs [53], and expand application to diverse tasks such as Customer Questions & Answers (Q&A) and dialog systems [54]. Improve model generalization through data augmentation and integrating prior knowledge [39], while filtering out noisy feedback using expert evaluation and machine learning classification [55]. |
Rapid construction of a semantic analysis method [56] | Building semantic parsers in new domains with zero training examples | Building optimized grammatical logic to natural language processes, training parsers, exploring impact and cross-domain effects. | (1) Nested quantization problem (2) Mimetic problems | Address nested quantization with complex queries [57], including subqueries, and enhance mimesis with detailed semantic analysis [58] using modifiers and logical operators. |
Deep semantic parsing methods using upper ontologies [59] | Semantic Parsing Using Frame Net Annotation and BERT-Based Upper Ontology for Distributed Representation of Sentence Contexts | Combining WordNet ontology, PropBank role labelling and multi-source corpus, we propose model-annotated semantic role sets for identifying physics engine predicates and parameters in 3D VR simulations. | (1) High demand for training data Question (2) Dependence on prior knowledge problem (3) High demand for computational resources problem | (1) Optimizing annotation e.g., expanding the training dataset through techniques such as semi-supervised learning [60] and automatic annotation [61] (2) Improving the acquisition of domain expertise [62]. (3) Choosing smaller models e.g., Tinybert [63] |
A memory-based approach to semantic parsing [64] | Semantic parsing through memorized contextual semantic information | The model uses a memory matrix to store contextual semantics, a memory controller to manage access, a combination of a discourse phrase encoder to turn inputs into vectors, and a decoder attention mechanism to interact and produce parsed results. | (1) High consumption of computing resources (2) Conversation history length limitation (3) Data sparsity problem (4) Poor interpretability problem | (1) Accelerated computation [65] (2) Processing long sequences such as transformers [11]. (3) Data enhancement and transfer learning [66]. (4) Explanatory enhancement [67]. |
Syntax error search methods using parser combiners [68] | Searching for syntax errors with parser combiners | Enter code, choose check mode: standard or preprocess. Standard catches AST errors; preprocess checks syntax, builds AST. | (1) Large memory consumption problem (2) Can only detect syntax errors problem | (1) Optimization of memory space occupation [69] (2) Static code analysis tools [70] |
Modeling/Methodology | Key Ideas | Approaches | Key Problems | Solutions |
---|---|---|---|---|
QURG [71] | Explicitly handle problem and context dependencies to improve Text-to-SQL system understanding and accurately generate SQL. | Problem Rewriting Completion, Constructing Editing Matrices, Two-Stream Encoder Processing Problem Context, Linking Natural Languages to Databases, and Generating SQL Queries. | Rewrite accuracy affects SQL quality, encoder processing efficiency is limited, and parsing complex contexts is difficult. | Optimal rewriting of models and datasets, improved encoder design for efficiency, and incorporation of domain knowledge and state-of-the-art techniques for model robustness. |
Data reorganization for neural semantic parsing [39] | Data reorganization | Input dataset, generate new examples using context-free grammar model, reorganize samples, then train Seq2Seq RNN model. | Data reorganization faces problems of noise, dependency, increase in training time, and application limitations. | (1) Controlling the quality of data reorganization (2) Determining the optimal parameters for data reorganization [72] (3) Optimizing the model training process [47] (4) Exploring applications in other domains [73] |
Based on the attention-enhanced encoder–decoder semantic parsing model [8] | The attention-enhancing coding and decoding model encodes the corpus as vectors and regulates the vectors to generate logical forms. | The encoder–decoder model incorporates an attention mechanism to convert linguistic sequences into logical form. | High consumption of computational resources, gradient problem, sequence order sensitivity, difficult to rely on in the long term, large amount of data, risk of overfitting, alignment requirements, poor interpretability. | The use of GPU acceleration [74], replacement of RNN models (e.g., GRU [75], Transformer [11]), application of bi-directional LSTMs [76], optimization of attentional mechanisms [77], data augmentation [78], regularization [79], improvement of the alignment method [80] and introduction of interpretable attentional mechanisms [81] |
A neural semantic parsing approach for semi-structured tables with type constraints [6] | Specialized Entity Embedding with Link Module Raw Link Embedding, Type Constrained Syntax Restricted Decoder Actions, Preserving Logical Form Compliance. | Encoding–Decoding Recurrent Networks, Entity Embedding Links, Type-Constrained Decoding, Question–Answer Supervision, Marginal Log Enumeration, Efficient Training. | The parser is limited to semi-structured tabular data and data preprocessing is highly dependent on manual intervention. | (1) Extending the applicability of parsers [82] (2) Automating the data preprocessing process [83] |
GECA [84] | Generating new training samples by recognizing and replacing local phrases allowed in a common context | The data enhancement protocol identifies substitution clauses, co-occurrence as evidence, deletion to form templates, and padding to generate new samples. | Case sensitivity, symbols and punctuation, synonyms and near-synonyms, misspellings and typos, contextual variations | unify the case of all input texts to eliminate case differences [85]; handle symbols and punctuation flexibly according to contextual needs; construct a glossary of synonyms and near-synonyms to extend the matching range; use the Transformer model for context-sensitive spelling correction [86]; and employ a fuzzy matching algorithm [87] to allow for a certain degree of textual differences, thus improving the match’s inclusiveness and robustness. |
Iterative-based dialog segmentation for neural semantic parsing [2] | A new framework for iterative corpus segmentation to enhance the neural semantic analyzer. | Iterative segmentation to extract spans, parser to meaning representation, integration into full meaning. | Statement segmentation accuracy, computational efficiency, long dependencies, model capacity, training stability challenges. | (1) Improve segmentation accuracy using advanced utterance boundary recognition models e.g., using pre-training based language model Generative Pre-trained Transformer 4 (GPT4) [2]. (2) Introducing attention mechanisms [80], computational resource allocation and parallel computing to reduce the computational burden. (3) Use complex RNN structures (e.g., LSTM [88]) to solve the long dependency problem (4) Increasing codec capacity, deep network structure, and residual connectivity [89] to enhance expressive power. (5) Using gradient trimming [90], other optimization algorithms [91] and residual connectivity [89], and jump connectivity to deal with gradient explosion and vanishing problems [92]. |
Syntax-based decoding semantic parser [93] | Collaboration between context modelling and grammar decoding to improve model semantic parsing performance. | Attentional Sequence Modelling, Problem Encoder Syntax Decoding, Textual Problems to SQL. | Limited contextual modelling capabilities, lack of common sense and background knowledge, data sparsity and labelling difficulties, and lexical ambiguity and syntactic difficulties | (1) Using more sophisticated modeling architectures such as Transformer [11]. (2) Introducing external knowledge bases [36] or pre-trained language models (3) Using unsupervised learning [94] or semi-supervised learning methods [60] to utilize unlabeled data for pre-training. (4) Introducing external lexical resources [95], syntactic analyzers, and other aids to help the model better handle lexical ambiguities and syntactic difficulties [96]. |
Global normalized neural models for semantic parsing [97] | Jointly normalize the output space, considering inter-label role dependencies. | The TranX system is based on context-independent grammars, neural networks encoding inputs, autoregressive prediction of grammar rules, normalized probability prediction, and training to maximize overall probability. | (1) High computational complexity (2) Label-dependent modeling is difficult (3) Limited structural expressiveness (4) Data scarcity | Use approximation methods such as pruning and sampling to reduce computation; decompose dependencies through hierarchical modelling and construct models step by step [98]; introduce graph structures [40] or deep networks to enhance structural expressiveness; leverage existing knowledge with the help of transfer learning [99] and expand the training set by combining with data augmentation techniques [100], in order to effectively solve the problem of insufficient data. |
Edit-Based SQL Query Generation for Cross-Domain Context-Related Problems [101] | Editing previous prediction queries to improve generation quality using interaction history. | System discourse-form encoding processes input, turn-taking attention considers history to enhance understanding, and form decoding generates responses. | (1) The problem of incomplete utilization of contextual information. (2) The problem of the decoder’s attention mechanism overly focusing on some of the inputs | 1. Use more powerful context encoders such as models like Transformer [11]. 2. (1) Researchers can introduce multi-layer attention mechanisms [102]. (2) Researchers can incorporate other attention mechanisms such as multiple attention [103]. (3) Researchers can add regularization [79] and control mechanisms. |
Validating the ability of neural dialog systems to effectively utilize dialog history [104] | An empirical approach to testing model sensitivity and exploring dialogue history information use. | Empirically investigating model sensitivity to contextual perturbations and understanding dialogue history use. | Fewer categories of models used for testing | Experiments were conducted using more classes of models such as BERT [12], GPT4 [19], and XLNet [14], and they were compared to previous models. |
A bottom-up approach to neural semantic parsing generation [105] | Bottom-Up Decoding, Lazy Extension Build Candidates, and Minimal Cost Boosting Generalization. | The neural encoder parses the input and generates real-valued vectors, and the neural decoder iterates to generate a graphical representation. | (1) The problem of unordered nary relations (2) The problem of user annotation burden | (1) Dealing with unordered nary relations can be done using heuristic search, pruning techniques, optimization algorithms, parallel computing, optimized data structures, and domain knowledge exploitation. (2) Consider semantic parsing as a text-to-text problem and use fine-tuned large-scale language models (e.g., GPT4) to reduce the user annotation burden. |
Previous Context Related Sentence to Executable Form Query Mapping Learning [106] | Mapping the interaction corpus as a formal query, considering historical information. | Attentional coding and decoding models, combined with historical dialogue generation queries, to update contextual reuse history and improve performance. | (1) Long-term dependency problems (2) High computational resource requirements (3) Lack of parallelism (4) Loss of information | (1) Use of more complex loop units (2) Introduce an attention mechanism [80] (3) Consider the use of other model architectures, such as Transformer [11], in combination with other model architectures. (4) Context cutting and segmentation |
Semantic Parsing of Knowledge-Aware Conversations Based on Web Forms [107] | Improving the performance of semantic parsing by integrating various types of knowledge | Question coding, table coding, controller coordination, column operator value prediction, copying actions, generating parsed statements. | (1) Long dependency problems (2) Limitations on expressiveness (3) Sensitive to input order (4) High number of parameters | The use of more powerful RNN units, introduction of attention mechanisms [80], use of stacked recurrent neural networks [108] (SRNN), parameter sharing, regularization techniques and pruning algorithms to reduce the number of parameters and decrease model complexity [109]. |
RECODE [110] | Subtree retrieval introduces code examples with explicit references to improve neural code generation performance. | Dynamic planning retrieves similar sentences, extracts AST subtrees, aligns modifications, improves the probability of decoding usage, and enhances code generation accuracy and efficiency. | Semantic understanding limitations, ambiguous data sparsity, contextual incoherence, precise abstraction balancing, language dependence, abstraction degree limitations, complex redundancy, lack of contextual information, maintenance difficulties. | Using advanced semantic understanding techniques (semantic role annotation [111], semantic parsing [112]), combination of rules and machine learning [113], attention [80] and context encoding [114], large-scale dataset training (CoNaLa dataset [115]), manual review feedback [116], flexible AST representation [97], semantic information fusion [117], contextual considerations [118], model compression optimization [119] and dynamic update strategies. |
Modeling/Methodology | Key Ideas | Approaches | Key Problems | Solutions |
---|---|---|---|---|
A method for mapping statements in a dialog to logical forms [10] | Mapping dialogue statements into logical forms of sequential actions | Dialogue memory management encoder–decoder structure | Incomplete context symmetry, increased computational memory consumption, strong decoding dependency, limited sequence generation, error propagation, slow training, overdependence on inputs, poor generalization. | The model performance and generation quality are comprehensively improved by introducing an attention mechanism [80], using lightweight RNN variants (simplified GRU or LightGRU [121]), optimizing the decoder [122], fusing prior knowledge [123], bundle search [124], label smoothing [125] and increasing training data [126]. |
Contextual semantic parsing model supported by retrieved data [9] | Retrieving data points for contextual semantic parsing evidence | (1) Retriever (2) Meta-learner | Training is time consuming, memory consuming, prone to overfitting, poor interpretability, gradient problems, sequence length limit, slow. | Enhance model efficiency with faster hardware [127], model downsizing, and lightweight RNNs [121]. Apply regularization [128], dropout [129], data augmentation [100], and model compression [130] for better generalization. Simplify with visualization and clear structures. Optimize gradients with pruning [131] and GRU [75]. Leverage attention [80], transformers [11], BERT [12] for sequence handling. Accelerate training with batching [132] and parallelism [133]. |
Neural Semantic parsing with extremely rich symbolic meaning representation [134] | Introducing lexical ontologies, novel symbols to enhance semantic richness and interpretability of neural parsers. | Design of a neural classification semantic parser with novel symbols to represent predicates, evaluated against traditional methods. | Large language models are not as effective as expected, hierarchical learning is under-explored, categorical coding needs to be optimized, and ontology and similarity measures need to be extended. | (1) Deeply analyze the reasons for model failure and optimize model architecture and training strategies (2) Explore hierarchical learning methods (3) Simplify classification coding (4) Experiment with different ontologies and similarity measures |
Weakly Supervised Reasoning with Neuro-Symbolic Methods [135] | Fusing Neural Networks and Symbolic Logic to Enhance NLP Interpretive Reasoning and Break Deep Learning Black Boxes. | Building Neuro-Symbolic Systems, Integrating Symbolic Explicit Reasoning, and Reinforcement Learning Optimization for Weakly Supervised Reasoning in NLP. | Training is expensive, inference needs to be manually designed, and complex semantic relationships are not handled adequately. | (1) Optimize training algorithms, (2) Explore automatic reasoning pattern discovery techniques, (3) introducing more advanced semantic understanding and multimodal fusion methods |
Neuro-symbolic learning generates logical constraints [136] | Integration of neural networks and symbols, building neural symbol system for weakly supervised intelligent learning, enhancing AI intelligence. | Parallel training networks with logistic, base constraints, differential convex planning for accuracy, trust regions to prevent degradation, stable neural symbol learning. | End-to-end learning is complex, resource-intensive, and logically constrained learning survives degradation. | (1) Optimizing framework design (2) Explore efficient algorithms, (3) Applying trust region tech with language models to reduce cost, improve efficiency, prevent degradation, advance neuro-symbolic |
Models | Acc./Prec./Rec./F1/BLUE | Task |
---|---|---|
LPDQUILP [28] | Acc. is 84%. | In the best experiments, the CHLL-induced parser, which includes 1100 lines of prolog code, achieves 84% accuracy in responding to new queries. |
Dynamic CCG Adaptation for Logical Parsing [33] | Prec., Rec. and F1 were 90.61 and 95.49 percent, 81.92 and 83.2 percent, 85.06 and 88.93 percent, respectively. | Single-Pass Parsing’s Precision, Rec. and F1 for the ATIS test set were 90.61%, 81.92% and 85.06%, respectively. Single-Pass Parsing’s Precision, Rec. and F1 for the Geo880 test set were 95.49%, 83.2% and 88.93%, respectively. |
DCS [38] | Acc. of 87.6 percent and 95 percent. | LJK11w/augmented triggers were 87.6% accurate in the GEO test set.LJK11w/augmented triggers answered questions 95% accurately in the JOBS test set. |
Semantic Parsing of Semi-Structured Tables [44] | 37.1% for Acc. 76.6 percent for oracle | The model has an Acc. of 37.1% on the WIKITABLE-QUESTION test data, compared to 76.6% for oracle. |
LCDMSLF [5] | 95, 96.5 percent, 95.7 percent for Prec., Rec. and F1, respectively and 83.7% for Acc. | The model partially matched Prec., Rec. and F1 on the ATIS DEC94 test set with 95%, 96.5%, and 95.7%, respectively, fully recovering 83.7% of the logical forms. |
Neural Semantic Parsing Methods via Feedback [7] | The accuracy rates were 79.24 percent and 82.5 percent, respectively. | The model was tested on the ATIS and Geo datasets with an accuracy of 79.24% and 82.5%, respectively. |
Rapidly building a semantic analysis method [56] | 56.4% accuracy | The parser in this paper has an accuracy of 56.4% in GEO880. |
A memory-based approach to semantic parsing [64] | The query, strict representation and loose representation accuracies were 45.3 percent, 70.2 percent and 69.8 percent, respectively. Query and interaction accuracies were 28.4 percent and 40.3 percent, 6.2 percent and 16.7 percent, respectively. | The MenCE model using Men-SnipCopy’s codec has ATIS query, strict representation and loose representation accuracies of 45.3%, 70.2% and 69.8%, respectively. Model MenCE + Liu et al. (2020) [93] using Men-Grammar has 28.4% and 6.2% query and interaction accuracies on the CoSQL test set, respectively. Model MenCE + Liu et al. (2020) [93] adopts Men-Grammar with query and interaction accuracies of 40.3% and 16.7% on the SparC test set, respectively. |
Data Restructuring for Neural Semantic Parsing [39] | Acc. of 89.3 percent, 83.3 percent and 77.5 percent. | Model AWP + AE + C2 was tested with 89.3% accuracy on the GEO dataset, model AE + C3 was 83.3% accurate on the ATIS dataset, and model AWP + AE + C2 had an average accuracy of 77.5% on OVERNIGHT. |
Enhanced Encoder–Decoder Semantic Parsing with Attention Mechanism [8] | The accuracy rates were 90 percent, 87.1 percent, and 84.6 percent, respectively. f1 was 74.2 percent. | Model SEQ2TREE has an accuracy of 90%, 87.1%, and 84.6% on the JOBS, GEO, and ATIS test datasets, respectively. Model SEQ2TREE has an F1 of 74.2% on the IFTTT test dataset. |
Neural Semantic Parsing for Semi-Structured Tables under Type Constraints [6] | Acc. 84.2 percent and 84.6 percent, F1 value 84.1 percent and 86 percent | The model in this paper has an accuracy of 42.7% on the WIK-ITABLEQUESTIONS validation set. |
Iterative Dialogue Segmentation for Neural Semantic Parsing [2] | The accuracy rates were 90.7 percent, 85.4 percent and 72.2 percent, respectively. | Model SEQ2SEQ + PDE has an accuracy of 90.7% on the GEO test set. Model SEQ2SEQ + PDE has an accuracy of 85.4% on the FORMULAS test set. Model BASEPARSER2(TRANSFORMER + COPY *) + PDE has an accuracy of 72.2% on COMPLEXWEBQUESTIONS. |
Syntax-based Decoding Semantic Parser [93] | The accuracy of question and interaction matching on the development set was 52.6%, 41%and 29.9%, 14%, respectively. | The accuracies of Ours + BERT for problem and interaction matching on the SparC and CoSOL development sets were 52.6%, 41% and 29.9%, 14%, respectively. |
Global Normalized Neural Models for Semantic Parsing [97] | The accuracy was 83.3 percent and 73.79 percent. the BLUE score was 28.39 percent. | Copy + data recombination has an accuracy of 83.3% on the ATIS test set. Ours(local) + Reranking has a BLUE score of 28.39% on the CoNaLa test set. Ours(local) has an accuracy of 73.79% on the Spider test set. |
Edit-Driven SQL Generation for Cross-Domain Contextual Queries [101] | The accuracy rates were 53.4 percent, 47.9 percent, 25.3 percent, 43.9 percent, 68.5 percent and 68.1 percent. | Model Ours + utterance-table BERT Embedding has an accuracy of 53.4% on the spider test data. Model Ours + query attention and sequence editing (w/gold query) has 47.9% and 25.3% accuracy for question and interaction matching on SParC test set, respectively. The accuracy of this paper’s method is 43.9%, 68.5% and 68.1% for query, strict representation and loose representation on ATIS test dataset, respectively. |
Bottom-up Neural Semantic Parsing Generation [105] | The accuracy rates were 38.3 percent, 81.3 percent, 25.1 percent and 86.4 percent. | The accuracy of LSP + T5-base is 38.3%, 81.3% and 25.1% in ATIS, Geoquery and Scholar. The accuracy of model LSP + LSTM Encoder in Geoquery is 86.4%. |
Learning Sentence-to-Formal Query Mappings [106] | The accuracies were 47.4 ± 1.3, 72.3 ± 0.5 and 72.0 ± 0.5, respectively. | The Full-GOLD model was 47.4 ± 1.3, 72.3 ± 0.5, and 72.0 ± 0.5 for query, strict representation, and relaxed representation accuracies, respectively. |
Knowledge-Aware Conversational Semantic Parsing for Web Forms [107] | The accuracy rates were 45.5 percent, 13.2 percent, 70.3 percent, 42.6 percent and 24.8 percent, respectively. | The accuracies of CAMP + TU + LM for ALL, SEQ, POS 1, POS 2 and POS 3 are 45.5%, 13.2%, 70.3%, 42.6% and 24.8%, respectively. |
RECODE [110] | Acc. and BLUE were 19.6 percent and 72.8 percent, 78.4 percent and 84.7 percent, respectively. | RECODE’s Acc. and BLUE on HS and Django were 19.6% and 72.8%, 78.4% and 84.7%, respectively. |
Dialogue-to-Logical Form Mapping Method [10] | Rec. and Prec. were 25.09%, 0.01%, 19.36% and 12.13%, 0.01% and 17.36%, respectively. Acc. was 21.04%, 20.38% and 45.05%, respectively. | HRED + KVmem, ContxIndp-SP and D2A methods have Rec. and Prec. of 25.09%, 0.01%, 19.36% and 12.13%, 0.01% and 17.36%, respectively in Clarification problem type. HRED + KVmem, ContxIndp-SP and D2A methods are 21.04%, 20.38% and 45.05% accurate in Verification (Boolean) problem type, respectively. |
Retrieval-Augmented Contextual Semantic Parsing [9] | Exact and BLEU were 9.15 percent, 10.5 percent and 23.34 percent, 24.40 percent, respectively. F1 value and Acc. were 16.35 percent, 18.31 percent, 18.9 percent, 18.42 percent, 18.70 percent, 19.12 percent and 21.04 percent 45.05 percent 51.17 percent 47.81 percent 55.00 percent 50.16 percent, respectively. | Seq2Action in the parsing-based approach without retrieved examples is 9.15% and 23.34% in the CONCODE test set for Exact and BLEU, respectively. Seq2Action + MAML (Context-aware Retrieval) in the retrieval example-based parsing method is 10.50% and 24.40% in the CONCODE test set Exact and BLEU, respectively. In the CSQA test set, the models HRED + KVmen, D2A, S2A, S2A+ EditVec, EditVec +RAndE, RAndE +MAML have F1 values in Clarification question type and Acc. in Verification (Boolean) question type are 16.35%, 18.31%, 18.9%, 18.42%, 18.70%, 19.12%, and 21.04% 45.05% 51.17% 47.81% 55.00% 50.16%. |
API | API-Providing Companies | Supported Languages | Interface Type | Application Scenario | Dominance | Drawbacks |
---|---|---|---|---|---|---|
Bosnian data (BosonNLP) | Bosnian data | Mainly supports Chinese | RESTful API, based on the HTTP protocol | E-commerce sentiment analysis, news categories and more. | The system wins the crown for Chinese word segmentation and provides a full range of solutions from basic to advanced text analysis. | Insufficient parsing of system terminology, some advanced features require payment. |
gugudata.com | gugudata.com | Polyglot | RESTful API, support HTTPS protocol | It is suitable for text similarity detection, content recommendation and other scenarios. | The system is analyzed in seconds, the NLP algorithm is semantically accurate, and the service is continuously updated and guaranteed. | The system is pay-per-use and provides customized services for domain text pre-processing. |
Baidu AI Open Platform | Baidu | Polyglot | RESTful API, based on the HTTP protocol. | The system is suitable for NLP tasks such as text classification, lexical analysis, word sense disambiguation, and entity recognition. | Baidu leverages a large corpus and NLP technology to provide rich APIs that simplify developer integration. | API key required to use Baidu NLP, comply with restrictions, complex tasks need to be combined with other technologies. |
Tencent cloud | Tencent | Polyglot | RESTful API interface | Text categorization, entity recognition, sentiment analysis, etc. | Tencent leverages user data and corpus to provide a stable and easy-to-integrate API. | Tencent NLP service premium features are available for a fee, and additional development may be required for specific domain customization. |
Azure | Microsoft corporation | Polyglot | RESTful API interface | Language Comprehension, Text Analysis, Text Generation, etc. | Microsoft provides diverse NLP API services based on global users. | Premium features are available for a fee, in combination with other tools for specific needs. |
Google Cloud NLP API | Google cloud | Polyglot | RESTful API | Entity recognition, sentiment analysis, text classification, etc. | Google NLP technology is powerful and the API interface is rich and easy to use. | Premium features are paid for and limited in some areas. |
IBM Watson Natural Language Understanding | IBM Watson | Polyglot | RESTful API interface | Text analytics, sentiment analysis, entity recognition, etc. | IBM provides rich NLP-related APIs with its deep AI accumulation. | Premium features paid for, additional configuration or development required for specific needs |
Lingju Semantic Understanding API | Spirit Gathering Corporation | Polyglot | HTTP Online Service Interface | Semantic analysis, Q&A systems and voice assistants | Provides semantic understanding analytics and supports private deployment scenarios. | May require paid access or customized services |
LLMs/Dataset | Geoquery | ATIS | Geo880 | JOBS | WIKITABLE-QUESTION | SParC | CoSQL | OVERNINGT | IFTTTQuirk | WIK-ITABLEQUESTIONS | COMPLEXWEBQUESTIONS | FORMULAS | CoNaLa | Spider | CFQ | Geoquery | Scholar | SequentialQA | Hearthstone | Django | CSQA | CONCODE | CSQA | Evaluation Algorithm |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LPDQUILP [28] | Acc. | |||||||||||||||||||||||
Dynamic CCG Adaptation for Logical Parsing [33] | Prec./Rec./F1. | |||||||||||||||||||||||
DCS [38] | Acc. | |||||||||||||||||||||||
Semantic Parsing of Semi-Structured Tables [44] | Acc. | |||||||||||||||||||||||
LCDMSLF [5] | Prec./Rec./F1/Acc. | |||||||||||||||||||||||
Neural Semantic Parsing Methods via Feedback [7] | Acc. | |||||||||||||||||||||||
Rapidly building a semantic analysis method [56] | Acc. | |||||||||||||||||||||||
A memory-based approach to semantic parsing [64] | Acc. | |||||||||||||||||||||||
Data Restructuring for Neural Semantic Parsing [39] | Acc. | |||||||||||||||||||||||
Enhanced Encoder–Decoder Semantic Parsing with Attention Mechanism [8] | Acc./F1 | |||||||||||||||||||||||
Neural Semantic Parsing for Semi-Structured Tables under Type Constraints [6] | Acc. | |||||||||||||||||||||||
Iterative Dialogue Segmentation for Neural Semantic Parsing [2] | Acc. | |||||||||||||||||||||||
Syntax-based Decoding Semantic Parser [93] | Acc. | |||||||||||||||||||||||
Global Normalized Neural Models for Semantic Parsing [97] | Acc./BLUE | |||||||||||||||||||||||
Edit-Driven SQL Generation for Cross-Domain Contextual Queries [101] | Acc. | |||||||||||||||||||||||
Bottom-up Neural Semantic Parsing Generation [105] | Acc. | |||||||||||||||||||||||
Learning Sentence-to-Formal Query Mappings [106] | Acc. | |||||||||||||||||||||||
Knowledge-Aware Conversational Semantic Parsing for Web Forms [107] | Acc. | |||||||||||||||||||||||
RECODE [110] | Acc./BLUE | |||||||||||||||||||||||
Dialogue-to-Logical Form Mapping Method [10] | Prec./Rec./Acc. | |||||||||||||||||||||||
Retrieval-Augmented Contextual Semantic Parsing [9] | BLUE/F1/Acc. | |||||||||||||||||||||||
Size | 250 | 5410 | 880 | 640 | 22,033 | 4298; 12k+ | 30k+, 10k+ | - | 86,960 | 18,496 | - | 37,000 | 2379 | 8695 | - | - | - | 6066 | 665 | 18,805 | 196K | - | - | - |
Source | Qn | telephone call recorder | Ray Moony | Ray Moony | Wikipedia | Yale | WizardofOz Method | Amazon crowdsourcing plat | IFTTT | - | Other | Codes | Other | - | - | - | - | - | - | - | - | - |
Domains | Time/Author | Main Contributions | Limitation | Future Research Directions |
---|---|---|---|---|
Medicine | (2017, Roberts K, et al.) [137] | Fusing rule-based and machine learning, we propose EHR data-to-logic form method to achieve 95.6% precision parsing, enhance system utility, reliability and unknown term processing, and help medical Q&A system. | Semantic parsing research has made a breakthrough, but we need to expand the problem set, improve the applicability and integration of the system, and strengthen the generalization and verify the effect in the future. | Extending the question set, optimizing the semantic model, and improving the generalization power; building an end-to-end Q&A system, verifying in real environment, and facilitating medical information integration and service application. |
Computers | (2023, Espejel J L, et al.) [138] | An overview of deep learning applications in Java code generation, analyzing the advantages and disadvantages, exploring the evaluation, and looking forward to research directions for development. | Java code generation evaluates multiple syntaxes and ignores semantics, the number of high references, resource requirements limit the application, and time-consuming training becomes a challenge. | Developing semantic evaluation metrics, researching efficient models, optimizing training, cross-language generation, and driving innovation in programming and software development. |
(2021, Lee C, et al.) [23] | An overview of NLP applications in software engineering, exploring advances in natural language to programming language conversion and deep semantic parsing techniques. | Semantic parsing evaluation methods are limited, do not adequately assess deep semantics, and are prone to misclassification; the lack of unified code semantics hinders cross-language understanding. | In the future, a semantic inference evaluator needs to be developed to unify the semantic representation of the code and extend the semantic parsing to optimize the user interaction and improve the overall semantic parsing capability. | |
(2021, Shin J, et al.) [139] | This paper provides an in-depth overview of natural language programming (NLPG), explores natural language to source code conversion, and proposes customized models, optimized representations aiming to improve code generation accuracy and efficiency and enhance programming capabilities. | Automatic code generation research needs to balance naturalness, adaptability, and completeness, and although machine learning helps, the limitations of existing methods make it difficult to satisfy them all. | Deep learning combined with statistical modelling, pre-training corpus fine-tuning, optimizing automated code, improving naturalness and adaptability, and driving technological innovation. | |
Cross-application areas of NLP and Database Management (DBMS) | (2022, Qin B, et al.) [140] | This paper reviews advances in deep learning for text-to-SQL parsing, covering datasets, pre-training model approaches, pointing out challenges, exploring future directions, and informing research. | Insufficient datasets limit scenario performance; optimization challenges in pre-training techniques; insufficient model capacity to handle complex queries. | Text-to-SQL research focuses on diverse data, pre-training techniques, cross-language parsing, and combining domain knowledge to broaden applications and promote technology development. |
(2022, Deng N, et al.) [141] | This paper reviews Text-to-SQL research, covering datasets, methods, and evaluations, summarizing challenges and strategies, pointing out shortcomings, exploring future directions, and stimulating research interests. | Text-to-SQL system benchmarks are superior, cross-domain real-world application performance drops, robustness to be improved, and noise impact. | Text-to-SQL research should be cross-domain, mention robustness, mine cue learning, innovate evaluation, and promote practical application of the technology. | |
(2021, Ahkouk K et al.) [142] | In this paper, we propose the NLIDB framework to transform natural language to SQL to simplify database access for non-technical users, broaden the scope of use, and optimize the experience. | Current frameworks turn SQL to non-technical users, but complex query processing is weak, input sensitivity, poor generalization, affecting accuracy and robustness. | The research should improve the framework NLP processing, robustness and accuracy, explore NLP technology, optimize user experience and promote efficient database services. | |
Economics | (2024, Noor S.) [143] | This paper analyzes the reasons for the slowness of Knowledge Base Question Answering (KBQA) research, points out the pattern and factual complexity, discusses the limitations of existing methods, and highlights the underutilization of semantic parsing results. | The SPICE dataset contributes dialogue semantic parsing but is limited in size, coverage, representation, and lacks baseline model evaluation. | Research should extend SPICE scale coverage, delve into linguistic phenomena, cross-language applications, develop advanced models, and enhance dialogue comprehension interactions. |
(2020, Fu B., et al.) [54] | KBQA progresses on complex Chinese problems, combines information retrieval and neural semantic parsing, looks at future directions, and demonstrates team results. | KBQA systems deal with complex problems and face interpretive, reasoning, knowledge base data and large-scale real-time efficiency challenges. | KBQA research will improve interpretability, enhance question handling, expand the knowledge base, optimize queries, explore multimodality and provide accurate answers. | |
Industry | (2023, Vougiouklis P, et al.) [144] | This paper introduces the FastRAT model, which significantly improves Text-to-SQL decoding speed and cross-language performance with a frameless decoder and multilingual pre-training. | The FastRAT model decodes effectively, but the deterministic design limits SQL coverage, some queries are difficult to decode, and the baseline assessment may be incomplete. | The research will extend FastRAT to support full SQL, enhance decoding, introduce baseline comprehensive evaluation, and improve cross-linguistic capabilities. |
(2024, Zhang W, et al.) [145] | The ECLIPSE system fuses template matching with a large language model to optimize cross-language industrial log parsing, launching the ECLIPSE-BENCH benchmark. | ECLIPSE Encounters Extreme Logging, Large-Scale Data Challenges, Cross-Language Capabilities Limited by Language Model Support. | The research will explore advanced algorithmic models, optimize index storage, extend cross-language, and validate ECLIPSE industrial utility. | |
(2023, Yang Y, et al.) [146] | The LogParser framework applies deep learning to in industrial ICT systems (IICTSs) log parsing, significantly improving accuracy and efficiency. It provides support for safe and reliable production. | LogParser is expensive to train, requires customization of specific logs, regular updates to adapt to changes, and copes with large-scale diversity. | The research will optimize deep learning models, improve generalization, incorporate tools to enhance analysis, automate updates and extend LogParser applications. | |
Agriculture | (2024, Yuan W, et al.) [147] | The AMR-OPO framework combines the BERT-BiLSTM-ATT-CRF-OPO model to transform the agricultural user language into a triad to enhance the interaction and experience of agricultural measurement and control systems. | The results of this research are remarkable, but the data scale, cross-domain validation, and real-time performance need to be optimized for in-depth research. | The research will explore the AMR-OPO cross-domain, optimize algorithms to improve real-time, iterate on user experience, and expand the diversity of data sizes. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, P.; Cai, X. A Survey of Semantic Parsing Techniques. Symmetry 2024, 16, 1201. https://doi.org/10.3390/sym16091201
Jiang P, Cai X. A Survey of Semantic Parsing Techniques. Symmetry. 2024; 16(9):1201. https://doi.org/10.3390/sym16091201
Chicago/Turabian StyleJiang, Peng, and Xiaodong Cai. 2024. "A Survey of Semantic Parsing Techniques" Symmetry 16, no. 9: 1201. https://doi.org/10.3390/sym16091201
APA StyleJiang, P., & Cai, X. (2024). A Survey of Semantic Parsing Techniques. Symmetry, 16(9), 1201. https://doi.org/10.3390/sym16091201