Automated Conditional Statements Checking for Complete Natural Language Requirements Specification
Abstract
:1. Introduction
2. Related Work
3. The Approach
3.1. Conditional Statements Extraction
3.2. Conditional Statements Clustering
3.2.1. The Embedding-Based Clustering
- Tokenization: break down statements into words;
- Part of Speech (POS) tagging: label words with known lexical categories;
- Words selection: keep only the verbs, nouns, adjectives;
- Stop words removal: remove commonly used words;
- Lemmatization: make the words to general form.
3.2.2. The Subject-Based Grouping
3.2.3. Compound Statement Splitting
- Shared subjects: It is composed of two or more independent conditional clauses connected by coordinating conjunctions. However, for the second and subsequent conditional clauses, the subjects are omitted;
- Non-shared subjects: It comprises two or more independent conditional clauses connected by coordinating conjunctions. Each independent conditional clause has its own subject.
3.3. Requirements Incompleteness Detection
3.3.1. The Negative Particles-Based Detection
3.3.2. The Antonyms-Based Detection
4. Empirical Evaluation
4.1. Research Questions(RQs)
4.2. Study Design
4.2.1. The Datasets
4.2.2. The Implementation
4.2.3. The Metrics
4.3. Descriptions of the Studies
4.3.1. Study I: Selection of Sentence Embedding Methods
- Word2vec+TF-IDF: Word2Vec [44] maps each word in the dataset into a vector, while TF-IDF [45] is a method which can generate a score for each word. In this case, the weighted representations of the conditional statements can be generated by combining the approaches of Word2Vec and TF-IDF. The implementations of Word2Vec and TF-IDF in Gensim library are used. The Word2Vec is a pre-trained model from Google [46].
- Doc2Vec: Doc2Vec [47] is an unsupervised algorithm that can learn a fixed-length feature representation from sentences. The implementation of Doc2Vec in Gensim library is used.
- Bert: Bert [48] is an unsupervised algorithm, and a deeply bidirectional system for pre-training NLP. The implementation of Bert in Anaconda is used.
- Skip-Thought: Skip-Thought is an unsupervised algorithm which provides a general distributed sentence encoder. We downloaded the source code of Skip-Thought from GitHub [49], and used the encoder of the pre-trained model to generate the representations of conditional statements. When using skip-thought, the required python version is 2.7.
4.3.2. Study II: Selection of Clustering Methods
- K-Means: K-Means is the commonly used clustering algorithm based on Euclidean distance. We use the K-Means in the Scikit-learn. The number of clusters is set to 12 according to the sum of the squared errors (SSE).
- Spectral Clustering: spectral clustering [51] is based on graph theory. We use the implementation of spectral clustering provided by Scikit-learn. The number of clusters is also set to 12.
- Agglomerative Clustering: agglomerative clustering [52] is a hierarchical clustering approach. We use the implementation of agglomerative clustering provided by Scikit-learn. The number of clusters is also set to 12.
- DBSCAN: DBSCAN [53] is a density-based clustering approach. The number of clusters is not required for clustering. We use the implementation of DBSCAN provided by Scikit-learn.
- Affinity Propagation: affinity propagation [54] is a graph-based clustering method, which does not require specifying the number of clusters for clustering. We use the implementation of affinity propagation provided by Scikit-learn.
4.3.3. Study III: The Performance of the Proposed Approach
4.4. Results and Analysis
4.4.1. RQ1: What Is the Optimal Solution for Each Machine Learning Module?
4.4.2. RQ2: How Does the Performance of Our Approach Perform?
- Template non-conformance: The proposed approach finds the opposite conditions by using the negative particles and the antonyms. This means that when the conditional statements have not used the negative particles and antonyms, the opposite conditions will not be found. For example, for the conditional statement “When there are GPS events to report”, it is difficult for our approach to find its opposite “When there are none GPS events”.
- Unknown words: We found that there are a lot of proper nouns used in the requirements specification. In most cases, they appear in the form of abbreviations. For a part of the speech tagging of these words, low accuracy is often achieved. This will affect the grouping of the conditional statements.
- Incorrect keywords: When specifying requirements, the analysts may use some words to describe some objects such as a button. For example, for the conditional statement “When the screen prompts ‘DORMANCY LIGHT: ON’”, the word of DORMANCY here refers to a light and not a state. In this case, it is incorrect to detect the incompleteness by finding the antonym of this word. However, our approach cannot identify these cases and this results in some wrongly reported detections.
- Ill-formed sentences: After clustering the conditional statements, the proposed approach groups the conditional statements according to the subjects of the statements. However, in practice, the subjects may be missing and the statements are not complete. For example, for the conditional statement “When transmitted”, it is obvious that the subject is missing. This will lower the precision of the proposed approach.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Brooks. No Silver Bullet Essence and Accidents of Software Engineering. Computer 1987, 20, 10–19. [Google Scholar] [CrossRef]
- Davies, J.; Woodcock, J. Using Z: Specification, Refinement, and Proof; Prentice Hall International: Hoboken, NJ, USA, 1996. [Google Scholar]
- Peterson, J.L. Petri Nets. ACM Comput. Surv. 1977, 9, 223–252. [Google Scholar] [CrossRef]
- Group, O.M. UML Resource Page. 2021. Available online: http://www.uml.org/ (accessed on 26 March 2021).
- Holt, J. UML for Systems Engineering: Watching the Wheels; IET: London, UK, 2004; Volume 4. [Google Scholar]
- Group, O.M. Official OMG SysML Site. 2021. Available online: http://www.omgsysml.org/ (accessed on 27 March 2021).
- Alexander, I.F.; Maiden, N. Scenarios, Stories, Use Cases: Through the Systems Development Life-Cycle, 1st ed.; Wiley Publishing: Hoboken, NJ, USA, 2004. [Google Scholar]
- Alexander, I.F.; Beus-Dukic, L. Discovering Requirements: How to Specify Products and Services; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Luisa, M.; Mariangela, F.; Pierluigi, N.I. Market Research for Requirements Analysis Using Linguistic Tools. Requir. Eng. 2004, 9, 40–56. [Google Scholar] [CrossRef]
- Da Silva, A.R.; Savić, D. Linguistic Patterns and Linguistic Styles for Requirements Specification: Focus on Data Entities. Appl. Sci. 2021, 11, 4119. [Google Scholar] [CrossRef]
- Schwitter, R. Controlled Natural Languages for Knowledge Representation. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, 23–27 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2010; pp. 1113–1121. [Google Scholar]
- De Roeck, A.; Willis, A.; Nuseibeh, B.; Chantree, F. Identifying Nocuous Ambiguities in Natural Language Requirements. In Proceedings of the 14th IEEE International Requirements Engineering Conference, Minneapolis, MN, USA, 11–15 September 2006; Computer Society: Los Alamitos, CA, USA, 2006; pp. 59–68. [Google Scholar] [CrossRef]
- Gervasi, V.; Zowghi, D. Reasoning about Inconsistencies in Natural Language Requirements. ACM Trans. Softw. Eng. Methodol. 2005, 14, 277–330. [Google Scholar] [CrossRef]
- Arora, C.; Sabetzadeh, M.; Briand, L.; Zimmer, F. Automated Checking of Conformance to Requirements Templates Using Natural Language Processing. IEEE Trans. Softw. Eng. 2015, 41, 944–968. [Google Scholar] [CrossRef]
- Arora, C.; Sabetzadeh, M.; Briand, L.; Zimmer, F. Automated Extraction and Clustering of Requirements Glossary Terms. IEEE Trans. Softw. Eng. 2017, 43, 918–945. [Google Scholar] [CrossRef] [Green Version]
- Misra, J. Terminological inconsistency analysis of natural language requirements. Inf. Softw. Technol. 2016, 74, 183–193. [Google Scholar] [CrossRef]
- Moitra, A.; Siu, K.; Crapo, A.; Chamarthi, H.; Durling, M.; Li, M.; Yu, H.; Manolios, P.; Meiners, M. Towards Development of Complete and Conflict-Free Requirements. In Proceedings of the 2018 IEEE 26th International Requirements Engineering Conference (RE), Banff, AB, Canada, 20–24 August 2018; pp. 286–296. [Google Scholar] [CrossRef]
- Filipovikj, P.; Rodriguez-Navas, G.; Nyberg, M.; Seceleanu, C. SMT-Based Consistency Analysis of Industrial Systems Requirements. In Proceedings of the Symposium on Applied Computing, Maracas, Morocco, 4–6 April 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 1272–1279. [Google Scholar] [CrossRef]
- Filipovikj, P.; Nyberg, M.; Rodriguez-Navas, G. Reassessing the pattern-based approach for formalizing requirements in the automotive domain. In Proceedings of the 2014 IEEE 22nd International Requirements Engineering Conference (RE), Karlskrona, Sweden, 25–29 August 2014; pp. 444–450. [Google Scholar] [CrossRef]
- Kim, M.; Park, S.; Sugumaran, V.; Yang, H. Managing requirements conflicts in software product lines: A goal and scenario based approach. Data Knowl. Eng. 2007, 61, 417–432. [Google Scholar] [CrossRef]
- Moser, T.; Winkler, D.; Heindl, M.; Biffl, S. Automating the Detection of Complex Semantic Conflicts between Software Requirements. In Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering, Miami Beach, FL, USA, 7–6 July 2011. [Google Scholar]
- Viana, T.; Zisman, A.; Bandara, A.K. Identifying Conflicting Requirements in Systems of Systems. In Proceedings of the 2017 IEEE 25th International Requirements Engineering Conference (RE), Lisbon, Portugal, 4–8 September 2017; pp. 436–441. [Google Scholar] [CrossRef]
- Pohl, K. Requirements Engineering Fundamentals: A Study Guide for the Certified Professional for Requirements Engineering Exam-Foundation Level-IREB Compliant; Rocky Nook, Inc.: San Rafael, CA, USA, 2016. [Google Scholar]
- Mavin, A.; Wilkinson, P.; Harwood, A.; Novak, M. Easy Approach to Requirements Syntax (EARS). In Proceedings of the 2009 17th IEEE International Requirements Engineering Conference, Atlanta, GA, USA, 31 August–4 September 2009; pp. 317–322. [Google Scholar] [CrossRef]
- Mavin, A.; Wilkinson, P. Big Ears (The Return of “Easy Approach to Requirements Engineering”). In Proceedings of the 2010 18th IEEE International Requirements Engineering Conference, Sydney, Australia, 27 September–1 October 2010; pp. 277–282. [Google Scholar] [CrossRef]
- Yang, H.; De Roeck, A.; Gervasi, V.; Willis, A.; Nuseibeh, B. Analysing Anaphoric Ambiguity in Natural Language Requirements. Requir. Eng. 2011, 16, 163. [Google Scholar] [CrossRef] [Green Version]
- Ferrari, A.; Gnesi, S. Using collective intelligence to detect pragmatic ambiguities. In Proceedings of the 2012 20th IEEE International Requirements Engineering Conference (RE), Chicago, IL, USA, 24–28 September 2012; pp. 191–200. [Google Scholar] [CrossRef]
- Efremov, A.; Gaydamaka, K. Incose guide for writing requirements. Translation experience, adaptation perspectives. In Proceedings of the CEUR Workshop Proceedings, Como, Italy, 9–11 September 2019; pp. 164–178. [Google Scholar]
- Top Engineering Tips: How to Write an Exceptionally Clear Requirements Document. White Paper. QRA Corp. 2018. Available online: https://www.qracorp.com/write-clear-requirements-document/ (accessed on 17 August 2021).
- Anderberg, M.R. Guide for Writing Requirements; INCOSE Publications Office: San Diego, CA, USA, 2019; Volume 3. [Google Scholar]
- Automating the INCOSE Guide for Writing Requirements; White Paper; QRA Corp: Halifax, NS, Canada, 2019.
- RAT—AUTHORING Tools. 2021. Available online: https://www.reusecompany.com/rat-authoring-tools (accessed on 16 August 2021).
- QVscribe. 2021. Available online: https://qracorp.com/qvscribe/ (accessed on 15 August 2021).
- IBM Engineering Requirements Quality Assistant. 2021. Available online: www.ibm.com/products/requirements-quality-assistant (accessed on 16 August 2021).
- Thompson, S.A. A Discourse Explanation for the Cross-linguistic Differences in the Grammar of Interrogation and Negation. Case Typology Gramm. Honor. Barry J. Blake 1998, 309, 341. [Google Scholar]
- Kiros, R.; Zhu, Y.; Salakhutdinov, R.; Zemel, R.S.; Torralba, A.; Urtasun, R.; Fidler, S. Skip-Thought Vectors. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–10 December 2015. [Google Scholar]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python; O’Reilly Media, Inc.: Newton, MA, USA, 2009. [Google Scholar]
- Oxford, O.E. Oxford English Dictionary; Oxford University Press: Oxford, UK, 2009. [Google Scholar]
- Cornog, M.W. The Merriam-Webster Dictionary of Synonyms and Antonyms. Springfield, MA: Merriam-Webster; Merriam-Webster, Inc.: Springfield, MA, USA, 1992. [Google Scholar]
- Anaconda Individual Edition. 2021. Available online: https://www.anaconda.com (accessed on 19 February 2021).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Řehůřek, R.; Sojka, P. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, 22 May 2010; ELRA: Valletta, Malta, 2010; pp. 45–50. [Google Scholar]
- Macqueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 21 June–18 July 1967; Volume 1, pp. 281–297. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Rajaraman, A.; Ullman, J.D. Mining of Massive Datasets; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Google Code: Word2vec. 2013. Available online: https://code.google.com/archive/p/word2vec/ (accessed on 18 February 2021).
- Le, Q.V.; Mikolov, T. Distributed Representations of Sentences and Documents. arXiv 2014, arXiv:1405.4053. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Zhu, Y.; Kiros, R.; Zemel, R.; Salakhutdinov, R.; Urtasun, R.; Torralba, A.; Fidler, S. Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books. arXiv 2015, arXiv:1506.06724. [Google Scholar]
- Rand, W.M. Objective Criteria for the Evaluation of Clustering Methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
- Von Luxburg, U. A Tutorial on Spectral Clustering. arXiv 2007, arXiv:0711.0189. [Google Scholar] [CrossRef]
- Anderberg, M.R. Cluster Analysis for Applications: Probability and Mathematical Statistics: A Series of Monographs and Textbooks; Academic Press: Cambridge, MA, USA, 2014; Volume 19. [Google Scholar]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Frey, B.J.; Dueck, D. Clustering by Passing Messages between Data Points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Buckland, M.; Gey, F. The Relationship between Recall and Precision. J. Am. Soc. Inf. Sci. 1994, 45, 12–19. [Google Scholar] [CrossRef]
hide-appear | build-destroy | complicated-regulation |
abide-violate | enhance-alleviates | approval-prohibition |
nadir-zenith | ascends-down | mandatory-optional |
forbid-permit | mutual-separate | mobile-immobile |
permanent-temporary | contrary-similar | interrupt-continue |
lateral-vertical | decode-encode | incongruity-compatibility |
disappear-appear | dismantle-construction | direct-indirect |
activate-failure | continue-interrupt | relieve-enhanced |
accurate-inaccurate | divestiture-restore | agree-disagree |
valid-invalid | erasing-preserve | disadvantage-advantage |
accept-decline | huddle-disperse | invalidation-efficacious |
opacity-transparent | fall-rise | acquire-bereavement |
approve-disapprove | legitimate-illegitimate | propagate-restrain |
air-ground | hinder-unobstructed | progress-regress |
fasten-unfasten | impede-expedite | fix-replace |
abate-aggravation | excess-lack | excess-lack |
equal-unequal | partial-entire | caution-indiscretion |
forward-backward | exit-entrance | retract-confirm |
internal-external | track-lose | inadequacy-abundance |
deliver-collect | lock-unlock | sporadic-frequently |
disobey-obey | empty-fill | horizontal-vertical |
gargantuan-negligible | discover-miss | ... |
Tool | Versions |
---|---|
Anaconda | 4.9.2 |
Python | 3.8.5 |
Scikit-learn | 0.23.2 |
Gensim | 3.8.3 |
NLTK | 3.5 |
Rand Index | Mutual Information | |
---|---|---|
Word2vec+TF-IDF | 0.0693 | 0.6185 |
Doc2vec | 0.0704 | 0.6382 |
Bert | 0.1197 | 0.6944 |
Skip-Thought | 0.1265 | 0.7334 |
Rand Index | Mutual Information | |
---|---|---|
K-Means | 0.1265 | 0.7334 |
Spectral Clustering | 0.0658 | 0.6814 |
Agglomerative Clustering | 0.1215 | 0.7331 |
DBSCAN | 0.1979 | 0.8519 |
Affinity Propagation | 0.3107 | 0.8667 |
Precision | Recall | F1 | |
---|---|---|---|
Our approach | 0.4231 | 0.6875 | 0.5238 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, C.; Zhao, Z.; Zhang, L.; Li, Z. Automated Conditional Statements Checking for Complete Natural Language Requirements Specification. Appl. Sci. 2021, 11, 7892. https://doi.org/10.3390/app11177892
Liu C, Zhao Z, Zhang L, Li Z. Automated Conditional Statements Checking for Complete Natural Language Requirements Specification. Applied Sciences. 2021; 11(17):7892. https://doi.org/10.3390/app11177892
Chicago/Turabian StyleLiu, Chun, Zhengyi Zhao, Lei Zhang, and Zheng Li. 2021. "Automated Conditional Statements Checking for Complete Natural Language Requirements Specification" Applied Sciences 11, no. 17: 7892. https://doi.org/10.3390/app11177892
APA StyleLiu, C., Zhao, Z., Zhang, L., & Li, Z. (2021). Automated Conditional Statements Checking for Complete Natural Language Requirements Specification. Applied Sciences, 11(17), 7892. https://doi.org/10.3390/app11177892