Jump to content

User:BjornKoemans/NLP for Requirements Engineering

From Wikipedia, the free encyclopedia

Natural Language Processing, or NLP, can be utilised in the field of Requirements Engineering to extract information from unstructured Natural Language (NL). NLP techniques can be used throughout almost all requirements engineering phases, from requirement elicitation to requirements management. There exists a range of NLP tools that have been developed that incorporate NLP tasks or techniques to support requirements engineering tasks. For example, NLP can be used to help identify customer needs during requirement elicitation by processing interviews and feedback in NL and can also be used to formalise ambiguous requirements written in NL to prevent conflicts.

Natural Language Processing (NLP)[edit]

Natural Language Processing, is an interdisciplinary subfield that combines the expertise of linguistics, computer science, and artificial intelligence, in order to examine the interactions between human language and computers, with a specific focus on developing algorithms and techniques for computers to process and analyze vast quantities of natural language data. NLP can be used for a wide range of applications, including language translation, question answering, sentiment analysis, topic modelling, text generation and text summarisation. NLP relies on computational models and can be combined with other technologies such as speech recognition to process natural language.

Usage of NLP in RE[edit]

Requirements Engineering (RE) is a critical step in the software development process, as it helps to ensure that the product meets the customer's demand. By using requirements during software development, the characteristics that the system must possess to fulfil the needs of stakeholders are formally specified. RE is a natural language-intensive field and takes various natural language artefacts into account during the RE lifecycle, such as requirements documents, user stories, reviews, product descriptions, privacy policies, etcetera. Due to requirements being written in NL, they are easy to read, write and understand by the stakeholders involved, also when those have little to no experience with RE. However, NL also introduces room for ambiguity and informal representations, which could result in communication errors and deviations in implementations. Additionally, large amounts of NL are hard to process manually. To support RE tasks which involve NL, NLP tasks and tools can be deployed. [1]

NLP can be used for automating the analysis of natural language artefacts and NLP can be used to automate and simplify the requirements engineering process by analysing natural language text and extracting useful information. Therefore, NLP can become very useful in supporting RE tasks in general and to make RE tasks more efficient and less error prone.

NLP has multiple use cases in RE, such as specifying requirements. NLP techniques are utilised to extract and formulate requirements in a specified format from the natural language texts at hand. Moreover, NLP can also be used to validate requirements for consistency and completeness, meaning that requirements that are too generic or specific, requirements that do not conform to a format or requirements that are inconsistent or conflicting can be identified and possibly also automatically improved. NLP also has certain use cases within the latter RE phases such as managing requirements, which covers tracing and monitoring the requirements. Using NLP implementation, requirements can automatically be monitored and traced according to natural language on how the project progresses and requirements change over time.[2]

RE activities[edit]

The main objective of NLP for RE is to support requirement engineers to perform various RE activities that involve processing and analysing natural language requirements documents. The RE activities that are targeted and supported by NLP are detection, extraction, modelling, tracing and relating, classification, and search and retrieval. These RE tasks are executed separately in different phases of the RE lifecycle.

List of 139 NLP-based tasks extracted by Zhao et al.

Detection occurs in the requirement analysis phase of the RE lifecycle, extraction is done for requirements elicitation, tracing and relating corresponds to requirement management, modelling is performed within the requirements design phase and classification is covered throughout the entire RE lifecycle. During all activities, various NLP tasks and tools can be utilised to process natural language that is generated by communication with customers, such as customer interviews, prototyping, workshops and more requirements elicitation techniques. Using tools such as Standford CoreNLP, natural language from interviews with stakeholders for example can easily be processed, resulting in a range of possible data, such as tokens, linguistic annotations, named entities, linguistic dependencies and relations.[3]

NLP tasks for RE activities[edit]

NLP-based tasks are small preprocessing steps within the overall pipeline of NLP. There are various NLP-based tasks that can be used during RE activities. Zhao et.al. extracted a total of 139 tasks from various studies, the ten most freqeuently used NLP tasks are shown in the table below.[4] Thereby, every task mentioned in the table below are applicable to all of the six RE activities. The entire list of all 139 NLP tasks can be found in the image aside.

No. NLP task Description
1 POS Tagging Part-of-speech tagging (POS tagging or PoS tagging or POST) is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context.
2 Tokenization Lexical analysis, also called lexical analysis or lexing, is the process of converting a sequence of characters into a sequence of lexical tokens (strings with an assigned and thus identified meaning)
3 Parsing Parsing is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar.
4 Stop-Words Removal Stop word are words which are filtered out (i.e. stopped) before or after processing of natural language data (text) because they are insignificant. By removing the insignificant words, the focus will be laid on the words containing important information.[5]
5 Term Extraction Terminology extraction, also called term extraction, automatically extract relevant terms from a given language artefact.
6 Stemming Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form.
7 Lemmatization Lemmatisation(or lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.
8 Similarity Measures Similarity measures compute the similarity between concepts/terms included in natural language artefacts in order to perform estimations.[6] These measures can be used right after the convertion of the terms included in the natural language artefacts into a machine-readable format.[7]
9 TF-IDF tf–idf, also called TF*IDF, TFIDF, tf-idf, or Tf–idf, is short for term frequency–inverse document frequency and is a numerical statistic that is intended to reflect how important a word is to a document in a collection or language artefacts.
10 Sentence Splitting Sentence splitting is a NLP technique where the text of a natural language artefact can be splitted to sentences by ‘.’ or ‘/n’ characters.[8]

NLP tools for RE activities[edit]

NLP tools are software systems or software libraries which can combine several NLP-based tasks together in order to serve a certain NLP goal.[9] There are various NLP tools that can be used for several RE activities, the ten most freqeuently used NLP tools according to the study of Zhao et.al. are shown in the table below.[10]

No. NLP tools Detection Extraction Modelling Tracing & Relating Classification Search & Retrieval Description
1 Stanford CoreNLP x x x x x x Natural language processing in Java and enables users to derive linguistic annotations for text.[11]
2 GATE x x x x x x Open source software toolkit, specifically for information extraction.
3 NLTK x x x x x x Platform for python programs to work with natural language.
4 Apache OpenNLP x x x x x x Machine learning toolkit with most of the standard NLP tasks such as language detection, tokenisation, sentence segmentation and part-of-speech tagging.
5 WEKA x x x x x Broader tool for data analysis and predictive modeling and can be used for text mining and classification tasks.
6 SpaCy x x x x x SpaCy is a library for advanced NLP and features neural network models mainly for part-of-speech tagging and text categorisation.
7 Genia Tagger x x x Tagger for outputting base form text, named entity tags and is specifically tuned for biomedical texts.
8 WMATRIX x x x Software tool for corpus analysis and comparison.
9 LingPipe Toolkit x x Tool for processing texts using computation linguistics, used for finding names and objects and correcting spelling.
10 Trigrams 'n' Tags POS Tagger x x x System for very efficient statistical part-of-speech tagging.
11 RapidMiner x x x x Broad data science platform, which can be used for sentiment analysis.
12 Scikit-learn x x Python machine learning library that can be used for text classification and processing.
13 Sketch Engine x x x Software for corpus management and text analysis.
14 Gensim x x Library for unsupervised topic modelling, document indexing and retrieval etc.

References[edit]

  1. ^ Dalpiaz, Fabiano; Ferrari, Alessio; Franch, Xavier; Palomares, Cristina (2018). "Natural Language Processing for Requirements Engineering: The Best Is Yet to Come". IEEE Software. 35 (5): 115–119. doi:10.1109/ms.2018.3571242. ISSN 0740-7459.
  2. ^ Ferrari, Alessio; Zhao, Liping; Alhoshan, Waad (2021). "NLP for Requirements Engineering: Tasks, Techniques, Tools, and Technologies". 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE. doi:10.1109/icse-companion52605.2021.00137.
  3. ^ Zhao, Liping; Alhoshan, Waad; Ferrari, Alessio; Letsholo, Keletso J.; Ajagbe, Muideen A.; Chioasca, Erol-Valeriu; Batista-Navarro, Riza T. (2021-04-17). "Natural Language Processing for Requirements Engineering". ACM Computing Surveys. 54 (3): 1–41. doi:10.1145/3444689. ISSN 0360-0300.
  4. ^ Zhao, Liping; Alhoshan, Waad; Ferrari, Alessio; Letsholo, Keletso J.; Ajagbe, Muideen A.; Chioasca, Erol-Valeriu; Batista-Navarro, Riza T. (2021-04-17). "Natural Language Processing for Requirements Engineering". ACM Computing Surveys. 54 (3): 1–41. doi:10.1145/3444689. ISSN 0360-0300.
  5. ^ Khanna, Chetna (2021-02-10). "Text pre-processing: Stop words removal using different libraries". Medium. Retrieved 2023-01-24.
  6. ^ Slimani, Thabet (2013-10-18). "Description and Evaluation of Semantic Similarity Measures Approaches". International Journal of Computer Applications. 80 (10): 25–33. doi:10.5120/13897-1851.
  7. ^ Briggs, James (2021-09-02). "Similarity Metrics in NLP". Medium. Retrieved 2023-01-24.
  8. ^ Beladev, Moran (2021-08-01). "NLP: Splitting Text into Sentences". Medium. Retrieved 2023-01-24.
  9. ^ Alzayed, Assad; Al-Hunaiyyan, Ahmed (2021). "A Bird's Eye View of Natural Language Processing and Requirements Engineering". International Journal of Advanced Computer Science and Applications. 12 (5). doi:10.14569/ijacsa.2021.0120512. ISSN 2156-5570.
  10. ^ Zhao, Liping; Alhoshan, Waad; Ferrari, Alessio; Letsholo, Keletso J.; Ajagbe, Muideen A.; Chioasca, Erol-Valeriu; Batista-Navarro, Riza T. (2021-04-17). "Natural Language Processing for Requirements Engineering". ACM Computing Surveys. 54 (3): 1–41. doi:10.1145/3444689. ISSN 0360-0300.
  11. ^ Manning, Christopher; Surdeanu, Mihai; Bauer, John; Finkel, Jenny; Bethard, Steven; McClosky, David (2014). "The Stanford CoreNLP Natural Language Processing Toolkit". Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg, PA, USA: Association for Computational Linguistics. doi:10.3115/v1/p14-5010.

See Also[edit]