BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions
Abstract
:1. Introduction
2. Related Works
- Traditional group brainstorming: This is the classic process where a group of people physically gather to generate ideas interactively and collaboratively.
- Virtual group brainstorming: This method uses digital tools such as online brainstorming software to allow participants to contribute remotely in real time or asynchronously.
- Individual electronic brainstorming: Participants generate ideas individually using electronic tools, then share them with the group for discussion and evaluation.
- Hybrid brainstorming: This approach combines elements of traditional group brainstorming with electronic tools to facilitate the collection, categorization, and presentation of ideas.
3. Materials and Methods
3.1. Data Preparation
3.1.1. Database Construction
3.1.2. Data Augmentation
3.1.3. Data Preprocessing
3.2. Topic Modeling and Clustering
3.2.1. Dimensionality Reduction
Algorithm 1: BERTopic Process for Topic Modeling and Clustering [27] |
Input:
|
Output:
|
Steps:
|
3.2.2. Clustering
3.2.3. Topic Representation
- : frequency of word in class .
- : frequency of word x across all classes.
- A: average number of words per class.
4. Results and Discussion
4.1. Evaluation Metrics
4.2. Quantitative Evaluation
4.3. Qualitative Evaluation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CBIE | Content-based information extraction |
DP | Data preprocessing |
DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
HDBSCAN | Hierarchical DBSCAN |
LDA | Latent Dirichlet Allocation |
LSA | Latent Semantic Analysis |
NLP | Natural language processing |
NMF | Non-negative matrix factorization |
OCR | Optical character recognition |
pLSA | Probabilistic Latent Semantic Analysis |
UMAP | Uniform Manifold Approximation and Projection |
References
- Memmert, L.; Tavanapour, N. Towards Human-AI-Collaboration in Brainstorming: Empirical Insights into the Perception of Working with a Generative AI. ECIS 2023 Research Papers. Available online: https://aisel.aisnet.org/ecis2023_rp/219 (accessed on 15 April 2024).
- Tang, L.; Peng, Y.; Wang, Y.; Ding, Y.; Durrett, G.; Rousseau, J.F. Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses. Proc. Conf. Assoc. Comput. Linguist. Meet. 2023, 2023, 12532–12555. [Google Scholar] [CrossRef] [PubMed]
- Barki, H.; Pinsonneault, A. Small Group Brainstorming and Idea Quality: Is Electronic Brainstorming the Most Effective Approach? Small Group Res. 2001, 32, 158–205. [Google Scholar] [CrossRef]
- Cheddak, A.; Ait Baha, T.; El Hajji, M.; Es-Saady, Y. Towards a Support System for Brainstorming Based Content-Based Information Extraction and Machine Learning. In Proceedings of the International Conference on Business Intelligence, Beni-Mellal, Morocco, 27–29 May 2021; Fakir, M., Baslam, M., El Ayachi, R., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 43–55. [Google Scholar] [CrossRef]
- Paulus, P.B.; Baruah, J.; Kenworthy, J. Chapter 24—Brainstorming: How to get the best ideas out of the “group brain” for organizational creativity. In Handbook of Organizational Creativity, 2nd ed.; Reiter-Palmon, R., Hunter, S., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 373–389. [Google Scholar] [CrossRef]
- Russell, T.M. Interactive Ideation: Online Team-Based Idea Generation Versus Traditional Brainstorming. Ph.D. Thesis, University of Minnesota, Minneapolis, MN, USA, 2019. [Google Scholar]
- Paulus, P.B.; Kenworthy, J.B. Effective brainstorming. In The Oxford Handbook of Group Creativity and Innovation; Oxford University Press: Oxford, UK, 2019; pp. 287–305. [Google Scholar]
- Paulus, P.B.; Yang, H.C. Idea generation in groups: A basis for creativity in organizations. Organ. Behav. Hum. Decis. Process. 2000, 82, 76–87. [Google Scholar] [CrossRef]
- Deckert, C.; Mohya, A.; Suntharalingam, S. Virtual whiteboards & digital post-its–incorporating internet-based tools for ideation into engineering courses. In Proceedings of the SEFI 2021: 49th Annual Conference Blended, Virtual, 13–16 September 2021; pp. 1370–1375. [Google Scholar]
- Dhaundiyal, D.; Pant, R. Tools for Virtual Brainstorming & Co-Creation: A Comparative Study of Collaborative Online Learning; Indiana University Southeast: New Albany, IN, USA, 2022. [Google Scholar]
- Wieland, B.; de Wit, J.; de Rooij, A. Electronic Brainstorming With a Chatbot Partner: A Good Idea Due to Increased Productivity and Idea Diversity. Front. Artif. Intell. 2022, 5, 880673. [Google Scholar] [CrossRef] [PubMed]
- Ekramipooya, A.; Boroushaki, M.; Rashtchian, D. Application of natural language processing and machine learning in prediction of deviations in the HAZOP study worksheet: A comparison of classifiers. Process Saf. Environ. Prot. 2023, 176, 65–73. [Google Scholar] [CrossRef]
- Evangelopoulos, N.E. Latent semantic analysis. Wiley Interdiscip. Rev. Cogn. Sci. 2013, 4, 683–692. [Google Scholar] [CrossRef] [PubMed]
- Hofmann, T. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Mach. Learn. 2001, 42, 177–196. [Google Scholar] [CrossRef]
- Zhang, C.; Yang, Q.; Zhang, J.; Gou, L.; Fan, H. Topic Mining and Future Trend Exploration in Digital Economy Research. Information 2023, 14, 432. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Hwang, S.J.; Lee, Y.K.; Kim, J.D.; Park, C.Y.; Kim, Y.S. Topic Modeling for Analyzing Topic Manipulation Skills. Information 2021, 12, 359. [Google Scholar] [CrossRef]
- Egger, R.; Yu, J. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef] [PubMed]
- Mendonça, M.; Figueira, Á. Topic Extraction: BERTopic’s Insight into the 117th Congress’s Twitterverse. Informatics 2024, 11, 8. [Google Scholar] [CrossRef]
- Creţulescu, R.G.; Morariu, D.I.; Breazu, M.; Volovici, D. DBSCAN algorithm for document clustering. Int. J. Adv. Stat. It C Econ. Life Sci. 2019, 9, 58–66. [Google Scholar] [CrossRef]
- Ros, F.; Guillaume, S.; Riad, R.; El Hajji, M. Detection of natural clusters via S-DBSCAN a Self-tuning version of DBSCAN. Knowl.-Based Syst. 2022, 241, 108288. [Google Scholar] [CrossRef]
- Barman, P.C.; Iqbal, N.; Lee, S.Y. Non-negative Matrix Factorization Based Text Mining: Feature Extraction and Classification. In International Conference on Neural Information Processing; King, I., Wang, J., Chan, L.W., Wang, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 703–712. [Google Scholar]
- de Groot, M.; Aliannejadi, M.; Haas, M.R. Experiments on Generalizability of BERTopic on Multi-Domain Short Text. arXiv 2022, arXiv:2212.08459. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M.; Furht, B. Text Data Augmentation for Deep Learning. J. Big Data 2021, 8, 101. [Google Scholar] [CrossRef] [PubMed]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
- Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
- Bhattacharjee, P.; Mitra, P. A survey of density based clustering algorithms. Front. Comput. Sci. 2021, 15, 1–27. [Google Scholar] [CrossRef]
- Malzer, C.; Baum, M. A Hybrid Approach To Hierarchical Density-based Cluster Selection. In Proceedings of the 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Karlsruhe, Germany, 14–16 September 2020; pp. 223–228. [Google Scholar] [CrossRef]
- Abdelrazek, A.; Eid, Y.; Gawish, E.; Medhat, W.; Hassan, A. Topic modeling algorithms and applications: A survey. Inf. Syst. 2023, 112, 102131. [Google Scholar] [CrossRef]
- Röder, M.; Both, A.; Hinneburg, A. Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, WSDM ’15, Shanghai, China, 2–6 February 2015; pp. 399–408. [Google Scholar] [CrossRef]
Method | Advantages | Limitations | Interpretability | Computational Complexity |
---|---|---|---|---|
LDA [16] | Easy to interpret | Simplified model May lack semantic representation | High | Moderate |
Handles large datasets | ||||
NMF [22] | Interpretable results Effective for dimensionality reduction | Sensitive to noise Requires preprocessed data | Moderate | Moderate |
LSA [13] | Uses semantic structure Works well for large corpora | Sensitivity to noisy data Less interpretable than LDA | Moderate | Moderate |
PLSA [14] | Sophisticated probabilistic model Captures nonlinear relationships | More complex to implement | Moderate | High |
Sensitivity to overfitting | ||||
BERTopic [23] | Utilizes BERT for contextual representation Captures complex semantic relationships Precise results for large datasets | Resource-intensive | Moderate | High |
Less interpretable than traditional approaches Requires learning phase on large data |
Stage | Algorithms | Parameters |
---|---|---|
Embeddings | AraBert-v02 | ――――― |
Dimensionality reduction | UMAP | n_neighbors = 15 − n_components = 5 − metric = cosine |
Clustering | HDBSCAN | default parameters |
Topic representation | c-TF-IDF | bm25_weighting − reduce_frequent_words |
No. | Top-10 Keywords | Annotations |
---|---|---|
Topic 1 | تأثير، سلبي، ضياع، للطلاب، تطوير، الزمن، النجاح، المهارات، الأكاديمي | التحديات التعليمية |
Negative, impact, loss, for students, development, time, success, skills, academic, waste | Educational challenges | |
Topic 2 | لتحفيزهم، يسبب، إخراج، تفادقم، التلاميذ، لتحسين، الالتزام، للمواد | تحفيز التلاميذ |
To motivate, cause, direct, aggravate, students, to improve, commitment, to academic, subjects, lessons | Motivating students | |
Topic 3 | توفير، تعليمية، المنزلية، برامج، للطلاب، تنظيم، كيفية، انشاء، تشجيع، الطلاب | الدعم التعليمي المنزلي |
Providing, educational, home, programs, for students, organizing, how to, create, encouraging, students | Home educational support | |
Topic 4 | التعليمية، الفرص، زيادة، الفرصة، تجاهل، تفاقم، الأسرة، انقسام، فقدان، الانقسام | فرص التعليم |
Educational, opportunity, increase, opportunity, neglect, aggravation, family, division, loss, division | Education opportunities | |
Topic 5 | لتكوين، غياب، الأسرة، فاشل، جهل، المدرسي، مواطن، بالمساطر، ضعف، الأنانية | نظام التعليم |
Formation, absence, family, failure, ignorance, school, citizen, rulers, weakness, selfishness | Educational system |
Topic Model | Number of Topics | Coherence (c_v) | Coherence (c_umass) | |||
---|---|---|---|---|---|---|
without DP | with DP | without DP | with DP | without DP | with DP | |
BERTopic | 29 | 25 | 0.029 | −0.002 | −6.630 | −7.983 |
LDA 5-100 | 5 | −0.147 | −0.151 | −7.380 | −7.240 | |
LDA 10-100 | 10 | −0.139 | −0.171 | −7.297 | −8.292 | |
LDA 20-100 | 20 | −0.172 | −0.169 | −8.688 | −8.623 | |
LDA 30-100 | 30 | −0.180 | −0.195 | −9.441 | −9.454 | |
NMF | 30 | −0.018 | 0.005 | −8.002 | −8.689 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheddak, A.; Ait Baha, T.; Es-Saady, Y.; El Hajji, M.; Baslam, M. BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions. Information 2024, 15, 365. https://doi.org/10.3390/info15060365
Cheddak A, Ait Baha T, Es-Saady Y, El Hajji M, Baslam M. BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions. Information. 2024; 15(6):365. https://doi.org/10.3390/info15060365
Chicago/Turabian StyleCheddak, Asma, Tarek Ait Baha, Youssef Es-Saady, Mohamed El Hajji, and Mohamed Baslam. 2024. "BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions" Information 15, no. 6: 365. https://doi.org/10.3390/info15060365
APA StyleCheddak, A., Ait Baha, T., Es-Saady, Y., El Hajji, M., & Baslam, M. (2024). BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions. Information, 15(6), 365. https://doi.org/10.3390/info15060365