Previous Issue
Volume 8, June
 
 

Big Data Cogn. Comput., Volume 8, Issue 7 (July 2024) – 2 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
24 pages, 7762 KiB  
Article
Semantic Non-Negative Matrix Factorization for Term Extraction
by Aliya Nugumanova, Almas Alzhanov, Aiganym Mansurova, Kamilla Rakhymbek and Yerzhan Baiburin
Big Data Cogn. Comput. 2024, 8(7), 72; https://doi.org/10.3390/bdcc8070072 - 27 Jun 2024
Viewed by 343
Abstract
This study introduces an unsupervised term extraction approach that combines non-negative matrix factorization (NMF) with word embeddings. Inspired by a pioneering semantic NMF method that employs regularization to jointly optimize document–word and word–word matrix factorizations for document clustering, we adapt this strategy for [...] Read more.
This study introduces an unsupervised term extraction approach that combines non-negative matrix factorization (NMF) with word embeddings. Inspired by a pioneering semantic NMF method that employs regularization to jointly optimize document–word and word–word matrix factorizations for document clustering, we adapt this strategy for term extraction. Typically, a word–word matrix representing semantic relationships between words is constructed using cosine similarities between word embeddings. However, it has been established that transformer encoder embeddings tend to reside within a narrow cone, leading to consistently high cosine similarities between words. To address this issue, we replace the conventional word–word matrix with a word–seed submatrix, restricting columns to ‘domain seeds’—specific words that encapsulate the essential semantic features of the domain. Therefore, we propose a modified NMF framework that jointly factorizes the document–word and word–seed matrices, producing more precise encoding vectors for words, which we utilize to extract high-relevancy topic-related terms. Our modification significantly improves term extraction effectiveness, marking the first implementation of semantically enhanced NMF, designed specifically for the task of term extraction. Comparative experiments demonstrate that our method outperforms both traditional NMF and advanced transformer-based methods such as KeyBERT and BERTopic. To support further research and application, we compile and manually annotate two new datasets, each containing 1000 sentences, from the ‘Geography and History’ and ‘National Heroes’ domains. These datasets are useful for both term extraction and document classification tasks. All related code and datasets are freely available. Full article
Show Figures

Figure 1

18 pages, 414 KiB  
Article
ReJOOSp: Reinforcement Learning for Join Order Optimization in SPARQL
by Benjamin Warnke, Kevin Martens, Tobias Winker, Sven Groppe, Jinghua Groppe, Prasad Adhiyaman, Sruthi Srinivasan and Shridevi Krishnakumar
Big Data Cogn. Comput. 2024, 8(7), 71; https://doi.org/10.3390/bdcc8070071 - 27 Jun 2024
Viewed by 402
Abstract
The choice of a good join order plays an important role in the query performance of databases. However, determining the best join order is known to be an NP-hard problem with exponential growth with the number of joins. Because of this, nonlearning approaches [...] Read more.
The choice of a good join order plays an important role in the query performance of databases. However, determining the best join order is known to be an NP-hard problem with exponential growth with the number of joins. Because of this, nonlearning approaches to join order optimization have a longer optimization and execution time. In comparison, the models of machine learning, once trained, can construct optimized query plans very quickly. Several efforts have applied machine learning to optimize join order for SQL queries outperforming traditional approaches. In this work, we suggest a reinforcement learning technique for join optimization for SPARQL queries, ReJOOSp. SPARQL queries typically contain a much higher number of joins than SQL queries and so are more difficult to optimize. To evaluate ReJOOSp, we further develop a join order optimizer based on ReJOOSp and integrate it into the Semantic Web DBMS Luposdate3000. The evaluation of ReJOOSp shows its capability to significantly enhance query performance by achieving high-quality execution plans for a substantial portion of queries across synthetic and real-world datasets. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop