Big Data and Cognitive Computing

Journal Browser

► Journal Browser

Big Data Cogn. Comput., Volume 8, Issue 7 (July 2024) – 2 articles

Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
You may sign up for e-mail alerts to receive table of contents of newly released issues.
PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.

24 pages, 7762 KiB

Open AccessArticle

Semantic Non-Negative Matrix Factorization for Term Extraction

by Aliya Nugumanova, Almas Alzhanov, Aiganym Mansurova, Kamilla Rakhymbek and Yerzhan Baiburin

Big Data Cogn. Comput. 2024, 8(7), 72; https://doi.org/10.3390/bdcc8070072 - 27 Jun 2024

Viewed by 343

Abstract

This study introduces an unsupervised term extraction approach that combines non-negative matrix factorization (NMF) with word embeddings. Inspired by a pioneering semantic NMF method that employs regularization to jointly optimize document–word and word–word matrix factorizations for document clustering, we adapt this strategy for term extraction. Typically, a word–word matrix representing semantic relationships between words is constructed using cosine similarities between word embeddings. However, it has been established that transformer encoder embeddings tend to reside within a narrow cone, leading to consistently high cosine similarities between words. To address this issue, we replace the conventional word–word matrix with a word–seed submatrix, restricting columns to ‘domain seeds’—specific words that encapsulate the essential semantic features of the domain. Therefore, we propose a modified NMF framework that jointly factorizes the document–word and word–seed matrices, producing more precise encoding vectors for words, which we utilize to extract high-relevancy topic-related terms. Our modification significantly improves term extraction effectiveness, marking the first implementation of semantically enhanced NMF, designed specifically for the task of term extraction. Comparative experiments demonstrate that our method outperforms both traditional NMF and advanced transformer-based methods such as KeyBERT and BERTopic. To support further research and application, we compile and manually annotate two new datasets, each containing 1000 sentences, from the ‘Geography and History’ and ‘National Heroes’ domains. These datasets are useful for both term extraction and document classification tasks. All related code and datasets are freely available. Full article

► Show Figures

Figure 1

18 pages, 414 KiB

Open AccessArticle

ReJOOSp: Reinforcement Learning for Join Order Optimization in SPARQL

by Benjamin Warnke, Kevin Martens, Tobias Winker, Sven Groppe, Jinghua Groppe, Prasad Adhiyaman, Sruthi Srinivasan and Shridevi Krishnakumar

Big Data Cogn. Comput. 2024, 8(7), 71; https://doi.org/10.3390/bdcc8070071 - 27 Jun 2024

Viewed by 402

Abstract

The choice of a good join order plays an important role in the query performance of databases. However, determining the best join order is known to be an NP-hard problem with exponential growth with the number of joins. Because of this, nonlearning approaches to join order optimization have a longer optimization and execution time. In comparison, the models of machine learning, once trained, can construct optimized query plans very quickly. Several efforts have applied machine learning to optimize join order for SQL queries outperforming traditional approaches. In this work, we suggest a reinforcement learning technique for join optimization for SPARQL queries, ReJOOSp. SPARQL queries typically contain a much higher number of joins than SQL queries and so are more difficult to optimize. To evaluate ReJOOSp, we further develop a join order optimizer based on ReJOOSp and integrate it into the Semantic Web DBMS Luposdate3000. The evaluation of ReJOOSp shows its capability to significantly enhance query performance by achieving high-quality execution plans for a substantial portion of queries across synthetic and real-world datasets. Full article

(This article belongs to the Special Issue Knowledge Graphs in the Big Data Era: Navigating the Confluence of Distribution, Visualization, and Advanced Computational Models)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 8, Issue 7 (July 2024) – 2 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI