Skip to Content

Machine Learning and Knowledge Extraction

Machine Learning and Knowledge Extraction is an international, peer-reviewed, open access, monthly journal on machine learning and applications, see our video on YouTube explaining the MAKE journal concept. 

Quartile Ranking JCR - Q1 (Engineering, Electrical and Electronic | Computer Science, Artificial Intelligence | Computer Science, Interdisciplinary Applications)

Get Alerted

Add your email address to receive forthcoming issues of this journal.

All Articles (673)

Soft-Prompted Semantic Normalization for Unsupervised Analysis of the Scientific Literature

  • Ivan Malashin,
  • Dmitry Martysyuk and
  • Aleksei Borodulin
  • + 3 authors

Mapping thematic structure in large scientific corpora enables the systematic analysis of research trends and conceptual organization. This work presents an unsupervised framework that leverages large language models (LLMs) as fixed semantic inference operators guided by structured soft prompts. The framework transforms raw abstracts into normalized semantic representations that reduce stylistic variability while retaining core conceptual content. These representations are embedded into a continuous vector space, where density-based clustering identifies latent research themes without predefining the number of topics. Cluster-level interpretation is performed using LLM-based semantic decoding to generate concise, human-readable descriptions of the discovered themes. Experiments on ICML and ACL 2025 abstracts demonstrate that the method produces coherent clusters reflecting problem formulations, methodological contributions, and empirical contexts. The findings indicate that prompt-driven semantic normalization combined with geometric analysis provides a scalable and model-agnostic approach for unsupervised thematic discovery across large scholarly corpora.

5 March 2026

Research landscape of LLM-based semantic representation and adaptation.

Multiple Sclerosis (MS) and Myelitis are serious inflammatory spinal cord disorders with overlapping clinical symptoms and radiological characteristics, making accurate differentiation challenging yet clinically essential. Early and precise diagnosis is critical for guiding treatment strategies and improving patient outcomes. In this study, we propose KhayyamNet, a novel hybrid deep learning architecture designed to fuse complementary local and global representations for the accurate diagnosis of MS and Myelitis using spinal MRI. To improve robustness and generalization capability, a comprehensive preprocessing strategy including data augmentation and intensity normalization is also applied to reduce noise and address data variability. The proposed architecture combines three complementary deep learning models for feature extraction composed of Xception for high-level semantic features, Convolutional Neural Networks (CNNs) for fine-grained local patterns, and Vision Transformers (ViTs) for global contextual representations via attention mechanisms. Extracted features are then fused and refined using the Minimum Redundancy Maximum Relevance (MRMR) algorithm to eliminate redundancy and retain the most informative signals. Finally, a Random Forest (RF) classifier utilizes the optimized feature set to achieve accurate and robust differentiation between MS, Myelitis, and control spinal MRIs. Experimental results demonstrate that KhayyamNet outperforms existing methods by achieving an average classification accuracy of 98.15±0.80%. This framework demonstrates promising performance for the automated analysis of spinal MRIs and shows potential to assist in the differentiation of MS and Myelitis. While these findings highlight the potential of KhayyamNet for automated MRI interpretation, its evaluation is limited to a single-center dataset, and further validation on external multi-center data is required.

5 March 2026

A block diagram of the proposed method.

Semantic and Engineering-Based Embedding for Classification List Development

  • Jadeyn Feng,
  • Allison Lau and
  • Michael Stewart
  • + 2 authors

The creation and application of classification category labels are essential tasks for transforming complex information into structured knowledge. Categories are used for summary and reporting purposes and have historically been identified by domain experts based on their past experiences and norms. Our interest lies in the general case where expert-generated category lists require improvement, and unsupervised learning, on its own, struggles to effectively identify categories for multi-class classification of human-generated texts. We hypothesise that including an annotated knowledge graph (KG) in an embedding process will positively impact unsupervised clustering performance. Our goal is to identify clusters that can be labelled and used for classification. We look at unsupervised clustering of Maintenance Work Order (MWO) texts. MWOs capture vital observations about equipment failures in process and heavy industries. The selected KG contains a mapping of equipment types to their inherent function based on the IEC 81346-2 international standard for classification of objects in industrial systems. Performance is assessed by statistical analysis, subject matter experts, and Normalized Mutual Information score. We demonstrate that Word2Vec Bi-LSTM and Sentence-BERT NN embedding methods can leverage equipment inherent function information in the KG to improve failure mode cluster identification for the MWO. Organisations seeking to use AI to automate assignment of a failure mode code to each MWO currently need test sets classified by humans. The results of this work suggest that a semantic layer containing a knowledge graph mapping equipment types to inherent function, and inherent function to failure modes could assist in quality control for automated failure mode classification.

4 March 2026

Demonstration of engineering chain of thought relating a specific physical object class (differential) to sub-systems and their inherent function, thence to each function’s associated functional failure, and the ways in which the functional failure can be observed. The list of parts and their functions is taken from the IEC 81346-2 Standard.

Anomaly detection on tabular data is widely used in fraud detection, predictive maintenance, and medical screening. While heterogeneous ensembles combining multiple detection paradigms achieve strong performance, their computational cost limits deployment in latency-sensitive or resource-constrained environments. We propose KD-AnomalyNet, a teacher–student framework that distills anomaly knowledge from a high-capacity ensemble into a lightweight neural model for efficient inference. Beyond performance replication, we study how anomaly representations transfer during distillation. To this end, we introduce a noise perturbation analysis that serves as a diagnostic probe for representation stability without introducing additional trainable components. Experiments on ten benchmark datasets show that the distilled model preserves up to 98.5% of the teacher’s AUC-ROC on the nine capacity-sufficient datasets (84.7% mean retention across all ten datasets) while achieving 26–181× inference speedups. Our analysis reveals which forms of anomaly knowledge transfer reliably—global outliers (78% transfer) and isolation-based detection (88% retention)—and which degrade under compression—local outliers (20% transfer) and neighborhood-based detection (76% retention)—providing practical guidance for deploying distilled anomaly detectors.

3 March 2026

The KD-AnomalyNet workflow. The teacher provides soft anomaly scores (
  
    s
    T
  
), confidence weights (c), and reconstruction targets (
  
    
      x
      ^
    
    T
  
), which are transferred to the student via a temperature-annealed distillation engine.

News & Conferences

Issues

Open for Submission

Editor's Choice

XFacebookLinkedIn
Mach. Learn. Knowl. Extr. - ISSN 2504-4990