Submit to Special Issue Submit Abstract to Special Issue Review for Algorithms Propose a Special Issue

Journal Menu

Journal Browser

Supervised and Unsupervised Classification Algorithms (2nd Edition)

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Related Special Issue
Published Papers

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Algorithms for Multidisciplinary Applications".

Deadline for manuscript submissions: 31 August 2024 | Viewed by 15275

Share This Special Issue

Special Issue Editors

Dr. Mario Rosario Guarracino

E-Mail Website
Guest Editor

Department of Economics and Law, University of Cassino and Southern Lazio, 03043 Cassino, Italy
Interests: data science; statistical network analysis; supervised classification
Special Issues, Collections and Topics in MDPI journals

Dr. Laura Antonelli

E-Mail Website
Guest Editor

Institute for High-Performance Computing and Networking, National Research Council, 80131 Naples, Italy
Interests: image processing and analysis; scientific computing; parallel computing
Special Issues, Collections and Topics in MDPI journals

Dr. Pietro Hiram Guzzi

E-Mail Website
Guest Editor

Department of Surgical and Medical Sciences, University of Catanzaro, Viale Europa, 88100 Catanzaro, Italy
Interests: computational biology; data science; bioinformatics; COVID-19
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Supervised and unsupervised classification algorithms are the two main branches of machine learning methods. Supervised classification refers to the task of training a system using labeled data divided into classes, and assigning data to these existing classes. The process consists in computing a model from a set of labeled training data, and then applying the model to predict the class label for incoming unlabeled data. It is called supervised learning because the training data set supervises the learning process. Supervised classification algorithms are divided into two categories: classification and regression.

In unsupervised classification, the data being processed are unlabeled, so in the lack of prior knowledge, the algorithm tries to search for a similarity to generate clusters and assign classes. Unsupervised classification algorithms are divided into three categories: clustering, data estimation, and dimensionality reduction.

Applications range from object detection from biomedical images and disease prediction to natural language understanding and generation.

Submissions are welcome both for traditional classification problems as well as new applications. Potential topics include but are not limited to image classification, data integration, clustering approaches, feature extraction, etc.

Dr. Mario Rosario Guarracino
Dr. Laura Antonelli
Dr. Pietro Hiram Guzzi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

supervised classification algorithms
clustering algorithms
network analysis
community extraction
data science
biological knowledge extraction

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Supervised and Unsupervised Classification Algorithms in Algorithms (6 articles)

Published Papers (11 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

21 pages, 3425 KiB

Open AccessArticle

Directed Clustering of Multivariate Data Based on Linear or Quadratic Latent Variable Models

by Yingjuan Zhang and Jochen Einbeck

Algorithms 2024, 17(8), 358; https://doi.org/10.3390/a17080358 (registering DOI) - 16 Aug 2024

Abstract

We consider situations in which the clustering of some multivariate data is desired, which establishes an ordering of the clusters with respect to an underlying latent variable. As our motivating example for a situation where such a technique is desirable, we consider scatterplots of traffic flow and speed, where a pattern of consecutive clusters can be thought to be linked by a latent variable, which is interpretable as traffic density. We focus on latent structures of linear or quadratic shapes, and present an estimation methodology based on expectation–maximization, which estimates both the latent subspace and the clusters along it. The directed clustering approach is summarized in two algorithms and applied to the traffic example outlined. Connections to related methodology, including principal curves, are briefly drawn. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

22 pages, 2721 KiB

Open AccessArticle

Federated Learning-Based Security Attack Detection for Multi-Controller Software-Defined Networks

by Abrar Alkhamisi, Iyad Katib and Seyed M. Buhari

Algorithms 2024, 17(7), 290; https://doi.org/10.3390/a17070290 - 2 Jul 2024

Viewed by 559

Abstract

A revolutionary concept of Multi-controller Software-Defined Networking (MC-SDN) is a promising structure for pursuing an evolving complex and expansive large-scale modern network environment. Despite the rich operational flexibility of MC-SDN, it is imperative to protect the network deployment against potential vulnerabilities that lead to misuse and malicious activities on data planes. The security holes in the MC-SDN significantly impact network survivability, and subsequently, the data plane is vulnerable to potential security threats and unintended consequences. Accordingly, this work intends to design a Federated learning-based Security (FedSec) strategy that detects the MC-SDN attack. The FedSec ensures packet routing services among the nodes by maintaining a flow table frequently updated according to the global model knowledge. By executing the FedSec algorithm only on the network-centric nodes selected based on importance measurements, the FedSec reduces the system complexity and enhances attack detection and classification accuracy. Finally, the experimental results illustrate the significance of the proposed FedSec strategy regarding various metrics. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

25 pages, 3788 KiB

Open AccessArticle

A Comprehensive Exploration of Unsupervised Classification in Spike Sorting: A Case Study on Macaque Monkey and Human Pancreatic Signals

by Francisco Javier Iñiguez-Lomeli, Edgar Eliseo Franco-Ortiz, Ana Maria Silvia Gonzalez-Acosta, Andres Amador Garcia-Granada and Horacio Rostro-Gonzalez

Algorithms 2024, 17(6), 235; https://doi.org/10.3390/a17060235 - 30 May 2024

Viewed by 461

Abstract

Spike sorting, an indispensable process in the analysis of neural biosignals, aims to segregate individual action potentials from mixed recordings. This study delves into a comprehensive investigation of diverse unsupervised classification algorithms, some of which, to the best of our knowledge, have not previously been used for spike sorting. The methods encompass Principal Component Analysis (PCA), K-means, Self-Organizing Maps (SOMs), and hierarchical clustering. The research draws insights from both macaque monkey and human pancreatic signals, providing a holistic evaluation across species. Our research has focused on the utilization of the aforementioned methods for the sorting of 327 detected spikes within an in vivo signal of a macaque monkey, as well as 386 detected spikes within an in vitro signal of a human pancreas. This classification process was carried out by extracting statistical features from these spikes. We initiated our analysis with K-means, employing both unmodified and normalized versions of the features. To enhance the performance of this algorithm, we also employed Principal Component Analysis (PCA) to reduce the dimensionality of the data, thereby leading to more distinct groupings as identified by the K-means algorithm. Furthermore, two additional techniques, namely hierarchical clustering and Self-Organizing Maps, have also undergone exploration and have demonstrated favorable outcomes for both signal types. Across all scenarios, a consistent observation emerged: the identification of six distinctive groups of spikes, each characterized by distinct shapes, within both signal sets. In this regard, we meticulously present and thoroughly analyze the experimental outcomes yielded by each of the employed algorithms. This comprehensive presentation and discussion encapsulate the nuances, patterns, and insights uncovered by these algorithms across our data. By delving into the specifics of these results, we aim to provide a nuanced understanding of the efficacy and performance of each algorithm in the context of spike sorting. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

17 pages, 368 KiB

Open AccessArticle

Smooth Information Criterion for Regularized Estimation of Item Response Models

by Alexander Robitzsch

Algorithms 2024, 17(4), 153; https://doi.org/10.3390/a17040153 - 6 Apr 2024

Cited by 1 | Viewed by 1196

Abstract

Item response theory (IRT) models are frequently used to analyze multivariate categorical data from questionnaires or cognitive test data. In order to reduce the model complexity in item response models, regularized estimation is now widely applied, adding a nondifferentiable penalty function like the LASSO or the SCAD penalty to the log-likelihood function in the optimization function. In most applications, regularized estimation repeatedly estimates the IRT model on a grid of regularization parameters

λ

. The final model is selected for the parameter that minimizes the Akaike or Bayesian information criterion (AIC or BIC). In recent work, it has been proposed to directly minimize a smooth approximation of the AIC or the BIC for regularized estimation. This approach circumvents the repeated estimation of the IRT model. To this end, the computation time is substantially reduced. The adequacy of the new approach is demonstrated by three simulation studies focusing on regularized estimation for IRT models with differential item functioning, multidimensional IRT models with cross-loadings, and the mixed Rasch/two-parameter logistic IRT model. It was found from the simulation studies that the computationally less demanding direct optimization based on the smooth variants of AIC and BIC had comparable or improved performance compared to the ordinarily employed repeated regularized estimation based on AIC or BIC. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

20 pages, 6112 KiB

Open AccessArticle

Multi-Augmentation-Based Contrastive Learning for Semi-Supervised Learning

by Jie Wang, Jie Yang, Jiafan He and Dongliang Peng

Algorithms 2024, 17(3), 91; https://doi.org/10.3390/a17030091 - 20 Feb 2024

Viewed by 1508

Abstract

Semi-supervised learning has been proven to be effective in utilizing unlabeled samples to mitigate the problem of limited labeled data. Traditional semi-supervised learning methods generate pseudo-labels for unlabeled samples and train the classifier using both labeled and pseudo-labeled samples. However, in data-scarce scenarios, reliance on labeled samples for initial classifier generation can degrade performance. Methods based on consistency regularization have shown promising results by encouraging consistent outputs for different semantic variations of the same sample obtained through diverse augmentation techniques. However, existing methods typically utilize only weak and strong augmentation variants, limiting information extraction. Therefore, a multi-augmentation contrastive semi-supervised learning method (MAC-SSL) is proposed. MAC-SSL introduces moderate augmentation, combining outputs from moderately and weakly augmented unlabeled images to generate pseudo-labels. Cross-entropy loss ensures consistency between strongly augmented image outputs and pseudo-labels. Furthermore, the MixUP is adopted to blend outputs from labeled and unlabeled images, enhancing consistency between re-augmented outputs and new pseudo-labels. The proposed method achieves a state-of-the-art performance (accuracy) through extensive experiments conducted on multiple datasets with varying numbers of labeled samples. Ablation studies further investigate each component’s significance. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

23 pages, 4215 KiB

Open AccessArticle

Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic

by Károly Héberger

Algorithms 2024, 17(1), 43; https://doi.org/10.3390/a17010043 - 19 Jan 2024

Cited by 2 | Viewed by 2024

Abstract

Background: The development and application of machine learning (ML) methods have become so fast that almost nobody can follow their developments in every detail. It is no wonder that numerous errors and inconsistencies in their usage have also spread with a similar speed independently from the tasks: regression and classification. This work summarizes frequent errors committed by certain authors with the aim of helping scientists to avoid them. Methods: The principle of parsimony governs the train of thought. Fair method comparison can be completed with multicriteria decision-making techniques, preferably by the sum of ranking differences (SRD). Its coupling with analysis of variance (ANOVA) decomposes the effects of several factors. Earlier findings are summarized in a review-like manner: the abuse of the correlation coefficient and proper practices for model discrimination are also outlined. Results: Using an illustrative example, the correct practice and the methodology are summarized as guidelines for model discrimination, and for minimizing the prediction errors. The following factors are all prerequisites for successful modeling: proper data preprocessing, statistical tests, suitable performance parameters, appropriate degrees of freedom, fair comparison of models, and outlier detection, just to name a few. A checklist is provided in a tutorial manner on how to present ML modeling properly. The advocated practices are reviewed shortly in the discussion. Conclusions: Many of the errors can easily be filtered out with careful reviewing. Every authors’ responsibility is to adhere to the rules of modeling and validation. A representative sampling of recent literature outlines correct practices and emphasizes that no error-free publication exists. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Graphical abstract

24 pages, 5543 KiB

Open AccessArticle

A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data

by Lamia Bendebane, Zakaria Laboudi, Asma Saighi, Hassan Al-Tarawneh, Adel Ouannas and Giuseppe Grassi

Algorithms 2023, 16(12), 543; https://doi.org/10.3390/a16120543 - 27 Nov 2023

Viewed by 2028

Abstract

Social media occupies an important place in people’s daily lives where users share various contents and topics such as thoughts, experiences, events and feelings. The massive use of social media has led to the generation of huge volumes of data. These data constitute a treasure trove, allowing the extraction of high volumes of relevant information particularly by involving deep learning techniques. Based on this context, various research studies have been carried out with the aim of studying the detection of mental disorders, notably depression and anxiety, through the analysis of data extracted from the Twitter platform. However, although these studies were able to achieve very satisfactory results, they nevertheless relied mainly on binary classification models by treating each mental disorder separately. Indeed, it would be better if we managed to develop systems capable of dealing with several mental disorders at the same time. To address this point, we propose a well-defined methodology involving the use of deep learning to develop effective multi-class models for detecting both depression and anxiety disorders through the analysis of tweets. The idea consists in testing a large number of deep learning models ranging from simple to hybrid variants to examine their strengths and weaknesses. Moreover, we involve the grid search technique to help find suitable values for the learning rate hyper-parameter due to its importance in training models. Our work is validated through several experiments and comparisons by considering various datasets and other binary classification models. The aim is to show the effectiveness of both the assumptions used to collect the data and the use of multi-class models rather than binary class models. Overall, the results obtained are satisfactory and very competitive compared to related works. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

27 pages, 9031 KiB

Open AccessArticle

Supervised Methods for Modeling Spatiotemporal Glacier Variations by Quantification of the Area and Terminus of Mountain Glaciers Using Remote Sensing

by Edmund Robbins, Thu Thu Hlaing, Jonathan Webb and Nezamoddin N. Kachouie

Algorithms 2023, 16(10), 486; https://doi.org/10.3390/a16100486 - 19 Oct 2023

Viewed by 1347

Abstract

Glaciers are important indictors of climate change as changes in glaciers physical features such as their area is in response to measurable evidence of fluctuating climate factors such as temperature, precipitation, and CO₂. Although a general retreat of mountain glacier systems has been identified in relation to centennial trends toward warmer temperatures, there is the potential to extract a great deal more information regarding regional variations in climate from the mapping of the time history of the terminus position or surface area of the glaciers. The remote nature of glaciers renders direct measurement impractical on anything other than a local scale. Considering the sheer number of mountain glaciers around the globe, ground measurements of terminus position are only available for a small percentage of glaciers and ground measurements of glacier area are rare. In this project, changes in the terminal point and area of Franz Josef and Gorner glaciers were quantified in response to climate factors using satellite imagery taken by Landsat at regular intervals. Two supervised learning methods including a parametric method (multiple regression) and a nonparametric method (generalized additive model) were implemented to identify climate factors that impact glacier changes. Local temperature, CO₂, and precipitation were identified as significant factors for predicting changes in both Franz Josef and Gorner glaciers. Spatiotemporal quantification of glacier change is an essential task to model glacier variations in response to global and local climate factors. This work provided valuable insights on quantification of surface area of glaciers using satellite imagery with potential implementation of a generic approach. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

26 pages, 35873 KiB

Open AccessArticle

Quantitative and Qualitative Comparison of Decision-Map Techniques for Explaining Classification Models

by Yu Wang, Alister Machado and Alexandru Telea

Algorithms 2023, 16(9), 438; https://doi.org/10.3390/a16090438 - 11 Sep 2023

Cited by 2 | Viewed by 1421

Abstract

Visualization techniques for understanding and explaining machine learning models have gained significant attention. One such technique is the decision map, which creates a 2D depiction of the decision behavior of classifiers trained on high-dimensional data. While several decision map techniques have been proposed recently, such as Decision Boundary Maps (DBMs), Supervised Decision Boundary Maps (SDBMs), and DeepView (DV), there is no framework for comprehensively evaluating and comparing these techniques. In this paper, we propose such a framework by combining quantitative metrics and qualitative assessment. We apply our framework to DBM, SDBM, and DV using a range of both synthetic and real-world classification techniques and datasets. Our results show that none of the evaluated decision-map techniques consistently outperforms the others in all measured aspects. Separately, our analysis exposes several previously unknown properties and limitations of decision-map techniques. To support practitioners, we also propose a workflow for selecting the most appropriate decision-map technique for given datasets, classifiers, and requirements of the application at hand. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

20 pages, 447 KiB

Open AccessArticle

Optimized Tensor Decomposition and Principal Component Analysis Outperforming State-of-the-Art Methods When Analyzing Histone Modification Chromatin Immunoprecipitation Profiles

by Turki Turki, Sanjiban Sekhar Roy and Y.-H. Taguchi

Algorithms 2023, 16(9), 401; https://doi.org/10.3390/a16090401 - 23 Aug 2023

Cited by 1 | Viewed by 1871

Abstract

It is difficult to identify histone modification from datasets that contain high-throughput sequencing data. Although multiple methods have been developed to identify histone modification, most of these methods are not specific to histone modification but are general methods that aim to identify protein binding to the genome. In this study, tensor decomposition (TD) and principal component analysis (PCA)-based unsupervised feature extraction with optimized standard deviation were successfully applied to gene expression and DNA methylation. The proposed method was used to identify histone modification. Histone modification along the genome is binned within the region of length L. Considering principal components (PCs) or singular value vectors (SVVs) that PCA or TD attributes to samples, we can select PCs or SVVs attributed to regions. The selected PCs and SVVs further attribute p-values to regions, and adjusted p-values are used to select regions. The proposed method identified various histone modifications successfully and outperformed various state-of-the-art methods. This method is expected to serve as a de facto standard method to identify histone modification. For reproducibility and to ensure the systematic analysis of our study is applicable to datasets from different gene expression experiments, we have made our tools publicly available for download from gitHub. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Figure 1

17 pages, 599 KiB

Open AccessArticle

Two Medoid-Based Algorithms for Clustering Sets

by Libero Nigro and Pasi Fränti

Algorithms 2023, 16(7), 349; https://doi.org/10.3390/a16070349 - 20 Jul 2023

Cited by 1 | Viewed by 1372

Abstract

This paper proposes two algorithms for clustering data, which are variable-sized sets of elementary items. An example of such data occurs in the analysis of a medical diagnosis, where the goal is to detect human subjects who share common diseases to possibly predict future illnesses from previous medical history. The first proposed algorithm is based on K-medoids and the second algorithm extends the random swap algorithm, which has proven to be capable of efficient and careful clustering; both algorithms depend on a distance function among data objects (sets), which can use application-sensitive weights or priorities. The proposed distance function makes it possible to exploit several seeding methods that can improve clustering accuracy. A key factor in the two algorithms is their parallel implementation in Java, based on functional programming using streams and lambda expressions. The use of parallelism smooths out the

O (N^{2})

computational cost behind K-medoids and clustering indexes such as the Silhouette index and allows for the handling of non-trivial datasets. This paper applies the algorithms to several benchmark case studies of sets and demonstrates how accurate and time-efficient clustering solutions can be achieved. Full article

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

► Show Figures

Journal Menu

Journal Browser

Supervised and Unsupervised Classification Algorithms (2nd Edition)

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (11 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI