Mathematical Methods in Machine Learning and Data Science

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 31 January 2025 | Viewed by 5604

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Engineering, Canadian University Dubai, Dubai, United Arab Emirates
Interests: operator algebras; machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Arts and Sciences, Department of Mathematics and Statistics, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates
Interests: statistics; probability and random processes; business mathematics; statistical modeling and data science

E-Mail Website
Guest Editor
Department of Mathematics and Statistics, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates
Interests: generalized statistical distributions arising from the hazard function; statistical inference of probability models; characterization of distributions; bivariate and multivariate weighted distributions

Special Issue Information

Dear Colleagues,

This Special Issue is devoted to recent advances in the use of mathematical methods in data science and machine learning. It includes a range of topics of concern to scholars applying quantitative, optimization, combinatorial, logical, topological, geometrical, statistical, algebraic, and algorithmic methods to diverse areas of data science and machine learning. Novel methods, new applications, comparative analyses of models, case studies, and state-of-the-art review papers are particularly welcomed.

Mathematical methods have underlain every major advancement in data science and machine learning—from reproducing kernel Hilbert spaces and back-propagation in the beginning, to more recent methods such as random matrices and graph theory. Combined with the enormous amount of available data and computing power, mathematical methods have propelled machine learning to astonishing results, achieving near-human-level performance on many tasks. As a response to the recent advancements, the objective of this Special Issue is to present a collection of notable mathematical and statistical methods in data science and machine learning. We invite scholars from all around the world to contribute to developing a comprehensive collection of papers on this important theme.

Dr. Firuz Kamalov
Prof. Dr. Hana Sulieman
Dr. Ayman Alzaatreh
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • mathematical methods
  • machine learning
  • data science
  • optimization
  • mathematical statistics
  • algorithms
  • linear algebra
  • dimensionality reduction
  • topology
  • geometry
  • logic
  • combinatorics
  • fuzzy logic
  • time series
  • regression
  • classification
  • imbalanced data
  • feature selection

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

34 pages, 10312 KiB  
Article
Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging
by Hari Mohan Rai, Serhii Dashkevych and Joon Yoo
Mathematics 2024, 12(18), 2808; https://doi.org/10.3390/math12182808 - 11 Sep 2024
Viewed by 631
Abstract
Breast cancer is one of the most lethal and widespread diseases affecting women worldwide. As a result, it is necessary to diagnose breast cancer accurately and efficiently utilizing the most cost-effective and widely used methods. In this research, we demonstrated that synthetically created [...] Read more.
Breast cancer is one of the most lethal and widespread diseases affecting women worldwide. As a result, it is necessary to diagnose breast cancer accurately and efficiently utilizing the most cost-effective and widely used methods. In this research, we demonstrated that synthetically created high-quality ultrasound data outperformed conventional augmentation strategies for efficiently diagnosing breast cancer using deep learning. We trained a deep-learning model using the EfficientNet-B7 architecture and a large dataset of 3186 ultrasound images acquired from multiple publicly available sources, as well as 10,000 synthetically generated images using generative adversarial networks (StyleGAN3). The model was trained using five-fold cross-validation techniques and validated using four metrics: accuracy, recall, precision, and the F1 score measure. The results showed that integrating synthetically produced data into the training set increased the classification accuracy from 88.72% to 92.01% based on the F1 score, demonstrating the power of generative models to expand and improve the quality of training datasets in medical-imaging applications. This demonstrated that training the model using a larger set of data comprising synthetic images significantly improved its performance by more than 3% over the genuine dataset with common augmentation. Various data augmentation procedures were also investigated to improve the training set’s diversity and representativeness. This research emphasizes the relevance of using modern artificial intelligence and machine-learning technologies in medical imaging by providing an effective strategy for categorizing ultrasound images, which may lead to increased diagnostic accuracy and optimal treatment options. The proposed techniques are highly promising and have strong potential for future clinical application in the diagnosis of breast cancer. Full article
(This article belongs to the Special Issue Mathematical Methods in Machine Learning and Data Science)
Show Figures

Figure 1

14 pages, 1461 KiB  
Article
Prompt Optimization in Large Language Models
by Antonio Sabbatella, Andrea Ponti, Ilaria Giordani, Antonio Candelieri and Francesco Archetti
Mathematics 2024, 12(6), 929; https://doi.org/10.3390/math12060929 - 21 Mar 2024
Viewed by 4216
Abstract
Prompt optimization is a crucial task for improving the performance of large language models for downstream tasks. In this paper, a prompt is a sequence of n-grams selected from a vocabulary. Consequently, the aim is to select the optimal prompt concerning a certain [...] Read more.
Prompt optimization is a crucial task for improving the performance of large language models for downstream tasks. In this paper, a prompt is a sequence of n-grams selected from a vocabulary. Consequently, the aim is to select the optimal prompt concerning a certain performance metric. Prompt optimization can be considered as a combinatorial optimization problem, with the number of possible prompts (i.e., the combinatorial search space) given by the size of the vocabulary (i.e., all the possible n-grams) raised to the power of the length of the prompt. Exhaustive search is impractical; thus, an efficient search strategy is needed. We propose a Bayesian Optimization method performed over a continuous relaxation of the combinatorial search space. Bayesian Optimization is the dominant approach in black-box optimization for its sample efficiency, along with its modular structure and versatility. We use BoTorch, a library for Bayesian Optimization research built on top of PyTorch. Specifically, we focus on Hard Prompt Tuning, which directly searches for an optimal prompt to be added to the text input without requiring access to the Large Language Model, using it as a black-box (such as for GPT-4 which is available as a Model as a Service). Albeit preliminary and based on “vanilla” Bayesian Optimization algorithms, our experiments with RoBERTa as a large language model, on six benchmark datasets, show good performances when compared against other state-of-the-art black-box prompt optimization methods and enable an analysis of the trade-off between the size of the search space, accuracy, and wall-clock time. Full article
(This article belongs to the Special Issue Mathematical Methods in Machine Learning and Data Science)
Show Figures

Figure 1

Back to TopTop