Applied Sciences

Editorial

Jump to: Research, Review

4 pages, 199 KiB

Open AccessEditorial

Machine Learning Methods with Noisy, Incomplete or Small Datasets

by Cesar F. Caiafa, Zhe Sun, Toshihisa Tanaka, Pere Marti-Puig and Jordi Solé-Casals

Appl. Sci. 2021, 11(9), 4132; https://doi.org/10.3390/app11094132 - 30 Apr 2021

Cited by 18 | Viewed by 3663

Abstract

In this article, we present a collection of fifteen novel contributions on machine learning methods with low-quality or imperfect datasets, which were accepted for publication in the special issue “Machine Learning Methods with Noisy, Incomplete or Small Datasets”, Applied Sciences (ISSN 2076-3417). These [...] Read more.

In this article, we present a collection of fifteen novel contributions on machine learning methods with low-quality or imperfect datasets, which were accepted for publication in the special issue “Machine Learning Methods with Noisy, Incomplete or Small Datasets”, Applied Sciences (ISSN 2076-3417). These papers provide a variety of novel approaches to real-world machine learning problems where available datasets suffer from imperfections such as missing values, noise or artefacts. Contributions in applied sciences include medical applications, epidemic management tools, methodological work, and industrial applications, among others. We believe that this special issue will bring new ideas for solving this challenging problem, and will provide clear examples of application in real-world scenarios. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

Research

Jump to: Editorial, Review

47 pages, 20585 KiB

Open AccessFeature PaperArticle

The INSESS-COVID19 Project. Evaluating the Impact of the COVID19 in Social Vulnerability While Preserving Privacy of Participants from Minority Subpopulations

by Karina Gibert and Xavier Angerri

Appl. Sci. 2021, 11(7), 3110; https://doi.org/10.3390/app11073110 - 31 Mar 2021

Cited by 5 | Viewed by 2914

Abstract

In this paper, the results of the project INSESS-COVID19 are presented, as part of a special call owing to help in the COVID19 crisis in Catalonia. The technological infrastructure and methodology developed in this project allows the quick screening of a territory for [...] Read more.

In this paper, the results of the project INSESS-COVID19 are presented, as part of a special call owing to help in the COVID19 crisis in Catalonia. The technological infrastructure and methodology developed in this project allows the quick screening of a territory for a quick a reliable diagnosis in front of an unexpected situation by providing relevant decisional information to support informed decision-making and strategy and policy design. One of the challenges of the project was to extract valuable information from direct participatory processes where specific target profiles of citizens are consulted and to distribute the participation along the whole territory. Having a lot of variables with a moderate number of citizens involved (in this case about 1000) implies the risk of violating statistical secrecy when multivariate relationships are analyzed, thus putting in risk the anonymity of the participants as well as their safety when vulnerable populations are involved, as is the case of INSESS-COVID19. In this paper, the entire data-driven methodology developed in the project is presented and the dealing of the small subgroups of population for statistical secrecy preserving described. The methodology is reusable with any other underlying questionnaire as the data science and reporting parts are totally automatized. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

20 pages, 3892 KiB

Open AccessArticle

Severity Classification of Parkinson’s Disease Based on Permutation-Variable Importance and Persistent Entropy

by Jigang Tong, Jiachen Zhang, Enzeng Dong and Shengzhi Du

Appl. Sci. 2021, 11(4), 1834; https://doi.org/10.3390/app11041834 - 19 Feb 2021

Cited by 17 | Viewed by 3765

Abstract

Parkinson’s disease (PD) is a neurodegenerative disease that causes chronic and progressive motor dysfunction. As PD progresses, patients show different symptoms at different stages of the disease. The severity assessment is inefficient and subjective when it comes to artificial diagnosis. However, abnormal gait [...] Read more.

Parkinson’s disease (PD) is a neurodegenerative disease that causes chronic and progressive motor dysfunction. As PD progresses, patients show different symptoms at different stages of the disease. The severity assessment is inefficient and subjective when it comes to artificial diagnosis. However, abnormal gait was contingent and the subject selection was limited. Therefore, few-shot learning based on small sample sets is critical to solving the problem of insufficient sample data in PD patients. Using datasets from PhysioNet, this paper presents a method based on permutation-variable importance (PVI) and persistent entropy of topological imprints, and uses support vector machine (SVM) as a classifier to achieve the severity classification of PD patients. The method includes the following steps: (1) Take the data as gait cycles, and calculate the gait characteristics of each cycle. (2) Use the random forest (RF) method to obtain the leading factors differentiating the gait of patients at different severity levels. (3) Use time-delay embedding to map the data into a topological space, and use the topological data analysis based on permutation homology to obtain the persistent entropy. (4) Use the Borderline-SMOTE (BSM) method to balance the sample data. (5) Use the SVM to classify the samples for the severity levels of PD. An accuracy of 98.08% was achieved by 10-fold cross-validation, so our method can be used as an effective means of computer-aided diagnosis of PD, and has important practical value. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

18 pages, 494 KiB

Open AccessArticle

Investigating Health-Related Features and Their Impact on the Prediction of Diabetes Using Machine Learning

by Hafiz Farooq Ahmad, Hamid Mukhtar, Hesham Alaqail, Mohamed Seliaman and Abdulaziz Alhumam

Appl. Sci. 2021, 11(3), 1173; https://doi.org/10.3390/app11031173 - 27 Jan 2021

Cited by 64 | Viewed by 6235

Abstract

Diabetes Mellitus (DM) is one of the most common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health [...] Read more.

Diabetes Mellitus (DM) is one of the most common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health Organization (WHO) considers Saudi Arabia as one of the top 10 countries in diabetes prevalence across the world. Since most of its medical services are provided by the government, the cost of the treatment in terms of hospitals and clinical visits and lab tests represents a real burden due to the large scale of the disease. The ability to predict the diabetic status of a patient with only a handful of features can allow cost-effective, rapid, and widely-available screening of diabetes, thereby lessening the health and economic burden caused by diabetes alone. The goal of this paper is to investigate the prediction of diabetic patients and compare the role of HbA1c and FPG as input features. By using five different machine learning classifiers, and using feature elimination through feature permutation and hierarchical clustering, we established good performance for accuracy, precision, recall, and F1-score of the models on the dataset implying that our data or features are not bound to specific models. In addition, the consistent performance across all the evaluation metrics indicate that there was no trade-off or penalty among the evaluation metrics. Further analysis was performed on the data to identify the risk factors and their indirect impact on diabetes classification. Our analysis presented great agreement with the risk factors of diabetes and prediabetes stated by the American Diabetes Association (ADA) and other health institutions worldwide. We conclude that by performing analysis of the disease using selected features, important factors specific to the Saudi population can be identified, whose management can result in controlling the disease. We also provide some recommendations learned from this research. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

20 pages, 3902 KiB

Open AccessArticle

Shadow Estimation for Ultrasound Images Using Auto-Encoding Structures and Synthetic Shadows

by Suguru Yasutomi, Tatsuya Arakaki, Ryu Matsuoka, Akira Sakai, Reina Komatsu, Kanto Shozu, Ai Dozen, Hidenori Machino, Ken Asada, Syuzo Kaneko, Akihiko Sekizawa, Ryuji Hamamoto and Masaaki Komatsu

Appl. Sci. 2021, 11(3), 1127; https://doi.org/10.3390/app11031127 - 26 Jan 2021

Cited by 24 | Viewed by 9560

Abstract

Acoustic shadows are common artifacts in medical ultrasound imaging. The shadows are caused by objects that reflect ultrasound such as bones, and they are shown as dark areas in ultrasound images. Detecting such shadows is crucial for assessing the quality of images. This [...] Read more.

Acoustic shadows are common artifacts in medical ultrasound imaging. The shadows are caused by objects that reflect ultrasound such as bones, and they are shown as dark areas in ultrasound images. Detecting such shadows is crucial for assessing the quality of images. This will be a pre-processing for further image processing or recognition aiming computer-aided diagnosis. In this paper, we propose an auto-encoding structure that estimates the shadowed areas and their intensities. The model once splits an input image into an estimated shadow image and an estimated shadow-free image through its encoder and decoder. Then, it combines them to reconstruct the input. By generating plausible synthetic shadows based on relatively coarse domain-specific knowledge on ultrasound images, we can train the model using unlabeled data. If pixel-level labels of the shadows are available, we also utilize them in a semi-supervised fashion. By experiments on ultrasound images for fetal heart diagnosis, we show that our method achieved 0.720 in the DICE score and outperformed conventional image processing methods and a segmentation method based on deep neural networks. The capability of the proposed method on estimating the intensities of shadows and the shadow-free images is also indicated through the experiments. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

16 pages, 769 KiB

Open AccessArticle

Data-Dependent Feature Extraction Method Based on Non-Negative Matrix Factorization for Weakly Supervised Domestic Sound Event Detection

by Seokjin Lee, Minhan Kim, Seunghyeon Shin, Sooyoung Park and Youngho Jeong

Appl. Sci. 2021, 11(3), 1040; https://doi.org/10.3390/app11031040 - 24 Jan 2021

Cited by 7 | Viewed by 2767

Abstract

In this paper, feature extraction methods are developed based on the non-negative matrix factorization (NMF) algorithm to be applied in weakly supervised sound event detection. Recently, the development of various features and systems have been attempted to tackle the problems of acoustic scene [...] Read more.

In this paper, feature extraction methods are developed based on the non-negative matrix factorization (NMF) algorithm to be applied in weakly supervised sound event detection. Recently, the development of various features and systems have been attempted to tackle the problems of acoustic scene classification and sound event detection. However, most of these systems use data-independent spectral features, e.g., Mel-spectrogram, log-Mel-spectrum, and gammatone filterbank. Some data-dependent feature extraction methods, including the NMF-based methods, recently demonstrated the potential to tackle the problems mentioned above for long-term acoustic signals. In this paper, we further develop the recently proposed NMF-based feature extraction method to enable its application in weakly supervised sound event detection. To achieve this goal, we develop a strategy for training the frequency basis matrix using a heterogeneous database consisting of strongly- and weakly-labeled data. Moreover, we develop a non-iterative version of the NMF-based feature extraction method so that the proposed feature extraction method can be applied as a part of the model structure similar to the modern “on-the-fly” transform method for the Mel-spectrogram. To detect the sound events, the temporal basis is calculated using the NMF method and then used as a feature for the mean-teacher-model-based classifier. The results are improved for the event-wise post-processing method. To evaluate the proposed system, simulations of the weakly supervised sound event detection were conducted using the Detection and Classification of Acoustic Scenes and Events 2020 Task 4 database. The results reveal that the proposed system has F1-score performance comparable with the Mel-spectrogram and gammatonegram and exhibits 3–5% better performance than the log-Mel-spectrum and constant-Q transform. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

16 pages, 1211 KiB

Open AccessArticle

Comparison of Dengue Predictive Models Developed Using Artificial Neural Network and Discriminant Analysis with Small Dataset

by Permatasari Silitonga, Alhadi Bustamam, Hengki Muradi, Wibowo Mangunwardoyo and Beti E. Dewi

Appl. Sci. 2021, 11(3), 943; https://doi.org/10.3390/app11030943 - 21 Jan 2021

Cited by 19 | Viewed by 3464

Abstract

In Indonesia, dengue has become one of the hyperendemic diseases. Dengue consists of three clinical phases—febrile phase, critical phase, and recovery phase. Many patients have died in the critical phase due to the lack of proper and timely treatment. Therefore, we developed models [...] Read more.

In Indonesia, dengue has become one of the hyperendemic diseases. Dengue consists of three clinical phases—febrile phase, critical phase, and recovery phase. Many patients have died in the critical phase due to the lack of proper and timely treatment. Therefore, we developed models that can predict the severity level of dengue based on the laboratory test results of the corresponding patients using Artificial Neural Network (ANN) and Discriminant Analysis (DA). In developing the models, we used a very small dataset. It is shown that ANN models developed using logistic and hyperbolic tangent activation function with 70% training data yielded the highest accuracy (90.91%), sensitivity (91.11%), and specificity (95.51%). This is the proposed model in this research. The proposed model will be able to help physicians in predicting the severity level of dengue patients before entering the critical phase. Furthermore, it will ease physicians in treating dengue patients early, so fatal cases or deaths can be avoided. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

17 pages, 2777 KiB

Open AccessArticle

Applying Knowledge Inference on Event-Conjunction for Automatic Control in Smart Building

by Hangli Ge, Xiaohui Peng and Noboru Koshizuka

Appl. Sci. 2021, 11(3), 935; https://doi.org/10.3390/app11030935 - 20 Jan 2021

Cited by 8 | Viewed by 2505

Abstract

Smart building, one of IoT-based emerging applications is where energy-efficiency, human comfort, automation, security could be managed even better. However, at the current stage, a unified and practical framework for knowledge inference inside the smart building is still lacking. In this paper, we [...] Read more.

Smart building, one of IoT-based emerging applications is where energy-efficiency, human comfort, automation, security could be managed even better. However, at the current stage, a unified and practical framework for knowledge inference inside the smart building is still lacking. In this paper, we present a practical proposal of knowledge extraction on event-conjunction for automatic control in smart buildings. The proposal consists of a unified API design, ontology model, inference engine for knowledge extraction. Two types of models: finite state machine(FSMs) and bayesian network (BN) have been used for capturing the state transition and sensor data fusion. In particular, to solve the problem that the size of time interval observations between two correlated events was too small to be approximated for estimation, we utilized the Markov Chain Monte Carlo (MCMC) sampling method to optimize the sampling on time intervals. The proposal has been put into use in a real smart building environment. 78-days data collection of the light states and elevator states has been conducted for evaluation. Several events have been inferred in the evaluation, such as room occupancy, elevator moving, as well as the event conjunction of both. The inference on the users’ waiting time of elevator-using revealed the potentials and effectiveness of the automatic control on the elevator. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

18 pages, 672 KiB

Open AccessArticle

Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology

by Despoina Mouratidis, Katia Lida Kermanidis and Vilelmini Sosoni

Appl. Sci. 2021, 11(2), 639; https://doi.org/10.3390/app11020639 - 11 Jan 2021

Cited by 2 | Viewed by 3031

Abstract

Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing [...] Read more.

Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing (NLP) metrics and embeddings), by using a model for machine learning based on noisy and small datasets. The linguistic features are string based for the language pairs English (EN)–Greek (EL) and EN–Italian (IT). The paper also explores the linguistic differences that affect evaluation accuracy between different kinds of corpora. A comparative study between using a simple embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative linguistic analysis has been carried out in order to address complex linguistic phenomena. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

18 pages, 443 KiB

Open AccessArticle

An Effective Multi-Label Feature Selection Model Towards Eliminating Noisy Features

by Jun Wang, Yuanyuan Xu, Hengpeng Xu, Zhe Sun, Zhenglu Yang and Jinmao Wei

Appl. Sci. 2020, 10(22), 8093; https://doi.org/10.3390/app10228093 - 15 Nov 2020

Cited by 2 | Viewed by 2106

Abstract

Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, [...] Read more.

Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

17 pages, 15864 KiB

Open AccessArticle

Convolution-GRU Based on Independent Component Analysis for fMRI Analysis with Small and Imbalanced Samples

by Shan Wang, Feng Duan and Mingxin Zhang

Appl. Sci. 2020, 10(21), 7465; https://doi.org/10.3390/app10217465 - 23 Oct 2020

Cited by 8 | Viewed by 3473

Abstract

Functional magnetic resonance imaging (fMRI) is a commonly used method of brain research. However, due to the complexity and particularity of the fMRI task, it is difficult to find enough subjects, resulting in a small and, often, imbalanced dataset. A dataset with small [...] Read more.

Functional magnetic resonance imaging (fMRI) is a commonly used method of brain research. However, due to the complexity and particularity of the fMRI task, it is difficult to find enough subjects, resulting in a small and, often, imbalanced dataset. A dataset with small samples causes overfitting of the learning model, and the imbalance will make the model insensitive to the minority class, which has been a problem in classification. It is of great significance to classify fMRI data with small and imbalanced samples. In the present study, we propose a 3-step method on a small and imbalanced fMRI dataset from a word-scene memory task. The steps of the method are as follows: (1) An independent component analysis is performed to reduce the dimension of data; (2) The synthetic minority oversampling technique is used to generate new samples of the minority class to balance data; (3) A convolution-Gated Recurrent Unit (GRU) network is used to classify the independent component signals, indicating whether the subjects are performing episodic memory tasks. The accuracy of the proposed method is 72.2%, which improves the classification performance compared with traditional classifiers such as support vector machines (SVM), logistic regression (LGR), linear discriminant analysis (LDA) and k-nearest neighbor (KNN), and this study gives a biomarker for evaluating the reactivation of episodic memory. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

13 pages, 7061 KiB

Open AccessArticle

Multifrequency Impedance Method Based on Neural Network for Root Canal Length Measurement

by Xiaoyue Qiao, Zheng Zhang and Xin Chen

Appl. Sci. 2020, 10(21), 7430; https://doi.org/10.3390/app10217430 - 22 Oct 2020

Cited by 9 | Viewed by 3133

Abstract

Root canal therapy is the most fundamental and effective approach for treating endodontics and periapicalitis. The length of the root canal must be accurately measured to clean the pathogenic substances in it. This study aims to present a multifrequency impedance method based on [...] Read more.

Root canal therapy is the most fundamental and effective approach for treating endodontics and periapicalitis. The length of the root canal must be accurately measured to clean the pathogenic substances in it. This study aims to present a multifrequency impedance method based on a neural network for root canal length measurement. A circuit system was designed which generates a current of frequencies from 100 Hz to 20 kHz in order to augment the data of impedance ratios with different combinations of frequencies. Several impedance ratios and other quantified characteristics, such as the type of tooth and file, were selected as features to train a neural network model that could predict the distance between the file and apical foramen. The model uses leave-one-out cross-validation, adopts the Adam optimizer and regularization, and has two hidden layers with nine and five nodes, respectively. The neural network-based multifrequency impedance method exhibits nearly 95% accuracy, compared with the dual-frequency impedance ratio method (which demonstrated no more than 85% accuracy in some situations). This method may eliminate the influence of human and environmental factors on measurement of the root canal length, thereby increasing measurement robustness. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

14 pages, 650 KiB

Open AccessArticle

Training Set Enlargement Using Binary Weighted Interpolation Maps for the Single Sample per Person Problem in Face Recognition

by Yonggeol Lee and Sang-Il Choi

Appl. Sci. 2020, 10(19), 6659; https://doi.org/10.3390/app10196659 - 23 Sep 2020

Cited by 1 | Viewed by 2286

Abstract

We propose a method of enlarging the training dataset for a single-sample-per-person (SSPP) face recognition problem. The appearance of the human face varies greatly, owing to various intrinsic and extrinsic factors. In order to build a face recognition system that can operate robustly [...] Read more.

We propose a method of enlarging the training dataset for a single-sample-per-person (SSPP) face recognition problem. The appearance of the human face varies greatly, owing to various intrinsic and extrinsic factors. In order to build a face recognition system that can operate robustly in an uncontrolled, real environment, it is necessary for the algorithm to learn various images of the same person. However, owing to limitations in the collection of facial image data, only one sample can typically be obtained, causing difficulties in the performance and usability of the method. This paper proposes a method that analyzes the changes in pixels in face images associated with variations by extracting the binary weighted interpolation map (B-WIM) from neutral and variational images in the auxiliary set. Then, a new variational image for the query image is created by combining the given query (neutral) image and the variational image of the auxiliary set based on the B-WIM. As a result of performing facial recognition comparison experiments on SSPP training data for various facial-image databases, the proposed method shows superior performance compared with other methods. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

18 pages, 993 KiB

Open AccessArticle

Learning Optimal Time Series Combination and Pre-Processing by Smart Joins

by Amaia Gil, Marco Quartulli, Igor G. Olaizola and Basilio Sierra

Appl. Sci. 2020, 10(18), 6346; https://doi.org/10.3390/app10186346 - 11 Sep 2020

Cited by 3 | Viewed by 2370

Abstract

In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. [...] Read more.

In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

23 pages, 11542 KiB

Open AccessFeature PaperArticle

Automatic Classification of Morphologically Similar Fish Species Using Their Head Contours

by Pere Marti-Puig, Amalia Manjabacas and Antoni Lombarte

Appl. Sci. 2020, 10(10), 3408; https://doi.org/10.3390/app10103408 - 14 May 2020

Cited by 6 | Viewed by 2965

Abstract

This work deals with the task of distinguishing between different Mediterranean demersal species of fish that share a remarkably similar form and that are also used for the evaluation of marine resources. The experts who are currently able to classify these types of [...] Read more.

This work deals with the task of distinguishing between different Mediterranean demersal species of fish that share a remarkably similar form and that are also used for the evaluation of marine resources. The experts who are currently able to classify these types of species do so by considering only a segment of the contour of the fish, specifically its head, instead of using the entire silhouette of the animal. Based on this knowledge, a set of features to classify contour segments is presented to address both a binary and a multi-class classification problem. In addition to the difficulty present in successfully discriminating between very similar forms, we have the limitation of having small, unreliably labeled image data sets. The results obtained were comparable to those obtained by trained experts. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

Review

Jump to: Editorial, Research

20 pages, 1017 KiB

Open AccessReview

Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets

by Cesar Federico Caiafa, Jordi Solé-Casals, Pere Marti-Puig, Sun Zhe and Toshihisa Tanaka

Appl. Sci. 2020, 10(23), 8481; https://doi.org/10.3390/app10238481 - 27 Nov 2020

Cited by 19 | Viewed by 7841

Abstract

In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct [...] Read more.

In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets. Full article

(This article belongs to the Special Issue Machine Learning Methods with Noisy, Incomplete or Small Datasets)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Machine Learning Methods with Noisy, Incomplete or Small Datasets

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (16 papers)

Editorial

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI