Machine Learning and Knowledge Extraction

11 pages, 2641 KiB

Open AccessArticle

Generalization of Parameter Selection of SVM and LS-SVM for Regression

by Jiye Zeng, Zheng-Hong Tan, Tsuneo Matsunaga and Tomoko Shirai

Mach. Learn. Knowl. Extr. 2019, 1(2), 745-755; https://doi.org/10.3390/make1020043 - 19 Jun 2019

Cited by 9 | Viewed by 4246

A Support Vector Machine (SVM) for regression is a popular machine learning model that aims to solve nonlinear function approximation problems wherein explicit model equations are difficult to formulate. The performance of an SVM depends largely on the selection of its parameters. Choosing [...] Read more.

A Support Vector Machine (SVM) for regression is a popular machine learning model that aims to solve nonlinear function approximation problems wherein explicit model equations are difficult to formulate. The performance of an SVM depends largely on the selection of its parameters. Choosing between an SVM that solves an optimization problem with inequality constrains and one that solves the least square of errors (LS-SVM) adds to the complexity. Various methods have been proposed for tuning parameters, but no article puts the SVM and LS-SVM side by side to discuss the issue using a large dataset from the real world, which could be problematic for existing parameter tuning methods. We investigated both the SVM and LS-SVM with an artificial dataset and a dataset of more than 200,000 points used for the reconstruction of the global surface ocean CO₂ concentration. The results reveal that: (1) the two models are most sensitive to the parameter of the kernel function, which lies in a narrow range for scaled input data; (2) the optimal values of other parameters do not change much for different datasets; and (3) the LS-SVM performs better than the SVM in general. The LS-SVM is recommended, as it has less parameters to be tuned and yields a smaller bias. Nevertheless, the SVM has advantages of consuming less computer resources and taking less time to train. The results suggest initial parameter guesses for using the models. Full article

(This article belongs to the Section Learning)

► Show Figures

Figure 1

30 pages, 18220 KiB

Open AccessArticle

Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points

by Oliver Chikumbo and Vincent Granville

Mach. Learn. Knowl. Extr. 2019, 1(2), 715-744; https://doi.org/10.3390/make1020042 - 05 Jun 2019

Cited by 8 | Viewed by 5405

Abstract

The sensitivity of the elbow rule in determining an optimal number of clusters in high-dimensional spaces that are characterized by tightly distributed data points is demonstrated. The high-dimensional data samples are not artificially generated, but they are taken from a real world evolutionary [...] Read more.

The sensitivity of the elbow rule in determining an optimal number of clusters in high-dimensional spaces that are characterized by tightly distributed data points is demonstrated. The high-dimensional data samples are not artificially generated, but they are taken from a real world evolutionary many-objective optimization. They comprise of Pareto fronts from the last 10 generations of an evolutionary optimization computation with 14 objective functions. The choice for analyzing Pareto fronts is strategic, as it is squarely intended to benefit the user who only needs one solution to implement from the Pareto set, and therefore a systematic means of reducing the cardinality of solutions is imperative. As such, clustering the data and identifying the cluster from which to pick the desired solution is covered in this manuscript, highlighting the implementation of the elbow rule and the use of hyper-radial distances for cluster identity. The Calinski-Harabasz statistic was favored for determining the criteria used in the elbow rule because of its robustness. The statistic takes into account the variance within clusters and also the variance between the clusters. This exercise also opened an opportunity to revisit the justification of using the highest Calinski-Harabasz criterion for determining the optimal number of clusters for multivariate data. The elbow rule predicted the maximum end of the optimal number of clusters, and the highest Calinski-Harabasz criterion method favored the number of clusters at the lower end. Both results are used in a unique way for understanding high-dimensional data, despite being inconclusive regarding which of the two methods determine the true optimal number of clusters. Full article

(This article belongs to the Section Data)

► Show Figures

Graphical abstract

17 pages, 356 KiB

Open AccessArticle

Exploration Using Without-Replacement Sampling of Actions Is Sometimes Inferior

by Stephen W. Carden and S. Dalton Walker

Mach. Learn. Knowl. Extr. 2019, 1(2), 698-714; https://doi.org/10.3390/make1020041 - 24 May 2019

Cited by 2 | Viewed by 3307

Abstract

In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration [...] Read more.

In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with-replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority. Full article

(This article belongs to the Section Learning)

► Show Figures

Graphical abstract

14 pages, 600 KiB

Open AccessArticle

DOPSIE: Deep-Order Proximity and Structural Information Embedding

by Mario Manzo and Alessandro Rozza

Mach. Learn. Knowl. Extr. 2019, 1(2), 684-697; https://doi.org/10.3390/make1020040 - 24 May 2019

Cited by 4 | Viewed by 2416

Abstract

Graph-embedding algorithms map a graph into a vector space with the aim of preserving its structure and its intrinsic properties. Unfortunately, many of them are not able to encode the neighborhood information of the nodes well, especially from a topological prospective. To address [...] Read more.

Graph-embedding algorithms map a graph into a vector space with the aim of preserving its structure and its intrinsic properties. Unfortunately, many of them are not able to encode the neighborhood information of the nodes well, especially from a topological prospective. To address this limitation, we propose a novel graph-embedding method called Deep-Order Proximity and Structural Information Embedding (DOPSIE). It provides topology and depth information at the same time through the analysis of the graph structure. Topological information is provided through clustering coefficients (CCs), which is connected to other structural properties, such as transitivity, density, characteristic path length, and efficiency, useful for representation in the vector space. The combination of individual node properties and neighborhood information constitutes an optimal network representation. Our experimental results show that DOPSIE outperforms state-of-the-art embedding methodologies in different classification problems. Full article

(This article belongs to the Section Network)

► Show Figures

Figure 1

31 pages, 659 KiB

Open AccessReview

Large-Scale Simultaneous Inference with Hypothesis Testing: Multiple Testing Procedures in Practice

by Frank Emmert-Streib and Matthias Dehmer

Mach. Learn. Knowl. Extr. 2019, 1(2), 653-683; https://doi.org/10.3390/make1020039 - 15 May 2019

Cited by 7 | Viewed by 4995

Abstract

A statistical hypothesis test is one of the most eminent methods in statistics. Its pivotal role comes from the wide range of practical problems it can be applied to and the sparsity of data requirements. Being an unsupervised method makes it very flexible [...] Read more.

A statistical hypothesis test is one of the most eminent methods in statistics. Its pivotal role comes from the wide range of practical problems it can be applied to and the sparsity of data requirements. Being an unsupervised method makes it very flexible in adapting to real-world situations. The availability of high-dimensional data makes it necessary to apply such statistical hypothesis tests simultaneously to the test statistics of the underlying covariates. However, if applied without correction this leads to an inevitable increase in Type 1 errors. To counteract this effect, multiple testing procedures have been introduced to control various types of errors, most notably the Type 1 error. In this paper, we review modern multiple testing procedures for controlling either the family-wise error (FWER) or the false-discovery rate (FDR). We emphasize their principal approach allowing categorization of them as (1) single-step vs. stepwise approaches, (2) adaptive vs. non-adaptive approaches, and (3) marginal vs. joint multiple testing procedures. We place a particular focus on procedures that can deal with data with a (strong) correlation structure because real-world data are rarely uncorrelated. Furthermore, we also provide background information making the often technically intricate methods accessible for interdisciplinary data scientists. Full article

(This article belongs to the Section Learning)

► Show Figures

Figure 1

12 pages, 642 KiB

Open AccessArticle

Prediction by Empirical Similarity via Categorical Regressors

by Jeniffer Duarte Sanchez, Leandro C. Rêgo and Raydonal Ospina

Mach. Learn. Knowl. Extr. 2019, 1(2), 641-652; https://doi.org/10.3390/make1020038 - 15 May 2019

Cited by 5 | Viewed by 2719

Abstract

A quantifier of similarity is generally a type of score that assigns a numerical value to a pair of sequences based on their proximity. Similarity measures play an important role in prediction problems with many applications, such as statistical learning, data mining, biostatistics, [...] Read more.

A quantifier of similarity is generally a type of score that assigns a numerical value to a pair of sequences based on their proximity. Similarity measures play an important role in prediction problems with many applications, such as statistical learning, data mining, biostatistics, finance and others. Based on observed data, where a response variable of interest is assumed to be associated with some regressors, it is possible to make response predictions using a weighted average of observed response variables, where the weights depend on the similarity of the regressors. In this work, we propose a parametric regression model for continuous response based on empirical similarities for the case where the regressors are represented by categories. We apply the proposed method to predict tooth length growth in guinea pigs based on Vitamin C supplements considering three different dosage levels and two delivery methods. The inferential procedure is performed through maximum likelihood and least squares estimation under two types of similarity functions and two distance metrics. The empirical results show that the method yields accurate models with low dimension facilitating the parameters’ interpretation. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

11 pages, 588 KiB

Open AccessArticle

Enhancing Domain-Specific Supervised Natural Language Intent Classification with a Top-Down Selective Ensemble Model

by Gard B. Jenset and Barbara McGillivray

Mach. Learn. Knowl. Extr. 2019, 1(2), 630-640; https://doi.org/10.3390/make1020037 - 18 Apr 2019

Cited by 3 | Viewed by 4498

Abstract

Natural Language Understanding (NLU) systems are essential components in many industry conversational artificial intelligence applications. There are strong incentives to develop a good NLU capability in such systems, both to improve the user experience and in the case of regulated industries for compliance [...] Read more.

Natural Language Understanding (NLU) systems are essential components in many industry conversational artificial intelligence applications. There are strong incentives to develop a good NLU capability in such systems, both to improve the user experience and in the case of regulated industries for compliance reasons. We report on a series of experiments comparing the effects of optimizing word embeddings versus implementing a multi-classifier ensemble approach and conclude that in our case, only the latter approach leads to significant improvements. The study provides a high-level primer for developing NLU systems in regulated domains, as well as providing a specific baseline accuracy for evaluating NLU systems for financial guidance. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

19 pages, 1640 KiB

Open AccessArticle

Real-Time Vehicle Make and Model Recognition System

by Muhammad Asif Manzoor, Yasser Morgan and Abdul Bais

Mach. Learn. Knowl. Extr. 2019, 1(2), 611-629; https://doi.org/10.3390/make1020036 - 17 Apr 2019

Cited by 27 | Viewed by 5849

Abstract

A Vehicle Make and Model Recognition (VMMR) system can provide great value in terms of vehicle monitoring and identification based on vehicle appearance in addition to the vehicles’ attached license plate typical recognition. A real-time VMMR system is an important component of many [...] Read more.

A Vehicle Make and Model Recognition (VMMR) system can provide great value in terms of vehicle monitoring and identification based on vehicle appearance in addition to the vehicles’ attached license plate typical recognition. A real-time VMMR system is an important component of many applications such as automatic vehicle surveillance, traffic management, driver assistance systems, traffic behavior analysis, and traffic monitoring, etc. A VMMR system has a unique set of challenges and issues. Few of the challenges are image acquisition, variations in illuminations and weather, occlusions, shadows, reflections, large variety of vehicles, inter-class and intra-class similarities, addition/deletion of vehicles’ models over time, etc. In this work, we present a unique and robust real-time VMMR system which can handle the challenges described above and recognize vehicles with high accuracy. We extract image features from vehicle images and create feature vectors to represent the dataset. We use two classification algorithms, Random Forest (RF) and Support Vector Machine (SVM), in our work. We use a realistic dataset to test and evaluate the proposed VMMR system. The vehicles’ images in the dataset reflect real-world situations. The proposed VMMR system recognizes vehicles on the basis of make, model, and generation (manufacturing years) while the existing VMMR systems can only identify the make and model. Comparison with existing VMMR research demonstrates superior performance of the proposed system in terms of recognition accuracy and processing speed. Full article

(This article belongs to the Section Data)

► Show Figures

Figure 1

21 pages, 3799 KiB

Open AccessArticle

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

by Zohreh Akbari and Rainer Unland

Mach. Learn. Knowl. Extr. 2019, 1(2), 590-610; https://doi.org/10.3390/make1020035 - 16 Apr 2019

Cited by 4 | Viewed by 3026

Abstract

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be [...] Read more.

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs. Full article

(This article belongs to the Section Learning)

► Show Figures

Figure 1

15 pages, 811 KiB

Open AccessArticle

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

by Blaž Škrlj, Jan Kralj, Nada Lavrač and Senja Pollak

Mach. Learn. Knowl. Extr. 2019, 1(2), 575-589; https://doi.org/10.3390/make1020034 - 04 Apr 2019

Cited by 21 | Viewed by 4207

Abstract

Deep neural networks are becoming ubiquitous in text mining and natural language processing, but semantic resources, such as taxonomies and ontologies, are yet to be fully exploited in a deep learning setting. This paper presents an efficient semantic text mining approach, which converts [...] Read more.

Deep neural networks are becoming ubiquitous in text mining and natural language processing, but semantic resources, such as taxonomies and ontologies, are yet to be fully exploited in a deep learning setting. This paper presents an efficient semantic text mining approach, which converts semantic information related to a given set of documents into a set of novel features that are used for learning. The proposed Semantics-aware Recurrent deep Neural Architecture (SRNA) enables the system to learn simultaneously from the semantic vectors and from the raw text documents. We test the effectiveness of the approach on three text classification tasks: news topic categorization, sentiment analysis and gender profiling. The experiments show that the proposed approach outperforms the approach without semantic knowledge, with highest accuracy gain (up to 10%) achieved on short document fragments. Full article

(This article belongs to the Section Learning)

► Show Figures

Figure 1

23 pages, 7227 KiB

Open AccessArticle

Artificial Intelligence to Enhance Aerodynamic Shape Optimisation of the Aegis UAV

by Yousef Azabi, Al Savvaris and Timoleon Kipouros

Mach. Learn. Knowl. Extr. 2019, 1(2), 552-574; https://doi.org/10.3390/make1020033 - 04 Apr 2019

Cited by 6 | Viewed by 4661

Abstract

This article presents an optimisation framework that uses stochastic multi-objective optimisation, combined with an Artificial Neural Network (ANN), and describes its application to the aerodynamic design of aircraft shapes. The framework uses the Multi-Objective Particle Swarm Optimisation (MOPSO) algorithm and the obtained results [...] Read more.

This article presents an optimisation framework that uses stochastic multi-objective optimisation, combined with an Artificial Neural Network (ANN), and describes its application to the aerodynamic design of aircraft shapes. The framework uses the Multi-Objective Particle Swarm Optimisation (MOPSO) algorithm and the obtained results confirm that the proposed technique provides highly optimal solutions in less computational time than other approaches to the same design problem. The main idea was to focus computational effort on worthwhile design solutions rather than exploring and evaluating all possible solutions in the design space. It is shown that the number of valid solutions obtained using ANN-MOPSO compared to MOPSO for 3000 evaluations grew from 529 to 1006 (90% improvement) with a penalty of only 8.3% (11 min) in computational time. It is demonstrated that including an ANN, the ANN-MOPSO with 3000 evaluations produced a larger number of valid solutions than the MOPSO with 5500 evaluations, and in 33% less computational time (64 min). This is taken as confirmation of the potential power of ANNs when applied to this type of design problem. Full article

(This article belongs to the Section Network)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Mach. Learn. Knowl. Extr., Volume 1, Issue 2 (June 2019) – 11 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI