Machine Learning on Scientific Data and Information

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: closed (31 December 2020) | Viewed by 126811

Special Issue Editor


E-Mail Website
Guest Editor
Departament of Computer Science, City University of Hong Kong, Hong Kong, China
Interests: bioinformatics; data science; machine learning; deep learning; medical informatics; cancer genomics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, we have witnessed the explosive growth of high-throughput scientific data in different disciplines, such as bioinformatics and computational biology. Nonetheless, traditional algorithms can suffer from data scalability, noises, and curse of dimensionality. To address these issues together, new scalable machine learning algorithms have to be developed.

Therefore, we have initiated such a Special Issue in the hope that researchers will work together to alleviate and transform these challenges into opportunities for scientific advancement by proposing different kinds of machine learning algorithms.

Dr. Ka-Chun Wong
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Machine Learning
  • Data Science
  • Bioinformatics
  • Computational Biology

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

21 pages, 5303 KiB  
Article
Evaluation of Tree-Based Ensemble Machine Learning Models in Predicting Stock Price Direction of Movement
by Ernest Kwame Ampomah, Zhiguang Qin and Gabriel Nyame
Information 2020, 11(6), 332; https://doi.org/10.3390/info11060332 - 20 Jun 2020
Cited by 96 | Viewed by 9716
Abstract
Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. [...] Read more.
Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. Machine learning (ML) models typically perform better than statistical and econometric models. Also, ensemble ML models have been shown in the literature to be able to produce superior performance than single ML models. In this work, we compare the effectiveness of tree-based ensemble ML models (Random Forest (RF), XGBoost Classifier (XG), Bagging Classifier (BC), AdaBoost Classifier (Ada), Extra Trees Classifier (ET), and Voting Classifier (VC)) in forecasting the direction of stock price movement. Eight different stock data from three stock exchanges (NYSE, NASDAQ, and NSE) are randomly collected and used for the study. Each data set is split into training and test set. Ten-fold cross validation accuracy is used to evaluate the ML models on the training set. In addition, the ML models are evaluated on the test set using accuracy, precision, recall, F1-score, specificity, and area under receiver operating characteristics curve (AUC-ROC). Kendall W test of concordance is used to rank the performance of the tree-based ML algorithms. For the training set, the AdaBoost model performed better than the rest of the models. For the test set, accuracy, precision, F1-score, and AUC metrics generated results significant to rank the models, and the Extra Trees classifier outperformed the other models in all the rankings. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

14 pages, 4710 KiB  
Article
Adversarial Hard Attention Adaptation
by Hui Tao, Jun He, Quanjie Cao and Lei Zhang
Information 2020, 11(4), 224; https://doi.org/10.3390/info11040224 - 18 Apr 2020
Viewed by 2470
Abstract
Domain adaptation is critical to transfer the invaluable source domain knowledge to the target domain. In this paper, for a particular visual attention model, saying hard attention, we consider to adapt the learned hard attention to the unlabeled target domain. To tackle this [...] Read more.
Domain adaptation is critical to transfer the invaluable source domain knowledge to the target domain. In this paper, for a particular visual attention model, saying hard attention, we consider to adapt the learned hard attention to the unlabeled target domain. To tackle this kind of hard attention adaptation, a novel adversarial reward strategy is proposed to train the policy of the target domain agent. In this adversarial training framework, the target domain agent competes with the discriminator which takes the attention features generated from the both domain agents as input and tries its best to distinguish them, and thus the target domain policy is learned to align the local attention feature to its source domain counterpart. We evaluated our model on the benchmarks of the cross-domain tasks, such as the centered digits datasets and the enlarged non-centered digits datasets. The experimental results show that our model outperforms the ADDA and other existing methods. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

13 pages, 4062 KiB  
Article
Short-Term Solar Irradiance Forecasting Based on a Hybrid Deep Learning Methodology
by Ke Yan, Hengle Shen, Lei Wang, Huiming Zhou, Meiling Xu and Yuchang Mo
Information 2020, 11(1), 32; https://doi.org/10.3390/info11010032 - 6 Jan 2020
Cited by 53 | Viewed by 4330
Abstract
Accurate prediction of solar irradiance is beneficial in reducing energy waste associated with photovoltaic power plants, preventing system damage caused by the severe fluctuation of solar irradiance, and stationarizing the power output integration between different power grids. Considering the randomness and multiple dimension [...] Read more.
Accurate prediction of solar irradiance is beneficial in reducing energy waste associated with photovoltaic power plants, preventing system damage caused by the severe fluctuation of solar irradiance, and stationarizing the power output integration between different power grids. Considering the randomness and multiple dimension of weather data, a hybrid deep learning model that combines a gated recurrent unit (GRU) neural network and an attention mechanism is proposed forecasting the solar irradiance changes in four different seasons. In the first step, the Inception neural network and ResNet are designed to extract features from the original dataset. Secondly, the extracted features are inputted into the recurrent neural network (RNN) network for model training. Experimental results show that the proposed hybrid deep learning model accurately predicts solar irradiance changes in a short-term manner. In addition, the forecasting performance of the model is better than traditional deep learning models (such as long short term memory and GRU). Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

23 pages, 20960 KiB  
Article
Weakly Supervised Learning for Evaluating Road Surface Condition from Wheelchair Driving Data
by Takumi Watanabe, Hiroki Takahashi, Yusuke Iwasawa, Yutaka Matsuo and Ikuko Eguchi Yairi
Information 2020, 11(1), 2; https://doi.org/10.3390/info11010002 - 19 Dec 2019
Cited by 5 | Viewed by 4325
Abstract
Providing accessibility information about sidewalks for people with difficulties with moving is an important social issue. We previously proposed a fully supervised machine learning approach for providing accessibility information by estimating road surface conditions using wheelchair accelerometer data with manually annotated road surface [...] Read more.
Providing accessibility information about sidewalks for people with difficulties with moving is an important social issue. We previously proposed a fully supervised machine learning approach for providing accessibility information by estimating road surface conditions using wheelchair accelerometer data with manually annotated road surface condition labels. However, manually annotating road surface condition labels is expensive and impractical for extensive data. This paper proposes and evaluates a novel method for estimating road surface conditions without human annotation by applying weakly supervised learning. The proposed method only relies on positional information while driving for weak supervision to learn road surface conditions. Our results demonstrate that the proposed method learns detailed and subtle features of road surface conditions, such as the difference in ascending and descending of a slope, the angle of slopes, the exact locations of curbs, and the slight differences of similar pavements. The results demonstrate that the proposed method learns feature representations that are discriminative for a road surface classification task. When the amount of labeled data is 10% or less in a semi-supervised setting, the proposed method outperforms a fully supervised method that uses manually annotated labels to learn feature representations of road surface conditions. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

33 pages, 8560 KiB  
Article
Semantic Information G Theory and Logical Bayesian Inference for Machine Learning
by Chenguang Lu
Information 2019, 10(8), 261; https://doi.org/10.3390/info10080261 - 16 Aug 2019
Cited by 8 | Viewed by 6195
Abstract
An important problem in machine learning is that, when using more than two labels, it is very difficult to construct and optimize a group of learning functions that are still useful when the prior distribution of instances is changed. To resolve this problem, [...] Read more.
An important problem in machine learning is that, when using more than two labels, it is very difficult to construct and optimize a group of learning functions that are still useful when the prior distribution of instances is changed. To resolve this problem, semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms are combined to form a systematic solution. A semantic channel in G theory consists of a group of truth functions or membership functions. In comparison with the likelihood functions, Bayesian posteriors, and Logistic functions that are typically used in popular methods, membership functions are more convenient to use, providing learning functions that do not suffer the above problem. In Logical Bayesian Inference (LBI), every label is independently learned. For multilabel learning, we can directly obtain a group of optimized membership functions from a large enough sample with labels, without preparing different samples for different labels. Furthermore, a group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions in a two-dimensional feature space, only 2–3 iterations are required for the mutual information between three classes and three labels to surpass 99% of the MMI for most initial partitions. For mixture models, the Expectation-Maximization (EM) algorithm is improved to form the CM-EM algorithm, which can outperform the EM algorithm when the mixture ratios are imbalanced, or when local convergence exists. The CM iteration algorithm needs to combine with neural networks for MMI classification in high-dimensional feature spaces. LBI needs further investigation for the unification of statistics and logic. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

14 pages, 4315 KiB  
Article
Multi-Regional Online Car-Hailing Order Quantity Forecasting Based on the Convolutional Neural Network
by Zihao Huang, Gang Huang, Zhijun Chen, Chaozhong Wu, Xiaofeng Ma and Haobo Wang
Information 2019, 10(6), 193; https://doi.org/10.3390/info10060193 - 4 Jun 2019
Cited by 9 | Viewed by 4040
Abstract
With the development of online cars, the demand for travel prediction is increasing in order to reduce the information asymmetry between passengers and drivers of online car-hailing. This paper proposes a travel demand forecasting model named OC-CNN based on the convolutional neural network [...] Read more.
With the development of online cars, the demand for travel prediction is increasing in order to reduce the information asymmetry between passengers and drivers of online car-hailing. This paper proposes a travel demand forecasting model named OC-CNN based on the convolutional neural network to forecast the travel demand. In order to make full use of the spatial characteristics of the travel demand distribution, this paper meshes the prediction area and creates a travel demand data set of the graphical structure to preserve its spatial properties. Taking advantage of the convolutional neural network in image feature extraction, the historical demand data of the first twenty-five minutes of the entire region are used as a model input to predict the travel demand for the next five minutes. In order to verify the performance of the proposed method, one-month data from online car-hailing of the Chengdu Fourth Ring Road are used. The results show that the model successfully extracts the spatiotemporal features of the data, and the prediction accuracies of the proposed method are superior to those of the representative methods, including the Bayesian Ridge Model, Linear Regression, Support Vector Regression, and Long Short-Term Memory networks. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

17 pages, 2103 KiB  
Article
Visual Analysis Scenarios for Understanding Evolutionary Computational Techniques’ Behavior
by Aruanda Meiguins, Yuri Santos, Diego Santos, Bianchi Meiguins and Jefferson Morais
Information 2019, 10(3), 88; https://doi.org/10.3390/info10030088 - 28 Feb 2019
Cited by 1 | Viewed by 4099
Abstract
Machine learning algorithms are used in many applications nowadays. Sometimes, we need to describe how the decision models created output, and this may not be an easy task. Information visualization (InfoVis) techniques (e.g., TreeMap, parallel coordinates, etc.) can be used for creating scenarios [...] Read more.
Machine learning algorithms are used in many applications nowadays. Sometimes, we need to describe how the decision models created output, and this may not be an easy task. Information visualization (InfoVis) techniques (e.g., TreeMap, parallel coordinates, etc.) can be used for creating scenarios that visually describe the behavior of those models. Thus, InfoVis scenarios were used to analyze the evolutionary process of a tool named AutoClustering, which generates density-based clustering algorithms automatically for a given dataset using the EDA (estimation-of-distribution algorithm) evolutionary technique. Some scenarios were about fitness and population evolution (clustering algorithms) over time, algorithm parameters, the occurrence of the individual, and others. The analysis of those scenarios could lead to the development of better parameters for the AutoClustering tool and algorithms and thus have a direct impact on the processing time and quality of the generated algorithms. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

16 pages, 6804 KiB  
Article
An Effective Feature Segmentation Algorithm for a Hyper-Spectral Facial Image
by Yuefeng Zhao, Mengmeng Wu, Liren Zhang, Jingjing Wang and Dongmei Wei
Information 2018, 9(10), 261; https://doi.org/10.3390/info9100261 - 22 Oct 2018
Cited by 1 | Viewed by 2693
Abstract
The human face as a biometric trait has been widely used for personal identity verification but it is still a challenging task under uncontrolled conditions. With the development of hyper-spectral imaging acquisition technology, spectral properties with sufficient discriminative information bring new opportunities for [...] Read more.
The human face as a biometric trait has been widely used for personal identity verification but it is still a challenging task under uncontrolled conditions. With the development of hyper-spectral imaging acquisition technology, spectral properties with sufficient discriminative information bring new opportunities for a facial image process. This paper presents a novel ensemble method for skin feature segmentation of a hyper-spectral facial image based on a k-means algorithm and a spanning forest algorithm, which exploit both spectral and spatial discriminative features. According to the closed skin area, local features are selected for further facial image analysis. We present the experimental results of the proposed algorithm on various public face databases which achieve higher segmentation rates. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

Review

Jump to: Research

68 pages, 7541 KiB  
Review
Text Classification Algorithms: A Survey
by Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes and Donald Brown
Information 2019, 10(4), 150; https://doi.org/10.3390/info10040150 - 23 Apr 2019
Cited by 952 | Viewed by 87402
Abstract
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing [...] Read more.
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in real-world problems are discussed. Full article
(This article belongs to the Special Issue Machine Learning on Scientific Data and Information)
Show Figures

Figure 1

Back to TopTop