Emerging Trends and Challenges in Supervised Learning Tasks

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (20 July 2021) | Viewed by 22097

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy
Interests: data mining and machine learning; high-dimensional data analysis; feature selection
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The adoption of data mining and machine learning methods has grown exponentially in recent years, with an ever-increasing number of reported applications. Despite the remarkable and rapid progress in this field, the complexity of real-world data poses significant challenges for both researchers and practitioners. In the context of supervised learning tasks, the quality of the models built from real data may strongly depend on several factors, such as data dimensionality, the number of patterns available for training, the number of problem classes, the level of class imbalance, and the variability of the concepts in time. Further, we often encounter datasets that are affected by data quality problems, such as incompleteness or noise. While new and more sophisticated learning approaches are constantly being explored, many questions remain unanswered about their large-scale applicability and utility in real-world scenarios. The aim of this Special Issue is to bring together contributions that discuss problems and solutions in this area, especially from an application-oriented perspective, with a main emphasis on advanced supervised methods for learning and gaining knowledge from complex data.

Topics of interest include but are not limited to:

- Data pre-processing for supervised learning tasks;

- Dimensionality reduction and feature selection techniques;

- Learning from high-dimensional data;

- Learning from imbalanced data;

- Learning from data streams and IoT data;

- Learning in the presence of concept drift;

- Data quality issues in supervised learning;

- Noise robustness of learning algorithms;

- Issues in model evaluation and selection;

- Cost-sensitive learning;

- Ensemble learning;

- Deep learning;

- Case studies and real-world applications.

Dr. Barbara Pes
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data mining
  • Machine learning
  • Supervised methods
  • Knowledge discovery
  • Real-world applications

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 170 KiB  
Editorial
Special Issue on Emerging Trends and Challenges in Supervised Learning Tasks
by Barbara Pes
Information 2021, 12(11), 481; https://doi.org/10.3390/info12110481 - 19 Nov 2021
Viewed by 1385
Abstract
With the massive growth of data-intensive applications, the machine learning field has gained widespread popularity [...] Full article
(This article belongs to the Special Issue Emerging Trends and Challenges in Supervised Learning Tasks)

Research

Jump to: Editorial

26 pages, 4809 KiB  
Article
VERONICA: Visual Analytics for Identifying Feature Groups in Disease Classification
by Neda Rostamzadeh, Sheikh S. Abdullah, Kamran Sedig, Amit X. Garg and Eric McArthur
Information 2021, 12(9), 344; https://doi.org/10.3390/info12090344 - 26 Aug 2021
Cited by 7 | Viewed by 2934
Abstract
The use of data analysis techniques in electronic health records (EHRs) offers great promise in improving predictive risk modeling. Although useful, these analysis techniques often suffer from a lack of interpretability and transparency, especially when the data is high-dimensional. The emergence of a [...] Read more.
The use of data analysis techniques in electronic health records (EHRs) offers great promise in improving predictive risk modeling. Although useful, these analysis techniques often suffer from a lack of interpretability and transparency, especially when the data is high-dimensional. The emergence of a type of computational system known as visual analytics has the potential to address these issues by integrating data analysis techniques with interactive visualizations. This paper introduces a visual analytics system called VERONICA that utilizes the natural classification of features in EHRs to identify the group of features with the strongest predictive power. VERONICA incorporates a representative set of supervised machine learning techniques—namely, classification and regression tree, C5.0, random forest, support vector machines, and naive Bayes to support users in developing predictive models using EHRs. It then makes the analytics results accessible through an interactive visual interface. By integrating different sampling strategies, analytics algorithms, visualization techniques, and human-data interaction, VERONICA assists users in comparing prediction models in a systematic way. To demonstrate the usefulness and utility of our proposed system, we use the clinical dataset stored at ICES to identify the best representative feature groups in detecting patients who are at high risk of developing acute kidney injury. Full article
(This article belongs to the Special Issue Emerging Trends and Challenges in Supervised Learning Tasks)
Show Figures

Figure 1

16 pages, 1041 KiB  
Article
Learning from High-Dimensional and Class-Imbalanced Datasets Using Random Forests
by Barbara Pes
Information 2021, 12(8), 286; https://doi.org/10.3390/info12080286 - 21 Jul 2021
Cited by 15 | Viewed by 3369
Abstract
Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, [...] Read more.
Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone. Full article
(This article belongs to the Special Issue Emerging Trends and Challenges in Supervised Learning Tasks)
Show Figures

Figure 1

16 pages, 893 KiB  
Article
Individualism or Collectivism: A Reinforcement Learning Mechanism for Vaccination Decisions
by Chaohao Wu, Tong Qiao, Hongjun Qiu, Benyun Shi and Qing Bao
Information 2021, 12(2), 66; https://doi.org/10.3390/info12020066 - 04 Feb 2021
Cited by 6 | Viewed by 2729
Abstract
Previous studies have pointed out that it is hard to achieve the level of herd immunity for the population and then effectively stop disease propagation from the perspective of public health, if individuals just make vaccination decisions based on individualism. Individuals in reality [...] Read more.
Previous studies have pointed out that it is hard to achieve the level of herd immunity for the population and then effectively stop disease propagation from the perspective of public health, if individuals just make vaccination decisions based on individualism. Individuals in reality often exist in the form of groups and cooperate in or among communities. Meanwhile, society studies have suggested that we cannot ignore the existence and influence of collectivism for studying individuals’ decision-making. Regarding this, we formulate two vaccination strategies: individualistic strategy and collectivist strategy. The former helps individuals taking vaccination action after evaluating their perceived risk and cost of themselves, while the latter focuses on evaluating their contribution to their communities. More significantly, we propose a reinforcement learning mechanism based on policy gradient. Each individual can adaptively pick one of these two strategies after weighing their probabilities with a two-layer neural network whose parameters are dynamically updated with his/her more and more vaccination experience. Experimental results on scale-free networks verify that the reinforcement learning mechanism can effectively improve the vaccine coverage level of communities. Moreover, communities can always get higher total payoffs with fewer costs paid, comparing that of pure individualistic strategy. Such performance mostly stems from individuals’ adaptively picking collectivist strategy. Our study suggests that public health authorities should encourage individuals to make vaccination decisions from the perspective of their local mixed groups. Especially, it is more worthy of noting that individuals with low degrees are more significant as their vaccination behaviors can more sharply improve vaccination coverage of their groups and greatly reduce epidemic size. Full article
(This article belongs to the Special Issue Emerging Trends and Challenges in Supervised Learning Tasks)
Show Figures

Figure 1

27 pages, 1839 KiB  
Article
Coming to Grips with Age Prediction on Imbalanced Multimodal Community Question Answering Data
by Alejandro Figueroa, Billy Peralta and Orietta Nicolis
Information 2021, 12(2), 48; https://doi.org/10.3390/info12020048 - 21 Jan 2021
Cited by 15 | Viewed by 1935
Abstract
For almost every online service, it is fundamental to understand patterns, differences and trends revealed by age demographic analysis—for example, take the discovery of malicious activity, including identity theft, violation of community guidelines and fake profiles. In the particular case of platforms such [...] Read more.
For almost every online service, it is fundamental to understand patterns, differences and trends revealed by age demographic analysis—for example, take the discovery of malicious activity, including identity theft, violation of community guidelines and fake profiles. In the particular case of platforms such as Facebook, Twitter and Yahoo! Answers, user demographics have impacts on their revenues and user experience; demographics assist in ensuring that the needs of each cohort are fulfilled via personalizing and contextualizing content. Despite the fact that technology has been made more accessible, thereby becoming evermore prevalent in both personal and professional lives alike, older people continue to trail Gen Z and Millennials in its adoption. This trailing brings about an under-representation that has a harmful influence on the demographic analysis and on supervised machine learning models. To that end, this paper pioneers attempts at examining this and other major challenges facing three distinct modalities when dealing with community question answering (cQA) platforms (i.e., texts, images and metadata). As for textual inputs, we propose an age-batched greedy curriculum learning (AGCL) approach to lessen the effects of their inherent class imbalances. When built on top of FastText shallow neural networks, AGCL achieved an increase of ca. 4% in macro-F1-score with respect to baseline systems (i.e., off-the-shelf deep neural networks). With regard to metadata, our experiments show that random forest classifiers significantly improve their performance when individuals close to generational borders are excluded (up to 20% more accuracy); and by experimenting with neural network-based visual classifiers, we discovered that images are the most challenging modality for age prediction. In fact, it is hard for a visual inspection to connect profile pictures with age cohorts, and there are considerable differences in their group distributions with respect to meta-data and textual inputs. All in all, we envisage that our findings will be highly relevant as guidelines for constructing assorted multimodal supervised models for automatic age recognition across cQA platforms. Full article
(This article belongs to the Special Issue Emerging Trends and Challenges in Supervised Learning Tasks)
Show Figures

Figure 1

17 pages, 611 KiB  
Article
Popularity Prediction of Instagram Posts
by Salvatore Carta, Alessandro Sebastian Podda, Diego Reforgiato Recupero, Roberto Saia and Giovanni Usai
Information 2020, 11(9), 453; https://doi.org/10.3390/info11090453 - 18 Sep 2020
Cited by 27 | Viewed by 8194
Abstract
Predicting the popularity of posts on social networks has taken on significant importance in recent years, and several social media management tools now offer solutions to improve and optimize the quality of published content and to enhance the attractiveness of companies and organizations. [...] Read more.
Predicting the popularity of posts on social networks has taken on significant importance in recent years, and several social media management tools now offer solutions to improve and optimize the quality of published content and to enhance the attractiveness of companies and organizations. Scientific research has recently moved in this direction, with the aim of exploiting advanced techniques such as machine learning, deep learning, natural language processing, etc., to support such tools. In light of the above, in this work we aim to address the challenge of predicting the popularity of a future post on Instagram, by defining the problem as a classification task and by proposing an original approach based on Gradient Boosting and feature engineering, which led us to promising experimental results. The proposed approach exploits big data technologies for scalability and efficiency, and it is general enough to be applied to other social media as well. Full article
(This article belongs to the Special Issue Emerging Trends and Challenges in Supervised Learning Tasks)
Show Figures

Figure 1

Back to TopTop