Big Data Analytics: Correspondence Factor Analysis, Clustering and Classification Algorithms and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 6725

Special Issue Editors


E-Mail Website
Guest Editor
Department of Early Childhood Education, Faculty of Education, University of Western Macedonia, Koila, 50100 Kozani, Greece
Interests: qualitative and quantitative methods in social sciences; applied statistics; implicative statistical analysis; multivariate statistical analysis; biostatistics; meta-analysis; structural equation models; big data; big data applications
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Economics and Business, Neapolis University, Pafos, Cyprus
Interests: digital marketing and communication; simulation
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Economics and Business, Neapolis University Pafos, Pafos 8042, Cyprus
Interests: HRM; entrepreneurship; corporate governance; leadership
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Applied Informatics, University of Macedonia, 54636 Thessaloniki, Greece
Interests: cloud computing; parallel and distributed computing; parallel algorithms; grid computing; digital design; computer architecture
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Production and Management Engineering, Democritous University, 57100 Xanthi, Greece
Interests: scheduling; RCMPSP; project management; graph theory and modeling; heuristics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data analysis, analytics, big data algorithms and artificial intelligence are having an enormous impact on all fields of scientific research. Clustering and classification algorithms, data analysis with Python or R, correspondence factor analysis, multiple correspondence analysis, hierarchical clustering and other statistical methods, theory and applications offer an interesting context for scientific contributions and applied research.

The present Topics of interest include data analysis methodological approaches; big data and artificial intelligence research; parallel and distributed data analysis including biomedical, medical, social sciences, humanities, education, economics, management, marketing, and computer science fields; and ethics, including applications, models, and best practices related to theory and applications. Emphasis is also placed on correspondence factor analysis, clustering and classification algorithms and applications as well as ethics issues regarding big data and artificial intelligence.

The goal of this Special Issue is to provide interested readers with a collection of papers describing recent developments in intelligent data analysis. Topics of interest include, but are not limited to:

  • Theory and models;
  • Clustering and classification algorithms;
  • Cognitive computing;
  • Computational intelligence;
  • Data mining techniques, big data mining, and data analytics;
  • Algorithms and simulation;
  • Parallel and distributed data analysis;
  • Big data analysis, algorithms of big data sets and data analysis with Python;
  • Correspondence factor analysis of data sets and correspondence factor analysis of big data sets;
  • Dimensional data: application of model-based clustering;
  • Hierarchical clustering and hierarchical clustering of massive, high dimensional data sets;
  • Multiple correspondence analysis in massive data sets;
  • Big data and correspondence analysis in machine learning;
  • Comparison of pattering methods;
  • Multiple correspondence analysis, discriminant correspondence analysis or barycentric discriminant analysis, implicative statistical analysis: theory and applications;
  • Web information analysis and complex data analysis;
  • Biomedical data analysis;
  • Biomedical and medical data applications;
  • Deep learning;
  • Visualizing data using correspondence analysis;
  • Machine learning;
  • Practical applications of data mining;
  • Uncertainty in big data;
  • Artificial intelligence algorithms;
  • Artificial intelligence ethics;
  • Artificial intelligence in education;
  • Big data algorithms;
  • Big data ethics;
  • Big data in education.

Prof. Dr. Sofia Anastasiadou
Dr. Andreas Masouras
Dr. Christos Papademetriou
Dr. Stavros Souravlas
Dr. Stefanos Katsavounis
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 7698 KiB  
Article
K-Means Clustering of 51 Geospatial Layers Identified for Use in Continental-Scale Modeling of Outdoor Acoustic Environments
by Katrina Pedersen, Ryan R. Jensen, Lucas K. Hall, Mitchell C. Cutler, Mark K. Transtrum, Kent L. Gee and Shane V. Lympany
Appl. Sci. 2023, 13(14), 8123; https://doi.org/10.3390/app13148123 - 12 Jul 2023
Cited by 1 | Viewed by 1204
Abstract
Applying machine learning methods to geographic data provides insights into spatial patterns in the data as well as assists in interpreting and describing environments. This paper investigates the results of k-means clustering applied to 51 geospatial layers, selected and scaled for a model [...] Read more.
Applying machine learning methods to geographic data provides insights into spatial patterns in the data as well as assists in interpreting and describing environments. This paper investigates the results of k-means clustering applied to 51 geospatial layers, selected and scaled for a model of outdoor acoustic environments, in the continental United States. Silhouette and elbow analyses were performed to identify an appropriate number of clusters (eight). Cluster maps are shown and the clusters are described, using correlations between the geospatial layers and clusters to identify distinguishing characteristics for each cluster. A subclustering analysis is presented in which each of the original eight clusters is further divided into two clusters. Because the clustering analysis used geospatial layers relevant to modeling outdoor acoustics, the geospatially distinct environments corresponding to the clusters may aid in characterizing acoustically distinct environments. Therefore, the clustering analysis can guide data collection for the problem of modeling outdoor acoustic environments by identifying poorly sampled regions of the feature space (i.e., clusters which are not well-represented in the training data). Full article
Show Figures

Figure 1

12 pages, 1271 KiB  
Article
A Machine-Learning-Inspired Opinion Extraction Mechanism for Classifying Customer Reviews on Social Media
by Fahad M. Alotaibi
Appl. Sci. 2023, 13(12), 7266; https://doi.org/10.3390/app13127266 - 19 Jun 2023
Cited by 2 | Viewed by 1400
Abstract
Machine learning frameworks categorizing customer reviews on online products have significantly improved sales and product quality for major manufacturers. Manually scrutinizing extensive customer reviews is imprecise and time-consuming. Current product research techniques rely on text mining, neglecting audio, and image components, resulting in [...] Read more.
Machine learning frameworks categorizing customer reviews on online products have significantly improved sales and product quality for major manufacturers. Manually scrutinizing extensive customer reviews is imprecise and time-consuming. Current product research techniques rely on text mining, neglecting audio, and image components, resulting in less productive outcomes for researchers and developers. AI-based machine learning frameworks that consider social media and online buyer reviews are essential for accurate recommendations in online e-commerce shops. This research paper proposes a novel machine-learning-based framework for categorizing customer reviews that uses a bag-of-features approach for feature extraction and a hybrid DNN framework for robust classification. We assess the performance of our machine learning framework using AliExpress and Amazon e-commerce product review data provided by customers, and we have achieved a classification accuracy of 91.5% with only 8.46% fallout. Moreover, when compared with state-of-the-art models, our proposed model shows superior performance in terms of sensitivity, specificity, precision, fallout, and accuracy. Full article
Show Figures

Figure 1

18 pages, 4922 KiB  
Article
Unsupervised Deep Embedded Clustering for High-Dimensional Visual Features of Fashion Images
by Umar Subhan Malhi, Junfeng Zhou, Cairong Yan, Abdur Rasool, Shahbaz Siddeeq and Ming Du
Appl. Sci. 2023, 13(5), 2828; https://doi.org/10.3390/app13052828 - 22 Feb 2023
Cited by 1 | Viewed by 1996
Abstract
Fashion image clustering is the key to fashion retrieval, forecasting, and recommendation applications. Manual labeling-based clustering is both time-consuming and less accurate. Currently, popular methods for extracting features from data use deep learning techniques, such as a Convolutional Neural Network (CNN). These methods [...] Read more.
Fashion image clustering is the key to fashion retrieval, forecasting, and recommendation applications. Manual labeling-based clustering is both time-consuming and less accurate. Currently, popular methods for extracting features from data use deep learning techniques, such as a Convolutional Neural Network (CNN). These methods can generate high-dimensional feature vectors, which are effective for image clustering. However, high dimensions can lead to the curse of dimensionality, which makes subsequent clustering difficult. The fashion images-oriented deep clustering method (FIDC) is proposed in this paper. This method uses CNN to generate a 4096-dimensional feature vector for each fashion image through migration learning, then performs dimensionality reduction through a deep-stacked auto-encoder model, and finally performs clustering on these low-dimensional vectors. High-dimensional vectors can represent images, and dimensionality reduction avoids the curse of dimensionality during clustering tasks. A particular point in the method is the joint learning and optimization of the dimensionality reduction process and the clustering task. The optimization process is performed using two algorithms: back-propagation and stochastic gradient descent. The experimental findings show that the proposed method, called FIDC, has achieved state-of-the-art performance. Full article
Show Figures

Figure 1

17 pages, 1174 KiB  
Article
A Novel Density Peaks Clustering Algorithm with Isolation Kernel and K-Induction
by Shichen Zhang and Kai Li
Appl. Sci. 2023, 13(1), 322; https://doi.org/10.3390/app13010322 - 27 Dec 2022
Cited by 2 | Viewed by 1334
Abstract
Density peaks clustering (DPC) algorithm can process data of any shape and is simple and intuitive. However, the distance between any two high-dimensional points tends to be consistent, which makes it difficult to distinguish the density peaks and easily produces “bad label” delivery. [...] Read more.
Density peaks clustering (DPC) algorithm can process data of any shape and is simple and intuitive. However, the distance between any two high-dimensional points tends to be consistent, which makes it difficult to distinguish the density peaks and easily produces “bad label” delivery. To surmount the above-mentioned defects, this paper put forward a novel density peaks clustering algorithm with isolation kernel and K-induction (IKDC). The IKDC uses an optimized isolation kernel instead of the traditional distance. The optimized isolation kernel solves the problem of converging the distance between the high-dimensional samples by increasing the similarity of two samples in a sparse domain and decreasing the similarity of two samples in a dense domain. In addition, the IKDC introduces three-way clustering, uses core domains to represent dense regions of clusters, and uses boundary domains to represent sparse regions of clusters, where points in the boundary domains may belong to one or more clusters. At the same time as determining the core domains, the improved KNN and average similarity are proposed to assign as many as possible to the core domains. The K-induction is proposed to assign the leftover points to the boundary domain of the optimal cluster. To confirm the practicability and validity of IKDC, we test on 10 synthetic and 8 real datasets. The comparison with other algorithms showed that the IKDC was superior to other algorithms in multiple clustering indicators. Full article
Show Figures

Figure 1

Back to TopTop