Journal Browser

► Journal Browser

Special Issue Information

Dear Colleagues,

This Special Issue is devoted to the increasing interest in the design and implementation of parallel and distributed data mining algorithms. The Workshop on Parallel and Distributed Data Mining (WPDM 2018) is an international forum that brings together researchers and practitioners, working on different high-performance aspects of data mining algorithms, enabling novel applications. Indeed, datasets grow so rapidly that, every day, exabytes of data are being generated, and, in order to extract and infer useful knowledge, parallel processing techniques are required to cope with this huge volume of data: Traditional sequential software is not viable to cope with them. When the data are generated in different places (such as wireless sensor networks and/or Internet of things devices), they may be geographically spread out, and the data cannot be sent to a centralized site, thus, the need for distributed processing algorithms arises. Topics relevant to this Special Issue cover the scope of the WPDM 2018 Workshop (http://sara.unisalento.it/~cafaro/WPDM2018/):

Parallel data mining algorithms using MPI and/or OpenMP
Parallel data mining algorithms targeting GPUs and many-cores accelerators
Parallel data mining applications exploiting FPGA
Distributed data mining algorithms
Benchmarking and performance studies of high-performance data mining applications
Novel programming paradigms to support high-performance computing for data mining
Performance models for high-performance data mining applications and middleware
Programming models, tools, and environments for high-performance computing in data mining
Caching, streaming, pipelining, and other optimization techniques for data management in high-performance computing for data mining

Extended versions of papers presented at WPDM 2018 are sought, but this call for papers is fully open to all those who wish to contribute by submitting a relevant research manuscript.

Assoc. Prof. Massimo Cafaro
Dr. Italo Epicoco
Dr. Marco Pulimeno
Prof. Giovanni Aloisio
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (1 paper)

Research

21 pages, 941 KiB

Open AccessArticle

Heterogeneous Distributed Big Data Clustering on Sparse Grids

by David Pfander, Gregor Daiß and Dirk Pflüger

Algorithms 2019, 12(3), 60; https://doi.org/10.3390/a12030060 - 07 Mar 2019

Cited by 4 | Viewed by 5667

Abstract

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big data scenarios, a high-performance clustering approach is required. Sparse grid clustering is a density-based clustering method that uses a sparse grid density estimation as its central building block. The underlying density estimation approach enables the detection of clusters with non-convex shapes and without a predetermined number of clusters. In this work, we introduce a new distributed and performance-portable variant of the sparse grid clustering algorithm that is suited for big data settings. Our computed kernels were implemented in OpenCL to enable portability across a wide range of architectures. For distributed environments, we added a manager–worker scheme that was implemented using MPI. In experiments on two supercomputers, Piz Daint and Hazel Hen, with up to 100 million data points in a ten-dimensional dataset, we show the performance and scalability of our approach. The dataset with 100 million data points was clustered in 1198

s

using 128 nodes of Piz Daint. This translates to an overall performance of 352

TFLOPS

. On the node-level, we provide results for two GPUs, Nvidia’s Tesla P100 and the AMD FirePro W8100, and one processor-based platform that uses Intel Xeon E5-2680v3 processors. In these experiments, we achieved between 43% and 66% of the peak performance across all computed kernels and devices, demonstrating the performance portability of our approach. Full article

(This article belongs to the Special Issue The Second International Workshop on Parallel and Distributed Data Mining)

► Show Figures

Figure 1

Journal Menu

Journal Browser

The Second International Workshop on Parallel and Distributed Data Mining

Share This Special Issue

Special Issue Editors

Special Issue Information

Published Papers (1 paper)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI