The Second International Workshop on Parallel and Distributed Data Mining

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (31 January 2019) | Viewed by 6367

Special Issue Editors


E-Mail Website
Guest Editor
1. Department of Engineering for Innovation, University of Salento, 73100 Lecce, Italy
2. Euro-Mediterranean Centre on Climate Change, Foundation, Lecce, Italy
Interests: parallel; distributed; grid/cloud/P2P computing; data mining; machine learning; deep learning; security and cryptography
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

E-Mail Website
Guest Editor

E-Mail Website
Guest Editor
1. Department of Engineering for Innovation, Via per Monteroni, 73100 Lecce, Italy
2. Euro-Mediterranean Centre on Climate Change, Foundation, Lecce, Italy
Interests: high performance computing; grid and cloud computing; distributed data management

Special Issue Information

Dear Colleagues,

This Special Issue is devoted to the increasing interest in the design and implementation of parallel and distributed data mining algorithms. The Workshop on Parallel and Distributed Data Mining (WPDM 2018) is an international forum that brings together researchers and practitioners, working on different high-performance aspects of data mining algorithms, enabling novel applications. Indeed, datasets grow so rapidly that, every day, exabytes of data are being generated, and, in order to extract and infer useful knowledge, parallel processing techniques are required to cope with this huge volume of data: Traditional sequential software is not viable to cope with them. When the data are generated in different places (such as wireless sensor networks and/or Internet of things devices), they may be geographically spread out, and the data cannot be sent to a centralized site, thus, the need for distributed processing algorithms arises. Topics relevant to this Special Issue cover the scope of the WPDM 2018 Workshop (http://sara.unisalento.it/~cafaro/WPDM2018/):

  • Parallel data mining algorithms using MPI and/or OpenMP
  • Parallel data mining algorithms targeting GPUs and many-cores accelerators
  • Parallel data mining applications exploiting FPGA
  • Distributed data mining algorithms
  • Benchmarking and performance studies of high-performance data mining applications
  • Novel programming paradigms to support high-performance computing for data mining
  • Performance models for high-performance data mining applications and middleware
  • Programming models, tools, and environments for high-performance computing in data mining
  • Caching, streaming, pipelining, and other optimization techniques for data management in high-performance computing for data mining

Extended versions of papers presented at WPDM 2018 are sought, but this call for papers is fully open to all those who wish to contribute by submitting a relevant research manuscript.

Assoc. Prof. Massimo Cafaro
Dr. Italo Epicoco
Dr. Marco Pulimeno
Prof. Giovanni Aloisio
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 941 KiB  
Article
Heterogeneous Distributed Big Data Clustering on Sparse Grids
by David Pfander, Gregor Daiß and Dirk Pflüger
Algorithms 2019, 12(3), 60; https://doi.org/10.3390/a12030060 - 07 Mar 2019
Cited by 4 | Viewed by 5667
Abstract
Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big data scenarios, a high-performance clustering approach is required. Sparse grid clustering is a density-based clustering method that [...] Read more.
Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big data scenarios, a high-performance clustering approach is required. Sparse grid clustering is a density-based clustering method that uses a sparse grid density estimation as its central building block. The underlying density estimation approach enables the detection of clusters with non-convex shapes and without a predetermined number of clusters. In this work, we introduce a new distributed and performance-portable variant of the sparse grid clustering algorithm that is suited for big data settings. Our computed kernels were implemented in OpenCL to enable portability across a wide range of architectures. For distributed environments, we added a manager–worker scheme that was implemented using MPI. In experiments on two supercomputers, Piz Daint and Hazel Hen, with up to 100 million data points in a ten-dimensional dataset, we show the performance and scalability of our approach. The dataset with 100 million data points was clustered in 1198 s using 128 nodes of Piz Daint. This translates to an overall performance of 352 TFLOPS . On the node-level, we provide results for two GPUs, Nvidia’s Tesla P100 and the AMD FirePro W8100, and one processor-based platform that uses Intel Xeon E5-2680v3 processors. In these experiments, we achieved between 43% and 66% of the peak performance across all computed kernels and devices, demonstrating the performance portability of our approach. Full article
Show Figures

Figure 1

Back to TopTop