Big Data Algorithmics

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Databases and Data Structures".

Deadline for manuscript submissions: closed (31 December 2020) | Viewed by 12842

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information Engineering, University of Padova, I-35131 Padova, Italy
Interests: big data algorithmics and data mining; algorithms and data structures for parallel and hierarchical platforms; models of computation; wireless interconnection networks and routing

E-Mail Website
Guest Editor
Department of Information Engineering, University of Padova, I-35131 Padova, Italy
Interests: algorithms and data structures for big data; models of computation; wireless interconnection networks and routing; graph analytics; high-performance numerical kernels; data mining

E-Mail Website
Guest Editor
Department of Information Engineering, University of Padova, I-35131 Padova, Italy
Interests: algorithms and data structures for big data; parallel and memory-efficient algorithms; algorithms for similarity search; models of computation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The last two decades have witnessed an impressive growth in the amount of digital data produced in a wide spectrum of application domains. The effective exploitation of this data is of great value to science, business, government, and society, and it has been considered as one of the most important scientific challenges of the 21st century. Due to the volume, velocity, heterogeneity, and noise, dealing with big data requires a sharp paradigm shift with respect to traditional computing. Novel computational frameworks such as MapReduce and Streaming have recently established themselves as standards for big data processing, both in academia and in industry. The effective use of these frameworks requires new approaches to algorithm design. For instance, algorithmic solutions must take into account that the data are typically distributed or streamed, and in any case cannot fit in a single machine's local memory. Moreover, since exact solutions are often computationally unfeasible for very large and possibly noisy inputs, suitable tradeoffs must be sought between the accuracy of the returned solution and computational efficiency.

Despite the considerable effort that the research community has been devoting to big data algorithmics, there is a dire need for new contributions as well as efforts to clean up and systematize very recent advances in the field. This Special Issue aims at making a step in this direction by soliciting original and unpublished contributions in the area. Topics of interest include (but are not limited to) those listed below.

TOPICS OF INTEREST

  • Computational Models for Big Data Processing
  • MapReduce Algorithms
  • Streaming Algorithms
  • Parallel and Distributed Algorithms for Data Intensive Computing
  • Approximation Algorithms for Massive Instances of Optimization Problems
  • Algorithms for Large Graphs
  • Models and Algorithms for Large Evolving Data and Sequences
  • Models and Algorithms for Large Uncertain Data
  • Dimensionality Reduction Techniques
  • Compact Data Structures and Sketches
  • Sampling Methodologies
  • Impossibility Results

Prof. Andrea Pietracaprina
Prof. Geppino Pucci
Prof. Francesco Silvestri
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

27 pages, 949 KiB  
Article
Finding Top-k Nodes for Temporal Closeness in Large Temporal Graphs
by Pierluigi Crescenzi, Clémence Magnien and Andrea Marino
Algorithms 2020, 13(9), 211; https://doi.org/10.3390/a13090211 - 29 Aug 2020
Cited by 14 | Viewed by 3528
Abstract
The harmonic closeness centrality measure associates, to each node of a graph, the average of the inverse of its distances from all the other nodes (by assuming that unreachable nodes are at infinite distance). This notion has been adapted to temporal graphs (that [...] Read more.
The harmonic closeness centrality measure associates, to each node of a graph, the average of the inverse of its distances from all the other nodes (by assuming that unreachable nodes are at infinite distance). This notion has been adapted to temporal graphs (that is, graphs in which edges can appear and disappear during time) and in this paper we address the question of finding the top-k nodes for this metric. Computing the temporal closeness for one node can be done in O(m) time, where m is the number of temporal edges. Therefore computing exactly the closeness for all nodes, in order to find the ones with top closeness, would require O(nm) time, where n is the number of nodes. This time complexity is intractable for large temporal graphs. Instead, we show how this measure can be efficiently approximated by using a “backward” temporal breadth-first search algorithm and a classical sampling technique. Our experimental results show that the approximation is excellent for nodes with high closeness, allowing us to detect them in practice in a fraction of the time needed for computing the exact closeness of all nodes. We validate our approach with an extensive set of experiments. Full article
(This article belongs to the Special Issue Big Data Algorithmics)
Show Figures

Figure 1

34 pages, 752 KiB  
Article
Mining Sequential Patterns with VC-Dimension and Rademacher Complexity
by Diego Santoro, Andrea Tonon and Fabio Vandin
Algorithms 2020, 13(5), 123; https://doi.org/10.3390/a13050123 - 18 May 2020
Cited by 10 | Viewed by 4598
Abstract
Sequential pattern mining is a fundamental data mining task with application in several domains. We study two variants of this task—the first is the extraction of frequent sequential patterns, whose frequency in a dataset of sequential transactions is higher than a user-provided threshold; [...] Read more.
Sequential pattern mining is a fundamental data mining task with application in several domains. We study two variants of this task—the first is the extraction of frequent sequential patterns, whose frequency in a dataset of sequential transactions is higher than a user-provided threshold; the second is the mining of true frequent sequential patterns, which appear with probability above a user-defined threshold in transactions drawn from the generative process underlying the data. We present the first sampling-based algorithm to mine, with high confidence, a rigorous approximation of the frequent sequential patterns from massive datasets. We also present the first algorithms to mine approximations of the true frequent sequential patterns with rigorous guarantees on the quality of the output. Our algorithms are based on novel applications of Vapnik-Chervonenkis dimension and Rademacher complexity, advanced tools from statistical learning theory, to sequential pattern mining. Our extensive experimental evaluation shows that our algorithms provide high-quality approximations for both problems we consider. Full article
(This article belongs to the Special Issue Big Data Algorithmics)
Show Figures

Figure 1

16 pages, 1062 KiB  
Article
Deterministic Coresets for k-Means of Big Sparse Data
by Artem Barger and Dan Feldman
Algorithms 2020, 13(4), 92; https://doi.org/10.3390/a13040092 - 14 Apr 2020
Cited by 7 | Viewed by 3894
Abstract
Let P be a set of n points in R d , k 1 be an integer and ε ( 0 , 1 ) be a constant. An ε-coreset is a subset C P with appropriate non-negative weights (scalars), that [...] Read more.
Let P be a set of n points in R d , k 1 be an integer and ε ( 0 , 1 ) be a constant. An ε-coreset is a subset C P with appropriate non-negative weights (scalars), that approximates any given set Q R d of k centers. That is, the sum of squared distances over every point in P to its closest point in Q is the same, up to a factor of 1 ± ε to the weighted sum of C to the same k centers. If the coreset is small, we can solve problems such as k-means clustering or its variants (e.g., discrete k-means, where the centers are restricted to be in P, or other restricted zones) on the small coreset to get faster provable approximations. Moreover, it is known that such coreset support streaming, dynamic and distributed data using the classic merge-reduce trees. The fact that the coreset is a subset implies that it preserves the sparsity of the data. However, existing such coresets are randomized and their size has at least linear dependency on the dimension d. We suggest the first such coreset of size independent of d. This is also the first deterministic coreset construction whose resulting size is not exponential in d. Extensive experimental results and benchmarks are provided on public datasets, including the first coreset of the English Wikipedia using Amazon’s cloud. Full article
(This article belongs to the Special Issue Big Data Algorithmics)
Show Figures

Figure 1

Back to TopTop