Research

555 KiB

Open AccessArticle

Content Sharing Graphs for Deduplication-Enabled Storage Systems

by Maohua Lu, Cornel Constantinescu and Prasenjit Sarkar

Algorithms 2012, 5(2), 236-260; https://doi.org/10.3390/a5020236 - 10 Apr 2012

Cited by 5 | Viewed by 5678

Deduplication in storage systems has gained momentum recently for its capability in reducing data footprint. However, deduplication introduces challenges to storage management as storage objects (e.g., files) are no longer independent from each other due to content sharing between these storage objects. In [...] Read more.

Deduplication in storage systems has gained momentum recently for its capability in reducing data footprint. However, deduplication introduces challenges to storage management as storage objects (e.g., files) are no longer independent from each other due to content sharing between these storage objects. In this paper, we present a graph-based framework to address the challenges of storage management due to deduplication. Specifically, we model content sharing among storage objects by content sharing graphs (CSG), and apply graph-based algorithms to two real-world storage management use cases for deduplication-enabled storage systems. First, a quasi-linear algorithm was developed to partition deduplication domains with a minimal amount of deduplication loss (i.e., data replicated across partitioned domains) in commercial deduplication-enabled storage systems, whereas in general the partitioning problem is NP-complete. For a real-world trace of 3 TB data with 978 GB of removable duplicates, the proposed algorithm can partition the data into 15 balanced partitions with only 54 GB of deduplication loss, that is, a 5% deduplication loss. Second, a quick and accurate method to query the deduplicated size for a subset of objects in deduplicated storage systems was developed. For the same trace of 3 TB data, the optimized graph-based algorithm can complete the query in 2.6 s, which is less than 1% of that of the traditional algorithm based on the deduplication metadata. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Figure 1

347 KiB

Open AccessArticle

An Online Algorithm for Lightweight Grammar-Based Compression

by Shirou Maruyama, Hiroshi Sakamoto and Masayuki Takeda

Algorithms 2012, 5(2), 214-235; https://doi.org/10.3390/a5020214 - 10 Apr 2012

Cited by 24 | Viewed by 6723

Abstract

Grammar-based compression is a well-studied technique to construct a context-free grammar (CFG) deriving a given text uniquely. In this work, we propose an online algorithm for grammar-based compression. Our algorithm guarantees O(log² n)- approximation ratio for the minimum grammar size, where [...] Read more.

Grammar-based compression is a well-studied technique to construct a context-free grammar (CFG) deriving a given text uniquely. In this work, we propose an online algorithm for grammar-based compression. Our algorithm guarantees O(log² n)- approximation ratio for the minimum grammar size, where n is an input size, and it runs in input linear time and output linear space. In addition, we propose a practical encoding, which transforms a restricted CFG into a more compact representation. Experimental results by comparison with standard compressors demonstrate that our algorithm is especially effective for highly repetitive text. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Figure 1

778 KiB

Open AccessArticle

A Semi-Preemptive Computational Service System with Limited Resources and Dynamic Resource Ranking

by Fang-Yie Leu, Keng-Yen Chao, Ming-Chang Lee and Jia-Chun Lin

Algorithms 2012, 5(1), 113-147; https://doi.org/10.3390/a5010113 - 14 Mar 2012

Viewed by 7355

Abstract

In this paper, we integrate a grid system and a wireless network to present a convenient computational service system, called the Semi-Preemptive Computational Service system (SePCS for short), which provides users with a wireless access environment and through which a user can share [...] Read more.

In this paper, we integrate a grid system and a wireless network to present a convenient computational service system, called the Semi-Preemptive Computational Service system (SePCS for short), which provides users with a wireless access environment and through which a user can share his/her resources with others. In the SePCS, each node is dynamically given a score based on its CPU level, available memory size, current length of waiting queue, CPU utilization and bandwidth. With the scores, resource nodes are classified into three levels. User requests based on their time constraints are also classified into three types. Resources of higher levels are allocated to more tightly constrained requests so as to increase the total performance of the system. To achieve this, a resource broker with the Semi-Preemptive Algorithm (SPA) is also proposed. When the resource broker cannot find suitable resources for the requests of higher type, it preempts the resource that is now executing a lower type request so that the request of higher type can be executed immediately. The SePCS can be applied to a Vehicular Ad Hoc Network (VANET), users of which can then exploit the convenient mobile network services and the wireless distributed computing. As a result, the performance of the system is higher than that of the tested schemes. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Figure 1

804 KiB

Open AccessArticle

Successive Standardization of Rectangular Arrays

by Richard A. Olshen and Bala Rajaratnam

Algorithms 2012, 5(1), 98-112; https://doi.org/10.3390/a5010098 - 29 Feb 2012

Cited by 2 | Viewed by 5276

Abstract

In this note we illustrate and develop further with mathematics and examples, the work on successive standardization (or normalization) that is studied earlier by the same authors in [1] and [2]. Thus, we deal with successive iterations applied to rectangular arrays of numbers, [...] Read more.

In this note we illustrate and develop further with mathematics and examples, the work on successive standardization (or normalization) that is studied earlier by the same authors in [1] and [2]. Thus, we deal with successive iterations applied to rectangular arrays of numbers, where to avoid technical difficulties an array has at least three rows and at least three columns. Without loss, an iteration begins with operations on columns: first subtract the mean of each column; then divide by its standard deviation. The iteration continues with the same two operations done successively for rows. These four operations applied in sequence completes one iteration. One then iterates again, and again, and again, ... In [1] it was argued that if arrays are made up of real numbers, then the set for which convergence of these successive iterations fails has Lebesgue measure 0. The limiting array has row and column means 0, row and column standard deviations 1. A basic result on convergence given in [1] is true, though the argument in [1] is faulty. The result is stated in the form of a theorem here, and the argument for the theorem is correct. Moreover, many graphics given in [1] suggest that except for a set of entries of any array with Lebesgue measure 0, convergence is very rapid, eventually exponentially fast in the number of iterations. Because we learned this set of rules from Bradley Efron, we call it “Efron’s algorithm”. More importantly, the rapidity of convergence is illustrated by numerical examples. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Figure 1

904 KiB

Open AccessArticle

Visualization, Band Ordering and Compression of Hyperspectral Images

by Raffaele Pizzolante and Bruno Carpentieri

Algorithms 2012, 5(1), 76-97; https://doi.org/10.3390/a5010076 - 20 Feb 2012

Cited by 26 | Viewed by 8970

Abstract

Air-borne and space-borne acquired hyperspectral images are used to recognize objects and to classify materials on the surface of the earth. The state of the art compressor for lossless compression of hyperspectral images is the Spectral oriented Least SQuares (SLSQ) compressor (see [1–7]). [...] Read more.

Air-borne and space-borne acquired hyperspectral images are used to recognize objects and to classify materials on the surface of the earth. The state of the art compressor for lossless compression of hyperspectral images is the Spectral oriented Least SQuares (SLSQ) compressor (see [1–7]). In this paper we discuss hyperspectral image compression: we show how to visualize each band of a hyperspectral image and how this visualization suggests that an appropriate band ordering can lead to improvements in the compression process. In particular, we consider two important distance measures for band ordering: Pearson’s Correlation and Bhattacharyya distance, and report on experimental results achieved by a Java-based implementation of SLSQ. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Figure 1

153 KiB

Open AccessArticle

A Note on Sequence Prediction over Large Alphabets

by Travis Gagie

Algorithms 2012, 5(1), 50-55; https://doi.org/10.3390/a5010050 - 17 Feb 2012

Cited by 1 | Viewed by 4614

Abstract

Building on results from data compression, we prove nearly tight bounds on how well sequences of length n can be predicted in terms of the size σ of the alphabet and the length k of the context considered when making predictions. We compare [...] Read more.

Building on results from data compression, we prove nearly tight bounds on how well sequences of length n can be predicted in terms of the size σ of the alphabet and the length k of the context considered when making predictions. We compare the performance achievable by an adaptive predictor with no advance knowledge of the sequence, to the performance achievable by the optimal static predictor using a table listing the frequency of each (k + 1)-tuple in the sequence. We show that, if the elements of the sequence are chosen uniformly at random, then an adaptive predictor can compete in the expected case if k ≤ log_σ n – 3 – ε, for a constant ε > 0, but not if k ≥ log_σ n. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

650 KiB

Open AccessArticle

Standard and Specific Compression Techniques for DNA Microarray Images

by Miguel Hernández-Cabronero, Ian Blanes, Michael W. Marcellin and Joan Serra-Sagristà

Algorithms 2012, 5(1), 30-49; https://doi.org/10.3390/a5010030 - 14 Feb 2012

Cited by 3 | Viewed by 5556

Abstract

We review the state of the art in DNA microarray image compression and provide original comparisons between standard and microarray-specific compression techniques that validate and expand previous work. First, we describe the most relevant approaches published in the literature and classify them according [...] Read more.

We review the state of the art in DNA microarray image compression and provide original comparisons between standard and microarray-specific compression techniques that validate and expand previous work. First, we describe the most relevant approaches published in the literature and classify them according to the stage of the typical image compression process where each approach makes its contribution, and then we summarize the compression results reported for these microarray-specific image compression schemes. In a set of experiments conducted for this paper, we obtain new results for several popular image coding techniques that include the most recent coding standards. Prediction-based schemes CALIC and JPEG-LS are the best-performing standard compressors, but are improved upon by the best microarray-specific technique, Battiato’s CNN-based scheme. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Graphical abstract

2242 KiB

Open AccessArticle

Compression-Based Tools for Navigation with an Image Database

by Antonella Di Lillo, Ajay Daptardar, Kevin Thomas, James A. Storer and Giovanni Motta

Algorithms 2012, 5(1), 1-17; https://doi.org/10.3390/a5010001 - 10 Jan 2012

Cited by 5 | Viewed by 7668

Abstract

We present tools that can be used within a larger system referred to as a passive assistant. The system receives information from a mobile device, as well as information from an image database such as Google Street View, and employs image [...] Read more.

We present tools that can be used within a larger system referred to as a passive assistant. The system receives information from a mobile device, as well as information from an image database such as Google Street View, and employs image processing to provide useful information about a local urban environment to a user who is visually impaired. The first stage acquires and computes accurate location information, the second stage performs texture and color analysis of a scene, and the third stage provides specific object recognition and navigation information. These second and third stages rely on compression-based tools (dimensionality reduction, vector quantization, and coding) that are enhanced by knowledge of (approximate) location of objects. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Figure 1

320 KiB

Open AccessArticle

A Catalog of Self-Affine Hierarchical Entropy Functions

by John Kieffer

Algorithms 2011, 4(4), 307-333; https://doi.org/10.3390/a4040307 - 01 Nov 2011

Viewed by 6729

Abstract

For fixed k ≥ 2 and fixed data alphabet of cardinality m, the hierarchical type class of a data string of length n = k^j for some j ≥ 1 is formed by permuting the string in all possible ways under permutations [...] Read more.

For fixed k ≥ 2 and fixed data alphabet of cardinality m, the hierarchical type class of a data string of length n = k^j for some j ≥ 1 is formed by permuting the string in all possible ways under permutations arising from the isomorphisms of the unique finite rooted tree of depth j which has n leaves and k children for each non-leaf vertex. Suppose the data strings in a hierarchical type class are losslessly encoded via binary codewords of minimal length. A hierarchical entropy function is a function on the set of m-dimensional probability distributions which describes the asymptotic compression rate performance of this lossless encoding scheme as the data length n is allowed to grow without bound. We determine infinitely many hierarchical entropy functions which are each self-affine. For each such function, an explicit iterated function system is found such that the graph of the function is the attractor of the system. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

167 KiB

Open AccessArticle

Lempel–Ziv Data Compression on Parallel and Distributed Systems

by Sergio De Agostino

Algorithms 2011, 4(3), 183-199; https://doi.org/10.3390/a4030183 - 14 Sep 2011

Cited by 9 | Viewed by 8630

Abstract

We present a survey of results concerning Lempel–Ziv data compression on parallel and distributed systems, starting from the theoretical approach to parallel time complexity to conclude with the practical goal of designing distributed algorithms with low communication cost. Storer’s extension for image compression [...] Read more.

We present a survey of results concerning Lempel–Ziv data compression on parallel and distributed systems, starting from the theoretical approach to parallel time complexity to conclude with the practical goal of designing distributed algorithms with low communication cost. Storer’s extension for image compression is also discussed. Full article

(This article belongs to the Special Issue Data Compression, Communication and Processing)

► Show Figures

Journal Menu

Journal Browser

Data Compression, Communication and Processing

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI