**1. Introduction**

A transform is a classical technique in signal processing, such as compression, classification, and recognition [1–5]. Traditional transforms, based on analytic orthogonal bases such as DCT, DFT, and Wavelets [1,6], suffer from two shortcomings: they do not depend on the data, and they reconstruct each image by approximation in the same subspace spanned by a non-redundant basis of the transforms, which limits the compact representation of natural signals.

Various models for sparse approximation have appeared in recent decades and play a fundamental role in modeling natural signals, with applications of denoising [7–10], super-resolution [11–13], and compression [1]. Such techniques exploit the sparsity of natural signals in analytic transform domains such as DCT, DFT, and various learning-based dictionaries [14–16].

There are two typical models for sparse representation: synthesis [10,14,15] and analysis [16–19] models. So far, most sparse models rely on the concept of synthesis, which represents the underlying signal as a sparse combination of atoms from a given dictionary. Specifically, **<sup>x</sup>** <sup>=</sup> **<sup>D</sup>***α*, where **<sup>x</sup>** <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* is the original signal, **<sup>D</sup>** <sup>∈</sup> <sup>R</sup>*N*×*<sup>M</sup>* is the given dictionary whose columns are the atoms, and *<sup>α</sup>* <sup>∈</sup> <sup>R</sup>*<sup>M</sup>* is the sparse coefficient, which is usually measured by the -0-norm ·0. A learning analysis sparse model was proposed by Elad [14,19], formulated as **Ωx**<sup>0</sup> = *r* with notation similar to that of the synthesis one. Instead of reconstructing the signal using a few atoms in dictionary (like in the synthesis model), an analysis model decomposes a signal in a sparse fashion, based on an assumption that the signal lies in a sparse subset of the dictionary.

An analysis model can be straightforwardly regarded as a forward transform if its corresponding backward transform **Ω**∗ is available. Recent research on transforms [2,4,5,20,21] has demonstrated the advantages of applying sparse constraints in transform learning. Motivated by this idea, many studies have been devoted to image denoising [5,20], classification [3,4], and other signal processing methods [21]. Learning-based transforms with sparse constraints measure the transform error, called sparsificaiton error, in the analysis or frequency domain, rather than in the temporal domain. Given training data **<sup>X</sup>** <sup>∈</sup> <sup>R</sup>*N*×*<sup>L</sup>* with signal vectors **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*N*, *<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>L</sup>* as its columns, the problem of training a square sparsifying transform **<sup>W</sup>** <sup>∈</sup> <sup>R</sup>*N*×*<sup>N</sup>* [21] is formulated as

$$\min\_{\mathbf{W}, \mathbf{Y}} \|\mathbf{W}\mathbf{X} - \mathbf{Y}\|\_F^2 + \mu \|\mathbf{W}\|\_F^2 - \lambda \log \det(\mathbf{W}) \tag{1}$$
 
$$\text{s.t. } \|\mathbf{y}\_i\|\_0 \le s\_\star$$

where **<sup>y</sup>***i*, *<sup>i</sup>* = 1, 2, ... , *<sup>L</sup>* are the columns of **<sup>Y</sup>** satisfying a sparse constraint and *<sup>μ</sup>***<sup>W</sup>**<sup>2</sup> *<sup>F</sup>* − *λ* log det(**W**) is a regularizer, which keeps **W** non-singular.

As we can see, learning-based models effectively reveal the relationship between the transform and the data. The square transform, which consists of a non-redundant basis, cannot express complicated images. In 2014, an overcomplete transform learning model called OCTOBOS [20] was proposed, which consists of a series of square transforms to represent different features of natural images. However, the number of transforms must be pre-defined, which admits limited flexibility in applications.

In recent years, frames, as an overcomplete system, have been applied in image processing such as denoising [22,23], image compressive [24] and high resolution image reconstruction [25]. A frame can be regarded as an extension of an orthogonal basis, as a frame **<sup>Φ</sup>** <sup>∈</sup> <sup>R</sup>*N*×*M*(*<sup>N</sup>* <sup>&</sup>lt; *<sup>M</sup>*) also spans an *N*-dimensional space. Compared to a general frame, a tight frame (e.g., wavelet tight frames [26], ridgelets [27], curvelets [28], shearlets [29], and others) can achieve wider use, as the lower and upper frame bounds are equal. A tight frame inherits the good characteristics of an orthogonal basis in signal processing, as its rows are orthogonal [30]. In a sparse representation, a redundant frame serves as an overcomplete dictionary to represent the signal [23]. With the development of data-driven approaches, learning-based tight frames have recently been researched [31–33]. In [31], redundant tight frames were used in compressed sensing. In [32], tight frames were applied to few-view image reconstruction. In [33], a data-driven method was presented, in which the dictionary atoms associated with a tight frame are generated by filters. In general, these studies model the frame learning problem in the dictionary learning form with tight frame constraints. These methods focus on tight frames, as the singular values of a tight frame are equal, which leads to simple optimization. A tight frame is a Parseval frame if the frame bounds are equal to 1. In fact, a Parseval frame is a redundant extension of the concept of a standard orthogonal basis. Due to its super-performance in linear signal representation, it can be well-used in sparse signal representation and optimization.

In this paper, we propose a data-driven redundant transform model based on Parseval frames (DRTPF for short), and present a model for learning DRTPF as well as a corresponding algorithm for solving the model. The algorithm consists of a sparse coding phase and a transform learning phase. The sparse coding phase updates the sparse coefficients and a threshold value using a conventional Batch Orthogonal Matching Pursuit (BtOMP) and pointwise thresholding. The transform learning phase performs the update of the frame using Gradient Descent and a relaxation or contraction singular values mapping, as well as updating the dual frame, in an atom-wise manner, using Least Squares. The advantages of the proposed DRTPF model (as well as the algorithm) are demonstrated with natural image denoising. To summarize, this paper makes the following contributions:


The rest of this paper is organized as follows. Section 2 reviews the related work on frames. Section 3 proposes the framework of DRTPF, including the form of DRTPF (Section 3.1) and the learning model and corresponding algorithm for DRTPF (Section 3.2). In Section 4, we demonstrate the effectiveness of our DRTPF model by analyzing the convergence of the corresponding algorithm and give experimental results on robustness analysis and image denoising, as well as evaluating the effectiveness of DRTPF compared with traditional transforms and sparse models.
