Next Article in Journal
Robust Biometric Authentication from an Information Theoretic Perspective
Next Article in Special Issue
Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding
Previous Article in Journal
Entropy Analysis on Electro-Kinetically Modulated Peristaltic Propulsion of Magnetized Nanofluid Flow through a Microchannel
Previous Article in Special Issue
A Sparse Multiwavelet-Based Generalized Laguerre–Volterra Model for Identifying Time-Varying Neural Dynamics from Spiking Activities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Characterization of the Domain of Beta-Divergence and Its Connection to Bregman Variational Model

School of Liberal Arts, Korea University of Technology and Education, Cheonan 31253, Korea
Entropy 2017, 19(9), 482; https://doi.org/10.3390/e19090482
Submission received: 20 July 2017 / Revised: 4 September 2017 / Accepted: 7 September 2017 / Published: 9 September 2017
(This article belongs to the Special Issue Entropy in Signal Analysis)

Abstract

:
In image and signal processing, the beta-divergence is well known as a similarity measure between two positive objects. However, it is unclear whether or not the distance-like structure of beta-divergence is preserved, if we extend the domain of the beta-divergence to the negative region. In this article, we study the domain of the beta-divergence and its connection to the Bregman-divergence associated with the convex function of Legendre type. In fact, we show that the domain of beta-divergence (and the corresponding Bregman-divergence) include negative region under the mild condition on the beta value. Additionally, through the relation between the beta-divergence and the Bregman-divergence, we can reformulate various variational models appearing in image processing problems into a unified framework, namely the Bregman variational model. This model has a strong advantage compared to the beta-divergence-based model due to the dual structure of the Bregman-divergence. As an example, we demonstrate how we can build up a convex reformulated variational model with a negative domain for the classic nonconvex problem, which usually appears in synthetic aperture radar image processing problems.

1. Introduction

In general, the domain of a divergence [1,2] is that confined not by the positiveness of variables but by the positiveness of a divergence (i.e., D ( b | u ) 0 ). Therefore, the domain of a divergence could be defined to include negative region while keeping positiveness of the divergence. To the best of our knowledge, it is unclear when the domain of the β -divergence (and the corresponding Bregman-divergence) include the negative region. In this article, we systematically explore the domains of the β -divergence [2] and the corresponding Bregman-divergence associated with the convex function of Legendre type [3].
The β -divergence [2,4,5,6,7] is a general framework of similarity measures induced from various statistical models, such as Poisson, Gamma, Gaussian, Inverse Gaussian, compound Poisson, and Tweedie distribution. For the connection between the β -divergence and the various statistical distributions, see [8]. Among the diverse statistical distributions, the Tweedie distribution has a unique feature, i.e., the unit deviance of the Tweedie distribution [8] corresponds to the β -divergence with β R ( 1 , 2 ) . It is interesting that ( 1 , 2 ) is a vital range of β while defining a convex right Bregman proximity operator [9,10]. We will address this issue in more details in Section 4. In addition, the β -divergence is also used as a distance-like measure in diverse areas, for instance, synthetic aperture radar (SAR) image processing [11,12], audio spectrogram comparison [6,13], and brain EEG signal processing [7].
We note that authors in [7] show the usefulness of the β -divergence with β > 1 as a robust similarity measure against outliers between two probability distributions. Here, outliers (rare events) are that have extremely low probability and thus they exist near zero probability. However, the (generalized) Kullback–Leibler-divergence (i.e., β -divergence D β = 1 ( b | u ) ), which is a commonly used similarity measure for probability distributions, is undefined at zero ( u = 0 ). See Figure 1a. Therefore, it is not easy to obtain robustness against outliers through the (generalized) Kullback–Leibler-divergence. On the contrary, the β -divergence with β > 1 (i.e., D β > 1 ( b | u ) ) is well defined at zero ( u = 0 ) and thus it is more robust to outliers than the Kullback–Leibler-divergence. For more details, see [4,5,7]. We also note that if the variables of β -divergence are not probability distributions (i.e., unnormalized) then outliers correspond to the variables that have extremely large values (≫1) [14]. To detect such kind of outliers under the Gamma distribution assumption, the β -divergence with β [ 1 , 0 ] is used as a distance-like measure in [11]. See also Figure 1c.
In the case of SAR image data processing, speckle noise is modeled with the Gamma distribution and thus the negative log-likelihood function, which appears in speckle reduction problem, corresponds to the β -divergence with β = 0 , i.e., the Itakura–Saito-divergence. Actually, this model is highly nonconvex [15]. Therefore, various transforms are introduced to relax nonconvexity of the Gamma distribution related speckle reduction model [16,17,18,19,20,21]. Recently, we have shown that the β -divergence with β ( 0 , 1 ) can be used as a transform-less convex relaxation model for SAR speckle reduction problem [12]. Generally, the data captured via a SAR system has extremely high dynamic range [22,23]. Under this harsh environment, β -divergence with β ( 1 , 0 ) is successfully used as a similarity measure for separation of the strong scatterers in SAR data [11]. In addition, the β -divergence is also used for the decomposition of magnitude data of audio spectrograms [6]. In these applications, the domains of data are generally assumed to be positive. However, the domain of the β -divergence can be extended to the negative region. In fact, if β = 2 , then the β -divergence is exactly the square of 2 -distance, the domain of which naturally includes a negative region. Surprisingly, in this article, we show that, under the mild condition on β , there are infinitely many β -divergences that have a negative domain.
It is known that the β -divergence can be reformulated with the Bregman-divergence [2,6]. However, if we restrict the base function of the Bregman-divergence as the convex function of Legendre type, then some part of the β -divergence cannot be expressed through the Bregman-divergence (see Table 1). Although the Bregman-divergence associated with a convex function of Legendre type does not exactly match with the β -divergence, due to the fruitful mathematical structure of the convex function of Legendre type, the associated Bregman-divergence has many useful properties. For instance, the dual formulation of the Bregman-divergence associated with the convex function of Legendre type can be used as a convex reformulation of some nonconvex problems under the certain condition on its domain [24]. In this article, we demonstrate that, by using the dual Bregman-divergence with the negative convex domain, we can make a convex reformulated Bregman variational model for the classic nonconvex problem that appears in the SAR image noise reduction problem [15]. We also show that we can unify the various variational models appearing in image processing problems as the Bregman variational model having sparsity constraints, e.g., total variation [25,26] (we called it Bregman-TV). Actually, the Bregman variational model corresponds to the right Bregman proximity operator [9,10]. See also [9,10,24,27,28,29,30] for theoretical analysis of the Bregman-divergence and related interesting properties of it.

1.1. Background

In this section, we review typical examples of the β -divergence, i.e., Itakura–Saito-divergence, Generalized Kullback–Leibler-divergence (I-divergence), and n o r m 2 -distance. In addition, we introduce the Bregman-divergence and the corresponding Bregman variational model with sparsity constraints.
Let us start with the β -divergence D β : Ω L × Ω R R + given by
D β ( b | u ) = u b b x x 2 β d x , 1 ,
where Ω L × Ω R = { ( b , u ) R n × R n | 0 D β ( b | u ) < } is the domain of the β -divergence. Actually, the domain Ω L × Ω R corresponds to the effective domain in optimization [3,31]. We call Ω L and Ω R as the left and right domains of the β -divergence. In addition, we assume that the left and right domains, Ω L and Ω R , are convex sets, respectively. That is, if a , b Ω L (or Ω R ), then the line segment between two points also satisfies a b ¯ Ω L (or Ω R ). Note that a , d = i = 1 n a i d i , a = ( a 1 , , a n ) R n , d = ( d 1 , , d n ) R n , 1 is the all one vector in R n , R + = { x R | x 0 } , R + + = { x R | x > 0 } , R = { x R | x 0 } , and R = { x R | x < 0 } . In addition, integration, multiplication, and division are performed component-wise. Based on a selection of β , we can recover the famous representatives of the β -divergence, i.e., Itakura–Saito-divergence [4,5,13], I-divergence (or generalized Kullback–Leibler-divergence) [20,32], and n o r m 2 -distance [25,26]. These three divergences are important examples of the β -divergence, since they show three different types of domains of the β -divergence. We summarize them in the following.
  • Itakura–Saito-divergence ( β = 0 ): Ω L = R + + n and Ω R = R + + n :
    D β ( b | u ) = b u ln b u 1 , 1 .
    Usually, the left and right domain, i.e., Ω L and Ω R of Itakura–Saito-divergence, are defined as positive and Ω L = Ω R [12,13]. However, due to the scale invariance property of it, the variables b and u can be negative at the same time, even within the logarithmic function, i.e., Ω L = Ω R = R n . Based on this keen observation, in this article, we develop a new methodology that systematically detects a domain having the negative region. The Itakura–Saito-divergence is a typical example that can be expressed by the β -divergence and the Bregman-divergence at the same time. However, it has the negative domain in the β -divergence framework, but not in the Bregman-divergence framework (see Table 1).
  • Generalized Kullback–Leibler-divergence (I-divergence) ( β = 1 ): Ω L = R + n and Ω R = R + + n :
    D β ( b | u ) = b ln b u ( b u ) , 1 ,
    where we naturally assume that 0 ln 0 = 0 . Interestingly, it has different left and right domains, i.e., Ω L Ω R . Due to the asymmetric structure of the domain of I-divergence, we need to carefully handle the β -divergence at the boundary of each domain. We categorize the class of the β -divergence that has the asymmetric domain structure in Section 2.
  • n o r m 2 -distance ( β = 2 ): Ω L = R n and Ω R = R n :
    D β ( b | u ) = 1 2 b u 2 2 .
    This divergence is preferable to other divergences, since it has R n as its domain for each variable. Unlike the previous two divergences, the domain of it naturally includes a negative region R n . Surprisingly, there are infinitely many β -divergences having R n as its domain. We will show it in Section 2.
Additionally, we introduce the Bregman-divergence associated with the convex function of Legendre type [3]. The Bregman-divergence D Φ : Ω × i n t ( Ω ) R + is formulated as
D Φ ( b | u ) = Φ ( b ) Φ ( u ) b u , Φ ( u ) ,
where the base function Φ is the convex function of Legendre type [3], Ω = d o m ( Φ ) = { x R n | Φ ( x ) R } , and i n t ( Ω ) is the interior of Ω . In fact, it is relatively interior of Ω , i.e., r i ( Ω ) . Note that r i ( Ω ) is the interior of Ω relative to its affine hull, which is the smallest affine set including Ω . Therefore, the relative interior r i ( Ω ) coincides with the interior i n t ( Ω ) when the affine hull of Ω is R n . For more details, see Chapter 2.H in [31]. In this article, since the β -divergence (1) is separable in terms of dimension, the affine hull of Ω is always R n and thus we simply use i n t ( Ω ) instead of r i ( Ω ) . Note that the typical examples of the β -divergence in the above (Itakura–Saito-divergence, I-divergence, and n o r m 2 -distance) can be reformulated with the Bregman-divergence (2) by using the convex function of Legendre type Φ and the associated domain Ω :
  • Itakura–Saito-divergence: Φ ( x ) = ln x , 1 with Ω = R + + n ,
  • I-divergence: Φ ( x ) = x ln x , 1 with Ω = R + n ,
  • n o r m 2 -distance: Φ ( x ) = 1 2 x 2 , 1 with Ω = R n .
The domain of the second variable of the Bregman-divergence (2) is always open set i n t ( d o m Φ ) . However, the right domain Ω R of the second variable of the β -divergence (1) could be a closed set. In the coming section, we thoroughly analyze the relation between the Bregman-divergence and the β -divergence with regard to its domain. Based on the Bregman-divergence (2), we introduce the Bregman variational model that unifies the various minimization problems appearing in image processing:
min u { D Φ ( b | u ) + λ R ( u ) | u i n t ( Ω ) } ,
where b is the observed data and R ( u ) is the sparsity enforcing regularization term, such as total variation [26]. In image processing, (3) corresponds to the denoising problem under the various noise distributions: Poisson, Speckle, Gaussian noise, etc. However, in optimization, it is known as (nonconvex) right Bregman proximity operator under mild conditions. See [9,10,24,30] for more details on the Bregman operator.

1.2. Overview

The article is organized as follows. In Section 2, we analyze the structure of the domain of the β -divergence. In Section 3, we study various mathematical structures of the β -divergence through the Bregman-divergence associated with the convex function of Legendre type. In Section 4, we introduce the Bregman variational model and its dual formulation for convex reformulation of the classic nonconvex problem that appears in the SAR speckle reduction problem. In addition, we introduce the right and left Bregman proximal operator. We give our conclusions in Section 5.

2. A Characterization of the Domain Ω L × Ω R of the β -Divergence

In this section, we analyze the structure of the β -divergence and the associated domain Ω L × Ω R based on the so-called extended logarithmic function.
Let us start with a definition of the extended logarithmic function that is essential in characterizing the domain of the β -divergence. We note that it corresponds to an equivalence class of Tsallis’s generalized logarithmic function [1,33] with an extention to the negative domain.
Definition 1.
Let α R , u = ( u 1 , , u n ) d o m ( ln α ) , and
ln α , c ( u ) = ln α , c ( u 1 ) , ln α , c ( u 2 ) , , ln α , c ( u n ) ,
where ln α , c ( u i ) = c u i x α d x and c R c , u = { c R | ln α , c ( u ) R n and c u i , i = 1 , , n } . Then, the extended logarithmic function is defined as an equivalence class
[ ln α ( u ) ] c = { x R n | x = ln α , c ( u ) , c R c , u } .
For simplicity, we leave out all constants after integration and then we attain
ln α ( u ) = ln u , i f α = 1 , 1 1 α u 1 α , otherwise ,
where d o m ( ln α ) = { x R n | ln α ( x ) R n } . We call (4) as the extended logarithmic function instead of the equivalence class [ ln α ( u ) ] c , unless otherwise specified.
Note that the domain and range of ln α ( u ) (4) are given in Table 2. In addition, we illustrate the structure of the extended logarithmic function in Figure 2. As noticed in Definition 1, the extended logarithmic function is defined as an equivalence class [ ln α ( u ) ] c with respect to c. If we set c = 1 , then we can recover Tsallis’ generalized logarithmic function [1,33] on its positive domain d o m ( ln α ) R + . See Figure 2a. However, we cannot use the generalized logarithm (i.e., ln α , c = 1 ( u ) ) in the negative domain. In fact, if α > 1 and R d o m ( ln α ) then Tsallis’ generalized logarithmic function is undefined, e.g., ln 4 , c = 1 ( 1 ) = 1 1 x 4 d x R . On the other hand, the proposed extended logarithmic function (4) is well defined on R for all α , since we can choose an appropriate c having the same sign with u even if α > 1 , e.g., ln 4 , c = 2 ( 1 ) = 2 1 x 4 d x R . See Figure 2d and Table 2. Indeed, the extended logarithmic function is useful when we simplify the complicated structure of the β -divergence. As described in the following Definition 2, the β -divergence is defined based on the difference of two extended logarithmic functions. In other words, the β -divergence is invariance with respect to a constant function in the extended logarithmic function (4). It is interesting that the Bregman-divergence (2) also has a similar invariance property with respect to an affine function in the base function Φ (see Proposition 1).
Definition 2.
Let ( b , u ) Ω L × Ω R = { ( b , u ) R n × R n | D β ( b | u ) R + } and β R . Then, the β-divergence is defined by
D β ( b | u ) = b [ ln 2 β ( b ) ln 2 β ( u ) ] [ ln 1 β ( b ) ln 1 β ( u ) ] , 1 = b u b u β 2 d u u b u β 1 d u , 1 .
After integration, we get the well-known formula of β-divergence:
D β ( b | u ) = b u ln b u 1 , 1 , if β = 0 , b ln b u ( b u ) , 1 , if β = 1 , b β 1 ( b β 1 u β 1 ) 1 β ( b β u β ) , 1 , if β 0 , 1 .
Although the β -divergence has a unified formula (5) via the extended logarithm (4), unfortunately, the determination of the domain Ω L × Ω R of the β -divergence heavily depends on β . Before we go any further, let us introduce the most important equivalence classes in this article. It will simplify complicated notations appearing in the β -divergence and the Bregman-divergence.
R e = { 2 k / ( 2 l + 1 ) | k , l Z , } R o = { ( 2 k + 1 ) / ( 2 l + 1 ) | k , l Z , } R x = R R o R e .
Note that R e and R o are subsets of the rational number and satisfy R e R o = , while R x is composed of all irrational numbers with a subset of the rational number that are not in R e R o . For instance, R e = { 0 , ± 2 3 , ± 4 3 , } , R o = { ± 1 , ± 1 3 , ± 5 3 , } and R x = { ± 1 4 , ± 1 2 , ± 2 , } .
Since the β -divergence (6) is developed based on the extended logarithmic function (i.e., power functions), inherently, we have to quantify the domain of a power function p ( x ) = x α and its inverse function p 1 ( y ) = y 1 / α . Actually, if x is positive, then a power function p ( x ) and the corresponding inverse function is well defined, irrespective of the choice of an exponent α R { 0 } . On the other hand, in the case of negative domain, e.g., x < 0 , the domain of a power function p ( x ) severely depends on the choice of an exponent α . With newly introduced equivalence classes in (7), we can easily categorize the domain of a power function p ( x ) = x α and its inverse function p 1 ( y ) = y 1 / α , α 0 . We summarize it in the following Lemma.
Lemma 1.
Let α R and d o m ( p ) = { x R | p ( x ) R } = d o m p R be the negative domain of a power function p ( x ) = x α . Then, p : d o m ( p ) r a n g e ( p ) has its negative domain
d o m ( p ) = R , if α R e R o , , if α R x ,
and the corresponding range
r a n g e ( p ) = R , if α R o , R + + , if α R e .
In addition, if α R o , then the inverse function of p is well defined and transparent on d o m ( p ) , i.e., p 1 p ( x ) = x for all x d o m ( p ) . However, if α R e { 0 } , then p 1 p ( x ) x for all x d o m ( p ) .
Proof. 
For any x R , let x = ( 1 ) | x | , then the power function p is expressed as p ( x ) = ( 1 ) α | x | α , α R . We note that the negative real domain of p ( x ) is well-defined only if ( 1 ) α { 1 , + 1 } . To clarify the evaluation of ( 1 ) α , let us express it in a polar form:
( 1 ) α = e i α ( 2 l + 1 ) π , l Z .
Then, we get
( 1 ) α = 1 , if α R e , 1 , if α R o , δ , if α R x ,
where δ C R . Regarding the inverse function p 1 ( y ) = y 1 / α , we have ( 1 ) 1 / a C R for all α R e { 0 } . However, if α R o , then ( 1 ) 1 / α = 1 . That is, we have p 1 p ( x ) = x , for all x R .  ☐
Now, with the equivalence classes (7) and Lemma 1, we classify domains of the β -divergence. The details are given in the following Theorem and Table 3. See also Figure 1 for the overall structure of the β -divergence on its domain.
Theorem 1.
Let us consider the domain of the β-divergence
Ω L × Ω R = { ( b , u ) R n × R n | D β ( b | u ) R + } ,
where
D β ( b | u ) = u b ( b x ) G ( x ) d x , 1
and G ( x ) = x β 2 = ( x 1 β 2 , x 2 β 2 , , x n β 2 ) . In addition, let us assume that the minimum value of G ( x ) on its domain is nonnegative, i.e.,
0 M G = m i n x { G ( x ) | x i [ min { u i , b i } , max { u i , b i } ] , i = 1 , , n } ,
where ( b , u ) Ω L × Ω R . Then, the domain of the β-divergence Ω L × Ω R is classified as in Table 3. Note that Ω R Ω L for all β R . In particular, if β ( 0 , 1 ] , then Ω R Ω L .
Proof. 
Due to the assumption M G 0 , we can easily obtain the positiveness of the β -divergence by the following inequality
D β ( b | u ) = u b ( b x ) x β 2 d x , 1 = u b ( b x ) G ( x ) d x , 1 1 2 b u 2 2 M G 0 .
Consequently, we only need to fulfill the following two conditions: (1) Is the domain of the β -divergence determined to satisfy (9)? (2) Is the β -divergence well-defined on its domain?
  • Case 1: M G 0 , for all ( b , u ) Ω L × Ω R .
    If ( b , u ) R + + n × R + + n then it is trivial to show M G 0 and thus it is always true that R + + n × R + + n Ω L × Ω R . Now, we will find β such that R × R Ω L × Ω R is satisfied. From Lemma 1, if x i < 0 then, we get the following
    G ( x i ) = | x i | β 2 , if β R e , | x i | β 2 , if β R o , , if β R x ,
    Based on β R , we have two different cases regarding the domain Ω L × Ω R .
    -
    If β R e , then, due to (9), the negative region cannot be included into the domain of the β -divergence. Therefore, we have
    Ω L × Ω R R + n × R + n .
    -
    If β R e , then due to (9), the domain of the β -divergence can be defined to include negative region. That is, we have
    R n × R n Ω L × Ω R .
  • Case 2: D β ( b | u ) < + , for all ( b , u ) Ω L × Ω R .
    Basically, the β -divergence can be expressed as
    D β ( b | u ) = b u b x β 2 d x u b x β 1 d x , 1 .
    That is, it is based on integrations of power functions of real variables u and b in R n . Therefore, we only need to see whether or not the integration in D β ( b | u ) is well defined at { 0 } . We note that, after integration, the exponents of b Ω L and u Ω R are different and thus the corresponding domains Ω R and Ω L could be different as well. Hence, we should consider the following three different cases:
    -
    β > 1 : We do not have any singularity at { 0 } with respect to b Ω L and u Ω R . Therefore, we have { 0 } Ω L = Ω R .
    -
    0 < β 1 : After integration, b Ω L does not have any singularity at { 0 } . However, u Ω R has a singularity at { 0 } . Therefore, we have { 0 } Ω L but { 0 } Ω R and thus Ω L Ω R in this region.
    -
    β 0 : In this case, both b Ω L and u Ω R have singularity at { 0 } . Thus, { 0 } Ω L = Ω R .
Based upon the analysis in Cases 1 and 2, we have six different choices of domain Ω L × Ω R for the β -divergence. It is summarized in Table 3 and illustrated in Figure 1. Since we only consider a convex domain, Ω L and Ω R should be selected as a convex set for each variable. In addition, due to the inherent integral formulation of β -divergence between b and u, the domain of both variables should be determined to have the same sign. ☐
As observed in Table 3, if β R e R , then there is a symmetry in the selection of the domain of the β -divergence. That is, Ω L × Ω R = R + + n × R + + n or R n × R n . Especially, if β = 0 , then the β -divergence corresponds to the Itakura–Saito-divergence D β ( b | u ) = b u log b u 1 , 1 , where the domain of it can be R + + n × R + + n or R n × R n . The positive domain is generally preferable, since it is related to real applications, e.g., intensity data type in the SAR system [11,12]. However, if we reformulate the β -divergence with the Bregman-divergence, then the negative domain commonly appears in the dual Bregman-divergence. In addition, we note that, due to Theorem 1, the β -divergence with the domain defined in Table 3 satisfies the following distance-like properties of the generic divergence [1,2]:
D β ( b | u ) R + ,
D β ( b | u ) = 0 if and only if b = u ,
where (11) is followed from the Definition of the domain of the β -divergence and (10). Note that (12) is satisfied, if we restrict the domain of the β -divergence as Ω R × Ω R . In fact, let us assume that ( b , u ) Ω R × Ω R . It is trivial to show that b = u D β ( b | u ) . Therefore, we only need to show D β ( b | u ) = 0 b = u . Letting b u , we then get D β ( b | u ) 1 2 b u 2 2 M G > 0 from (10).
Since the β -divergence (5) with its domain defined in Table 3 has the distance-like properties, (11) and (12), we can make a variational model with the β -divergence and a regularization term for smoothness constraint of the given data b. The following is an example of a variational model based on the β -divergence [12]
u β = arg min u B F β ( u ) = D β ( b | u ) + λ R ( u ) ,
where λ > 0 and B Ω R is a domain of F β for a given b B . Note that B is an open convex set induced from the physical constraints of the observed data b. In the case of a prior R ( u ) , it can be a sparsity-enforcing function such as total variation (TV) T V ( u ) = u 1 [25,26] and frame [34]. We call (13) the β-sparse model or β-TV [12], if TV is used as a prior. Under the domain restriction in Table 3, actually, we have lots of freedom in choosing β R of the β -sparse model (13). However, if we add additional constraints, such as convexity onto the β -divergence, then interestingly, the possible choice of β is dramatically reduced to a small set. For example, D β ( b | u ) with respect to u is convex on its domain only if β [ 1 , 2 ] [6]. Outside of this region, i.e., β R [ 1 , 2 ] , the convexity of it depends on the given data b [12]. In Section 4.1, we analyze the convexity of the β -divergence via the right Bregman proximity operator [28].
Although the proposed β -sparse model F β ( u ) in (13) is not convex in general, F β ( u ) has an interesting global optimum property [11,12], in case λ = 0 . See also [35,36]. For completeness of the article, we add it below.
Theorem 2 ([11,36]).
For a given observed data { b 1 , . . , b | N | } , let B be an open convex set in Ω R , β R , b j B , and μ = 1 | N | j = 1 | N | b j . Then, we obtain that
j = 1 | N | D β ( b j | u ) j = 1 | N | D β ( b j | μ )
is always satisfied, regardless of the choice of u B .
Note that μ in (14) corresponds to the β -centroid in segmentation problem and is related to the Bregman centroid, which is extensively studied in [37]. In SAR image processing, if β = 0 , then (14) corresponds to the multi-looking process, which is commonly used to reduce speckle noise in SAR data [12,22,23].

3. The Bregman-Divergence Associated with the Convex Function of Legendre Type for the β -Divergence

In this section, we study the Bregman-divergence associated with the convex function of Legendre type and its connection to the β -divergence. Although there is partial equivalence between two divergences, the Bregman-divergence has an important mathematical dual formulation. With a negative domain, the dual Bregman-divergence is unambiguously useful for convex reformulation of the nonconvex β -sparse model (13) (see Section 4).
Before we proceed further, let us first review the convex function of Legendre type (see Section 26 [3]). See also [27,29].
Definition 3.
Let Φ : Ω R be lower semicontinuous, convex, and proper function defined on Ω = d o m Φ R n . Then, Φ is the convex function of Legendre type (or Legendre), if the following conditions are satisfied (i.e., Φ is essentially strictly convex and essentially smooth):
1. 
i n t ( Ω ) .
2. 
Φ is differentiable on i n t ( Ω ) .
3. 
x b d ( Ω ) and y i n t ( Ω ) ,
lim t 0 Φ ( x + t ( y x ) ) , y x = .
4. 
Φ is strictly convex on i n t ( Ω ) .
Here, Ω = d o m Φ = { x R n Φ ( u ) R } is a convex set and b d ( Ω ) is the boundary of the domain Ω.
The main advantage of the convex function of Legendre type is that the inverse function of the gradient of it has an isomorphism with the gradient of its conjugate function as described below. This is a useful property when we characterize the dual structure of the Bregman-divergence associated with the convex function of Legendre type.
Theorem 3 ([3,27]).
Let Ω = d o m Φ and Ω = d o m Φ = { x R n | Φ ( x ) R } . Then, the function Φ is the convex function of Legendre type if and only if its conjugate
Φ ( x ) = sup ξ Ω x , ξ Φ ( ξ )
is the convex function of Legendre type. In this case, the gradient mapping
Φ : i n t ( Ω ) i n t ( Ω )
is an isomorphism with its inverse mapping ( Φ ) 1 = Φ .
For more details on Theorem 3, see Theorem 26.5 in [3] and Fact 2.9 in [27]. Let us assume that Φ be the convex function of Legendre type. Then, we can define the Bregman-divergence D Φ : Ω × i n t ( Ω ) R + associated with Legendre Φ :
D Φ ( b | u ) = Φ ( b ) Φ ( u ) b u , Φ ( u ) ,
where b Ω and u i n t ( Ω ) . Several functions we are interested in are in a category of the convex function of Legendre type. For instance, Shannon entropy function Φ ( x ) = x log x , 1 is a typical example of Legendre. the Bregman-divergence associated with it corresponds to the β -divergence with β = 1 , i.e., Generalized Kullback–Leibler-divergence. We note that there is the convex function of Legendre type which does not have a corresponding β -divergence. For instance, Fermi–Dirac entropy function Φ ( x ) = x ln x + ( 1 x ) ln ( 1 x ) , 1 is Legendre and the associated Bregman divergence is the Logistic loss function D Φ ( b | u ) = b ln b u + ( 1 b ) ln 1 b 1 u , 1 . See [27,36] for more details on the Bregman-divergence. In the following, we summarize various useful features of the Bregman-divergence associated with Legendre Φ .
Theorem 4.
Let Ω = d o m Φ and Ω = d o m Φ . Then, the Bregman-divergence associated with the convex function of Legendre type Φ satisfies the following.
1. 
D Φ ( b | u ) is strictly convex with respect to b on i n t ( Ω ) .
2. 
For any u i n t ( Ω ) , D Φ ( b | u ) is coercive with respect to b, i.e., lim b D Φ ( b | u ) .
3. 
For some b i n t ( Ω ) , D Φ ( b | u ) is coercive with respect to u if and only if Ω = i n t ( Ω ) .
4. 
D Φ ( b | u ) = 0 if and only if b = u , where b i n t ( Ω )
5. 
For any b , u i n t ( Ω ) ,
D Φ ( b | u ) = D Φ ( Φ ( u ) , Φ ( b ) ) ,
where Φ is the conjugate function of Φ.
6. 
For any b , u i n t ( Ω ) ,
D Φ ( b | u ) = D Φ ( Φ ( u ) , Φ ( b ) ) .
Proof. 
Since Φ is the convex function of Legendre type, Φ is strictly convex on i n t ( Ω ) . Hence, ( 1 ) is trivial. The proofs of ( 2 ) ( 6 ) are in Theorem 3.7, Theorem 3.9 and Corollary 3.11 in [27]. ☐
In the above Theorem, the dual formulation (17) is a unique feature of the Bregman-divergence with Legendre. Unfortunately, the β -divergence does not have a corresponding dual concept. Later, we will show how to use the dual Bregman-divergence (17) to make a convex reformulated model of the nonconvex β -sparse model (13). In addition, we note that the β -divergence (5) is established based on the extended logarithmic function (4), which is an equivalence class in terms of a constant function. Therefore, we can say that the β -divergence is invariant with respect to a constant function in the extended logarithmic function. Interestingly, the base function Φ of the Bregman-divergence also has such kind of invariance property with respect to an affine function. For this, Φ does not need to be Legendre. However, for simplicity, we assume that Φ is Legendre. The details are following.
Proposition 1.
Let us define an equivalence class of the convex function of Legendre type Φ in terms of affine function as follows:
[ Φ ] A = { Φ ( x ) R | Φ ( x ) = Φ ( x ) + c , x + d , 1 , c , d R n } ,
where A = { c , x + d , 1 | c , d R n } . Then, the Bregman-divergence D [ Φ ] A associated with an equivalence class [ Φ ] A is equal to the Bregman-divergence D Φ associated with Φ, irrespective of the choice of an affine function A .
Proof. 
We have the following equivalence with respect to an arbitrary affine function A :
D [ Φ ] A ( b | u ) = [ Φ ( b ) ] A [ Φ ( u ) ] A b u , [ Φ ( u ) ] A = ( Φ ( b ) + c , b + d , 1 ) ( Φ ( u ) + c , u + d , 1 ) b u , Φ ( u ) + c = Φ ( b ) Φ ( u ) b u , Φ ( u ) = D Φ ( b | u ) ,
where ( b , u ) Ω × i n t ( Ω ) and Ω = d o m Φ . Therefore, Φ with an arbitrary affine function A (i.e., an equivalence class [ Φ ] A in terms of affine function A ) does not change the structure of the Bregman-divergence D Φ ( b | u ) at all. ☐
To connect the β -divergence and the Bregman-divergence associated with Legendre, we need to find a specialized convex function of Legendre type. Based on the comments in [2], we use an integral formula of the extended logarithmic function for the special convex function of Legendre type. Through this connection, we can reformulate the β -divergence into the Bregman-divergence associated with the convex function of Legendre type. The details are following.
Theorem 5.
Let x Ω R n and
Φ ( x ) = d x ln 2 β ( t ) d t , 1 = ln x , 1 , if β = 0 , x ln x , 1 , if β = 1 , 1 β ( β 1 ) x β , 1 , if β 0 , 1 ,
where Ω = d o m Φ = { x R n | Φ ( x ) R } , ln α ( t ) is the extended logarithmic function in (4), d is an arbitrary constant vector in R n and be selected to satisfy the condition Φ ( x ) R . For simplicity, by Proposition 1, we dropped all affine function in Φ ( x ) . Then, Φ in (19) is the convex function of Legendre type with the domain Ω given below:
I . entire region : β > 1 , β R e and Ω = R n , II . positive region : 0 < β 1 and Ω = R + n , β 0 and Ω = R + + n , III . negative region : 0 < β < 1 , β R e and Ω = R n , β < 0 , β R e and Ω = R n .
Proof. 
For simplicity, all affine functions are left out based on Proposition 1. In addition, it is trivial to show that Φ ( x ) is the Burg entropy ( ln x , 1 with Ω = R + + n ) if β = 0 and the Shannon entropy ( x ln x , 1 with Ω = R + n ) if β = 1 . They are well-known examples of Legendre. As noticed in (2), the corresponding Bregman-divergences are Itakura–Saito-divergence and Generalized Kullback–Leibler-divergence.
Now, we only need to check whether Φ ( x ) = 1 β ( β 1 ) x β , 1 ( β 0 , 1 ) is Legendre or not. Among four conditions in Definition 3, it is trivial to show that Φ ( x ) satisfies the conditions 1 and 2. In the end, Φ ( x ) with β 0 , 1 and two Legendre conditions 3 and 4 are left.
I.
Condition 3 in Definition 3:
Since Φ ( x ) = 1 β ( β 1 ) x β , 1 ( β 0 , 1 ) is a power function, the only possible boundary of Φ is { 0 } . Therefore, we search β R { 0 , 1 } to find where condition 3 is satisfied. Since Φ ( x ) = 1 β 1 x β 1 , we get the following at the potential boundary { 0 } :
-
y R + + n :
lim t 0 Φ ( 0 + t y ) , y = , if β < 1 and β 0 , 0 , if β > 1 .
-
y R n :
lim t 0 Φ ( 0 + t y ) , y = , if β < 1 and β R e { 0 } , + , if β < 1 and β R o , 0 , if β > 1 .
In summary, the following are the possible domains and the corresponding β for condition 3. We note that, if β > 1 , then 0 b d ( d o m Φ ) :
-
R + + n Ω and β ( , 1 ) { 0 } ,
-
R n Ω and β ( ( , 1 ) { 0 } ) R e .
II.
Condition 4 in Definition 3:
It can be easily checked by the fact that Φ is strictly convex on i n t ( Ω ) if and only if Φ is strictly monotone, that is, the following is satisfied [38] :
Φ ( x ) Φ ( x ) , x x > 0 whenever x x .
Since Φ is separable in terms of dimension, we only need to show that Φ ( x ) is a strictly increasing function on i n t ( Ω ) R . Note that if 2 Φ ( x ) = x β 2 > 0 on an open region, then Φ is strictly convex (i.e., Φ is strictly increasing) in that region:
-
2 Φ ( x ) > 0 if β R { 0 , 1 } and x R + + n ,
-
2 Φ ( x ) > 0 if β ( R { 0 , 1 } ) R e and x R n .
Note that at { 0 } , we need to directly show that Φ is strictly increasing.
Now, we integrate the information in the above Legendre conditions 3 and 4 for the decision of the domain Ω = d o m Φ based on β . The details are the following:
  • β > 1
    -
    β R e : d o m Φ = R n and x R n , x β 1 = | x | β 1 . Therefore, Φ ( x ) is strictly increasing function on i n t ( d o m Φ ) since Φ ( x ) = Φ ( x ) .
    -
    β R o : d o m Φ = R n but Φ is an odd function with respect to zero and thus it is not a convex function.
    -
    β R x : d o m Φ = R + n but it does not satisfy condition 3 at 0 .
  • 0 < β < 1
    -
    β R e : d o m Φ = R n but d o m Φ = R n { 0 } is not a convex set. Since i n t ( d o m Φ ) d o m Φ [3], to keep convexity of the domain i n t ( d o m Φ ) , we need to select i n t ( d o m Φ ) = R n or R + + n . That is, we have d o m Φ = R n or d o m Φ = R + n . In both cases, we know that Φ is strictly increasing on i n t ( d o m Φ ) , since 2 Φ = x β 2 > 0 , x i n t ( d o m Φ ) .
    -
    β R o : Following the case β R e , we have d o m Φ = R + n or R n . If d o m Φ = R + n , then Φ is strictly increasing on i n t ( d o m Φ ) ( 2 Φ > 0 ). However, if x < 0 , then 2 Φ ( x ) < 0 and thus it is not convex on its negative domain.
    -
    β R x : d o m Φ = R + n and Φ is strictly increasing function on i n t ( d o m Φ ) since 2 Φ = x β 2 > 0 for all x i n t ( d o m Φ ) .
    -
    Condition 3 is satisfied on the above selected domain.
  • β < 0
    -
    β R e : d o m Φ = R n { 0 } is not a convex set. Therefore, we need to restrict d o m Φ as a convex set, i.e., d o m Φ = R n or d o m Φ = R + + n . On both domains, Φ is strictly increasing ( 2 Φ > 0 , x d o m Φ ).
    -
    β R o : Following the case β R e , we have d o m Φ = R + + n or R n . If d o m Φ = R + + n then Φ is strictly increasing on i n t ( d o m Φ ) .
    -
    β R x : d o m Φ = R + + n and Φ is strictly increasing function on i n t ( d o m Φ ) since 2 Φ = x β 2 > 0 for all x i n t ( d o m Φ ) .
    -
    Condition 3 is satisfied on the above selected domain.
 ☐
Remark 1.
Note that Φ in (19) should be an equivalence class
[ Φ ( x ) ] A = d x [ ln 2 β ( t ) ] c d t , 1
with an affine function A = { α , x + γ , 1 | α = f ( c ) , γ = g ( c , d ) } . Here, [ ln 2 β ( t ) ] c is an equivalence class of an extended logarithmic function in (4). As observed in (18), we have D [ Φ ] A ( b | u ) = D Φ ( b | u ) for any affine function A . Therefore, we can drop all affine function in [ Φ ] A .
Since Φ in (19) is Legendre under the domain condition (20), we can establish a new Bregman-divergence associated with Φ in (19). Interestingly, it corresponds to the β -divergence (1) [2]. However, there is a mismatch between the domain of the Bregman-divergence in (20) and the domain of the β -divergence in Table 3. We summarize it in Table 1. As a matter of fact, the positive domain with β > 1 is not defined in Φ (19) due to the Legendre condition. In addition, in the case of β = 0 , the negative domain R n × R n is not defined in the Bregman-divergence with Φ (19). In the following Theorem, we show that under the restriction of the domain of the β -divergence to the domain of the Bregman-divergence, we can get an equivalence between the β -divergence and the Bregman-divergence associated with Legendre Φ (19).
Theorem 6.
Let us consider the β-divergence (1) and the Bregman-divergence (16) associated with Legendre Φ (19). If we restrict the domain of the β-divergence with the domain of the Bregman-divergence associated with Φ (19), which is Ω L × Ω R = d o m Φ × i n t ( d o m Φ ) (see Table 1), then the β-divergence is equal to the Bregman-divergence associated with Legendre Φ(19).
Proof. 
Since the domain of the β -divergence Ω L × Ω R is set up with d o m Φ × i n t ( d o m Φ ) , the β -divergence is well defined with the restricted domain. In the following, we show an equivalence between the β -divergence and the Bregman-divergence under the domain condition of the Bregman-divergence:
D β ( b | u ) = u b b t t 2 β d t , 1 = u b t β 1 d t , 1 + b , u b t β 2 d t = ( 1 β ) u b Φ ( t ) d t , 1 + b , 1 β 1 ( b β 1 u β 1 ) = Φ ( b ) Φ ( u ) 1 β 1 u β 1 ( b u ) = Φ ( b ) Φ ( u ) b u , Φ ( u ) .
Note that we do not use any Φ ( b ) information in the above derivation and thus the above equivalence is well satisfied within the domain of the Bregman-divergence associated with Legendre Φ (19). ☐
In the following Theorem, we calculate the conjugate function Φ and the corresponding domain d o m Φ of the convex function of Legendre type Φ defined in (19). The computation of the domain of Φ is useful in determining the structure of D Φ ( b | u ) . For instance, as noticed in Theorem 4 (3), if d o m Φ is open, then the corresponding Bregman-divergence D Φ ( b | u ) is coercive with respect to u i n t ( d o m Φ ) . Surprisingly, when β [ 0 , 1 ) , D Φ ( b | u ) is not convex but coercive with respect to u. This fact is importantly used in SAR speckle reduction problems [12,15,21].
Theorem 7.
Let Φ (19) be the convex function of Legendre type with d o m Φ (20). Then, Φ , the conjugate of Φ, and the corresponding domain d o m Φ is calculated as follows:
  • β = 0 and d o m Φ = R + + n : d o m Φ = R n
    Φ ( x ) = 1 + ln ( x ) , 1 .
  • β = 1 and d o m Φ = R + n : d o m Φ = R n
    Φ ( x ) = exp ( x 1 ) , 1 .
  • β { 0 , 1 } :
    Φ ( x ) = 1 β ( ( β 1 ) x ) β β 1 , 1 .
    In this case, d o m Φ depends on β:
    β > 1 , β R e , and d o m Φ = R n : d o m Φ = R n , 0 < β < 1 , and d o m Φ = R + n : d o m Φ = R n , 0 < β < 1 , β R e , and d o m Φ = R n : d o m Φ = R + + n , β < 0 , and d o m Φ = R + + n : d o m Φ = R n , β < 0 , β R e , and d o m Φ = R n : d o m Φ = R + n .
Proof. 
Since Φ is Legendre, from Theorem 3, if x i n t ( d o m Φ ) , then we have
Φ ( x ) = x , Φ 1 ( x ) Φ ( Φ 1 ( x ) ) .
As noticed in (20), the domain of Φ (19) depends on β and thus the domain of its conjugate function Φ also depends on β . We categorize d o m Φ below, by using (22):
  • β = 0 : Φ ( x ) = ln x , 1 and d o m Φ = R + + n . From (22), the conjugate function Φ is calculated as
    Φ ( x ) = x 1 x + ln 1 x , 1 = 1 ln ( x ) , 1 .
    Therefore, the domain of Φ becomes d o m Φ = R n .
  • β = 1 : Φ ( x ) = x ln x , 1 and d o m Φ = R + n . From (22), the conjugate function Φ is calculated as
    Φ ( x ) = x exp ( x 1 ) exp ( x 1 ) ln ( exp ( x 1 ) ) , 1 = x exp ( x 1 ) ( x 1 ) exp ( x 1 ) , 1 = exp ( x 1 ) , 1 .
    It is trivial to show that d o m Φ = R n .
  • β 0 , 1 : Φ ( x ) = 1 β ( β 1 ) x β , 1 and d o m Φ is given in (20). By simple calculation, we get
    Φ 1 ( x ) = ( ( β 1 ) x ) 1 ( β 1 ) , x i n t ( d o m Φ ) .
    and from (22), the conjugate function Φ is derived as follows:
    Φ ( x ) = x ( ( β 1 ) x ) 1 / ( β 1 ) 1 β ( β 1 ) ( ( β 1 ) x ) β / ( β 1 ) , 1 = 1 β ( ( β 1 ) x ) β / ( β 1 ) , 1 .
    Now, we need to decide the domain
    d o m Φ = { x R n | Φ ( x ) R } .
    While we identify d o m Φ , it should be selected based on the following isomorphism (in Theorem 3)
    Φ : i n t ( d o m Φ ) i n t ( d o m Φ ) ,
    where Φ ( x ) = 1 β 1 x β 1 = ( Φ ) 1 and the following estimation
    β / ( β 1 ) R e , if β R e .
    With the above information and the classification of d o m Φ in (20), we are going to decide d o m Φ based on β .
    -
    β > 1 and β R e and d o m Φ = R n :
    We have β / ( β 1 ) R e R + + and thus d o m Φ = R n . In addition, for all x R n , Φ ( x ) = 1 β 1 x β 1 R n is well defined.
    -
    0 < β < 1 and d o m Φ = R + n :
    In this case, β / ( β 1 ) < 0 and β 1 < 0 . Therefore, for all β ( 0 , 1 ) , we have d o m Φ = R n . In addition, for all x R + + n , we have Φ ( x ) R n . That is, the isomorphism between i n t ( d o m Φ ) and i n t ( d o m Φ ) is well defined.
    -
    0 < β < 1 and β R e and d o m Φ = R n :
    In this case, β / ( β 1 ) < 0 and β / ( β 1 ) R e and β 1 < 0 . Therefore, the possible d o m Φ is R n or R + + n . However, we need to choose d o m Φ = R + + n from the isomorphic mapping Φ : i n t ( d o m Φ ) i n t ( d o m Φ ) .
    -
    β < 0 and d o m Φ = R + + n :
    In this case, 0 < β / ( β 1 ) < 1 . Therefore, we have d o m Φ = R n . Actually, this domain is well matched with the bijective mapping Φ ( x ) = 1 β 1 x β 1 < 0 for all x > 0 .
    -
    β < 0 and β R e and d o m Φ = R n :
    In this case, β / ( β 1 ) ( 0 , 1 ) R e . The possible domain is d o m Φ = R n . However, d o m Φ = R n R + + n . From Theorem 23.4 in [3] , i n t ( d o m Φ ) d o m Φ . Due to the convexity constraint of the domain, d o m Φ = R + n or d o m Φ = R n . Through the isomorphic mapping Φ ( x ) = 1 β 1 x β 1 > 0 , x < 0 , we select d o m Φ = R + n .
 ☐
In the following, the global minimization property of the β -divergence in Theorem 2 is reformulated with the Bregman-divergence. For more details, see [35,36].
Theorem 8 ([36]).
For all b i i n t ( d o m Φ ) with an index set i N , the following inequality is always satisfied, irrespective of the choice of u i n t ( d o m Φ ) :
i N D Φ ( b i | u ) i N D Φ ( b i | μ ) ,
where μ = 1 | N | i N b i and | N | is the cardinality of the set N.
Proof. 
Let us start with the generalized Pythagoras Theorem [35] of the Bregman-divergence:
D Φ ( b | u ) D Φ ( b | μ ) = D Φ ( μ | u ) + Φ ( μ ) Φ ( u ) , b μ .
For all b i i n t ( d o m Φ ) with the index set N, let μ = 1 | N | i N b i . Then, from (26), we get
1 | N | i N D Φ ( b i | u ) D Φ ( b i | μ ) = D Φ ( μ | u ) + Φ ( μ ) Φ ( u ) , b i μ .
Since D Φ ( μ | u ) 0 , we have
i N D Φ ( b i | u ) i N D Φ ( b i | μ ) ,
irrespective of the choice of u i n t ( d o m Φ ) .  ☐
Note that μ in (25) corresponds to the Bregman centroid, which is extensively studied in [37]. Now, we are ready to jump into the variational model having the Bregman-divergence as its fitting term. Many various important variational models induced from the statistical distribution are in this category.

4. Bregman Variational Model—Bregman-TV

In this section, we study the β -sparse model (13) with TV regularization via Bregman divergence (16) associated with Legendre (19) under the domain condition in (20). First, we introduce Bregman proximity operators in Section 4.1 and then we demonstrate how to use dual Bregman-divergence with the negative domain for convex reformulation of the nonconvex β -TV model [12] in Section 4.2.
The image data in general is observed in 2 D array and have a limited dynamic range, due to physical constraints of the image capturing system. Therefore, let us assume that the observed image data is bounded and also column-wise stacked. That is, b B R + n , where B is an open and bounded convex set. Now, we start with the following Bregman variational model with total variation, i.e., Bregman-TV.
min u B D Φ ( b | L u ) + λ T V ( u ) ,
where T V ( u ) = i = 1 n u i is a typical sparsity constraint in image processing, L : R + n R + n is a linear mapping, λ > 0 , and B i n t ( d o m Φ ) R + n . Although the domain B is nonnegative in real applications, through the dual formulation of the Bregman-divergence (17), the nonpositive domain is very common and sometimes is useful for convex reformulation of nonconvex variational models appearing in SAR image enhancement problems. See Theorem 7 for the negative domain of the conjugate function Φ .
We note that L is a matrix with nonnegative entries and it is designed based on various applications of image processing, e.g., for the image deblurring problem, L is a blur or a convolution matrix; for an image inpainting problem, L is a binary mask matrix; for an image denoising problem, L is an identity matrix. See [39,40] for more details on image denoising, deblurring, and inpainting problems with total variation or other sparsity constraints such as wavelet frames. The following are typical examples of the Bregman-TV induced from the various physical noise sources, e.g., Gaussian, Poisson, and Speckle noise:
  • β = 2 : Image restoration problems (e.g., denoising, deblurring, inpainting) under Gaussian noise [25,26,41]
    min u B 1 2 b L u 2 2 + λ T V ( u ) .
  • β = 1 : Image restoration problems (e.g., denoising, deblurring, inpainting) under Poisson noise [32]
    min u B L u b ln L u , 1 + λ T V ( u ) .
  • β = 0 : Image restoration problems (e.g., denoising, deblurring, inpainting) under Gamma multiplicative noise (or speckle noise) [15]
    min u B b L u + ln L u , 1 + λ T V ( u ) .
  • β ( 0 , 1 ) : A convex relaxed model [12] for the above SAR image restoration model (30). Additionally, this region is related to the compound Poisson distribution [8]
    min u V b ( L u ) β 1 1 β + ( L u ) β β , 1 + λ T V ( u ) .
For the remainder of this article, we only consider the image denoising problem ( L = I ). It is also known as the (nonconvex) right Bregman proximity operator [10].

4.1. Bregman Proximity Operators

In this section, we introduce the right and left Bregman proximity operator [9,10,30] based on the Bregman-divergence associated with Φ (19). In this section, let us assume that Φ (19) is convex, smooth, and (dimensionally) separable function (not necessarily Legendre). That is, the Bregman-divergence D Φ ( b | u ) associated with Φ (19) also exists in the positive domain d o m Φ = R + n with β > 1 . See Table 1.
We note that the Bregman-divergence D Φ ( b | u ) associated with Φ (19) is strictly convex with respect to b (see Theorem 4). On the other hand, convexity of D Φ ( b | u ) with respect to u strongly depends on the observed data b and β in Φ (19). Based on [9,12,28], we present three different convexities of the Bregman-divergence associated with Φ . Let Ω = d o m Φ . Then, we have the following:
  • The Bregman-divergence D Φ (16) is jointly convex if D Φ ( b | u ) is convex with respect to ( b , u ) on i n t ( Ω ) × i n t ( Ω ) .
  • The Bregman-divergence D Φ (16) is separately convex if D Φ ( b | u ) is convex with respect to u i n t ( Ω ) for all b i n t ( Ω ) .
  • The Bregman-divergence D Φ (16) is conditionally convex if D Φ ( b | u ) is convex with respect to u B for all b B , where B i n t ( Ω ) is an open convex set and depends on b.
We note that the conditional convexity is first introduced in this article based on the previous analysis of the β -TV model [12]. The reason we are interested in conditional convexity is that, in real applications, the dynamic range of the observed data is very limited. For instance, the observed image data via an optical camera have 8-bit resolution (i.e., b [ 0 , 2 8 ] ) [41] and the intensity level of the backscattered radar signal in SAR system is 32-bit resolution (i.e., b [ 0 , 2 32 ] ) at most [21]. Therefore, it is natural to consider convexity depending on the given data b.
The following Theorem, mostly based on Theorem 3.3 in [28], is useful in characterizing convexity of the Bregman-divergence D Φ ( b | u ) associated with Φ (19). We note that D i a g ( A ) is a vector with diagonal entries of a matrix (or tensor) A. Also, a function f is concave if and only if f ( α u + ( 1 α ) v ) α f ( u ) + ( 1 α ) f ( v ) , for all α [ 0 , 1 ] .
Theorem 9.
Let D Φ ( b | u ) be the Bregman-divergence associated with convex, smooth, and (dimensionally) separable function Φ. In addition, we assume that h = Diag ( 2 Φ ) > 0 , then we have the following useful criterion for the convexity of D Φ ( b | u ) . Here, Ω = d o m Φ .
(i) 
D Φ ( b | u ) is jointly convex if and only if 1 / h is concave. Note that, since h = ( h 1 , , h n ) is (dimensionally) separable, 1 / h is defined as 1 / h ( u ) = ( 1 / h 1 ( u ) , , 1 / h n ( u ) ) , it is concave if and only if h satisfy the following inequality:
h ( u ) + h ( u ) ( u b ) h 2 ( u ) h ( b ) , b , u i n t ( Ω ) .
Moreover, if 2 h exists, then D Φ ( b | u ) is jointly convex if and only if
h ( u ) Diag ( 2 h ( u ) ) 2 ( Diag ( h ) ) 2 .
(ii) 
D Φ ( b | u ) is separately convex if and only if
h ( u ) + h ( u ) ( u b ) 0 , b , u i n t ( Ω ) .
(iii) 
D Φ ( b | u ) is conditionally convex if and only if
h ( u ) + h ( u ) ( u b ) 0 , b , u B ,
where B is an open convex set in i n t ( Ω ) and depends on b [12,42].
Proof. 
The proof of the first two convexities are given in Theorem 3.3 in [28]. For the conditional convexity, let us take second derivatives of D Φ ( b | u ) with respect to u. Then, we get
h ( u ) + h ( u ) ( u b ) 0 .
For each b B b i n t ( d o m Φ ) , we can find the domain of u B u i n t ( d o m Φ ) satisfying the above condition. Let B be a convex and open set satisfying B B b B u . Then, we have the conditional convexity condition in (35). ☐
The following Theorem shows an interesting result that D Φ ( b | u ) associated with Φ (19) is convex on its whole domain with respect to u in a very limited region β [ 1 , 2 ] . From a statistical point of view, this region is a little bit curious. In fact, if β ( 1 , 2 ) then the Bregman-divergence D Φ ( b | u ) does not have the corresponding statistical Tweedie distribution [8].
Theorem 10.
Let Φ be a convex and smooth function in (19) (not necessarily Legendre). Then, D Φ ( b | u ) is separately convex (and also jointly convex) with the following domain conditions:
i n t ( Ω ) = R + + n , if β [ 1 , 2 ) , R n , if β [ 1 , 2 ) R e , R n , if β = 2 .
Due to the physical constraints of the observed data b, if we further restrict the domain of b , then we have conditional convexity of D Φ ( b | u ) . Let α = ( β 2 ) / ( β 1 ) , if β 1 and B m / M ± be a constant vector in R n representing B m / M ± 1 . Then, we have the following:
  • Case I: Let us assume that the given data b be positive and have the following limitation in measurement
    b i n t ( Ω b + ) = { b R + + n | b ( B m + , B M + ) } .
    Then, D Φ ( b | u ) is conditionally convex on B, which is given below:
    -
    β ( 2 , + ) : B = { u R + + n | u ( m a x ( B m + , α B M + ) , B M + ) , } where α ( 0 , 1 ) .
    -
    β ( , 1 ) : B = { u R + + n | u ( B m + , m i n ( α B m + , B M + ) ) , } where α ( 1 , + ) .
  • Case II: Let us assume that the given data b is negative and has the following limitation in measurement
    b i n t ( Ω b ) = { b R n | b ( B m , B M ) } .
    Then, D Φ ( b | u ) is conditionally convex on B, which is given below:
    -
    β ( 2 , + ) R e : B = { u R n | u ( B m , m i n ( α B m , B M ) ) , } where α ( 0 , 1 ) .
    -
    β ( , 1 ) R e { 0 } : B = { u R n | u ( m a x ( B m , α B M ) , B M ) , } where α ( 1 , 2 ) ( 2 , + ) .
Proof. 
Since Φ in (19) is sufficiently smooth, we use (34) with h ( u ) = u β 2 . To find separately convex region, we need to find β satisfying
h ( u ) + h ( u ) ( u b ) = u β 3 [ ( β 1 ) u ( β 2 ) b ] 0 , b , u i n t ( Ω ) ,
where we need to decide the corresponding domain Ω based on Table 1. In the case of conditional convexity, let us assume that the domain of b is limited as (37) or (38). In the following, we summarize the separate and conditional convexity of D Φ .
  • i n t ( Ω ) = R n with β > 1 and β R e :
    We simplify (39) as
    u 0 and u α b o r u 0 and u α b .
    Then, we have
    -
    β = 2 : α = 0 and thus u does not depend on b in (39). In fact, for any b i n t ( Ω ) , we can select arbitrary u i n t ( Ω ) .
    -
    β 2 : In this case, the domain of u depends on b. For instance, if β = 3 and b > 0 , then the domain of u bounded below, i.e., u 0 . 5 b > 0 . Therefore, the domain of u cannot be the whole region i n t ( Ω ) = R n . This restriction is related to conditional convexity of D Φ and is given in the following cases; i n t ( Ω ) = R + + n ( β R ) and R n ( β R e { 0 } ) .
  • i n t ( Ω ) = R + + n and β R :
    We simplify (39) as
    ( β 1 ) u ( β 2 ) b .
    -
    β > 1 : (40) is simplified as u α b and we get
    β ( 1 , 2 ) : Since α ( , 0 ) , u α b is satisfied for all u , b i n t ( Ω ) = R + + n .
    β > 2 : Let us assume that the given data b is bounded, i.e., (37) is satisfied. From u α b and α ( 0 , 1 ) , we have to satisfy the condition u α B M + and thus the restricted domain B corresponding to (37) is given as
    B = { u R + + n | u ( m a x ( B m + , α B M + ) , B M + ) } .
    In fact, for all b B , D Φ ( b | u ) is convex in terms of u B .
    -
    β = 1 : (40) is satisfied for all u , b i n t ( Ω ) = R + + n .
    -
    β < 1 : Let us assume that b satisfies (37). Then, from u α b with α ( 1 , + ) , we get u α B m + and thus the restricted domain B corresponding to (37) is given as
    B = { u R + + n | u ( B m + , m i n ( α B m + , B M + ) ) } .
  • i n t ( Ω ) = R n and β R e { 0 } :
    We simplify (39) as
    ( β 1 ) u ( β 2 ) b .
    -
    β > 1 : (41) is simplified as u α b and we get
    β ( 1 , 2 ) : α ( , 0 ) and u α b is satisfied for all u , b i n t ( Ω ) = R n
    β > 2 : Let us assume that b satisfies (38). Then, from u α b with α ( 0 , 1 ) , we get u α B m and thus the restricted domain B corresponding to (38) is given as
    B = { u R n | u ( B m , m i n ( α B m , B M ) ) } .
    -
    β < 1 : Let us assume that b satisfy (38). Then, from u α b with α ( 1 , 2 ) ( 2 , + ) , we get u α B M and thus the restricted domain B corresponding to (38) is given as
    B = { u R n | u ( m a x ( B m , α B M ) , B M ) } .
In addition, we note that it is not difficult to see that D Φ ( b | u ) is jointly convex on its domain i n t ( Ω ) × i n t ( Ω ) , where i n t ( Ω ) is given in (36). From the joint convexity condition in (33), we get the following condition
( β 2 ) ( β 1 ) u 2 ( β 3 ) 0 .
Since the exponent of u is even, i.e., u 2 ( β 3 ) 0 for all u i n t ( Ω ) (36), if β [ 1 , 2 ] , then ( β 2 ) ( β 1 ) 0 and thus D Φ ( b | u ) is jointly convex under the domain constraints in (36). ☐
Based on joint (and separate) convexity of D Φ associated with Φ (19) on the domain (36), we can define the right Bregman proximity operator P λ T V : i n t ( Ω ) i n t ( Ω ) associated with Φ (19) as follows:
P λ T V ( b ) = arg min u i n t ( Ω ) D Φ ( b | u ) + λ T V ( u ) ,
where D Φ ( b | u ) associated with Φ (19) is strictly convex, smooth, and coercive with respect to u and total variation is also convex and a coercive function [40]. We note that D Φ ( b | u ) with the domain i n t ( Ω ) (36) is coercive with respect to u, due to the joint convexity condition in (32) and h 2 ( u ) h ( b ) > 0 . For more details on the right Bregman proximal operator, see [10,24]. We should be cautious that, although the right Bregman proximal operator P λ T V ( b ) (42) is well defined for the given data b i n t ( Ω ) , its usefulness in real applications is limited, due to the separately convex condition β [ 1 , 2 ] . Actually, in the case of β = 2 , it is just an ordinary proximal operator [9]. Note that, in real applications such as SAR [12], D Φ ( b | u ) with β [ 0 , 1 ) and the positive domain constraints (37) (i.e., D Φ ( b | u ) is conditionally convex) is used. In this case, it corresponds to not convex but coercive right Bregman proximity operator. See Theorem 4 (3) for the coercivity of the operator on its domain i n t ( Ω ) .
Now, let us consider Φ (19) with the domain condition in Table 1. Then, D Φ ( b | u ) is coercive and strictly convex in terms of b (see Theorem 4). Hence, we can also define the left Bregman proximity operator P λ T V : i n t ( Ω ) i n t ( Ω ) as follows:
P λ T V ( u ) = arg min b i n t ( Ω ) D Φ ( b | u ) + λ T V ( b ) .
Unlike the right Bregman proximity operator, the left Bregman proximity operator (43) associated with the base function Φ in (19) is strictly convex and coercive for all β R on its domain i n t ( Ω ) , where Ω = d o m Φ . We note that the left Bregman proximity operator can be characterized in a more simple way as
P λ T V = ( Φ + λ T V ) 1 Φ ,
where T V is a subgradient of T V . Actually, P λ T V is a maximal monotone operator. See [30,43] for more details on Bregman proximity operator and the corresponding Bregman–Moreau envelopes.
Remark 2.
We could use β-divergence to define proximity operators. For instance, the right β-divergence proximity operator can be defined as
P λ T V β ( b ) = arg min u Ω R D β ( b | u ) + λ T V ( u ) .
Instead of TV in (44), if we use an indicator function
ι S ( u ) = 0 , if u S , + , if u S ,
for a convex set S, then we get the right β-divergence projection operator for S as follows:
P ι S β ( b ) = arg min u Ω R D β ( b | u ) + ι S ( u ) .
It is interesting that the robustness of β-divergence [7] can be explained through the right β-divergence projection operator (45). Let us assume that b , u in (45) are probability distributions (i.e., i b i = 1 and i u i = 1 ) and S is a set of Gaussian distributions. Here, the notation is slightly abused, since the Gaussian distribution is a continuous probability distribution and it is not a convex set. We note that the (generalized) Kullback–Leibler (KL) divergence (i.e., D β ( b | u ) with β = 1 ), which is a commonly used similarity measure between two probability distributions, is undefined at zero probability ( u = 0 ). See Figure 1a and Table 1. However, outliers (i.e., rare events) have extremely low probability and thus they exist near zero probability. In this case, KL-divergence amplify the value near zero, i.e., lim u 0 D β = 1 ( b | u ) = + . However, when β > 1 , as noticed in Figure 1a and Table 1, lim u 0 D β > 1 ( b | u ) < + . Thus, outliers which exist near zero are not weighted too much. Hence, the right β-divergence projection operator (45) with β > 1 is more robust to outliers than the KL-divergence-based operator. For more details, see [4,5,7]. Note that we can also define the left β-divergence proximity operator as
P λ T V β ( u ) = arg min b Ω L D β ( b | u ) + λ T V ( b ) .

4.2. Dual Bregman-Divergence-Based Left Bregman Operator for a Convex Reformulation of the Bregman-TV with β < 1

In this section, we introduce a convex reformulation of the nonconvex Bregman-TV (27) ( β < 1 and L = I ) associated with Φ (19), which is the convex function of Legendre type and its domain is given in Table 1. Note that the problems we study in this section are related to the speckle reduction problem [12,15].
Due to the Theorem 4, we have the following reformulated Bregman-TV:
min u B D Φ ( Φ ( u ) | Φ ( b ) ) + λ T V ( u ) ,
where B = { x R + + n | x ( B m + , B M + ) } with B m + ϵ 1 for some ϵ > 0 . See [12,42] for real SAR data processing applications where the box constraint B is a critical element of the performance. Now, let w = Φ ( u ) and the corresponding domain Φ ( B ) , then we have the left Bregman proximity operator associated with the dual Bregman-divergence:
P λ T V Φ ( Φ ( b ) ) = arg min w Φ ( B ) F ( w ) = D Φ ( w | Φ ( b ) ) + λ T V ( Φ ( w ) ) ,
where Φ ( B ) i n t ( d o m Φ ) and we use Φ Φ ( w ) = w . Since β < 1 , Φ = 1 β 1 x β 1 is strictly increasing on its domain i n t ( d o m Φ ) = R + + n . Therefore, we have the transformed domain defined on R n as
Φ ( B ) = { x R n | x ( Φ ( B m + ) , Φ ( B M + ) ) } ,
where < Φ ( ϵ ) Φ ( B m + ) < Φ ( B M + ) < 0 and it is also a convex set. Moreover, Φ is also a strictly convex function by the following Lemma.
Lemma 2.
Let us assume that β < 1 and i n t ( d o m Φ ) = R + + n . Then, we have
Φ ( w ) = [ ( β 1 ) w ] 1 / ( β 1 ) ,
where w i n t ( d o m Φ ) = R n . Note that Φ is strictly increasing and strictly convex on its domain i n t ( d o m Φ ) . However, it is not coercive but bounded below. In fact, we have Φ ( w ) 0 , w i n t ( d o m Φ ) .
Proof. 
Let f ( w ) = Φ ( w ) . Then, since β < 1 and w R n , we have
2 f ( w ) = ( 2 β ) [ ( β 1 ) w ] 3 2 β β 1 > 0 .
Since Φ is Legendre, Φ is strictly increasing on its domain. We note that, although Φ is strictly convex and strictly increasing, it is not coercive but bounded below, i.e.,
lim w Φ ( w ) = 0 .
 ☐
Finally, by using strict convexity and strictly increasing property of Φ , we have a unique solution of the minimization problem in (48).
Theorem 11.
Let Φ (19) be the convex function of Legendre type and its domain is given in Table 1. In addition, we assume that β < 1 and B = { x R + + n | x ( B m + , B M + ) } . Then, for the given data b B , the left Bregman proximity operator (48) associated with the dual Bregman-divergence is well-defined. That is, there is a unique solution w = arg min w Φ ( B ) F ( w ) .
Proof. 
Since Φ (19) is Legendre, Φ is also Legendre on the domain R n ( = i n t ( d o m Φ ) ) . Therefore, D Φ ( w | x ) is strictly convex and coercive in terms of w by Theorem 4 (2). In addition, T V is a composition of · 1 D , where D is a linear matrix (i.e, first order difference matrix) [44] and a 1 = i | a i | . Hence, we have
T V ( Φ ( w ) ) = · 1 D Φ ( w ) ,
where Φ ( w ) is strictly convex (Lemma 2) and D is a linear operator and · 1 is a simple metric (convex and increasing). Therefore, T V ( Φ ( w ) ) is also a convex function (see Section IV.2.1 in [38]). In addition, since Φ ( w ) is lower bounded (Lemma 2), T V ( Φ ( w ) ) is also lower bound. Then, the objective function F ( w ) in (48) is coercive (see Lemma 2.12 in [10]) and strictly convex. In the end, the left Bregman proximity operator associated with dual Bregman-divergence has an unique solution (see Proposition 3.5 in [10]) as
w = P T V Φ ( Φ ( b ) ) ,
where b B and w = arg min w Φ ( B ) F ( w ) in (48). Regarding the domain Φ ( B ) , since Φ is Legendre, | Φ ( x ) | as | x | 0 . Therefore, we need to keep a distance from { 0 } to assure | Φ ( x ) | < + . In fact, as noticed in (49), the transformed domain Φ ( B ) is a convex set and away from { 0 } . ☐
The above Theorem is quite surprising. We can get a unique solution of the nonconvex Bregman-TV (27) ( β < 1 and L = I ) through the left Bregman proximity operator (48) with an additional isomorphic transformation mapping Φ Φ = I as
u Φ P ( λ T V Φ ) Φ ( b ) .
However, in general, due to the severe nonlinearity of Φ within the non-smooth regularizer, i.e., T V ( Φ ( w ) ) , it is not easy to design a stable numerical algorithm to find a solution u in (51). To overcome this drawback, we can directly modify (47) with a constraint w = Φ ( u ) as
min w , u D Φ ( w | Φ ( b ) ) + λ T V ( u ) ,
subject to the following constraints:
u B , w Φ ( B ) , w = Φ ( u ) ( or u = Φ ( w ) ) .
Since w = Φ ( u ) is a nonlinear constraint and thus we cannot directly apply highly sophisticated augmented Lagrangian-based optimization algorithm. As a heuristic, to remedy these nonlinear constraints, we may consider the following penalty method [45]:
( w ( k + 1 ) , u ( k + 1 ) ) = arg min w Φ ( B ) , u B D Φ ( w | Φ ( b ) ) + ρ 2 w Φ ( u ( k ) ) 2 + τ 2 u Φ ( w ( k ) ) 2 + λ T V ( u ) .
This model is convex in terms of w and u, respectively. However, it is not convex with respect to ( u , w ) . In case of speckle reduction problems (30), nonlinearity of Φ could be reduced by using a shifting technique in [42].
In the following example, we show how (51) can be applied to relax nonconvex speckle reduction problems (30) with L = I .
Example 1.
Let us consider the following nonconvex minimization problems. For a given b B ,
min u B D Φ ( b | u ) + λ T V ( u ) ,
where B = { x R + + n | x ( B m + , B M + ) } is an open convex set with B m + ϵ 1 for some ϵ > 0 . Note that Φ ( u ) = ln u , 1 is the convex function of Legendre type ( β = 0 ) and
D Φ ( b | u ) = b u ln b u 1 , 1
is the Bregman-divergence associated with Φ (Burg entropy) function. This model is known as AA-model [15]. It is well known that it is not easy to find a global minimizer of (55), due to the severe nonconvexity of D Φ ( b | u ) in terms of u [15,21]. Therefore, various transform-based convex relaxation approaches are introduced [12,16,17,18,19,20,21,42]. In this example, we are going to use dual Bregman-divergence to find a solution of (55). We note that Φ ( u ) = ln u , 1 is the convex function of Legendre type on its domain i n t ( d o m Φ ) = R + + n . Hence, by Theorem 7, we get the following corresponding conjugate function:
Φ ( w ) = 1 ln ( w ) , 1 .
This function is also the convex function of Legendre type on its domain i n t ( d o m Φ ) = R n . Now, by using the dual Bregman-divergence, we get a convex reformulated version of (55) as
w = arg min w B w F ( w ) , u = 1 w ,
where
F ( w ) = ln ( w ) , 1 w , b + T V 1 w
and
B w = x R n | x 1 B m + , 1 B M + .
We note that B w = Φ ( B ) is a convex set and T V ( 1 w ) is also convex for all w R n . Therefore, the objective function F ( w ) in (56) is strictly convex on its domain B w . In addition, due to the Theorem 4 (2), F ( w ) is coercive in the domain R n . Therefore, we have a unique solution u of (55). A similar inverse transformation on the positive domain R + + n itself is introduced in [45,46].

5. Conclusions

In this article, we introduced the extended logarithmic function and, based on that, we could redefine the domain of the β -divergence. In fact, we have found that if β is in the class R e = { x R | x = 2 k / ( 2 l + 1 ) , k , l Z } , then the negative region R n should be included into the domain of the β -divergence. In addition, if we use the integral of the extended logarithmic function as a base function of the Bregman-divergence, then we have a partial equivalence between the β -divergence and the Bregman-divergence associated with the Legendre base function. Last but not least, by using dual formulation of the Bregman-divergence associated with convex function of Legendre type and the negative domain of it, we have shown that we could make a convex reformulated model of the nonconvex variational model that appears in the SAR speckle reduction problem. The approaches in this article could be extended to other divergences, such as α - and γ -divergences [2]. In addition, we could plug the presented model into various segmentation problems [11,47,48].

Acknowledgments

I am thankful to the reviewers for their valuable comments and suggestions. This work was supported by the Basic Science Program through the NRF of Korea funded by the Ministry of Education (NRF-2015R101A1A01061261) and by the Education and Research Promotion Program of KOREATECH.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Amari, S.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Washington, DC, USA, 2000. [Google Scholar]
  2. Cichocki, A.; Amari, S. Families of Alpha- Beta- and Gamma- divergences: Flexible and robust measures of similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef]
  3. Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
  4. Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimizing a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
  5. Eguchi, S.; Kano, Y. Robustifying Maximum Likelihood Estimation. Available online: https://www.researchgate.net/profile/Shinto_Eguchi/publication/228561230_Robustifing_maximum_likelihood_estimation_by_psi-divergence/links/545d65910cf2c1a63bfa63e6/Robustifing-maximum-likelihood-estimation-by-psi-divergence.pdf (accessed on 7 September 2017).
  6. Fevotte, C.; Idier, J. Algorithm for Nonnegative Matrix Factorization with the beta-divergence. Neural Comput. 2011, 23, 2421–2456. [Google Scholar] [CrossRef]
  7. Samek, W.; Blythe, D.; Müller, K.-R.; Kawanabe, M. Robust Spatial Filtering with Beta Divergence. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 1007–10015. [Google Scholar]
  8. Jorgensen, B. The Theory of Dispersion Models; Chapman & Hall: London, UK, 1997. [Google Scholar]
  9. Bauschke, H.H.; Combettes, P.L. Iterating Bregman Retractions. SIAM J. Optim. 2003, 13, 1159–1173. [Google Scholar] [CrossRef]
  10. Bauschke, H.H.; Combettes, P.L.; Noll, D. Joint minimization with alternating Bregman proximity operators. Pac. J. Optim. 2006, 2, 401–424. [Google Scholar]
  11. Woo, H. Beta-divergence based two-phase segmentation model for synthetic aperture radar images. Electron. Lett. 2016, 52, 1721–1723. [Google Scholar] [CrossRef]
  12. Woo, H.; Ha, J. Besta-divergence-based variational model for speckle reduction. IEEE Signal Proc. Lett. 2016, 23, 1557–1561. [Google Scholar] [CrossRef]
  13. Fevotte, C.; Bertin, N.; Durrieu, J.-L. Nonnegative Matrix Factorization with the Itakura–Saito Divergence: with application to music analysis. Neural Comput. 2009, 21, 793–830. [Google Scholar] [CrossRef] [PubMed]
  14. Lobry, S.; Denis, L.; Tupin, F. Multitemporal SAR image decomposition into strong scatterers, background, and speckle. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3419–3429. [Google Scholar] [CrossRef]
  15. Aubert, G.; Aujol, J.F. A variational approach to remove multiplicative noise. SIAM J. Appl. Math. 2008, 68, 925–946. [Google Scholar] [CrossRef]
  16. Bioucas-Dias, J.M.; Figueiredo, M.A.T. Multiplicative noise removal using variable splitting and constrained optimization. IEEE Trans. Image Process. 2010, 19, 1720–1730. [Google Scholar] [CrossRef] [PubMed]
  17. Huang, Y.M.; Ng, M.K.; Wen, Y.W. A new total variation method for multiplicative noise removal. SIAM J. Imaging Sci. 2009, 2, 20–40. [Google Scholar] [CrossRef]
  18. Kang, M.; Yun, S.; Woo, H. Two-level convex relaxed variational model for multiplicative denoising. SIAM J. Imaging Sci. 2011, 6, 875–903. [Google Scholar] [CrossRef]
  19. Shi, J.; Osher, S. A nonlinear inverse scale space method for a convex multiplicative noise model. SIAM J. Imaging Sci. 2008, 1, 294–321. [Google Scholar] [CrossRef]
  20. Steidl, G.; Teuber, T. Removing multiplicative noise by Douglas-Rachford splitting. J. Math Imaging Vis. 2010, 36, 168–184. [Google Scholar] [CrossRef]
  21. Yun, S.; Woo, H. A new multiplicative denoising variational model based on m-th root transformation. IEEE Trans. Image Process. 2012, 21, 2523–2533. [Google Scholar] [PubMed]
  22. Bamler, R. Principles of synthetic aperture radar. Surv. Geophys. 2000, 21, 147–157. [Google Scholar] [CrossRef]
  23. Oliver, C.; Quegan, S. Understanding Synthetic Aperture Radar Imaging; SciTech Publishing: Raleigh, NC, USA, 2004. [Google Scholar]
  24. Bauschke, H.H.; Wang, X.; Ye, J.; Yuan, X. Bregman distances and Chebyshev sets. J. Approx. Theory 2009, 159, 3–25. [Google Scholar] [CrossRef]
  25. Chambolle, A. An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 2004, 20, 89–97. [Google Scholar]
  26. Rudin, L.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D 1992, 60, 259–268. [Google Scholar] [CrossRef]
  27. Bauschke, H.H.; Borwein, J.M. Legendre functions and the method of random Bregman projections. J. Convex Anal. 1997, 4, 27–67. [Google Scholar]
  28. Bauschke, H.H.; Borwein, J.M. Joint and separate convexity of the Bregman distance. In Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications; Burnariou, D., Censor, Y., Reich, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2001. [Google Scholar]
  29. Bauschke, H.H.; Borwein, J.M.; Combettes, P.L. Essential smoothness, essential strict convexity and Legendre functions in Banach spaces. Commun. Contemp. Math. 2001, 3, 615–647. [Google Scholar] [CrossRef]
  30. Bauschke, H.H.; Dao, M.N.; Lindstrom, S.B. Regularizing with Bregman-Moreau envelopes. arXiv, 2017; arXiv:1705.06019v1. [Google Scholar]
  31. Rockafellar, R.T.; Wets, R.J.-B. Variational Analysis; Springer: New York, NY, USA, 1998. [Google Scholar]
  32. Setzer, S.; Steidl, G.; Teuber, T. Deblurring Poissonian images by split Bregman techniques. J. Vis. Commun. Image Represent. 2010, 21, 193–199. [Google Scholar] [CrossRef]
  33. Ashok, M.; Sundaresan, R. Minimization problems based on relative α-entropy II: Reverse projection. IEEE Trans. Inf. Theory 2015, 61, 5081–5095. [Google Scholar] [CrossRef]
  34. Durand, S.; Fadili, J.; Nikolova, M. Multiplicative noise removal using L1 fidelity on frame coefficients. J. Math Imaging Vis. 2010, 36, 201–226. [Google Scholar] [CrossRef] [Green Version]
  35. Teboulle, M. A unified continuous optimization framework for center-based clustering methods. J. Mach. Learn. Res. 2007, 8, 65–102. [Google Scholar]
  36. Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
  37. Nielsen, F.; Nock, R. Sided and symmetrized Bregman centroids. IEEE Trans. Inf. Theory 2009, 55, 2882–2904. [Google Scholar] [CrossRef]
  38. Hiriart-Urruty, J.-B.; Lemarechal, C. Convex Analysis and Minimization Algorithms: Part 1: Fundamentals; Springer: New York, NT, USA, 1996. [Google Scholar]
  39. Chan, T.F.; Shen, J. Image Processing and Analysis; SIAM: Philadelphia, PA, USA, 2005. [Google Scholar]
  40. Aubert, G.; Kornprobst, P. Mathematical Problems in Image Processing; Springer: New York, NY, USA, 2006. [Google Scholar]
  41. Yun, S.; Woo, H. Linearized proximal alternating minimization algorithm for motion deblurring by nonlocal regularization. Pattern Recognit. 2011, 44, 1312–1326. [Google Scholar] [CrossRef]
  42. Woo, H.; Yun, S. Alternating minimization algorithm for speckle reduction with a shifting technique. IEEE Tran. Image Process. 2012, 21, 1701–1714. [Google Scholar]
  43. Kan, C.; Song, W. The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Anal. 2012, 75, 1385–1399. [Google Scholar] [CrossRef]
  44. Micchelli, C.A.; Shen, L.; Xu, Y. Proximity algorithms for image models: denoising. Inverse Probl. 2011, 27, 045009. [Google Scholar] [CrossRef]
  45. Nie, X.; Qiao, H.; Zhang, B. A Variational model for PolSAR data speckle reduction based on the Wishart distribution. IEEE Trans. Image Process. 2015, 24, 1209–1222. [Google Scholar] [PubMed]
  46. Oh, A.K.; Willett, R.M. Regularized Non-Gaussian Image Denoising. arXiv, 2015; arXiv:1508.02971v1. [Google Scholar]
  47. Chan, T.F.; Vese, L. Active contours without edges. IEEE Trans. Image Process. 2001, 10, 266–277. [Google Scholar] [CrossRef] [PubMed]
  48. Chan, T.F.; Esedoglu, S.; Nikolova, M. Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 2006, 66, 1632–1648. [Google Scholar] [CrossRef]
Figure 1. The graphs of the β -divergence D β ( b | u ) , which is based on the proposed extended logarithmic function ln α ( u ) in (4). (a) and (b) shows D β ( b | u ) for β 1 with different choice of b, i.e., b = 1 , 1 ; (c) and (d) shows D β ( b | u ) for β 1 with different choice of b i.e., b = 1 , 1 . Note that D β ( b | u ) with β = 1 is not defined if u R .
Figure 1. The graphs of the β -divergence D β ( b | u ) , which is based on the proposed extended logarithmic function ln α ( u ) in (4). (a) and (b) shows D β ( b | u ) for β 1 with different choice of b, i.e., b = 1 , 1 ; (c) and (d) shows D β ( b | u ) for β 1 with different choice of b i.e., b = 1 , 1 . Note that D β ( b | u ) with β = 1 is not defined if u R .
Entropy 19 00482 g001
Figure 2. The graphs of the extended logarithmic function in Definition 1. (a) shows an equivalence class [ ln α ( u ) ] c = { ln α , 0 ( u ) , ln α , 1 ( u ) , ln α , 2 ( u ) } with α = 2 3 ; (b) shows ln α , 1 ( u ) = 1 u 1 x α d x with different choice of α = 2 7 , 4 7 , 6 7 ; (c) and (d) show ln α ( u ) in (4) for different choices of α . Note that ln α ( u ) is an extended logarithmic function without a constant term.
Figure 2. The graphs of the extended logarithmic function in Definition 1. (a) shows an equivalence class [ ln α ( u ) ] c = { ln α , 0 ( u ) , ln α , 1 ( u ) , ln α , 2 ( u ) } with α = 2 3 ; (b) shows ln α , 1 ( u ) = 1 u 1 x α d x with different choice of α = 2 7 , 4 7 , 6 7 ; (c) and (d) show ln α ( u ) in (4) for different choices of α . Note that ln α ( u ) is an extended logarithmic function without a constant term.
Entropy 19 00482 g002
Table 1. We compare the domain of the β -divergence and the domain of the Bregman-divergence associated with the convex function of Legendre type in (19). We note that the domain R + n × R + n ( β > 1 ) and the domain R n × R n ( β = 0 ) do not exist in the Bregman-divergence. If we relax the Legendre condition of Φ as a convex and smooth function, then the Bregman-divergence D Φ also exists in the region β > 1 with d o m Φ = R + n .
Table 1. We compare the domain of the β -divergence and the domain of the Bregman-divergence associated with the convex function of Legendre type in (19). We note that the domain R + n × R + n ( β > 1 ) and the domain R n × R n ( β = 0 ) do not exist in the Bregman-divergence. If we relax the Legendre condition of Φ as a convex and smooth function, then the Bregman-divergence D Φ also exists in the region β > 1 with d o m Φ = R + n .
Region β -Divergence Bregman-Divergence
entire β > 1 , β R e Ω L = Ω R = R n β > 1 , β R e d o m Φ = R n
β > 1 Ω L = Ω R = R + n - -
positive 0 < β 1 Ω L = R + n and Ω R = R + + n 0 < β 1 d o m Φ = R + n
β 0 Ω L = Ω R = R + + n β 0 d o m Φ = R + + n
negative 0 < β < 1 , β R e Ω L = R n and Ω R = R n 0 < β < 1 , β R e d o m Φ = R n
β 0 , β R e Ω L = Ω R = R n β < 0 , β R e d o m Φ = R n
Table 2. The domain and range of the extended logarithmic function ln α ( x ) defined in (4).
Table 2. The domain and range of the extended logarithmic function ln α ( x ) defined in (4).
α = 1 α < 1 α > 1
α R o α R e α R x α R o α R e α R x
d o m ( ln α ) R + + R R R + R + + or R R + + or R R + +
r a n g e ( ln α ) R R + R R + R R or R + + R
Table 3. A classification of the domain Ω L × Ω R = { ( b , u ) R n × R n | D β ( b | u ) R + } of the β -divergence in terms of β .
Table 3. A classification of the domain Ω L × Ω R = { ( b , u ) R n × R n | D β ( b | u ) R + } of the β -divergence in terms of β .
β Ω L × Ω R
β > 1 and β R e R + n × R + n
β > 1 and β R e R n × R n
0 < β 1 and β R e R + n × R + + n
0 < β 1 and β R e R + n × R + + n or R n × R n
β 0 and β R e R + + n × R + + n
β 0 and β R e R + + n × R + + n or R n R n

Share and Cite

MDPI and ACS Style

Woo, H. A Characterization of the Domain of Beta-Divergence and Its Connection to Bregman Variational Model. Entropy 2017, 19, 482. https://doi.org/10.3390/e19090482

AMA Style

Woo H. A Characterization of the Domain of Beta-Divergence and Its Connection to Bregman Variational Model. Entropy. 2017; 19(9):482. https://doi.org/10.3390/e19090482

Chicago/Turabian Style

Woo, Hyenkyun. 2017. "A Characterization of the Domain of Beta-Divergence and Its Connection to Bregman Variational Model" Entropy 19, no. 9: 482. https://doi.org/10.3390/e19090482

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop