**3. The Duo Fenchel–Young Divergence and Its Corresponding Duo Bregman Divergence**

Inspired by formula of Equation (29), we shall define the *duo Fenchel–Young divergence* using a *dominance condition* on a pair (*F*1(*θ*), *F*2(*θ*)) of strictly convex generators.

**Definition 1** (duo Fenchel–Young divergence)**.** *Let F*1(*θ*) *and F*2(*θ*) *be two strictly convex functions such that F*1(*θ*) ≥ *F*2(*θ*) *for any θ* ∈ Θ<sup>12</sup> = dom(*F*1) ∩ dom(*F*2)*. Then the duo Fenchel–Young divergence YF*1,*F*<sup>∗</sup> <sup>2</sup> (*θ*, *<sup>η</sup>* ) *is defined by*

$$Y\_{F\_1, F\_2^\*}(\theta, \eta') := F\_1(\theta) + F\_2^\*(\eta') - \theta^\top \eta'. \tag{35}$$

When *F*1(*θ*) = *F*2(*θ*) =: *F*(*θ*), we have *F*<sup>∗</sup> <sup>1</sup> (*η*) = *F*<sup>∗</sup> <sup>2</sup> (*η*) =: *F*∗(*η*), and we retrieve the ordinary Fenchel–Young divergence [17]:

$$Y\_{\mathcal{F},\mathcal{F}^\*}(\theta,\eta') := F(\theta) + F^\*(\eta') - \theta^\top \eta' \ge 0. \tag{36}$$

Note that in Equation (35), we have *η* = ∇*F*2(*θ* ).

**Property 1** (Non-negative duo Fenchel–Young divergence)**.** *The duo Fenchel–Young divergence is always non-negative.*

**Proof.** The proof relies on the reverse dominance property of strictly convex and differentiable conjugate functions:

**Lemma 1** (Reverse majorization order of functions by the Legendre–Fenchel transform)**.** *Let F*1(*θ*) *and F*2(*θ*) *be two Legendre-type convex functions [14]. Then if F*1(*θ*) ≥ *F*2(*θ*) *then we have F*∗ <sup>2</sup> (*η*) ≥ *F*<sup>∗</sup> <sup>1</sup> (*η*)*.*

**Proof.** This property is graphically illustrated in Figure 3. The reverse dominance property of the Legendre–Fenchel transformation can be checked algebraically as follows:

$$F\_1^\*(\eta) \quad = \sup\_{\theta \in \Theta} \{ \eta^\top \theta - F\_1(\theta) \},\tag{37}$$

$$\eta = \eta^\perp \theta\_1 - F\_1(\theta\_1) \qquad \text{(with } \eta = \nabla F\_1(\theta\_1)\text{)},\tag{38}$$

$$<\langle \, \eta \, | \, \theta\_1 - F\_2(\theta\_1) \, \tag{39}$$

$$\leq \sup\_{\theta \in \Theta} \{ \eta^\top \theta - F\_2(\theta) \} = F\_2^\*(\eta). \tag{40}$$

Thus we have *F*∗ <sup>1</sup> (*η*) ≤ *F*<sup>∗</sup> <sup>2</sup> (*η*) when *F*1(*θ*) ≥ *F*2(*θ*). Therefore it follows that *YF*1,*F*<sup>∗</sup> <sup>2</sup> (*θ*, *<sup>η</sup>* ) ≥ 0 since we have

$$Y\_{\overline{F}\_1, \overline{F}\_2^\*} (\theta, \eta') \quad := \quad F\_1(\theta) + F\_2^\*(\eta') - \theta^\top \eta',\tag{41}$$

$$\geq \quad F\_1(\theta) + F\_1^\*(\eta') - \theta^\top \eta' = \mathcal{Y}\_{F\_1, F\_1^\*}(\theta, \eta') \geq 0,\tag{42}$$

where *YF*1,*F*<sup>∗</sup> <sup>1</sup> is the ordinary Fenchel–Young divergence, which is guaranteed to be nonnegative from the Fenchel–Young inequality.

**Figure 3.** (**a**) Visual illustration of the Legendre–Fenchel transformation: *F*∗(*η*) is measured as the vertical gap (left long black line with both arrows) between the origin and the hyperplane of the "slope" *η* tangent at *F*(*θ*) evaluated at *θ* = 0. (**b**) The Legendre transforms *F*∗ <sup>1</sup> (*η*) and *F*<sup>∗</sup> <sup>1</sup> (*η*) of two functions *F*1(*θ*) and *F*2(*θ*) such that *F*1(*θ*) ≥ *F*2(*θ*) reverse the dominance order: *F*<sup>∗</sup> <sup>2</sup> (*η*) ≥ *F*<sup>∗</sup> <sup>1</sup> (*η*).

We can express the duo Fenchel–Young divergence using the primal coordinate systems as a generalization of the Bregman divergence to two generators that we term the duo Bregman divergence (see Figure 4) :

$$B\_{F\_1, F\_2}(\theta : \theta') := Y\_{F\_1, F\_2^\*}(\theta, \eta') = F\_1(\theta) - F\_2(\theta') - (\theta - \theta') \, ^\vee \nabla F\_2(\theta'), \tag{43}$$

with *η* = ∇*F*2(*θ* ).

This generalized Bregman divergence is non-negative when *F*1(*θ*) ≥ *F*2(*θ*). Indeed, we check that

$$B\_{F\_1, F\_2}(\theta : \theta') \quad = \quad F\_1(\theta) - F\_2(\theta') - (\theta - \theta') \cap \nabla F\_2(\theta'), \tag{44}$$

$$\geq \quad F\_{2}(\theta) - F\_{2}(\theta') - (\theta - \theta') \, ^\top \nabla F\_{2}(\theta') = B\_{\mathbb{F}\_{2}}(\theta : \theta') \geq 0. \tag{45}$$

**Figure 4.** The duo Bregman divergence induced by two strictly convex and differentiable functions *F*<sup>1</sup> and *F*<sup>2</sup> such that *F*1(*θ*) ≥ *F*2(*θ*). We check graphically that *BF*1,*F*<sup>2</sup> (*θ* : *θ* ) ≥ *BF*<sup>2</sup> (*θ* : *θ* ) (vertical gaps).

**Definition 2** (duo Bregman divergence)**.** *Let F*1(*θ*) *and F*2(*θ*) *be two strictly convex functions such that F*1(*θ*) ≥ *F*2(*θ*) *for any θ* ∈ Θ<sup>12</sup> = dom(*F*1) ∩ dom(*F*2)*. Then the generalized Bregman divergence is defined by*

$$B\_{F\_1, F\_2}(\theta : \theta') = F\_1(\theta) - F\_2(\theta') - (\theta - \theta')^\top \nabla F\_2(\theta') \ge 0. \tag{46}$$

**Example 2.** *Consider F*1(*θ*) = *<sup>a</sup>* <sup>2</sup> *<sup>θ</sup>*<sup>2</sup> *for a* <sup>&</sup>gt; <sup>0</sup>*. We have <sup>η</sup>* <sup>=</sup> *<sup>a</sup>θ, <sup>θ</sup>* <sup>=</sup> *<sup>η</sup> <sup>a</sup> , and*

$$F\_1^\*(\eta) = \frac{\eta^2}{a} - \frac{a}{2} \frac{\eta^2}{a^2} = \frac{\eta^2}{2a}. \tag{47}$$

*Let F*2(*θ*) = <sup>1</sup> <sup>2</sup> *<sup>θ</sup>*<sup>2</sup> *so that <sup>F</sup>*1(*θ*) <sup>≥</sup> *<sup>F</sup>*2(*θ*) *for <sup>a</sup>* <sup>≥</sup> <sup>1</sup>*. We check that <sup>F</sup>*<sup>∗</sup> <sup>1</sup> (*η*) = *<sup>η</sup>*<sup>2</sup> <sup>2</sup>*<sup>a</sup>* ≤ *F*<sup>∗</sup> <sup>2</sup> (*η*) *when a* ≥ 1*. The duo Fenchel–Young divergence is*

$$\mathcal{Y}\_{\mathbb{F}\_1, \mathbb{F}\_2^\*} (\theta, \eta') = \frac{a}{2} \theta^2 + \frac{1}{2} \eta'^2 - \theta \eta' \ge 0,\tag{48}$$

*when a* ≥ 1*. We can express the duo Fenchel–Young divergence in the primal coordinate systems as*

$$B\_{\rm F\_1,F\_2}(\theta,\theta') = \frac{a}{2}\theta^2 + \frac{1}{2}\theta'^2 - \theta\theta'.\tag{49}$$

*When a* = 1*, F*1(*θ*) = *F*2(*θ*) = <sup>1</sup> <sup>2</sup> *<sup>θ</sup>*<sup>2</sup> := *<sup>F</sup>*(*θ*)*, and we obtain BF*(*θ*, *<sup>θ</sup>* ) = <sup>1</sup> <sup>2</sup> *θ* − *θ* 2 <sup>2</sup>*, half the squared Euclidean distance as expected. Figure 5 displays the graph plot of the duo Bregman divergence for several values of a.*

**Figure 5.** The duo half squared Euclidean distance *D*<sup>2</sup> *<sup>a</sup>* (*θ* : *θ* ) := *<sup>a</sup>* <sup>2</sup> *<sup>θ</sup>*<sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> *<sup>θ</sup>* <sup>2</sup> <sup>−</sup> *θθ* is non-negative when *<sup>a</sup>* <sup>≥</sup> 1: (**a**) half squared Euclidean distance (*<sup>a</sup>* <sup>=</sup> 1), (**b**) *<sup>a</sup>* <sup>=</sup> 2, (**c**) *<sup>a</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> , which shows that the divergence can be negative then since *a* < 1.

**Example 3.** *Consider <sup>F</sup>*1(*θ*) = *<sup>θ</sup>*<sup>2</sup> *and <sup>F</sup>*2(*θ*) = *<sup>θ</sup>*<sup>4</sup> *on the domain* <sup>Θ</sup> = [0, 1]*. We have <sup>F</sup>*1(*θ*) <sup>≥</sup> *F*2(*θ*) *for θ* ∈ Θ*. The convex conjugate of F*1(*η*) *is F*<sup>∗</sup> <sup>1</sup> (*η*) = <sup>1</sup> <sup>4</sup> *<sup>η</sup>*2*. We have*

$$F\_2^\*(\eta) = \eta^{\frac{4}{3}} \left( \left(\frac{1}{4}\right)^{\frac{1}{9}} - \left(\frac{1}{4}\right)^{\frac{4}{9}} \right) = \frac{3}{4^{\frac{4}{9}}} \eta^{\frac{4}{3}} \tag{50}$$

*with η*2(*θ*) = 4*θ*3*. Figure 6 plots the convex functions F*1(*θ*) *and F*2(*θ*)*, and their convex conjugates F*∗ <sup>1</sup> (*η*) *and F*<sup>∗</sup> <sup>2</sup> (*η*)*. We observe that F*1(*θ*) ≥ *F*2(*θ*) *on θ* ∈ [0, 1] *and that F*<sup>∗</sup> <sup>1</sup> (*η*) ≤ *F*<sup>∗</sup> <sup>2</sup> (*η*) *on H* = [0, 2]*.*

We now state a property between dual duo Bregman divergences:

**Figure 6.** The Legendre transform reverses the dominance ordering: *<sup>F</sup>*1(*θ*) = *<sup>θ</sup>*<sup>2</sup> <sup>≥</sup> *<sup>F</sup>*2(*θ*) = *<sup>θ</sup>*<sup>4</sup> <sup>⇔</sup> *F*∗ <sup>1</sup> (*η*) ≤ *F*<sup>∗</sup> <sup>2</sup> (*η*) for *θ* ∈ [0, 1].

**Property 2** (Dual duo Fenchel–Young and Bregman divergences)**.** *We have*

$$Y\_{\mathbb{F}\_1, \mathbb{F}\_2^\*} (\theta : \eta') = B\_{\mathbb{F}\_1, \mathbb{F}\_2} (\theta : \theta') = B\_{\mathbb{F}\_2^\*, \mathbb{F}\_1^\*} (\eta' : \eta) = Y\_{\mathbb{F}\_2^\*, \mathbb{F}\_1} (\eta' : \theta) \tag{51}$$

**Proof.** From the Fenchel–Young equalities of the inequalities, we have *F*1(*θ*) = *θη* − *F*<sup>∗</sup> <sup>1</sup> (*η*) for *η* = ∇*F*1(*θ*) and *F*2(*θ* ) = *<sup>θ</sup> <sup>η</sup>* − *<sup>F</sup>*<sup>∗</sup> <sup>2</sup> (*η* ) with *η* = ∇*F*2(*θ* ). Thus we have

$$B\_{\rm F\_1, F\_2}(\theta: \theta') \quad = \quad F\_1(\theta) - F\_2(\theta') - (\theta - \theta')^\top \nabla F\_2(\theta'),\tag{52}$$

$$=\left.\theta^{\top}\eta - F\_1^\*(\eta) - \theta^{\prime \top}\eta^{\prime} + F\_2^\*(\eta^{\prime}) - (\theta - \theta^{\prime})^{\top}\eta^{\prime}\right.\tag{53}$$

$$=\left.F\_2^\*(\eta') - F\_1^\*(\eta) - (\eta' - \eta)^\top \theta\_\prime \tag{54}$$

$$
\hat{\eta} = \quad B\_{F\_2^\*, F\_1^\*} (\eta' : \eta). \tag{55}
$$

Recall that *F*1(*θ*) ≥ *F*2(*θ*) implies that *F*<sup>∗</sup> <sup>1</sup> (*η*) ≤ *F*<sup>∗</sup> <sup>2</sup> (*η*) (Lemma 1), *θ* = ∇*F*<sup>∗</sup> <sup>1</sup> (*η*), and therefore the dual duo Bregman divergence is non-negative:

$$\begin{split} B\_{F\_{2}^{\*},F\_{1}^{\*}}(\eta^{\prime}:\eta) &= \, \_{F\_{2}^{\*}}(\eta^{\prime}) - F\_{1}^{\*}(\eta) - (\eta^{\prime} - \eta)^{\top} \, \_{\prime}\theta \\ &\geq \, \_{\!} \underbrace{\, F\_{1}^{\*}(\eta^{\prime}) - F\_{1}^{\*}(\eta) - (\eta^{\prime} - \eta)^{\top} \, \_{\!}\nabla F\_{1}^{\*}(\eta)}\_{B\_{F\_{1}^{\*}}(\eta^{\prime}:\eta) \geq 0}. \end{split}$$
