Dynamic Stepsize Techniques in DR-Submodular Maximization

Yanfei Li; Min Li; Qian Liu; Yang Zhou

doi:10.3390/math13091447

Abstract

The Diminishing-Return (DR)-submodular function maximization problem has garnered significant attention across various domains in recent years. Classic methods often employ continuous greedy or Frank–Wolfe approaches to tackle this problem; however, high iteration and subproblem solver complexity are typically required to control the approximation ratio effectively. In this paper, we introduce a strategy that employs a binary search to find the dynamic stepsize, integrating it into traditional algorithm frameworks to address problems with different constraint types. We demonstrate that algorithms using this dynamic stepsize strategy can achieve comparable approximation ratios to those using a fixed stepsize strategy. In the monotone case, the iteration complexity is

O ({∥ \nabla F (0) ∥}_{1} ϵ^{- 1})

, while in the non-monotone scenario, it is

O (n + {∥ \nabla F (0) ∥}_{1} ϵ^{- 1})

, where F denotes the objective function. We then apply this strategy to solving stochastic DR-submodular function maximization problems, obtaining corresponding iteration complexity results in a high-probability form. Furthermore, theoretical examples as well as numerical experiments validate that this stepsize selection strategy outperforms the fixed stepsize strategy.

Keywords:

DR-submodular; approximation algorithm; dynamic stepsizes; computational complexity; stochastic optimization

MSC:

90C27; 68W25; 65Y20

1. Introduction

The problem of maximizing DR-submodular functions, which generalizes set submodular functions to more general domains such as integer lattices and box regions in Euclidean spaces, has emerged as a prominent research topic in optimization. Set submodular functions inherently capture the diminishing returns property, where the marginal gain of adding an element to a set decreases as the set expands. DR-submodular functions extend this fundamental property to continuous and mixed-integer domains, enabling broader applications in machine learning, graph theory, economics, and operations research [1,2,3,4]. This extension not only addresses practical problems with continuous variables but also provides a unified framework for solving set submodular maximization through continuous relaxation techniques [5,6].

The problem of deterministic DR-submodular function maximization considered in this paper can be formally written as follows:

\begin{matrix} max_{x \in X} F (x), \end{matrix}

(1)

where the feasible set

X \subseteq {[0, 1]}^{n}

is compact and convex, and

F : {[0, 1]}^{n} \mapsto R_{+}

is a differentiable DR-submodular function. While this problem is generally NP-hard, under certain structural assumptions, approximation algorithms with constant approximation ratios can be developed. For unconstrained scenarios, Niazadeh et al. [7] established a tight

\frac{1}{2}

-approximation algorithm, aligning with classical results for unconstrained submodular maximization. In constrained settings, monotonicity plays a critical role: convex-constrained monotone DR-submodular maximization admits a

(1 - \frac{1}{e})

-approximation [1], whereas non-monotone cases under down-closed constraints achieve

\frac{1}{e}

-approximation [1] with recent improvements to

0.401

[8]. For general convex constraints containing the origin, a

\frac{1 - {min}_{x \in X} {∥ x ∥}_{\infty}}{4}

-approximation guarantee is attainable [9,10,11].

While deterministic DR-submodular maximization has been extensively studied, practical scenarios often involve uncertainties where the objective function can only be accessed through stochastic evaluations. This motivates the investigation of stochastic DR-submodular maximization problems, which are typically formulated as follows:

\begin{matrix} max_{x \in X} F (x) : = E_{ξ \sim P} [f (x, ξ)], \end{matrix}

(2)

where the DR-submodular function

F : {[0, 1]}^{n} \to R_{+}

is defined as the expectation of stochastic functions

f (x, ξ) : {[0, 1]}^{n} \times Ξ \to R_{+}

with

ξ \sim P

. Building upon the Lyapunov framework established for deterministic problems [10], recent works like [12,13] have developed stochastic variants of continuous greedy algorithms. Specifically, Lian et al. [14] proposed SPIDER-based methods that reduce gradient evaluation complexity from

O (ϵ^{- 3})

in earlier works [12] to

O (ϵ^{- 2})

through variance reduction techniques.

The Lyapunov framework proposed by Du et al. [9,10] provides a unified perspective for analyzing DR-submodular maximization algorithms. By modeling algorithms as discretizations of ordinary differential equations (ODEs) in the time domain

[0, 1]

, this approach establishes a direct connection between continuous-time dynamics and discrete-time implementations. Specifically, the approximation ratio of discrete algorithms differs from their continuous counterparts by a residual term that diminishes as the stepsize approaches zero. However, the use of constant stepsizes in this framework imposes fundamental limitations: the required number of iterations grows inversely with stepsize magnitude, leading to linear computational complexity both in theory and practical implementations.

To address the limitations of fixed stepsize strategies, recent advances have explored dynamic stepsize adaptation for submodular optimization. For box-constrained DR-submodular maximization, Chen et al. [15] developed a

\frac{1}{2}

-approximation algorithm with

O (1 / ϵ)

adaptive rounds, where stepsizes are selected through enumeration over a candidate set of size

O (log 1 / ϵ)

. Furthermore, Ene et al. [16] achieved

\frac{1}{e}

-approximation for non-monotone cases using

O (log \frac{n}{ϵ} log \frac{1}{ϵ} log \frac{2 n + m}{ϵ^{2}})

parallel rounds, and

(1 - \frac{1}{e})

-approximation for monotone cases with

O (log \frac{n}{ϵ} log \frac{m + n}{ϵ^{2}})

rounds. These works inspire our development of binary search-based dynamic stepsizes that achieve comparable approximation guarantees while reducing computational complexity.

The dynamic stepsize strategy in this paper leverages binary search to approximate solutions to univariate equations by selecting intervals based on midpoint function value signs, ensuring convergence via monotonicity and continuity. While stochastic methods like simulated annealing [17] address non-convex/stochastic problems via probabilistic criteria, their guarantees depend on cooling schedules and lack deterministic convergence. In contrast, our binary search framework exploits the monotonicity/continuity of the stepsize equation (Equation (38)), achieving sufficiently precise solutions with guaranteed efficiency, avoiding cooling parameter dependencies and focusing on theoretical foundations for DR-submodular structures.

1.1. Contributions

This paper introduces a novel dynamic stepsize strategy for DR-submodular maximization problems, offering significant improvements over traditional fixed stepsize methods. Our approach achieves state-of-the-art approximation guarantees while reducing computational complexity. Notably, the iteration complexity of our algorithms is independent of the smoothness parameter L. In the monotone case, it is also independent of the variable dimension n. Furthermore, both the gradient evaluation complexity and function evaluation complexity exhibit only a logarithmic dependence on the problem dimension n and the smoothness parameter L. Below, we summarize the key contributions:

Deterministic DR-Submodular Maximization: For deterministic settings, our dynamic stepsize strategy achieves the following complexity bounds:
–
In the monotone case, the iteration complexity is $O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$ , where ${∥ \nabla F (0) ∥}_{1}$ reflects the gradient norm at the origin, and $ϵ$ denotes the discretization error.
–
For non-monotone functions, the iteration complexity increases to $O (n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$ , accounting for the added challenge posed by non-monotonicity.
To determine the stepsize dynamically, we employ a binary search procedure, introducing an additional factor of $O (log \frac{n L}{ϵ})$ to the evaluation complexity.
Stochastic DR-Submodular Maximization: Extending our approach to stochastic settings, we achieve comparable complexity results with high probability:
–
For monotone objective functions, the iteration complexity remains $O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$ .
–
In the non-monotone case, the complexity is $O (n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$ .
These results demonstrate that our method maintains efficiency regardless of the smoothness parameter L, making it particularly suitable for large-scale stochastic optimization problems.
Empirical Validation: We validate the effectiveness of our dynamic stepsize strategy through three examples: multilinear extensions of set submodular functions, DR-submodular quadratic functions, and softmax extensions for determinantal point processes (DPPs). The results confirm that our approach outperforms fixed stepsize strategies in terms of both iteration complexity and practical performance.

Table 1 provides a unified overview of our algorithms’ theoretical guarantees and computational complexities (iteration and gradient evaluation bounds) under diverse problem settings, enabling readers to rapidly grasp the efficiency and adaptability of our dynamic stepsize framework.

Table 1. Algorithms and theoretical guarantees in this paper. D = Deterministic; S = Stochastic; M = Monotone; NM = Non-Monotone; DC = Down-closed; GC = General convex; Grad Eval = Gradient evaluation complexity; S-Grad Eval = Single (per-sample) gradient evaluation complexity.

1.2. Organizations

The organization of the rest of this manuscript is as follows. Section 2 introduces the fundamental concepts and key results that form the basis of our work. In Section 3, we outline the design principles of our dynamic stepsize strategy and establish theoretical guarantees for both monotone and non-monotone deterministic objective functions. Section 4 extends our approach to stochastic settings, presenting algorithms and analyses tailored for monotone and non-monotone DR-submodular functions under uncertainty. In Section 5, we evaluate the computational efficiency of our strategy through its application to three canonical DR-submodular functions, with comprehensive numerical experiments validating the efficacy of the dynamic stepsize approach. Section 6 summarizes our key findings while discussing both limitations and promising future research directions.

2. Preliminaries

We begin by introducing the formal definition of a non-negative DR-submodular function defined on the continuous domain

{[0, 1]}^{n}

, along with some fundamental properties.

Definition 1.

A function

F : {[0, 1]}^{n} \to R_{+}

is said to be DR-submodular if for any two vectors

x, y \in {[0, 1]}^{n}

satisfying

x \leq y

(coordinate-wise) and any scalar

a \geq 0

such that

x + a e_{i}, y + a e_{i} \in {[0, 1]}^{n}

, the following inequality holds:

F (x + a e_{i}) - F (x) \geq F (y + a e_{i}) - F (y),

(3)

for all

i = 1, \dots, n

. Here,

e_{i}

represents the i-th standard basis vector in

R^{n}

.

This property reflects the diminishing returns behavior of F along each coordinate direction. Specifically, the marginal gain of increasing a single coordinate diminishes as the input vector grows larger.

To facilitate further discussions, we introduce additional notation. Throughout this paper, the inequality

x \leq y

for two vectors means that

x_{i} \leq y_{i}

holds for all

i \in [n]

. Additionally, the operation

x \lor y

is defined as

{(x \lor y)}_{i} = max {x_{i}, y_{i}}

, and

x \land y

is defined as

{(x \land y)}_{i} = min {x_{i}, y_{i}}

.

An important result is that, in the differentiable case, DR-submodular functions are equivalent to the monotonic decrease in the gradient. Specifically, when F is differentiable, F is DR-submodular if and only if [2]

\nabla F (x) \geq \nabla F (y), \forall x \leq y \in {[0, 1]}^{n} .

(4)

Another essential property for differentiable DR-submodular functions that will be used in this paper is derived from the concavity-like behavior in non-negative directions, as stated in the following proposition.

Proposition 1

([18]). When F is differentiable and DR-submodular, then

⟨ \nabla F (x), y - x ⟩ \geq F (x \lor y) + F (x \land y) - 2 F (x), \forall x, y \in {[0, 1]}^{n} .

(5)

In this paper, we also require the function F to be L-smooth, meaning that for any

x, y \in X

, there holds

∥ \nabla F (x) - \nabla F (y) ∥ \leq L ∥ x - y ∥,

(6)

where

∥ \cdot ∥

denotes the Euclidean norm unless otherwise specified. An important property of L-smooth functions is that they satisfy the following necessary (but not sufficient) condition:

F (y) - F (x) \geq ⟨ \nabla F (x), y - x ⟩ - \frac{L}{2} {∥ y - x ∥}^{2} .

(7)

For the stochastic DR-submodular maximization problem (2), we introduce additional notations to describe the stochastic approximation of the objective function’s full gradient.

- At each iteration j, let

M_{j}

denote a random subset of samples drawn from

P

, with m representing the size of

M_{j}

.

- The stochastic gradient at x is computed as

\nabla f (x, M_{j}) : = \frac{1}{| M_{j} |} \sum_{ξ \in M_{j}} \nabla f (x, ξ) .

(8)

- We use an unbiased estimator

h_{j}

to approximate the true gradient

\nabla F (x^{j})

.

All algorithms and theoretical analyses in this paper for problems (1) and (2) rely on the following foundational assumption:

Assumption 1.

The problems under consideration satisfy these conditions:

1.: $F : {[0, 1]}^{n} \to R_{+}$ is DR-submodular and L-smooth.
2.: $F (0) = 0$ and $0 \in X$ .
3.: A Linear-Objective Optimization (LOO) oracle is available, providing solutions to

$max_{x \in X} c^{⊤} x \forall c \in R^{n} .$

(9)

The following assumption is essential for the stochastic problem (2).

Assumption 2.

The stochastic gradient is unbiased—i.e.,

L_{ξ} [\nabla f (x, ξ)] = \nabla F (x) .

(10)

This assumption ensures that the mini-batch gradient estimator

\nabla f (x, M_{j})

satisfies

L_{M_{j}} [\nabla f (x, M_{j})] = \nabla F (x),

(11)

where

M_{j}

denotes the random mini-batch sampled at iteration j. This property is critical for deriving high-probability guarantees in stochastic optimization.

Lyapunov Method for DR-Submodular Maximization

As discussed in [10], Lyapunov functions play a crucial role in the analysis of algorithms. Depending on the specific problem, the Lyapunov function can take various parametric forms. Taking the monotone DR-submodular maximization problem for example, the ideal algorithm can be designed as follows:

\begin{matrix} v (x (t)) & \in & arg max_{v \in X} ⟨ \nabla F (x (t)), v ⟩, \\ \dot{x} (t) & = & v (x (t)) . \end{matrix}

(12)

A unified parameterized form of the Lyapunov function is given by:

L (x (t)) = a_{t} F (x (t)) - b_{t} F (x (t)), t \in [0, T],

(13)

where

a_{t}

and

b_{t}

are time-dependent parameters.

The monotonicity of the Lyapunov function is closely tied to the approximation ratio of the algorithm, as demonstrated by the following inequality:

F (x (T)) \geq \frac{b_{T} - b_{0}}{a_{T}} F (x^{*}),

(14)

where

x^{*}

represents the optimal solution. The specific values of

a_{T}

,

b_{T}

, and T depend on the problem under consideration and are chosen accordingly to achieve the desired theoretical guarantees. In this problem, letting

a_{t} = b_{t} = e^{t}, T = 1

can guarantee the monotonicity of

L (x (t))

and then the approximation ratio for monotone DR-submodular functions is:

F (x (1)) \geq (1 - \frac{1}{e}) F (x^{*}) .

(15)

For maximizing non-monotone DR-submodular functions with down-closed constraints, the ideal algorithm can be designed as follows:

\begin{matrix} v (x (t)) & \in & arg max_{v \in X \cap {v : v \leq 1 - x (t)}} ⟨ \nabla F (x (t)), v ⟩, \end{matrix}

(16)

\begin{matrix} \dot{x} (t) & = & α_{t} v (x (t)) . \end{matrix}

(17)

In this problem, let

a_{t} = e^{t}

,

b_{t} = t

, and

T = 1

. Then the best approximation ratio for monotone DR-submodular functions is:

F (x (1)) \geq \frac{1}{e} F (x^{*}),

(18)

where

X

is down-closed.

For maximizing non-monotone DR-submodular functions with general convex constraints, the ideal algorithm can be designed as follows:

\begin{matrix} v (x (t)) & \in & arg max_{v \in X} ⟨ \nabla F (x (t)), v ⟩, \end{matrix}

(19)

\begin{matrix} \dot{x} (t) & = & α_{t} [v (x (t)) - x (t)] . \end{matrix}

(20)

In this problem, let

a_{t} = {(t + 1)}^{2}

,

b_{t} = t

, and

T = 1

. Then the best approximation ratio for monotone DR-submodular functions is:

F (x (1)) \geq \frac{1}{4} F (x^{*}),

(21)

where

X

is only convex.

In this paper, we focus on the same algorithmic ODE forms as those discussed above. However, our key improvement lies in the discretization process. Specifically, we aim to enhance the iteration complexity by employing a dynamic stepsize strategy. This approach allows for more efficient approximations while maintaining the desired theoretical guarantees, thereby advancing the state-of-the-art in DR-submodular maximization algorithms.

3. Deterministic Scenarios

In this section, we discuss the dynamic stepsize algorithms for maximizing deterministic DR-submodular functions, considering two cases: monotone and non-monotone. The fixed stepsize versions of the algorithms discussed in this section already exist in the literature, as documented in [6,9,10].

3.1. Monotone Case

In this subsection, we discuss a dynamic stepsize algorithm designed to maximize a monotone DR-submodular function while ensuring an approximation guarantee. To better illustrate the strategy for selecting the stepsize, we first introduce an idealized algorithm that relies on an oracle capable of solving a univariate continuous monotone equation. Subsequently, we propose a practical and implementable version of the algorithm.

3.1.1. An Idealized Algorithm

The ideal version referred to above is depicted as Algorithm 1. Unlike the fixed stepsize approach used in [10] where

δ_{j} \equiv \frac{1}{K}

, our algorithm determines the stepsize by solving Equation (22). In brief, the stepsize is selected to ensure that the directional derivatives along

v^{j}

between successive iterations differ precisely by

ϵ

.

Algorithm 1: Ideal CG

Before analyzing the computational complexity and approximation guarantees of Algorithm 1, we must verify the feasibility of its output.

Lemma 1.

Algorithm 1 outputs a solution satisfying

x \in X

.

Proof.

Note that

\begin{matrix} x (1) = x (0) + \sum_{j} δ^{j} v^{j} = \sum_{j} δ^{j} v^{j}, \end{matrix}

(23)

and

\sum_{j} δ^{j} = \sum_{j} t_{j + 1} - t_{j} = 1

. By the feasibility of

v^{j}

and the convexity of P, we can prove the conclusion. □

The iteration complexity bound is established as follows.

Lemma 2.

The iteration number K of Algorithm 1 satisfies

K \leq \frac{min \{{∥ \nabla F (0) ∥}_{1}, n L\}}{ϵ} + 2 .

(24)

Proof.

For

j = 0, \dots, K - 2

, we have

\begin{matrix} ⟨\nabla F (x^{j}) - \nabla F (x^{j + 1}), 1⟩ & \geq & ⟨\nabla F (x^{j}) - \nabla F (x^{j + 1}), v^{j}⟩ = ϵ . \end{matrix}

(25)

Summing up this inequality from

j = 0

to

K - 2

yields

\begin{matrix} (K - 2) ϵ & \leq & \sum_{j = 0}^{K - 3} ⟨\nabla F (x^{j}) - \nabla F (x^{j + 1}), 1⟩ \\ = & ⟨\nabla F (x^{0}) - \nabla F (x^{K - 2}), 1⟩ . \end{matrix}

(26)

Noting that

{∥ \nabla F (0) ∥}_{1} \geq ⟨\nabla F (x^{0}) - \nabla F (x^{K - 2}), 1⟩

by the monotonicity of F, we can conclude that

K \leq \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + 2

.

By the fact that F is L-smooth and DR-submodular, there holds

⟨ \nabla F (x^{0}) - \nabla F (x^{K - 2}), 1 ⟩ \leq n L .

(27)

Combining these bounds completes the proof. □

As outlined in [10], the iteration complexity of the Frank–Wolfe algorithm for DR-submodular maximization is

O (\frac{n L}{ϵ})

. We now present the approximation guarantee and complexity results for Algorithm 1.

Theorem 1.

Assume that F is monotone. Then Algorithm 1 returns a solution x satisfying

F (x) \geq (1 - e^{- 1}) F (x^{*}) - ϵ,

where

x^{*}

denotes the optimal solution, with iteration complexity given by Equation (24).

Proof.

Define the potential function

L (j) : {0, \dots, K - 1} \to R

as

\begin{matrix} L (j) = e^{t_{j}} [F (x^{j}) - F (x^{*})] . \end{matrix}

(28)

For

j = 0, \dots, K - 1

, there holds

\begin{matrix} L (j + 1) - L (j) \\ = & e^{t_{j + 1}} (F (x^{j + 1}) - F (x^{*})) - e^{t_{j}} (F (x^{j}) - F (x^{*})) \\ = & e^{t_{j + 1}} (F (x^{j + 1}) - F (x^{j})) + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \\ = & e^{t_{j + 1}} \int_{0}^{δ^{j}} ⟨\nabla F (x^{j} + t v^{j}), v^{j}⟩ d t + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \end{matrix}

\begin{matrix} \geq & e^{t_{j + 1}} δ^{j} [⟨\nabla F (x^{j}), v^{j}⟩ - ϵ] + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \end{matrix}

(29)

\begin{matrix} \geq & e^{t_{j + 1}} δ^{j} [⟨\nabla F (x^{j}), x^{*}⟩ - ϵ] + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \end{matrix}

(30)

\begin{matrix} \geq & e^{t_{j + 1}} δ^{j} [⟨\nabla F (x^{j}), x^{j} \lor x^{*} - x^{j}⟩ - ϵ] + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \end{matrix}

(31)

\begin{matrix} \geq & e^{t_{j + 1}} δ^{j} [F (x^{j} \lor x^{*}) - F (x^{j}) - ϵ] + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \end{matrix}

(32)

\begin{matrix} \geq & e^{t_{j + 1}} δ^{j} [F (x^{*}) - F (x^{j}) - ϵ] + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \end{matrix}

(33)

\begin{matrix} = & - e^{t_{j + 1}} δ^{j} ϵ + (e^{t_{j + 1}} δ^{j} - e^{t_{j + 1}} + e^{t_{j}}) (F (x^{*}) - F (x^{j})) \end{matrix}

\begin{matrix} \geq & - e^{t_{j + 1}} δ^{j} ϵ, \end{matrix}

(34)

where (29) holds for the monotonicity of

\nabla F

, (30) holds for the definition of

v^{j}

, (31) and (33) holds by the monotonicity of F and (32) is by Proposition 1. Thus,

\begin{matrix} L (K) - L (0) & = & \sum_{j = 0}^{K - 1} L (j + 1) - L (j) \\ \geq & \sum_{j = 0}^{K - 1} - e^{t_{j + 1}} δ^{j} ϵ \geq - e ϵ \sum_{j = 0}^{K - 1} δ^{j} = - e ϵ . \end{matrix}

(35)

Simultaneously, there holds

\begin{matrix} L (K) - L (0) & = & e (F (x) - F (x^{*})) - (F (x^{0}) - F (x^{*})) . \end{matrix}

(36)

Thus, we finally have

\begin{matrix} F (x) \geq (1 - e^{- 1}) F (x^{*}) - ϵ . \end{matrix}

(37)

The proof is completed. □

3.1.2. Algorithm with Binary Search

An oracle for solving Equation (22) exactly is not always feasible in general cases. To address this, we propose employing a binary search technique to compute an approximate solution. This approach preserves the approximation ratio achieved by Algorithm 1, leading to the development of Algorithm 2. The key distinction between these two algorithms lies in the determination of the stepsize

δ^{j}

at each iteration.

In Algorithm 2, we utilize the bisection method to compute a compensation parameter

δ_{j}

that satisfies condition (38). The implementation begins by initializing the search interval as

[0, t_{j}]

. At each iteration, we evaluate the left-hand side of (38) at the midpoint of the current interval and determine whether the value belongs to the right sub-interval. Depending on this evaluation, we systematically discard either the left or right half of the interval and repeat the process. The well-defined nature of this procedure is guaranteed by the monotonicity and continuity of the left-hand side expression with respect to

δ^{i}

.

Algorithm 2: Bisection continuous-greedy

Following a similar analysis to that of Algorithm 1, the number of iterations K in the “while” loop of Algorithm 2 can also be bounded.

Corollary 1.

For Algorithm 2, the iteration number satisfies

K \leq \frac{2 min {∥ \nabla F (0) ∥_{1}, n L}}{ϵ} + 2 .

(39)

To analyze the gradient evaluation complexity of F, it is necessary to examine the number of binary search steps required in each iteration to determine the stepsize.

Lemma 3.

For each iteration

j \in {0, \dots, K - 1}

, the stepsize

δ^{j}

can be determined within at most

M = 1 + {log}_{2} (\frac{n L}{ϵ})

binary search steps.

Proof.

Let

δ^{*}

represent the exact solution to the equation

ϕ (δ^{j}) : = ⟨ \nabla F (x^{j}), v^{j} ⟩ - ⟨ \nabla F (x^{j} + δ^{j} v^{j}), v^{j} ⟩ - \frac{ϵ}{2} = 0 .

(40)

Due to the monotonicity and continuity of the univariate function

ϕ (δ^{j})

, the binary search process can identify an interval of length

2^{- M}

that contains

δ^{*}

. Let

δ^{j}

denote the right endpoint of this interval. Consequently, we have

0 \leq δ^{j} - δ^{*} \leq 2^{- M} \leq \frac{ϵ}{2 n L} .

(41)

By the L-smoothness property of F, it follows that

\begin{matrix} |⟨ \nabla F (x^{j} + δ^{*} v^{j}), v^{j} ⟩ - ⟨ \nabla F (x^{j} + δ^{j} v^{j}), v^{j} ⟩| & = |⟨ \nabla F (x^{j} + δ^{*} v^{j}) - \nabla F (x^{j} + δ^{j} v^{j}), v^{j} ⟩| \end{matrix}

(42)

\begin{matrix} \leq L (δ^{j} - δ^{*}) {∥ v^{j} ∥}^{2} \end{matrix}

(43)

\begin{matrix} \leq L \cdot \frac{ϵ}{2 n L} \cdot n \end{matrix}

(44)

\begin{matrix} = \frac{ϵ}{2} . \end{matrix}

(45)

Thus, we obtain

\begin{matrix} ⟨ \nabla F (x^{j}), v^{j} ⟩ - ⟨ \nabla F (x^{j} + δ^{j} v^{j}), v^{j} ⟩ & = \frac{ϵ}{2} + ⟨ \nabla F (x^{j} + δ^{*} v^{j}), v^{j} ⟩ - ⟨ \nabla F (x^{j} + δ^{j} v^{j}), v^{j} ⟩ \\ \leq ϵ . \end{matrix}

(46)

Additionally, by the monotonicity of

\nabla F

, we know

⟨ \nabla F (x^{j} + δ^{*} v^{j}), v^{j} ⟩ \geq ⟨ \nabla F (x^{j} + δ^{j} v^{j}), v^{j} ⟩ .

(47)

Combining these results with the definition of

δ^{*}

, the proof is completed. □

The approximation guarantee and oracle complexity of Algorithm 2 can now be derived.

Theorem 2.

Assume F is monotone. Then, Algorithm 2 outputs a solution x satisfying

F (x) \geq (1 - e^{- 1}) F (x^{*}) - ϵ .

(48)

The LOO oracle complexity is at most

O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})

, and the gradient evaluation complexity is at most

O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} log \frac{n L}{ϵ})

.

Proof.

The complexities of the LOO oracle and gradient evaluations can be established using Corollary 1 and Lemma 3. We now focus on deriving the approximation ratio.

Recall the function

L (j)

defined in (34). For

j = 0, \dots, K - 1

, we have

\begin{matrix} L (j + 1) - L (j) & = e^{t_{j + 1}} \int_{0}^{δ^{j}} ⟨ \nabla F (x^{j} + t v^{j}), v^{j} ⟩ d t + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) . \end{matrix}

(49)

From the structure of Algorithm 2, the DR-submodularity, and the monotonicity of F, it follows that

\begin{matrix} \int_{0}^{δ^{j}} ⟨ \nabla F (x^{j} + t v^{j}), v^{j} ⟩ d t & \geq \int_{0}^{δ^{j}} ⟨ \nabla F (x^{j} + δ^{j} v^{j}), v^{j} ⟩ d t \end{matrix}

(50)

\begin{matrix} \geq δ^{j} [⟨ \nabla F (x^{j}), v^{j} ⟩ - ϵ] \end{matrix}

(51)

\begin{matrix} \geq δ^{j} [⟨ \nabla F (x^{j}), x^{*} ⟩ - ϵ] \end{matrix}

(52)

\begin{matrix} \geq δ^{j} [⟨ \nabla F (x^{j}), x^{j} \lor x^{*} - x^{j} ⟩ - ϵ] \end{matrix}

(53)

\begin{matrix} \geq δ^{j} [F (x^{j} \lor x^{*}) - F (x^{j}) - ϵ] \end{matrix}

(54)

\begin{matrix} \geq δ^{j} [F (x^{*}) - F (x^{j}) - ϵ] . \end{matrix}

(55)

Thus, we obtain

\begin{matrix} L (j + 1) - L (j) & \geq e^{t_{j + 1}} δ^{j} [F (x^{*}) - F (x^{j}) - ϵ] + (e^{t_{j + 1}} - e^{t_{j}}) (F (x^{j}) - F (x^{*})) \end{matrix}

(56)

\begin{matrix} = - e^{t_{j + 1}} δ^{j} ϵ + (e^{t_{j + 1}} δ^{j} - e^{t_{j + 1}} + e^{t_{j}}) (F (x^{*}) - F (x^{j})) \end{matrix}

(57)

\begin{matrix} \geq - e^{t_{j + 1}} δ^{j} ϵ . \end{matrix}

(58)

Summing up these inequalities for

j = 0

to

K - 1

, we obtain

\begin{matrix} L (K) - L (0) & = \sum_{j = 0}^{K - 1} (L (j + 1) - L (j)) \end{matrix}

(59)

\begin{matrix} \geq \sum_{j = 0}^{K - 1} - e^{t_{j + 1}} δ^{j} ϵ \end{matrix}

(60)

\begin{matrix} \geq - e ϵ . \end{matrix}

(61)

On the other hand, by the definition of

L (j)

, we have

\begin{matrix} L (K) - L (0) & = e (F (x) - F (x^{*})) - (F (0) - F (x^{*})) \end{matrix}

(62)

\begin{matrix} = e F (x) + (1 - e) F (x^{*}) . \end{matrix}

(63)

Combining these results yields the approximation guarantee stated in the theorem. □

3.2. Non-Monotone Case

In this subsection, we discuss the dynamic stepsize strategy for maximizing DR-submodular functions after removing monotonicity, along with its theoretical guarantees. Unlike the monotonic case, here we categorize the constraints into two types: down-closed and general convex types.

3.2.1. Down-Closed Constraint

Algorithm 3 is proposed for scenarios involving down-closed constraints, which means that if

x \in X

and

0 \leq y \leq x

, then

y \in X

. The fundamental framework is inspired by the measured continuous greedy (MCG) algorithm introduced in [6], initially raised for maximizing the multilinear extension relaxation of submodular set functions. In [10], MCG was demonstrated to require

O (\frac{n L}{ϵ})

iterations to ensure an approximation loss of

ϵ

.

Algorithm 3: Bisection MCG

The feasibility of the output produced by Algorithm 3 is ensured by the down-closed property of

X

.

Lemma 4.

The solution x generated by Algorithm 3 satisfies

x \in X

.

The proof of is presented in Appendix A.

The absence of the monotonicity assumption for F necessitates a distinct analysis of the iteration complexity for Algorithm 3 compared to Algorithm 2.

Lemma 5.

For Algorithm 3, the number of iterations K satisfies

K \leq min \{n + \frac{{2 ∥ \nabla F (0) ∥}_{1}}{ϵ}, \frac{2 n L}{ϵ}\} + 2 .

(65)

Proof.

The upper bound

K \leq \frac{2 n L}{ϵ} + 2

follows analogous reasoning to Corollary 1.

Note that at the

K - 1

-th iteration of the algorithm, we have

\nabla F (x^{K - 2}) ≰ 0

—i.e., there exists at least one

i \in [n]

such that

{\frac{\partial F}{\partial x_{i}}|}_{x = x^{K - 2}} > 0

, or else we can let

v^{j} = 0

and terminate the algorithm.

Now, consider the changing process of the sign of

\nabla F (x^{j})

for

j = 0, \dots, K - 2

.

Case I.

\nabla F (x^{K - 2}) \geq 0

. In this case, the analysis is analogous to that in Lemma 2 and we have

K \leq \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + 2 \sim O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ}) .

(66)

Case II.

\nabla F (x^{K - 2}) ≱ 0

. First, we define the iteration index set

S \subseteq [K - 1]

as the following:

S = \{j \in [K - 1] : \exists i \in [n] such that {\frac{\partial F}{\partial x_{i}}|}_{x = x^{j}} > 0, {\frac{\partial F}{\partial x_{i}}|}_{x = x^{j + 1}} < 0\} .

(67)

It is obvious that

| S | \leq n

.

For

j \notin S

, by the monotonicity of

\nabla F

w.r.t. K, we have

\begin{matrix} ⟨\nabla F {(x^{j})}^{+}, 1⟩ - ⟨\nabla F {(x^{j + 1})}^{+}, 1⟩ & = & ⟨\nabla F {(x^{j})}^{+} - \nabla F {(x^{j + 1})}^{+}, 1⟩ \\ \geq & ⟨\nabla F (x^{j}) - \nabla F (x^{j + 1}), v^{j}⟩ \geq \frac{ϵ}{2}, \end{matrix}

(68)

since

v^{j}

,

\nabla F (x^{j})

and

\nabla F (x^{j + 1})

are entry-wisely of the same sign. By the monotonicity of

\nabla F (x)

, the number of these iterations can be bounded as

\begin{matrix} {∥ \nabla F (0) ∥}_{1} & \geq & {∥ \nabla F (0)}^{+} ∥_{1} - {∥ \nabla F {(x^{K - 2})}^{+} ∥}_{1} \\ = & \sum_{j = 0}^{K - 3} ⟨\nabla F {(x^{j})}^{+}, 1⟩ - ⟨\nabla F {(x^{j + 1})}^{+}, 1⟩ \\ \geq & \sum_{j \notin S} ⟨\nabla F {(x^{j})}^{+}, 1⟩ - ⟨\nabla F {(x^{j + 1})}^{+}, 1⟩ \\ \geq & |[K - 2] ∖ S| \cdot \frac{ϵ}{2} . \end{matrix}

The proof is completed. □

Building upon the preceding analysis, we establish the following approximation guarantee and complexity results for Algorithm 3:

Theorem 3.

For any down-closed feasible set

X \subseteq {[0, 1]}^{n}

, Algorithm 3 produces a solution x satisfying

F (x) \geq e^{- 1} F (x^{*}) - ϵ

with LOO oracle calls bounded by

O (n + ϵ^{- 1} {∥ \nabla F (0) ∥}_{1}),

(69)

and gradient evaluations at most

O ((n + ϵ^{- 1} {∥ \nabla F (0) ∥}_{1}) log \frac{n L}{ϵ}) .

(70)

Proof.

Redefine the potential function as

L (j) = e^{t_{j}} F (x^{j}) - t_{j} F (x^{*})

. Then, for

j = 0, \dots K - 1

, there holds

\begin{matrix} L (j + 1) - L (j) \\ = & e^{t_{j + 1}} F (x^{j + 1}) - t_{j + 1} F (x^{*}) - e^{t_{j}} F (x^{j}) + t_{j} F (x^{*}) \\ = & e^{t_{j + 1}} (F (x^{j + 1}) - F (x^{j})) + (e^{t_{j + 1}} - e^{t_{j}}) F (x^{j}) - (t_{j + 1} - t_{j}) F (x^{*}) \\ = & e^{t_{j + 1}} \int_{0}^{\frac{δ^{j}}{e^{δ^{j}}}} ⟨\nabla F (x^{j}) + t v^{j}, v^{j}⟩ d t + (e^{t_{j + 1}} - e^{t_{j}}) F (x^{j}) - (t_{j + 1} - t_{j}) F (x^{*}) . \end{matrix}

(71)

Similar to the proof of Theorem 2, we need to prove a lower bound on the difference of function values between two adjacent iteration points when F is non-monotone:

\begin{matrix} \int_{0}^{\frac{δ^{j}}{e^{δ^{j}}}} ⟨\nabla F (x^{j}) + t v^{j}, v^{j}⟩ d t & \geq & \frac{δ^{j}}{e^{δ^{j}}} [⟨\nabla F (x^{j}), v^{j}⟩ - ϵ] \\ \geq & \frac{δ^{j}}{e^{δ^{j}}} [⟨\nabla F (x^{j}), x^{j} \lor x^{*} - x^{j}⟩ - ϵ] \\ \geq & \frac{δ^{j}}{e^{δ^{j}}} [F (x^{j}) \lor x^{*}) - F (x^{j}) - ϵ] \\ \geq & \frac{δ^{j}}{e^{δ^{j}}} [(1 - ∥ x^{j} ∥_{\infty}) F (x^{*}) - F (x^{j}) - ϵ], \end{matrix}

(72)

where the third inequality is due to Proposition 1 and the fourth inequality is by Lemma 3 in [1], which implies that the following inequality holds:

\begin{matrix} F (x \lor x^{*}) & \geq & (1 - {∥ x ∥}_{\infty}) F (x^{*}) = (1 - {max}_{i = 1}^{n} x_{i} (t)) F (x^{*}), \end{matrix}

(73)

for

\forall x \in X

. Additionally, for the upper bound on the

ℓ_{\infty}

-norm of

x^{j}

, we have the following claim. □

Claim 1.

For all the iteration points

x^{j}

,

j = 0, \dots, K

, of Algorithm 3, we have

∥ x^{j} ∥_{\infty} \leq 1 - e^{- t_{j}}

.

The claim can be proved by induction. First, note that

∥ x^{0} ∥ = 0 \leq 1 - e^{0}

. Assume that

∥ x^{j} ∥_{\infty} \leq 1 - e^{- t_{j}}

for some j, then the proof can be finished by showing that

\begin{matrix} {(x^{j + 1})}_{i} & \leq & \frac{δ^{j}}{e^{δ^{j}}} + (1 - \frac{δ^{j}}{e^{δ^{j}}}) {(x^{j})}_{i} \\ \leq & \frac{δ^{j}}{e^{δ^{j}}} + (1 - \frac{δ^{j}}{e^{δ^{j}}}) (1 - e^{- t_{j}}) \\ \leq & 1 - e^{- t_{j} - δ^{j}} . \end{matrix}

(74)

So the above formula yields

\begin{matrix} L (j + 1) - L (j) \\ \geq & e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}} [e^{- t_{j}} F (x^{*}) - F (x^{j}) - ϵ] + (e^{t_{j + 1}} - e^{t_{j}}) F (x^{j}) - (t_{j + 1} - t_{j}) F (x^{*}) \\ = & - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}} ϵ + (e^{t_{j + 1} - t_{j}} \frac{δ^{j}}{e^{δ^{j}}} - δ^{j}) F (x^{*}) + (e^{t_{j + 1}} - e^{t_{j}} - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}}) F (x^{j}) \\ = & - e^{t_{j}} δ^{j} ϵ + (e^{t_{j + 1}} - e^{t_{j}} - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}}) F (x^{j}) \\ \geq & - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}} ϵ . \end{matrix}

(75)

By summing Equation (75) over j from 0 to

K - 1

, we obtain

\begin{matrix} L (K) - L (0) & \geq & \sum_{j = 0}^{K - 1} - e^{t_{j}} δ^{j} ϵ \\ \geq & - e ϵ \sum_{j = 0}^{K - 1} δ^{j} = - e ϵ . \end{matrix}

(76)

Together with the fact

L (K) - L (0) = e F (x) - F (x^{*})

, we obtain the theorem.

3.2.2. General Convex Constraint

This section presents a Frank–Wolfe variant designed for maximizing non-monotone DR-submodular functions under general convex constraints, where stepsizes are determined through binary search operations on Equation (38). A key distinction from prior methods in [10] and earlier approaches lies in our iterative tracking protocol. Specifically, the method requires maintaining records of both parameter vectors and their corresponding function evaluations at each iteration. Upon completing the iteration sequence, the procedure outputs the stored point achieving maximum functional value, contrasting with traditional implementations that directly return the final computed iterate.

The feasibility and approximation characteristics of solution x generated by Algorithm 4 are formally established through the following analytical results.

Algorithm 4: Bisection Frank-Wolfe

Lemma 6.

Algorithm 4 produces a feasible solution satisfying

x \in X

.

The proof of is presented in Appendix B.

Theorem 4.

The solution of Algorithm 4 satisfies

F (x) \geq \frac{1}{4} F (x^{*}) - \frac{ϵ}{2}

(78)

with computational complexity characterized by

O (n + ϵ^{- 1} {∥ \nabla F (0) ∥}_{1})

LOO oracle calls and

O ((n + ϵ^{- 1} {∥ \nabla F (0) ∥}_{1}) log \frac{n L}{ϵ})

gradient evaluations.

The proof of is presented in Appendix C.

4. Stochastic DR-Submodular Function Maximization

This section investigates stochastic maximization of DR-submodular functions under two distinct settings. Section 4.1 focuses on the monotone case, establishing theoretical guarantees for constrained optimization. Building upon this foundation, Section 4.2 extends the analysis to non-monotone scenarios, addressing both down-closed constraints and generalized convex constraints. For the fixed stepsize implementations of stochastic DR-submodular maximization algorithms, we refer readers to [12,14].

4.1. Stochastic Monotone DR-Submodular Maximization

Algorithm 5 implements a SPIDER-CG framework for continuous monotone DR-submodular optimization, integrating binary search for adaptive stepsize selection. Our approach builds on the recursive gradient estimator from [14], where Lian et al. consider an gradient approximation

h_{j}

of

\nabla F (x^{j})

by adding an unbiased estimator of

\nabla F (x^{j}) - \nabla F (x^{j - 1})

to

h_{j - 1}

, and

h_{0}

is given as an unbiased estimator of

\nabla F (x^{0})

. Building upon this variance-reduced foundation, we adopt the binary search method to find out a proper dynamic stepsize.

Algorithm 5: Bisection Stochastic CG

We have the following theorem on the theoretical results of Algorithm 5.

Theorem 5.

Assume that F is monotone and set

m = O (ϵ^{- 3} log \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ σ})

. Under Assumption 2, Algorithm 5 outputs a solution x satisfying

\begin{matrix} E [F (x)] \geq (1 - \frac{1}{e}) F (x^{*}) - \frac{ϵ}{1 - ϵ} . \end{matrix}

(79)

With probability

1 - σ

, the LOO oracle complexity is bounded by

O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})

and the gradient evaluation complexity is bounded by

O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ^{4}} \cdot log \frac{n L}{ϵ^{2}} \cdot log \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ σ}) .

Proof.

Firstly, we provide the proof of the complexity. For

j = 0, \dots, K - 2

, from the deterministic algorithm, we have

\begin{matrix} E [⟨\nabla f (x^{j}, M_{j}) - \nabla f (x^{j + 1}, M_{j}), 1⟩] & = & ⟨\nabla F (x^{j}) - \nabla F (x^{j} + 1), 1⟩ \\ \geq & ⟨\nabla F (x^{j}) - \nabla F (x^{j + 1}), v^{j}⟩ \\ = & E [⟨\nabla f (x^{j}, M_{j}) - \nabla f (x^{j + 1}, M_{j}), v^{j}⟩] . \end{matrix}

(80)

According to the Chernoff bounds, let

X_{i} = ⟨\nabla f (x^{j}, M_{j}) - \nabla f (x^{j + 1}, M_{j}), v^{j}⟩

,

X = \frac{1}{m} \sum_{j = 0}^{m} X_{i}

,

μ = ⟨\nabla F (x^{j}) - \nabla F (x^{j + 1}), v^{j}⟩

, satisfying

\begin{matrix} P (X \leq (1 + ϵ) μ) \geq 1 - e^{- \frac{ϵ^{2} m μ}{3}} \geq 1 - e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}} . \end{matrix}

(81)

For each iteration, the probability of

μ \geq \frac{1}{1 + ϵ} \cdot \frac{ϵ}{2}

is at least

1 - e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}}

. For all j, the following inequality holds with a high probability

{(1 - e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}})}^{K}

,

\begin{matrix} E [⟨\nabla f (x^{j}, M_{j}) - \nabla f (x^{j + 1}, M_{j}), v^{j}⟩] & \geq & \frac{1}{1 + ϵ} \cdot \frac{ϵ}{2} . \end{matrix}

(82)

Then

\begin{matrix} ⟨\nabla F (x^{0}) - \nabla F (x^{K - 2}), 1⟩ & = & \sum_{j = 0}^{K - 3} ⟨\nabla F (x^{j}) - \nabla F (x^{j + 1}), 1⟩ \\ \geq & \sum_{j = 0}^{K - 3} E [⟨\nabla f (x^{j}, M_{j}) - \nabla f (x^{j + 1}, M_{j}), v^{j}⟩] \\ \geq & (K - 2) \cdot \frac{1}{1 + ϵ} \cdot \frac{ϵ}{2} . \end{matrix}

(83)

It is noteworthy that

∥ \nabla F (x^{0}) ∥_{1} = {∥ E [\nabla f (x^{0}, M_{j})] ∥}_{1} \geq E [⟨\nabla f (x^{0}, M_{j}) - \nabla f (x^{K - 2}, M_{j}), 1⟩] .

Then we can obtain

\begin{matrix} {∥E [\nabla f (x^{0}, M_{j})]∥}_{1} \geq (K - 2) \cdot \frac{1}{1 + ϵ} \cdot \frac{ϵ}{2}, \end{matrix}

(84)

\begin{matrix} K \leq \frac{2 ∥ \nabla F (x^{0}) ∥_{1} (1 + ϵ)}{ϵ} + 2 . \end{matrix}

(85)

Then we can obtain the complexity

O (\frac{∥ \nabla F (x^{0}) ∥_{1}}{ϵ})

with probability

{(1 - e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}})}^{K}

. According to Taylor expansion, we have

\begin{matrix} {(1 - e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}})}^{K} \geq 1 - K \cdot e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}} . \end{matrix}

(86)

Denote

σ : = K \cdot e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}}

. The number m of set

M_{j}

is at most

6 ϵ^{- 3} (1 + ϵ) log \frac{2 ϵ^{- 1} {∥ \nabla F (x^{0}) ∥}_{1} (1 + ϵ) + 2}{σ} .

Let

\begin{matrix} ϕ (δ^{j}) & : = & ⟨\nabla f (x^{j}, M_{j}), v^{j}⟩ - ⟨\nabla f (x^{j} + δ^{j} v^{j}, M_{j}), v^{j}⟩ - \frac{ϵ}{2} . \end{matrix}

(87)

\begin{matrix} E [⟨\nabla f (x^{j}, M_{j}), v^{j}⟩ - ⟨\nabla f (x^{j} + δ^{*} v^{j}, M_{j}), v^{j}⟩] = \frac{ϵ}{2} . \end{matrix}

(88)

From the deterministic case, letting the right endpoint of the interval be

δ^{j}

yields

\begin{matrix} 0 \leq δ^{j} - δ^{*} \leq 2^{- M} \leq (1 - \frac{1}{1 + ϵ}) \frac{ϵ}{2 n L} . \end{matrix}

(89)

From the above, for all j, there holds the following inequation in high probability.

\begin{matrix} E [⟨\nabla f (x^{j}, M_{j}), v^{j}⟩] - E [⟨\nabla f (x^{j} + δ^{j} v^{j}, M_{j}), v^{j}⟩] \geq \frac{1}{1 + ϵ} \cdot \frac{ϵ}{2} . \end{matrix}

(90)

Then

\begin{matrix} |E [⟨\nabla f (x^{j} + δ^{j} v^{j}, M_{j}), v^{j}⟩] - E [⟨\nabla f (x^{j} + δ^{*} v^{j}, M_{j}), v^{j}⟩]| \\ = & |E [⟨\nabla f (x^{j} + δ^{j} v^{j}, M_{j}) - \nabla f (x^{j} + δ^{*} v^{j}, M_{j}), v^{j}⟩]| \\ = & |⟨\nabla F (x^{j} + δ^{j} v^{j}, M_{j}) - \nabla F (x^{j} + δ^{*} v^{j}, M_{j}), v^{j}⟩| \\ \leq & L (δ^{j} - δ^{*}) {∥ v^{j} ∥}^{2} \leq (1 - \frac{1}{1 + ϵ}) \cdot \frac{ϵ}{2} . \end{matrix}

(91)

Now, we focus on the approximation ratio. Define the function

L (j)

as

\begin{matrix} L (j) : = e^{t_{j}} (E [F (x^{j})] - F (x^{*})) . \end{matrix}

(92)

For

j = 0, \dots K - 1

, there holds

\begin{matrix} L (j + 1) - L (j) = e^{t_{j + 1}} E [\int_{0}^{δ^{j}} ⟨\nabla F (x^{j} + t v^{j}), v^{j}⟩ d t] + (e^{t_{j + 1}} - e^{t_{j}}) (E [F (x^{j})] - F (x^{*})) . \end{matrix}

(93)

By the form of Algorithm 5 and the DR-submodularity and monotonicity of F, we have

\begin{matrix} E [\int_{0}^{δ^{j}} ⟨\nabla F (x^{j} + t v^{j}), v^{j}⟩ d t] \\ \geq & E [\int_{0}^{δ^{j}} ⟨\nabla F (x^{j} + δ^{j} v^{j}), v^{j}⟩ d t] \\ \geq & E [δ^{j} ⟨\nabla F (x^{j} + δ^{j} v^{j}), v^{j}⟩] \\ = & δ^{j} E [⟨E [h_{j + 1}], v^{j}⟩] \\ = & δ^{j} E [⟨E [h_{j} + \nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \\ = & δ^{j} E [⟨E [h_{j}], v^{j}⟩ + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \\ \geq & δ^{j} E [⟨E [h_{j}], x^{j} \lor x^{*} - x^{j}⟩ + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \\ = & δ^{j} E [⟨\nabla F (x^{j}), x^{j} \lor x^{*} - x^{j}⟩ + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \\ \geq & δ^{j} E [F (x^{j} \lor x^{*}) - F (x^{j}) + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \\ \geq & δ^{j} E [F (x^{*}) - F (x^{j}) + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] . \end{matrix}

(94)

Thus, we have

\begin{matrix} L (j + 1) - L (j) \end{matrix}

\begin{matrix} \geq & δ^{j} e^{t_{j + 1}} E [F (x^{*}) - F (x^{j}) + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \end{matrix}

(95)

\begin{matrix} + (e^{t_{j + 1}} - e^{t_{j}}) (E [F (x^{j})] - F (x^{*})) \\ \geq & δ^{j} e^{t_{j + 1}} E [F (x^{*}) - F (x^{j}) - \frac{ϵ}{1 - ϵ}] + (e^{t_{j + 1}} - e^{t_{j}}) (E [F (x^{j})] - F (x^{*})) \\ = & (δ^{j} e^{t_{j + 1}} - e^{t_{j + 1}} + e^{t_{j}}) (F (x^{*}) - E [F (x^{j})]) - δ^{j} e^{t_{j + 1}} \frac{ϵ}{1 - ϵ} \end{matrix}

\begin{matrix} \geq & - δ^{j} e^{t_{j + 1}} \frac{ϵ}{1 - ϵ} . \end{matrix}

(96)

Summing up all the above inequalities from

j = 0

to

K - 1

yields

\begin{matrix} L (K) - L (0) & = & \sum_{j = 0}^{K - 1} L (j + 1) - L (j) \\ \geq & \sum_{j = 0}^{K - 1} - δ^{j} e^{t_{j + 1}} \frac{ϵ}{1 - ϵ} \\ \geq & - e \frac{ϵ}{1 - ϵ} . \end{matrix}

(97)

On the other hand, by the definition of function L, we have

\begin{matrix} L (K) - L (0) & = & e (E [F (x^{K})] - F (x^{*})) - (E [F (0)] - F (x^{*})) \\ = & e E [F (x)] + (1 - e) F (x^{*}) . \end{matrix}

(98)

Then

\begin{matrix} E [F (x^{K})] \geq (1 - \frac{1}{e}) F (x^{*}) - \frac{ϵ}{1 - ϵ} . \end{matrix}

(99)

□

4.2. Stochastic Non-Monotone DR-Submodular Maximization

This subsection investigates the stochastic maximization of non-monotone DR-submodular functions under two constraint classes: down-closed convex sets and general convex domains.

4.2.1. Down-Closed Constraint

Algorithm 6 is designed for a stochastic non-monotone DR-submodular function with a down-closed constraint.

Algorithm 6: Bisection Stochastic MCG

Theorem 6.

Assume that

X \subseteq {[0, 1]}^{n}

is down-closed and set

m = O (ϵ^{- 3} log (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ σ} + \frac{n}{σ}))

. Under Assumption 2, Algorithm 6 outputs a solution x satisfying

\begin{matrix} E [F (x)] \geq e^{- 1} F (x^{*}) - ϵ . \end{matrix}

(101)

With probability

1 - σ

, the LOO oracle complexity is bounded by

O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + n)

and the gradient evaluation complexity is bounded by

O (ϵ^{- 3} (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + n) \cdot log \frac{n L}{ϵ^{2}} \cdot log \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ σ}) .

(102)

The proof of is presented in Appendix D.

4.2.2. General Convex Constraint

In this subsection, we present the dynamic stepsize algorithm for solving the maximization of stochastic non-monotone DR-submodular functions with general convex constraints.

Theorem 7.

Algorithm 7 outputs a solution x satisfying

\begin{matrix} E [F (x)] \geq \frac{1}{4} F (x^{*}) - \frac{ϵ}{2} . \end{matrix}

(103)

With probability

1 - σ

, the LOO oracle complexity is bounded by

O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + n)

and the gradient evaluation complexity is bounded by

O (ϵ^{- 3} (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + n) \cdot log \frac{n L}{ϵ^{2}} \cdot log \frac{∥ \nabla F (x^{0}) ∥_{1}}{ϵ σ})

.

The proof of is presented in Appendix E.

Algorithm 7: Bisection Stochastic Frank–Wolfe

5. Examples

To explore the potential acceleration offered by a dynamic stepsize strategy, we present three illustrative examples in this section.

Multilinear Relaxation for Submodular Maximization. Let V be a finite ground set, and let

f : 2^{V} \to R

be a function. The multilinear extension of f is defined as

\begin{matrix} F (x) = \sum_{S \subseteq V} f (S) \prod_{i \in S} x_{i} \prod_{i \notin S} (1 - x_{i}), \end{matrix}

(105)

where

x \in {[0, 1]}^{| V |}

. It is well known that the function f is submodular (i.e.,

f (S \cap {e}) - f (S) \geq f (T \cap {e}) - f (T)

for any

S \subseteq T \subseteq V

and

e \notin T

) if and only if F is DR-submodular. Therefore, maximizing a submodular function f can be achieved by first solving the maximization of its multilinear extension and then obtaining a feasible solution to the original problem through a rounding method. Such algorithms are known to provide strong approximation guarantees [5,19].

Let

e_{i}

denote the vector whose i-th entry is 1 and all other entries are 0. The upper bound of

{∥ \nabla F (0) ∥}_{1}

can be derived as follows.

Lemma 7.

Let F denote of a submodular set function f. Suppose the feasible set

X

satisfies

X \supseteq {e_{i}}_{i = 1}^{n}

and we have

{∥ \nabla F (0) ∥}_{1} \leq n F (x^{*}) .

(106)

For the multilinear extension of a submodular set function, the Lipschitz constant L is given by

L = O (n^{2}) F (x^{*})

[6].

Softmax Relaxation for DPP MAP Problem. Determinantal point processes (DPPs) are probabilistic models that emphasize diversity by capturing repulsive interactions, making them highly valuable in machine learning for tasks requiring varied selections. Let H denote the positive semi-definite kernel matrix associated with a DPP. The softmax extension of the DPP maximum a posteriori (MAP) problem is expressed as

\begin{matrix} F (x) = \log \det (diag (x) (H - I) + I), x \in {[0, 1]}^{n}, \end{matrix}

(107)

where I represents the identity matrix. Based on Corollary 2 in [4], the gradient of the softmax extension

F (x)

can be written as follows:

\begin{matrix} \nabla_{i} F (x) = {({(diag (x) (H - I) + I)}^{- 1} (H - I))}_{i i}, \forall i \in [n] . \end{matrix}

(108)

Consequently, the

ℓ_{1}

-norm of the gradient at

0

is given by

{∥ \nabla F (0) ∥}_{1} = \sum_{i = 1}^{n} | H_{i i} - 1 | .

(109)

In practical scenarios involving DPPs, the matrix H is often a Gram matrix, where the diagonal elements

H_{i i}

are universally bounded. This implies that the asymptotic growth of

{∥ \nabla F (0) ∥}_{1}

is upper-bounded by

O (n)

.

DR-Submodular Quadratic Functions. Consider a quadratic function

F (x)

of the form

\begin{matrix} F (x) = \frac{1}{2} x^{T} A x + a^{T} x + c, \end{matrix}

(110)

where

A \in R_{-}^{n \times n}

(i.e., A is a matrix with non-positive entries). In this case,

F (x)

is DR-submodular. It is straightforward to verify that

{∥ \nabla F (0) ∥}_{1} = {∥ a ∥}_{1} \leq n,

and the gradient Lipschitz constant is given by

L = {∥ A ∥}_{2} .

The computational complexities of the algorithms designed to address the three constrained DR-submodular function maximization problems outlined earlier are compiled in Table 2. The results presented in the table reveal that the dynamic stepsize strategy introduced in this work offers a significant advantage in terms of complexity over the constant stepsize approach, both for the MLE and softmax relaxation problems. However, in the quadratic case, a definitive comparison of their complexities cannot be made, because they are determined by the

ℓ_{1}

-norm of the linear term vector and the

ℓ_{2}

-norm of the quadratic term matrix, respectively, and there is no inherent relationship between the magnitudes of these two quantities.

Table 2. Comparison of complexities between dynamic and constant stepsizes for three examples (grad.eval: complexity of gradient evaluation).

Numerical Experiments

We conduct numerical experiments to evaluate different stepsize selection strategies for solving DR-submodular maximization problems. Our investigation focuses on two fundamental classes of objective functions: quadratic DR-submodular functions and softmax extension functions. The experimental framework builds upon established methodologies from [9,14], with necessary adaptations for our specific analysis.

Our experimental evaluation considers two problem classes: softmax extension problems and quadratic DR-submodular problems with linear constraints. Since neither problem class inherently satisfies monotonicity, we augment both functions with an additional

b^{T} x

term, where b is a positive vector with components in appropriate ranges. This modification enables the verification of Algorithms 2 and 5 by ensuring monotonicity preservation.

We evaluate Algorithms 2–4 on the softmax extension problems, while testing the stochastic algorithms (Algorithms 5–7) on quadratic DR-submodular problems with incorporated random variables. The randomization methodology follows the principled approach outlined in [14].

For each problem class, we consider decision space dimensions

n \in {40, 100}

, with the number of constraints m set as

0.5 n

for each dimension. The approximation parameter

ϵ

is fixed at 0.1, and the constant stepsize strategy employs 100 iterations. Each configuration is executed with five independent trials, with averaged results reported.

Figure 1 presents the performance comparison for softmax extension problems, while Figure 2 displays the results for stochastic quadratic DR-submodular problems. Both figures demonstrate the evolution of achieved function values across different stepsize strategies, providing empirical insights into algorithmic efficiency.

Figure 1. Numerical results for softmax problems.

Figure 2. Numerical results for stochastic quadratic DR-submodular problems.

From the numerical results, we observe that for both deterministic and stochastic problems, dynamic stepsizes generally lead to lower iteration complexity compared to constant stepsizes, especially for larger problem dimensions. This finding highlights the advantages of using dynamic stepsizes in solving DR-submodular maximization problems.

6. Conclusions

This paper introduces a dynamic stepsize strategy for DR-submodular maximization, achieving iteration complexities independent of the smoothness parameter L. In deterministic settings, monotone cases attain

(1 - 1 / e)

-approximation with

O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})

iterations, while non-monotone problems under down-closed or general convex constraints achieve

1 / e

and

1 / 4

-approximations with

O (n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})

iterations. For stochastic optimization, variance reduction techniques (e.g., SPIDER) further reduce gradient evaluation complexities while maintaining high-probability guarantees. Empirical results on multilinear extensions, DPP softmax relaxations, and DR-submodular quadratics validate the practical efficiency of our methods compared to fixed stepsize baselines.

Our work has three key limitations: first, while our dynamic strategy matches the iteration complexity of fixed stepsize methods, it does not guarantee superiority for all L-smooth DR-submodular functions. Second, Algorithm 1 avoids the L-smoothness assumption but requires a univariate equation oracle to solve Equation (22), which lacks practical applications as no real-world examples have been identified to support this assumption. Third, our stepsize mechanism heavily relies on the DR-submodularity property, limiting its applicability to non-DR-submodular functions or mixed-integer domains. These limitations highlight opportunities for future research to extend our framework to broader function classes and practical scenarios.

Author Contributions

Conceptualization, Y.Z.; Methodology, Q.L. and Y.Z.; Validation, M.L.; Formal analysis, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

The author Yang Zhou was supported by the National Natural Science Foundation of China (No. 12371099). This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial or non-financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Proof of Lemma 4

Proof.

By the algorithm, we can first obtain the following relations:

\begin{matrix} x^{K} & = & x^{0} + \sum_{j = 1}^{K - 1} \frac{δ^{j}}{e^{δ^{j}}} v^{j} \end{matrix}

(A1)

\begin{matrix} \leq & \sum_{j = 1}^{K - 1} δ^{j} v^{j} \in P . \end{matrix}

(A2)

The conclusions can then be obtained by the down-closeness of P. □

Appendix B. Proof of Lemma 6

Proof.

The conclusion can be derived by the fact that

x^{j} \in P

for

\forall j = 0, \dots, K

, which can be proved by induction. Note that

x^{0} \in P

and

x^{j + 1} = 2^{- δ^{j}} x^{j} + (1 - 2^{- δ^{j}}) v^{j}

is a convex combination of

x^{j}

and

v^{j} \in P

. Thus,

x^{j + 1} \in P

if

x^{j} \in P

and the proof is completed. □

Appendix C. Proof of Theorem 4

Proof.

For Algorithm 4, we have the claim that for

\forall j = 0, \dots, K

, there holds

\begin{matrix} ∥ x^{j} ∥_{\infty} \leq 1 - 2^{- t_{j}} . \end{matrix}

(A3)

We first prove the claim by induction. Note that

x^{0} = 0

satisfies the inequality. Assume that for some

j = 0, \dots, K - 1

, the inequality holds; then for

\forall i \in [n]

, we have

\begin{matrix} {(x^{j + 1})}_{i} & = & {(x^{j})}_{i} + (1 - 2^{- δ^{j}}) [{(v^{j})}_{i} - {(x^{j})}_{i}] \\ \leq & 2^{- δ^{j}} (1 - 2^{- t_{j}}) + 1 - 2^{- δ^{j}} \\ \leq & 1 - 2^{- t_{j} - δ^{j}} = 1 - 2^{- t_{j + 1}} . \end{matrix}

(A4)

Thus, inequality (A3) is proved. Redefine the potential function

L (j) = 4^{t_{j}} F (x^{j}) - 2^{t_{j}} F (x^{*})

; then

\begin{matrix} L (j + 1) - L (j) \\ = & 4^{t_{j + 1}} F (x^{j + 1}) - 2^{t_{j + 1}} F (x^{*}) - 4^{t_{j}} F (x^{j}) + 2^{t_{j}} F (x^{*}) \\ = & 4^{t_{j + 1}} (F (x^{j + 1}) - F (x^{j})) + [4^{t_{j + 1}} - 4^{t_{j}}] F (x^{j}) - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) \\ r & = & 4^{t_{j + 1}} \int_{0}^{1 - 2^{- δ^{j}}} ⟨\nabla F (x^{j} + t (v^{j} - x^{j})), v^{j} - x^{j}⟩ d t + [4^{t_{j + 1}} - 4^{t_{j}}] F (x^{j}) - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) . \end{matrix}

(A5)

Together by inequalities (73), (A3), and

\begin{matrix} \int_{0}^{1 - 2^{- δ^{j}}} ⟨\nabla F (x^{j} + t (v^{j} - x^{j})), v^{j} - x^{j}⟩ d t \\ \geq & (1 - 2^{- δ^{j}}) ⟨\nabla F (x^{j} + (1 - 2^{- δ^{j}}) (v^{j} - x^{j})), v^{j} - x^{j}⟩ \\ \geq & (1 - 2^{- δ^{j}}) [⟨\nabla F (x^{j}), v^{j} - x^{j}⟩ - ϵ] \\ \geq & (1 - 2^{- δ^{j}}) [⟨\nabla F (x^{j}), x^{*} - x^{j}⟩ - ϵ] \\ \geq & (1 - 2^{- δ^{j}}) [F (x^{j} \lor x^{*}) - 2 F (x^{j}) - ϵ], \end{matrix}

(A6)

we obtain

\begin{matrix} L (j + 1) - L (j) \\ = & 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) [F (x^{j} \lor x^{*}) - 2 F (x^{j}) - ϵ] + [4^{t_{j + 1}} - 4^{t_{j}}] F (x^{j}) - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) \\ \geq & 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) [2^{- t_{j}} F (x^{*}) - 2 F (x^{j}) - ϵ] + [4^{t_{j + 1}} - 4^{t_{j}}] F (x^{j}) - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) \\ = & 2^{t_{j}} {(2^{δ^{j}} - 1)}^{2} [F (x^{*}) - 2^{t_{j}} F (x^{j})] - 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) ϵ . \end{matrix}

(A7)

Note that if there exists a

j \in {0, \dots, K - 1}

such that

F (x^{*}) - 2^{t_{j}} F (x^{j}) \leq 0

, then by the form of the output x in Algorithm 4, we have

F (x) \geq F (x^{j}) \geq 2^{- t_{j}} F (x^{*}) \geq \frac{1}{2} F (x^{*})

, which satisfies (103). Otherwise, we have

\begin{matrix} r L (j + 1) - L (j) & \geq & - 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) ϵ, \end{matrix}

(A8)

and

\begin{matrix} L (K) - L (0) & \geq & \sum_{j = 0}^{K - 1} - 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) ϵ \\ = & - \sum_{j = 0}^{K - 1} 2^{t_{j + 1}} (2^{t_{j + 1}} - 2^{t_{j}}) ϵ \\ \geq & - 2 \sum_{j = 0}^{K - 1} (2^{t_{j + 1}} - 2^{t_{j}}) ϵ \\ = & - 2 ϵ . \end{matrix}

(A9)

Together with the fact that

\begin{matrix} L (K) - L (0) & = & 4 F (x) - F (x^{*}), \end{matrix}

(A10)

we complete the proof. □

Appendix D. Proof of Theorem 6

Proof.

Similar to Lemma 5, the difference in the analysis is as follows.

For

j \notin S

, by the monotonicity of

\nabla F

w.r.t. K, we obtain the following effect with a high probability

(1 - e^{- \frac{ϵ^{3} m}{6 (1 + ϵ)}})^{K}

from Theorem 5:

\begin{matrix} E [⟨\nabla f {(x^{j}, M_{j})}^{+} - \nabla f {(x^{j + 1}, M_{j})}^{+}, 1⟩] & = & ⟨\nabla F {(x^{j})}^{+} - \nabla F {(x^{j + 1})}^{+}, 1⟩ \\ \geq & ⟨\nabla F {(x^{j})}^{+} - \nabla F {(x^{j + 1})}^{+}, v^{j}⟩ \\ = & E [⟨\nabla f {(x^{j}, M_{j})}^{+} - \nabla f {(x^{j + 1}, M_{j})}^{+}, v^{j}⟩] \\ \geq & \frac{1}{1 + ϵ} \cdot \frac{ϵ}{2} . \end{matrix}

(A11)

since

v^{j}

,

\nabla F (x^{j})

and

\nabla F (x^{j + 1})

are entry-wisely of the same sign. By the monotonicity of

\nabla F (x)

, the number of these iterations can be bounded as

\begin{matrix} {∥ \nabla F (0) ∥}_{1} & \geq & {∥ \nabla F (0)}^{+} ∥_{1} - {∥ \nabla F {(x^{K - 2})}^{+} ∥}_{1} \\ = & \sum_{j = 0}^{K - 3} ⟨\nabla F {(x^{j})}^{+}, 1⟩ - ⟨\nabla F {(x^{j + 1})}^{+}, 1⟩ \\ \geq & \sum_{j = 0}^{K - 3} E [⟨\nabla f {(x^{j}, M_{j})}^{+} - \nabla f {(x^{j + 1}, M_{j})}^{+}, v^{j}⟩] \\ \geq & \sum_{j \notin S} E [⟨\nabla f {(x^{j}, M_{j})}^{+} - \nabla f {(x^{j + 1}, M_{j})}^{+}, v^{j}⟩] \\ \geq & |[K - 2] ∖ S| \cdot \frac{1}{1 + ϵ} \cdot \frac{ϵ}{2} . \end{matrix}

(A12)

From Theorem 5, the number m of set

M_{j}

is at most

6 ϵ^{- 3} (1 + ϵ) ln (\frac{{2 (1 + ϵ) ∥ \nabla F (0) ∥}_{1}}{ϵ σ} + \frac{n + 2}{σ})

.

Now, we present the proof of the approximation ratio. Redefine the potential function as

L (j) = e^{t_{j}} E [F (x^{j})] - t_{j} F (x^{*})

. Then for

j = 0, \dots K - 1

, the following holds:

\begin{matrix} L (j + 1) - L (j) \\ = & e^{t_{j + 1}} E [F (x^{j + 1})] - t_{j + 1} F (x^{*}) - e^{t_{j}} E [F (x^{j})] + t_{j} F (x^{*}) \\ = & e^{t_{j + 1}} (E [F (x^{j + 1})] - E [F (x^{j})]) + (e^{t_{j + 1}} - e^{t_{j}}) E [F (x^{j})] - (t_{j + 1} - t_{j}) F (x^{*}) \\ = & e^{t_{j + 1}} E [\int_{0}^{\frac{δ^{j}}{e^{δ^{j}}}} ⟨\nabla F (x^{j} + t v^{j}), v^{j}⟩ d t] + (e^{t_{j + 1}} - e^{t_{j}}) E [F (x^{j})] - (t_{j + 1} - t_{j}) F (x^{*}) . \end{matrix}

(A13)

We need to prove a lower bound on the difference of function values between two adjacent iteration points when F is non-monotone:

\begin{matrix} \int_{0}^{\frac{δ^{j}}{e^{δ^{j}}}} ⟨\nabla F (x^{j} + t v^{j}), v^{j}⟩ d t \\ \geq & \int_{0}^{\frac{δ^{j}}{e^{δ^{j}}}} ⟨\nabla F (x^{j} + \frac{δ^{j}}{e^{δ^{j}}} v^{j}), v^{j}⟩ d t \\ = & \frac{δ^{j}}{e^{δ^{j}}} ⟨\nabla F (x^{j} + \frac{δ^{j}}{e^{δ^{j}}} v^{j}), v^{j}⟩ \\ = & \frac{δ^{j}}{e^{δ^{j}}} ⟨E [h_{j + 1}], v^{j}⟩ \\ = & \frac{δ^{j}}{e^{δ^{j}}} ⟨E [h_{j} + \nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩ \\ \geq & \frac{δ^{j}}{e^{δ^{j}}} [⟨E [h_{j}], x^{j} \lor x^{*} - x^{j}⟩ + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \\ \geq & \frac{δ^{j}}{e^{δ^{j}}} [F (x^{j}) \lor x^{*}) - F (x^{j}) + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j}⟩] \\ \geq & \frac{δ^{j}}{e^{δ^{j}}} [(1 - ∥ x^{j} ∥_{\infty}) F (x^{*}) - F (x^{j}) - \frac{ϵ}{1 - ϵ}], \end{matrix}

(A14)

where the fourth inequality is by Lemma 3 in [1], which implies that the following inequality holds:

\begin{matrix} F (x \lor x^{*}) & \geq & (1 - {∥ x ∥}_{\infty}) F (x^{*}) = (1 - {max}_{i = 1}^{n} x_{i} (t)) F (x^{*}) . \end{matrix}

(A15)

for

\forall x \in P

. From the deterministic case, for the upper bound on the

ℓ_{\infty}

-norm of

x^{j}

, we have the following claim.

Claim 2.

For all the iteration points

x^{j}

,

j = 0, \dots, K

, of Algorithm 3, we have

∥ x^{j} ∥_{\infty} \leq 1 - e^{- t_{j}}

.

So, the above formula yields

\begin{matrix} L (j + 1) - L (j) \\ \geq & e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}} [(1 - ∥ x^{j} ∥_{\infty}) F (x^{*}) - F (x^{j}) - \frac{ϵ}{1 - ϵ}] + (e^{t_{j + 1}} - e^{t_{j}}) E [F (x^{j})] - (t_{j + 1} - t_{j}) F (x^{*}) \\ = & - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}} \frac{ϵ}{1 - ϵ} + (e^{t_{j + 1} - t_{j}} \frac{δ^{j}}{e^{δ^{j}}} - δ^{j}) F (x^{*}) + (e^{t_{j + 1}} - e^{t_{j}} - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}}) F (x^{j}) \\ = & - e^{t_{j}} δ^{j} \frac{ϵ}{1 - ϵ} + (e^{t_{j + 1}} - e^{t_{j}} - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}}) E [F (x^{j})] \\ \geq & - e^{t_{j + 1}} \frac{δ^{j}}{e^{δ^{j}}} \frac{ϵ}{1 - ϵ} . \end{matrix}

(A16)

By summing Equation (A16) over j from 0 to

K - 1

, we obtain

\begin{matrix} L (K) - L (0) & \geq & \sum_{j = 0}^{K - 1} - e^{t_{j}} δ^{j} \frac{ϵ}{1 - ϵ} \\ \geq & - e \frac{ϵ}{1 - ϵ} \sum_{j = 0}^{K - 1} δ^{j} = - e \frac{ϵ}{1 - ϵ} . \end{matrix}

(A17)

Together with the fact that

L (K) - L (0) = e E [F (x)] - F (x^{*})

, we obtain the theorem. □

Appendix E. Proof of Theorem 7

Proof.

For Algorithm 7, we have the claim that for

\forall j = 0, \dots, K

, there holds

\begin{matrix} ∥ x^{j} ∥_{\infty} \leq 1 - 2^{- t_{j}} . \end{matrix}

(A18)

Redefine the potential function

L (j) = 4^{t_{j}} E [F (x^{j})] - 2^{t_{j}} F (x^{*})

; then

\begin{matrix} L (j + 1) - L (j) \\ = & 4^{t_{j + 1}} E [F (x^{j + 1})] - 2^{t_{j + 1}} F (x^{*}) - 4^{t_{j}} E [F (x^{j})] + 2^{t_{j}} F (x^{*}) \\ = & 4^{t_{j + 1}} (E [F (x^{j + 1})] - E [F (x^{j})]) + [4^{t_{j + 1}} - 4^{t_{j}}] E [F (x^{j})] - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) \\ = & 4^{t_{j + 1}} E [\int_{0}^{1 - 2^{- δ^{j}}} ⟨\nabla F (x^{j} + t (v^{j} - x^{j})), v^{j} - x^{j}⟩ d t] \\ + [4^{t_{j + 1}} - 4^{t_{j}}] E [F (x^{j})] - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) . \end{matrix}

(A19)

Together by inequalities (73), (A3), and

\begin{matrix} \int_{0}^{1 - 2^{- δ^{j}}} ⟨\nabla F (x^{j} + t (v^{j} - x^{j})), v^{j} - x^{j}⟩ d t \\ \geq & (1 - 2^{- δ^{j}}) ⟨\nabla F (x^{j} + (1 - 2^{- δ^{j}}) (v^{j} - x^{j})), v^{j} - x^{j}⟩ \\ = & (1 - 2^{- δ^{j}}) [⟨E [h_{j + 1}], v^{j} - x^{j}⟩] \\ = & (1 - 2^{- δ^{j}}) [⟨E [h_{j} + \nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j} - x^{j}⟩] \\ = & (1 - 2^{- δ^{j}}) [⟨E [h_{j}], v^{j} - x^{j}⟩ + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j} - x^{j}⟩] \\ \geq & (1 - 2^{- δ^{j}}) [⟨E [h_{j}], x^{*} - x^{j}⟩ + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j} - x^{j}⟩] \\ \geq & (1 - 2^{- δ^{j}}) [F (x^{j} \lor x^{*}) - 2 F (x^{j}) + ⟨E [\nabla f (x^{j + 1}, M_{j + 1}) - \nabla f (x^{j}, M_{j + 1})], v^{j} - x^{j}⟩] \\ \geq & (1 - 2^{- δ^{j}}) [F (x^{j} \lor x^{*}) - 2 F (x^{j}) - \frac{ϵ}{1 - ϵ}], \end{matrix}

(A20)

we obtain

\begin{matrix} L (j + 1) - L (j) \\ = & 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) [E [F (x^{j} \lor x^{*}) - 2 F (x^{j})] - \frac{ϵ}{1 - ϵ}] \\ + [4^{t_{j + 1}} - 4^{t_{j}}] E [F (x^{j})] - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) \\ \geq & 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) [2^{- t_{j}} F (x^{*}) - 2 E [F (x^{j})] - \frac{ϵ}{1 - ϵ}] \\ + [4^{t_{j + 1}} - 4^{t_{j}}] E [F (x^{j})] - (2^{t_{j + 1}} - 2^{t_{j}}) F (x^{*}) \\ = & 2^{t_{j}} {(2^{δ^{j}} - 1)}^{2} [F (x^{*}) - 2^{t_{j}} E [F (x^{j})]] - 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) \frac{ϵ}{1 - ϵ} . \end{matrix}

(A21)

Note that if there exists a

j \in {0, \dots, K - - 1}

such that

F (x^{*}) - 2^{t_{j}} E [F (x^{j})] \leq 0

, then by the form of the output x in Algorithm 7, we have

E [F (x)] \geq E [F (x^{j})] \geq 2^{- t_{j}} F (x^{*}) \geq \frac{1}{2} F (x^{*})

, which satisfies (103). Otherwise, we have

\begin{matrix} L (j + 1) - L (j) & \geq & - 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) \frac{ϵ}{1 - ϵ}, \end{matrix}

(A22)

and

\begin{matrix} L (K) - L (0) & \geq & \sum_{j = 0}^{K - 1} - 4^{t_{j + 1}} (1 - 2^{- δ^{j}}) \frac{ϵ}{1 - ϵ} \\ = & - \sum_{j = 0}^{K - 1} 2^{t_{j + 1}} (2^{t_{j + 1}} - 2^{t_{j}}) \frac{ϵ}{1 - ϵ} \\ \geq & - 2 \sum_{j = 0}^{K - 1} (2^{t_{j + 1}} - 2^{t_{j}}) \frac{ϵ}{1 - ϵ} \\ = & - 2 \frac{ϵ}{1 - ϵ} . \end{matrix}

(A23)

Together with the fact that

\begin{matrix} L (K) - L (0) & = & 4 E [F (x)] - F (x^{*}), \end{matrix}

(A24)

we complete the proof. □

References

Bian, A.; Levy, K.; Krause, A.; Buhmann, J.M. Continuous DR-submodular maximization: Structure and algorithms. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 486–496. [Google Scholar]
Bian, A.A.; Mirzasoleiman, B.; Buhmann, J.; Krause, A. Guaranteed non-convex optimization: Submodular maximization over continuous domains. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 2017, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 111–120. [Google Scholar]
Bian, Y.; Buhmann, J.M.; Krause, A. Continuous submodular function maximization. arXiv 2020, arXiv:2006.13474. [Google Scholar]
Gillenwater, J.; Kulesza, A.; Taskar, B. Near-Optimal MAP Inference for Determinantal Point Processes. In Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS’12, Lake Tahoe, NV, USA, 3–6 December 2012; Curran Associates Inc.: Red Hook, NY, USA, 2012; Volume 2, pp. 2735–2743. [Google Scholar]
Calinescu, G.; Chekuri, C.; Pál, M.; Vondrák, J. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 2011, 40, 1740–1766. [Google Scholar] [CrossRef]
Feldman, M.; Naor, J.; Schwartz, R. A unified continuous greedy algorithm for submodular maximization. In Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS ’11, Palm Springs, CA, USA, 22–25 October 2011; IEEE Computer Society: Washington, DC, USA, 2011; pp. 570–579. [Google Scholar]
Niazadeh, R.; Roughgarden, T.; Wang, J.R. Optimal Algorithms for Continuous Non-Monotone Submodular and DR-Submodular Maximization. J. Mach. Learn. Res. 2020, 21, 1–31. [Google Scholar]
Buchbinder, N.; Feldman, M. Constrained submodular maximization via new bounds for dr-submodular functions. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, Vancouver, BC, Canada, 24–28 June 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1820–1831. [Google Scholar]
Du, D.; Liu, Z.; Wu, C.; Xu, D.; Zhou, Y. An improved approximation algorithm for maximizing a DR-submodular function over a convex set. arXiv 2022, arXiv:2203.14740. [Google Scholar]
Du, D. Lyapunov function approach for approximation algorithm design and analysis: With applications in submodular maximization. arXiv 2022, arXiv:2205.12442. [Google Scholar]
Mualem, L.; Feldman, M. Resolving the Approximability of Offline and Online Non-Monotone DR-Submodular Maximization over General Convex Sets. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Valencia, Spain, 25–27 April 2023; pp. 2542–2564. [Google Scholar]
Mokhtari, A.; Hassani, H.; Karbasi, A. Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization. arXiv 2018, arXiv:1804.09554. [Google Scholar]
Hassani, H.; Karbasi, A.; Mokhtari, A.; Shen, Z. Stochastic Conditional Gradient++: (Non)Convex Minimization and Continuous Submodular Maximization. SIAM J. Optim. 2020, 30, 3315–3344. [Google Scholar] [CrossRef]
Lian, Y.; Xu, D.; Du, D.; Zhou, Y. A Stochastic Non-Monotone DR-Submodular Maximization Problem over a Convex Set. In Proceedings of the Computing and Combinatorics: 28th International Conference, COCOON 2022, Shenzhen, China, 22–24 October 2022; Springer Nature: Berlin/Heidelberg, Germany, 2023; Volume 13595, pp. 1–11. [Google Scholar]
Chen, L.; Feldman, M.; Karbasi, A. Unconstrained submodular maximization with constant adaptive complexity. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, Phoenix, AZ, USA, 23–26 June 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 102–113. [Google Scholar]
Ene, A.; Nguyen, H. Parallel algorithm for non-monotone DR-submodular maximization. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 2902–2911. [Google Scholar]
Delahaye, D.; Chaimatanan, S.; Mongeau, M. Simulated annealing: From basics to applications. In Handbook of Metaheuristics; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–35. [Google Scholar]
Hassani, H.; Soltanolkotabi, M.; Karbasi, A. Gradient Methods for Submodular Maximization. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 5843–5853. [Google Scholar]
Chekuri, C.; Vondrák, J.; Zenklusen, R. Submodular Function Maximization via the Multilinear Relaxation and Contention Resolution Schemes. SIAM J. Comput. 2014, 43, 1831–1879. [Google Scholar] [CrossRef]

Figure 1. Numerical results for softmax problems.

Figure 2. Numerical results for stochastic quadratic DR-submodular problems.

Table 1. Algorithms and theoretical guarantees in this paper. D = Deterministic; S = Stochastic; M = Monotone; NM = Non-Monotone; DC = Down-closed; GC = General convex; Grad Eval = Gradient evaluation complexity; S-Grad Eval = Single (per-sample) gradient evaluation complexity.

Algorithm	Setting	Complexity	Approximation Ratio
Algorithm 2	D-M/GC	Iter: $O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$	$1 - 1 / e$ (Theorem 2)
		Grad eval: $O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} log \frac{n L}{ϵ})$
Algorithm 3	D-NM/DC	Iter: $O (n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$	$\frac{1}{e}$ (Theorem 3)
		Grad eval: $O ((n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ}) log \frac{n L}{ϵ})$
Algorithm 4	D-NM/GC	Iter: $O (n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$	$\frac{1}{4}$ (Theorem 4)
		Grad eval: $O ((n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ}) log \frac{n L}{ϵ})$
Algorithm 5	S-M/GC	Oracle: $O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$	$1 - 1 / e$ (Theorem 5)
		S-Grad eval: $O (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ^{4}} \cdot log \frac{n L}{ϵ^{2}} \cdot log \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ σ})$
Algorithm 6	S-NM/DC	Oracle: $O (n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$	$\frac{1}{e}$ (Theorem 6)
		S-Grad eval: $O (ϵ^{- 3} (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + n) \cdot log \frac{n L}{ϵ^{2}} \cdot log \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ σ})$
Algorithm 7	S-NM/GC	Oracle: $O (n + \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ})$	$\frac{1}{4}$ (Theorem 7)
		S-Grad eval: $O (ϵ^{- 3} (\frac{{∥ \nabla F (0) ∥}_{1}}{ϵ} + n) \cdot log \frac{n L}{ϵ^{2}} \cdot log \frac{{∥ \nabla F (0) ∥}_{1}}{ϵ σ})$

Table 2. Comparison of complexities between dynamic and constant stepsizes for three examples (grad.eval: complexity of gradient evaluation).

Examples	Dynamic Stepsize		Constant Stepsize
Examples	LOO	grad.eval	LOO (grad.eval)
MLE	$O (\frac{n}{ϵ})$	$O (\frac{n}{ϵ} log \frac{n}{ϵ})$	$O (\frac{n^{3}}{ϵ})$
Softmax	$O (\frac{n}{ϵ})$	$O (\frac{n}{ϵ} log \frac{n L}{ϵ})$	$O (\frac{n L}{ϵ})$
Quadratic	$O (\frac{{\| \| a \| \|}_{1}}{ϵ})$	$O (\frac{{\| \| a \| \|}_{1}}{ϵ} log \frac{{n \| \| A \| \|}_{2}}{ϵ})$	$O (\frac{{n \| \| A \| \|}_{2}}{ϵ})$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

Dynamic Stepsize Techniques in DR-Submodular Maximization †

Abstract

1. Introduction

1.1. Contributions

1.2. Organizations

2. Preliminaries

Lyapunov Method for DR-Submodular Maximization

3. Deterministic Scenarios

3.1. Monotone Case

3.1.1. An Idealized Algorithm

3.1.2. Algorithm with Binary Search

3.2. Non-Monotone Case

3.2.1. Down-Closed Constraint

3.2.2. General Convex Constraint

4. Stochastic DR-Submodular Function Maximization

4.1. Stochastic Monotone DR-Submodular Maximization

4.2. Stochastic Non-Monotone DR-Submodular Maximization

4.2.1. Down-Closed Constraint

4.2.2. General Convex Constraint

5. Examples

Numerical Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Lemma 4

Appendix B. Proof of Lemma 6

Appendix C. Proof of Theorem 4

Appendix D. Proof of Theorem 6

Appendix E. Proof of Theorem 7

References

Article Metrics

Article Access Statistics

Dynamic Stepsize Techniques in DR-Submodular Maximization^†