1. Introduction
A compact convex subset K of having interior points is called a convex body, whose boundary and interior are denoted by and , respectively. Let be the set of convex bodies in . For each , let be the least number of translates of necessary to cover K. Regarding the least upper bound of in , there is a long-standing conjecture:
Conjecture 1 (Hadwiger’s covering conjecture).
For each , we havethe equality holds if and only if K is a parallelotope. Classical results related to this conjecture can be found in [
1,
2]. While extensive research has been conducted (see e.g., refs. [
3,
4,
5,
6,
7]), Conjecture 1 has only been conclusively resolved when
.
For each , set .
Let
. A set having the form
, where
and
, is called a
smaller homothetic copy of
K. According to Theorem 34.3 in [
1],
equals the least number of smaller homothetic copies of
K required to cover
K. Clearly,
for some
if and only if
, where
For each
, the map
is an affine invariant and is called the
covering functional with respect to p. Let
and
. A set
C of
p points satisfying
is called a
p-optimal configuration of K.
In [
8], Chuanming Zong proposed the first program based on computers to tackle Conjecture 1 via estimating covering functionals. Two different algorithms have been designed for this purpose. The first one is introduced by Chan He et al. (cf. [
9]) based on the geometric branch-and-bound method (cf. [
10]). The algorithm is implemented in two parts. The first part uses geometric branch-and-bound methods to estimate
, where
The second part also uses geometric branch-and-bound methods to estimate
. When
, computing
and
in this way exhibits a significantly high computational complexity. The other is introduced by Man Yu et al. (cf. [
11]) based on the relaxation algorithm. Let
be a discretization of
K,
be a set containing a
p-optimal configuration of
K, and
. They transformed the problem of covering
S by smaller homothetic copies of
K into a vertex
p-center problem, and showed that the solution of the corresponding vertex
p-center problem is a good approximation of
by proving
where
and
and
are two positive numbers satisfying
Clearly, finer discretizations of K are required to obtain more accurate estimates of , which will lead to higher computational complexity.
In this paper, we propose an algorithm utilizes Compute Unified Device Architecture (CUDA) and stochastic global optimization methods to accelerate the process of estimating
. CUDA is a parallel computing platform, particularly well-suited for handling large-scale computational tasks by performing many computations in parallel (cf. [
12]). When discretizing convex bodies, CUDA provides a natural discretization method and enables parallel computation for all discretized points, thereby accelerating the execution of algorithms. As show in
Section 2, when calculating
for some
, we need to obtain the maximum dissimilarity between a point in
S and its closest point in
C. The reduction technique provided by CUDA, which typically involves performing a specific operation on all elements in an array (summation, finding the maximum, finding the minimum, etc., cf. [
13]), enables the efficient computation of
. When facing large-scale optimization problems, stochastic algorithms have the capability to produce high-quality solutions in a short amount of time.
In
Section 2, the problem of estimating
is transformed into a minimization optimization problem, and an error estimation is provided. Using ideas mentioned in [
9], an algorithm based on CUDA for
is designed in
Section 3. Results of computational experiments showing the effectiveness of our algorithm are presented in
Section 4.
2. Covering Functional and Error Estimation
As in [
9], we put
. For each
, let
where
is the set of all non-singular affine transformations on
. We can apply an appropriate affine transformation, if necessary, to ensure that
Remark 1. In general, it is not easy to calculate . If K is symmetric about the origin, then is the Banach–Mazur distance between K and . By Proposition 37.6 in [14], when K is the n-dimensional Euclidean unit ball , . For our purpose, it will be sufficient to choose an α, as large as possible, such that . Definition 1 (cf. [
11]).
Let . For k, , the numberis called the dissimilarity
of k and c. If , the dissimilarity between any two points is precisely the Euclidean distance between these two points. In general, the dissimilarity is not symmetric. If K is an n-dimensional convex polytope, the dissimilarity between any two points can be computed by Lemma 1.
Lemma 1. Let be an n-dimensional convex polytope with determined bywhere A is an m-by-n matrix and B is an m-dimensional column vector whose elements are all 1. For any , we havewhere . Proof. Since
K is bounded,
. Let
Since
, we have
, which implies that
Since
, we have
. Thus,
which implies that
The desired equality (
3) follows from (
4) and (
5). □
Remark 2. Clearly, each convex polytope K containing the origin 0 in its interior can be represented as (2). Let
and
be two integers. We denote by
the
j-th coordinate of a point
and
Lemma 2. .
Proof. Suppose that
. Let
be the point satisfying
Then,
. Let
. If
is even, then
It follows that
. Therefore,
□
Theorem 1. Let and be an integer. If satisfies , then Proof. Let
. There exists a point
such that
Let
,
be an integer,
K be a convex body satisfying
,
and
. Put
and
Proposition 1. Let K, α, i, S be as above, . Then, Proof. Let
be a
p-element subset of
such that
. We have
which completes the proof. □
3. An Algorithm Based on CUDA for
Let
,
, and
C be a set of
p points. First, we use CUDA to obtain
S defined by (
6) and compute the minimum dissimilarity from each point in
S to
C. Then, we employ a CUDA-based reduction algorithm to obtain
. Finally, we use different stochastic global optimization algorithms to estimate
and select an appropriate optimization algorithm through comparison.
Figure 1 shows the overall framework of the algorithm.
3.1. An Algorithm Based on CUDA for
CUDA organizes threads into a hierarchical structure consisting of grids, blocks, and threads. The grid is the highest-level organization of threads in CUDA, and a grid represents a collection of blocks. A block, identified by a unique block index within its grid, is a group of threads that can cooperate with each other and share data using shared memory. Threads are organized within blocks, and each thread is identified by a unique thread index within its block. The number of blocks and threads per block can be specified when launching a CUDA kernel. The grid and block dimensions can be one-dimensional, two-dimensional, or three-dimensional, depending on the problem to address. For more information about CUDA, we refer to [
15,
16,
17,
18].
The organization of threads within blocks and grids provides a natural way to discretize
. First, we discretize
into a set
P of (gridDim.x)
(gridDim.y) points. Each point
p in
P corresponds to a block
in CUDA. And
contains a collection of blockDim.x threads, each one of which corresponds to a point in
. See
Figure 2, where gridDim.x, girdDim.y, and blockDim.x are set to be 5.
Let
T be the set of all threads invoked by CUDA. Then, the cardinality of
T is
For
,
, and
, there is a thread
indexed by
which corresponds to the point
Put
where
is a positive number satisfying
. For each
, denote by
the minimum dissimilarity from point
to
C, i.e.,
If , we set . The CUDA thread corresponding to computes . Then, a CUDA-based reduction algorithm will be invoked to obtain .
The idea of the reduction algorithm based on CUDA is to divide the original data into multiple blocks, and then perform a local reduction operation on each block to obtain the local reduction result, and, finally, a global reduction operation is performed on the local reduction results to obtain the final reduction result (cf. [
19]).
Algorithm 1 with parameters
K,
C,
p, blockDim.x, gridDim.x, gridDim.y, and
calculates
. It is more efficient than the geometric branch-and-bound approach proposed in [
9]. For example, take
,
,
Algorithm 1 An algorithm based on CUDA to compute |
- Require:
a convex body K, a set C of p points in , a positive number , blockDim.x, gridDim.x and gridDim.y - Ensure:
as an estimation of - 1:
Host and device allocate memory, initialize and copy host data to device - 2:
- 3:
- 4:
- 5:
- 6:
- 7:
- 8:
if then: - 9:
Calculate by Lemma 1 - 10:
- 11:
else - 12:
- 13:
end if - 14:
- 15:
Using the reduction algorithm to find the maximum value - 16:
- 17:
Copy the final reduction result to the host, - 18:
- 19:
return
|
Algorithm 1 yields good estimations of
and
much faster, as seen in
Table 1. Both algorithms run on a computer equipped with an AMD Ryzen 9 3900X 12-core processor and the NVIDIA A4000 graphics processor. For Algorithm 1, we take
, and the accuracy is given by Proposition 1. For the geometric branch-and-bound algorithm, we set the relative accuracy
to be
(cf. [
9] for the usage of the relative accuracy). The execution time of the geometric branch-and-bound approach exhibits substantial variability among different
Cs, whereas the algorithm based on CUDA shows relatively consistent execution times across various cases.
3.2. Different Stochastic Global Optimization Algorithms for
We choose to employ stochastic global optimization algorithms for several reasons. In the program proposed by Chuanming Zong (cf. [
8]), after appropriately selecting a positive real number
and constructing a
-net
for
endowed with the Banach–Mazur metric, we only need to verify that
holds for each
, where
is a reasonably accurate estimate of the least upper bound of
. For this purpose, we do not need to determine exact values of covering functionals of convex bodies in
. Stochastic global optimization algorithms demonstrate a low time complexity and high algorithmic efficiency. Moreover, based on the results presented in
Table 2, it is evident that stochastic global optimization algorithms provide satisfactory estimates for covering functionals.
The NLopt (Non-Linear Optimization) library is a rich collection of optimization routines and algorithms, which provides a platform-independent interface for global and local optimization problems (cf. [
20]). Algorithms in the NLopt library are partitioned into four categories: non-derivative-based global algorithms, derivative-based global algorithms, non-derivative-based local algorithms, and derivative-based local algorithms. We use several non-derivative-based stochastic global algorithms here. All global optimization algorithms require bound constraints to be specified as optimization parameters (cf. e.g., ref. [
21]).
The following is the framework of a stochastic optimization algorithm based on NLopt.
Define the objective function and boundary constraints.
Declare an optimizer for NLopt.
Set algorithm and dimension.
Set termination conditions. NLopt provides different termination condition options including: value tolerance, parameter tolerance, function value stop value, iteration number, and time.
Proposition 2. If , , and , then .
Proof. Suppose that
. Then
. Thus,
which shows that
. □
Remark 3. By Proposition 2, when , we only need to search for points in a p-optimal configuration of K.
We utilized different stochastic global optimization algorithms to choose a more efficient one. Optimization algorithms under consideration include Controlled Random Search with local mutation (GN_CRS2_LM) (cf. [
22]), evolutionary strategy (GN_ESCH) (cf. [
23]), and evolutionary constrained optimization (GN_ISRES) (cf. [
24]). See Algorithm 2.
Algorithm 2 A stochastic optimization algorithm for based on NLopt. |
- Require:
K, C, p, blockDim.x, gridDim.x, gridDim.y, an estimation , lower bound LB and upper bound UB of the search domain - Ensure:
as an estimation of . - 1:
- 2:
- 3:
- 4:
procedure(, blockDim.x, gridDim.x, gridDim.y) ▹Algorithm 1 - 5:
- 6:
___ - 7:
___ LB - 8:
___ UB - 9:
___ - 10:
___ - 11:
- 12:
- 13:
repeat - 14:
- 15:
__ - 16:
__ - 17:
_&f_ - 18:
if result then - 19:
_ - 20:
end if - 21:
- 22:
until
- 23:
return
|
Let
p be a positive integer,
, LB = −2, UB = 2, and
Table 3 shows a comparison between these three stochastic algorithms. It can be seen that GN_CRS2_LM is better than the other ones.