1. Introduction
We consider the problem of optimal box positioning, that is, finding a position of a d-dimensional box with given edge lengths that maximizes the number of enclosed points of a given n-element set . In this paper, we prove that this problem is NP-hard when integers are not fixed and treated as parameters of the problem.
The problem of optimal box positioning has wide applications in computational geometry, data mining, and pattern recognition (e.g., see [
1,
2,
3]). In [
4], the authors presented a clustering approach based on the greedy algorithm that finds an approximate solution of the optimal box positioning problem. The algorithm was inspired by the apparatus of maximum interval pattern concepts (see, e.g., [
4,
5]), a technique that allows one to select patterns from fuzzy contexts. This approach was successfully applied to the dataset of tactile images registered by the Medical Tactile Endosurgical Complex [
6,
7,
8], which allows intraoperative tactile examination of tissues. Comparison of the proposed clustering approach with the conventional
k-means clustering resulted in a statistically significant advantage of the proposed method over
k-means in clustering quality. Note that the result proved in the present paper justifies developing algorithms to solve an approximate version of the optimal box positioning over the exact one.
The rest of the paper is organized as follows. In
Section 2, we describe some known results. In
Section 3, we introduce formal definitions and formulate the problem of optimal integer box positioning and the auxiliary problem of the existence of an integer
m-box. In
Section 4, we prove the NP-hardness of the problem of optimal box positioning. In
Section 5, we summarize the results.
3. Formal Definitions
Definition 1. A d-dimensional box with edge lengths is a Cartesian product of the intervals where ().
Furthermore, we consider only boxes with integer edge lengths and vertice coordinates, i.e., . We call such boxes integer boxes.
Definition 2. The problem of optimal integer box positioning is defined as follows: find an integer box with given edge lengths that maximizes the number of enclosed points of a set .
In
Section 4, we obtain NP-hardness of the problem of optimal integer box positioning as a corollary of the theorem about NP-completeness of the problem of the existence of an integer
m-box.
Definition 3. The problem of the existence of an integer m-box is a problem of the existence of an integer box with given edge lengths that contains at least m points from a set .
In general, case parameters of both problems are integers and a set P. The number m is considered as a function of or as a constant.
It is easy to see that both problems belong to the P complexity class if the parameter d is fixed. Indeed, without loss of generality, we can consider only boxes for which each is equal to the i-th coordinate of some point from the set P. So to solve the problem, we can count the number of points in at most boxes. Since each count can be performed in operations, the total number of operations for solving the problem is , which is polynomial in n.
Definition 4. The 3-CNF satisfiability problem is the problem of the existence of an assignment to the Boolean variables , which turns formula in the conjunctive normal form to 1 (here, denotes literals over variables from the set ). For further details, see e.g., [9]. Without loss of generality, assume that variables of every conjunctive clause are distinct. Indeed, otherwise a clause is either identically equal to 1 (if it contains both a variable and its negation) or can be replaced with at most four clauses with the required property such that the conjunction of these clauses is identically equal to the initial clause.
Cook’s theorem [
12] states that the 3-CNF satisfiability problem is NP-complete. This fact will give ground for our proof of NP-hardness of the problem of the existence of an integer
m-box.
4. NP-Hardness of the Problem of Optimal Box Positioning
Theorem 1. The problem of the existence of an integer m-box belongs to the NP complexity class.
Proof. Suppose we have a certificate: a box B which encloses at least m points from the set P. Then the certificate validation can be performed by counting cardinality of , which can be done by iterating over the set P and checking whether the current point lies in the box B. Since P contains n elements and each check can be done with comparisons, counting cardinality of will take operations, which is polynomial in parameters . □
Theorem 2. The problem of the existence of an integer m-box is NP-hard.
Proof. We will prove this theorem by employing a polynomial reduction of the 3-CNF satisfiability problem (which is NP-hard [
12]) in the problem of the existence of an integer
m-box. Consider an arbitrary formula
F in conjunctive normal form with
d variables
and
n disjunctive clauses
, each containing exactly 3 literals:
, where
;
denotes a literal over one of the variables
.
We construct the set by the following procedure. Consider the disjunctive clause with variables , , and the set of its satisfying assignments over the variable set . Since each disjunctive clause contains exactly 3 literals corresponding to distinct variables, it holds that . We map the pair to the point with coordinates by the following rule:
We define the set P as an image of this map over all clauses and their sets of satisfying assignments , so . For further convenience, we also introduce sets , as subsets of P that consist of all points associated with .
To complete the proof of the theorem, we prove the following lemmas.
Lemma 1. In the above notation, for an arbitrary unit cube and for all , the intersection contains zero points or one point.
Proof. Consider an arbitrary and points associated with . Since for any , satisfying assignments and are different, there exists such that the values of variable in and are opposite. Hence, the lth coordinates of and differ by 2 (one of these coordinates equals 0, and the other equals 2). Thus, points and cannot belong to the same unit cube. □
Lemma 2. In the above notation, a formula F is satisfiable if and only if there exists a unit cube such that .
Proof. Let us first prove that if
F is satisfiable, then a cube
C with
exists. Let
be a satisfying assignment for
F. We construct a subset
consisting of the points that correspond to the satisfying assignments
matching the satisfying assignment
S. Since for each
there exists exactly one satisfying assignment
that matches
S, we have
. Let
be an arbitrary point in
and
. If
is not met in the respective clause, the value of
will be equal to 1. Otherwise, the value of
will be equal to
. This means that if
, the value of
will lie in the interval
, and otherwise in the interval
. Thus, the cube
, where
covers the
n-element set
. Note that
contains exactly one point corresponding to each clause, so according to Lemma 1, the cube
C has no common points with
. Thus
.
Now we prove that if a unit cube with exists, then F is satisfiable. Let C be the specified unit cube. By Lemma 1, we conclude that contains exactly one point corresponding to each clause. Since each edge length of C is equal to 1 and the cube vertex coordinates are integers, the list of l-th coordinates of the points from (for fixed ) contains exactly one value from the set , and we denote this value by . From the procedure of construction of the set P, we conclude that is a satisfying assignment for F. □
Lemmas 1 and 2 directly imply the following assertion.
Lemma 3. In the above notation, a formula F is satisfiable if and only if there exists a unit m-cube for and the set P.
To complete the proof of Theorem 2, we consider the problem of the existence of an integer m-box (with m equal to n) in d-dimensional space for a box with all edge lengths equal to 1 (i.e., for the unit cube) and the constructed set P. Lemma 3 states that F is satisfiable if and only if there exists a unit cube that encloses n points. This statement in combination with the fact that set P can be constructed in time polynomial in completes the proof of the theorem. □
Since the class of NP-complete problems is the intersection of the class NP and the class NP-hard, Theorems 1 and 2 immediately lead to the following theorem.
Theorem 3. The problem of the existence of an integer m-box is NP-complete.
Now we are ready to prove the main theorem.
Theorem 4. The problem of optimal integer box positioning is NP-hard.
Proof. This theorem is a trivial corollary of Theorem 3. Consider a set . Then, finding the optimal position of an integer box B with edge lengths immediately leads to an answer to the problem of the existence of an integer m-box (by simply counting the number of points in the found box in operations and comparing it with m), which is proved to be NP-complete. Thus, we made a polynomial reduction of the problem of the existence of an integer m-box to the problem of optimal integer box positioning. □
Note that the above proofs actually lead to stronger results, namely to NP-completeness of the problem of the existence of an integer unit m-cube and the NP-hardness of the problem of optimal integer unit cube positioning.
Corollary 1. The problem of optimal integer box positioning with a set of prohibited points (i.e., box should have an empty intersection with it) is NP-hard.
Proof. This statement immediately follows from the NP-hardness of the problem of optimal integer box positioning since it is a particular case of the considered problem with . □
Corollary 2. The weighted problem of optimal integer box positioning with the range of the weight function in is NP-hard.
Proof. This is also a corollary of the NP-hardness of the problem of optimal integer box positioning since we obtain an unweighted version of the problem by setting the weight function to for all points. □