1. Introduction
Tree data structures are well-known and used in different algorithms. At the same time, when we construct algorithms with random behavior like randomized and quantum algorithms, we should consider error probability. We suggest a general method for updating a tree data structure in the noisy case. We call it Walking tree. For a tree of height h, we consider an operation for processing all nodes from a root to a target node. Suppose the running time of the operation is , where T is running time required to process a node. Then, if navigation by the tree can have an error, our technique allows us to carry it out with running time, where is the error probability for the whole operation. Note that the standard way to handle probabilistic navigation is the success probability boosting (repetition of the noisy action) with complexity.
Our technique is based on results for the noisy Binary search algorithm [
1]. The authors of that paper present an idea based on the random walk algorithm for a balanced binary tree that can be constructed for the binary search algorithm. We generalize the idea for a tree with any structure that allows us to apply the method to a wide class of tree data structures. Different algorithms for noisy search, especially a noisy tree, and graph processing and search were considered in [
2,
3,
4,
5,
6,
7,
8]. We apply our technique to two tree data structures. The first one is the Red–Black tree [
9], which is an implementation of a self-balanced binary search tree [
9]. If the key comparing procedure has a bounded error, then our noisy self-balanced binary search tree allows us to conduct add, remove, and search operations in
running time, where
is the error probability for a whole operation and
N is the number of nodes in the tree. In the case of
, we have
running time and the noisy key comparing procedure does not affect running time (asymptotically). At the same time, if we use the success probability boosting technique, then the running time is
. The second one is the Segment tree [
10,
11]. If the indexes comparing procedure has a bounded error, then our noisy segment tree allows us to conduct update and request operations in
running time, where
is the error probability for a whole operation and
N is the number of leaves. In the case of
, we have
running time. So, we obtain a similar advantage. We use these data structures in the context of quantum computation [
12] which is one of the hot topics in the last decades. There are many problems where we can obtain a quantum speed-up. Some of them can be found here [
13,
14], including graph problems [
15,
16,
17,
18,
19,
20,
21] and string processing problems [
22,
23,
24,
25,
26,
27,
28]. Quantum algorithms have randomized behavior, so it is important to use noisy data structures for this model. We use the quantum query model [
29] as the main computational model for the quantum algorithms. We apply the walking tree method for the following problems.
The first one is the string-sorting problem. We want to sort
n strings of a length
l in lexicographical order. However, quantum algorithms cannot sort arbitrary comparable objects faster than
[
30,
31]. At the same time, some results improve the hidden constant [
32,
33]. Other researchers investigated the space-bounded case [
34]. The situation with sorting strings is a little bit different. We know that the classical Radix sort algorithm has
running time [
9] for a finite-size alphabet. That is faster than sorting algorithms for arbitrary comparable objects. Here, the lower bound for classical (randomized or deterministic) algorithms is
. In the quantum case, faster algorithms with
running time are known [
35,
36]. Here,
does not consider log factors. In this paper, we suggest a simpler implementation based on a noisy red–black tree.
The second one is the Auto-Complete Problem. We have two kinds of queries: adding a string
s to the dictionary
and querying the most frequent completion of a string
t from the dictionary. We call
s a completion of
t if
t is a prefix of
s. Assume that
L is the total sum of all lengths of strings from all queries. We solve the problem using quantum string comparing algorithm [
35,
36,
37,
38,
39] and noisy Red–Black tree. The running time of the quantum algorithm is
. The lower bound for quantum running time is
. At the same time, the best classical algorithm based on trie (prefix tree) [
40,
41,
42,
43] has
running time. That is also the classical (deterministic or randomized) lower bound
. So, we obtain quantum speed-up if most of the strings have
length.
3. Main Technique: A Walking Tree
In this section, we present a rooted tree that we call a walking tree. It is a utility data structure for noisy computation for the main data structure. Here we use it for the following data structures: (i) Binary Search Tree. We assume that elements comparing procedures can have errors. (ii) Segment Tree. We assume that the indexes (borders of segments) comparing procedure can have errors.
Note that the walking tree is a general technique, and it can be used for other tree data structures. Let us present the general idea of the tree. The technique is motivated by [
1]. Assume that
G is a rooted tree. We are interested in moving from the root to a specific (
target) node operation. Assume that we have the following procedures:
GetTheRoot(G) returns the root node of the tree G.
SelectChild(v) returns the child of the node that should be reached from the node v. We assume that there is only one child that should be reached from a node.
IsItTheTarget(v) returns if the node is the last node that should be visited in the operation; and returns otherwise.
ProcessANode(V) processes the node in the required way.
IsANodeCorrect(V) returns if the node should be visited during the the operation; and returns if the node is visited because of an error.
Assume that the operation has the following form (Algorithm 1).
Algorithm 1 An operation on the tree G |
GetTheRoot(G) ProcessANode(v) while IsItTheTarget(v) = False do SelectChild(v) ProcessANode(v) end while |
Let us consider the operation such that “navigation” procedures (that are SelectChild, IsANodeCorrect, and IsItTheTarget) can return an answer with an error p, where , where . We assume that error events are independent. We need to isolate p from . We cannot be sure that the algorithm works if . That is why we use statement. Our goal is to conduct the operation with an error . Note that in the general case, can be non-constant and depend on the number of tree nodes. Let be the height of the tree. The standard technique is boosting success probability. On each step, we repeat SelectChild procedure times and choose the most frequent answer. In that case, the error probability of the operation is at most , and the running time of the operation is , where T is complexity of ProcessANode procedure. Our goal is to have running time.
Let us construct a rooted tree W by G such that the set of nodes of W has a one-to-one correspondence to the nodes of G and the same with sets of edges. We call W a walking tree. Let and be bijections between these two sets. For simplicity, we define procedures for W similar to the procedures for G. Suppose , then GetTheRoot(W) = (GetTheRoot(G)); SelectChild(u) = (SelectChild IsItTheTarget(u) = IsItTheTarget IsANodeCorrect(u) = Note that the navigation procedures are noisy (have an error). We reduce the error probability to by constant number of repetitions (using the boosting success probability technique). Additionally, we associate a counter with a node that is a non-negative integer number. Initially, values of counters are 0 for all nodes, i.e., for each .
We invoke a random walk by the walking tree W. The walk starts from the root node GetTheRoot(W). Let us discuss processing . Firstly, we check the counter’s value . If , then we carry out steps from 1.1 to 1.3.
Step 1.1. We check the correctness of current node using IsANodeCorrect(u) procedure. If the result is , then we go to Step 1.2. If the result is , then we are here because of an error, and we go up by changing Parent(u). If the node u is the root, then we stay in u.
Step 1.2. We check whether the current node is target using IsItTheTarget(u) procedure. If it is , then we increase the counter . If it is , then we go to Step 1.3.
Step 1.3. We go to the children SelectChild(u).
If , then we carry out Step 2.1. We can say that the counter is a measure of confidence that u is the target node. If , then we should continue walking. If , then we think that u is the target node. If a bigger value of means we are more confident that it is the target node.
Step 2.1. If IsItTheTarget(u) = True, then we increase the counter . Otherwise, we decrease the counter . So, we become more or less confident in the fact that the node u is the target.
The walking process stops in
s steps, where
. The stopping node
u is the target one. After that, we carry out the operation with the original tree
G. We store path in
, such that
,
=
Parent, and
is the root node of
G. Then, we process them using
ProcessANode for
i from 1 to
k. Let a procedure
OneStep(
u) be one step of the walking process on the walking tree
W. It accepts the current node
u and returns the new node. The code representation of the procedure is in Algorithm 2.
Algorithm 2 One step of the walking process, OneStep(u). The input is and the result is the node for the next step of the walking |
if then if IsANodeCorrect(u) = False then ▹ Step 1.1 if GetTheRoot(W) then Parent(u) end if else if IsItTheTarget(u) = True then ▹ Step 1.2 else SelectChild(u) ▹ Step 1.3 end if end if else if IsItTheTarget(u) = True then ▹ Step 2.1 else end if end if |
The whole algorithm is presented in Algorithm 3.
Algorithm 3 The walking algorithm for steps |
GetTheRoot(W) fordo Onestep(u) end for , , while
GetTheRoot(G) do Parent(v) ▹ Here ∘ means the concatenation of two sequences. The line adds the node to the beginning of the path sequence ▹ The length of the path sequence end while for do ProcessANode end for |
Let us discuss the algorithm and its properties. On each node, we have two options, we go in the direction of the target node or the opposite direction.
Assume that of the current node u. If we are in a wrong branch, then the only correct direction is the parent node. If we are in the correct branch, then the only correct direction is the correct child node. All other directions are wrong. Assume that . If we are in the target node, then the only correct direction is increasing the counter, and the wrong direction is decreasing the counter. Otherwise, the only correct direction is decreasing the counter.
Choosing the direction is based on the results of at most three invocations of navigation procedures (SelectChild, IsANodeCorrect, and IsItTheTarget). Remember that we reach error probability using a constant number of repetitions. Due to the independence of error events, the total error probability of choosing a direction is at most . So, the probability of moving in correct direction is at least and for a wrong direction it is at most . Let us show that if , then the error probability for an operation on G is . Note that can be non-constant. In Corollarys 1 and 2, we have .
Theorem 1. Given error probability , Algorithm 3 completes the same action as Algorithm 1 with a running time of .
Proof. Let us consider the walking tree. We emulate the counter by replacing it with a nodes chain of length . Formally, for a node we add nodes such that Parent, Parent for . The only child of is for , and does not have children.
In that case, the increasing of can be emulated by moving from to . The decreasing can be emulated by moving from to . We can assume that is the node u itself.
Let be the target node, i.e., IsItTheTarget. Let us consider the distance L between the target node and the current node in the modified tree. The distance L is a random variable. Each step of the walk increases or decreases the distance L by 1. So, we can present , where is the root node of W, , and are independent random variables that represent i-th step and show increasing or decreasing the distance. Let if we move in the correct direction, and if we move in the wrong direction. Note that the probability of moving to the correct direction () is at least and the probability of moving to the wrong direction () is at most . From now on, without loss of generality, we assume that and .
If , then we are in the node in the modified tree and in the node in the original walking tree W. Note that , where by the definition of the height of a tree. Therefore, means . So, the probability of success of the operation is the probability of the event, i.e., .
Let
for
. We treat
as independent binary random variables. Let
. For such
X and for any
, the following form of Chernoff bound [
49] holds
Since
, we have
and the inequality (
1) becomes
Substituting
Y for
X we, obtain
From now on without loss of generality, we assume that for some . Let and .
In the following steps, we relax the inequality by obtaining less tight bounds for the target probability.
Firstly, we obtain a new lower bound
and hence
Secondly, we obtain a new upper bound
Combining the two obtained bounds we have
and hence
Considering the probability of the opposite event we finally obtain
□
In the next section, we show several applications of the technique.
5. Quantum Sort Algorithm for Strings
As one of the interesting applications, we suggest applications from quantum computing [
12,
29]. Let us discuss the string-sorting problem as one of the applications.
Problem: There are
n strings
of size
l for some positive integers
n and
l. The problem is to find a permutation
such that
, or
and
for each
. The Quantum sorting algorithm for strings was presented in [
35,
36]. The running time of the algorithm is
. We can present the algorithm with the same complexity but in a simpler way. Assume that we have a noisy self-balanced binary search tree with strings as keys and a quantum string comparing procedure.
There is a quantum algorithm for comparing two strings quadratically faster than any classical counterparts. The algorithm is based on modifications [
39,
53] of Grover’s search algorithm [
44,
45]. The result is the following.
Lemma 1. ([
36]).
There is a quantum algorithm that compares two strings s and t of lengths and in the lexicographical order with running time and error probability . We assume that the comparing procedure compares strings in lexicographical order, and if they are equal, then it compares indexes. In fact, we store indexes of the strings in nodes. We assume that if a node stores a key index
i, then any node from the left subtree has a key index
j such that
or (
and
); and any node from the right subtree has a key index
such that
or (
and
). Initially, the tree is empty. Let
be a function that adds a string
to the tree. Let
be a function that returns the index of the minimal string
s from the tree according to the comparing procedure. After returning the index, the function removes it from the tree. The algorithm is two for-loops. The first one is adding all strings one by one using
for
. The second one is obtaining the index of minimal strings
for
. The code representation of the algorithm is in
Appendix A. The second For-loop can be replaced by in-order traversal (dfs) of the tree for constructing the list. This approach has a smaller hidden constant in big-O. The full idea is presented in
Appendix A for completeness.
Theorem 4. The quantum running time for sorting n string of length l is and .
The upper bound is complexity of the presented algorithm and algorithm from [
36]. The proof of the lower bound is presented below.
For simplicity, we assume that strings are binary, i.e., , for . Let us formally define the sorting function.
For positive integers , let be a function that obtains n binary strings of length l as input and returns a permutation of n integers that is a result of sorting input strings. Here, is a set of all permutations of integers from 0 to . For , we have where is a permutation such that or ( and ), for .
Note that in the case of
, the function
can be used to compute the majority function. We use
to sort strings and the
-th string is a value of the majority function. Therefore, we expect that complexity of
should be
[
54]. In the case of
, the function
is similar to the OR function, so we expect that it requires
queries [
54].
Note that in the case of , the function can be used to compute the majority function. We use to sort strings and the -th string is a value of the majority function. Therefore, we expect that complexity of should be . In the case of , the function is similar to the OR function, so we expect that it requires queries.
Formally, we prove the following.
Lemma 2. For positive integers , let be a majority function, and be a function that returns the minimal index of one in the input. Proof. Consider the input . Suppose that .
Then the proof of (
2) follows from the fact that
Take an input
. Let
be a pair of words
and
y. Now we see that
and this completes the proof. □
Note that [
36] proves the lower bound of the form
. Combining their result with the Lemma 2, we get the following corollary.
Corollary 3. The complexity of is .
6. Auto-Complete Problem
In this section, we present the Auto-Complete Problem and a quantum algorithm that is another application of the Noisy Binary Search Tree.
Problem: Assume that we use some constant-size alphabet, for example, binary, ASCII, or Unicode. We work with a sequence of strings , where is the length of the sequence; and are increasing indexes of strings. Here, the index is the index of the query for adding this string to . Initially, the sequence is empty. Then, we have n queries of two types. The first type is adding a string s to the sequence . Let be a number of occurrence (or “frequency”) of a string u among . The second type is querying the most frequent completion from of a string t. Let us define it formally. If t is a prefix of , then we say that is a completion of t. Let is a completion of be the set of completions for t, and let be the maximal “frequency” of strings from . The problem is to find the index .
We use a Self-Balanced Search tree for our solution. A node
v of the tree stores 4-tuple
, where
i is an index of a string
that is “stored” in the node, and
. The tree is a Search tree by this strings
similar to storing strings in
Section 5. For comparing strings, we use a quantum procedure from Lemma 1. Therefore, our tree is noisy. The index
j is the index of the most “frequent” string in the sub-tree whose root is
v, and
. Formally, for any node
from this sub-tree if
is associated with
, then
or (
and
).
Initially, the tree is empty. Let us discuss processing the first type of query. We want to add a string s to . We search a node v with associated such that . If we can find it, then we increase . It means j parameter of the node v or its ancestors can be updated. There are at most ancestors because the height of the tree is . So, for each ancestor of v that associated with , if or ( and ), then we update and .
If we cannot find the string s in , then we add a new node to the tree with associated 4-tuple , where r is the index of the query. Note that if we re-balance nodes in the Red–Black tree, then we easily can recompute j and elements of nodes.
Assume that we have a
Search(
s) procedure that returns a node
v with associated
where
. If there is no such a node, then the procedure returns
. A procedure
AddAString(
r) adding a node
to the tree. A procedure
GetTheRoot returns the root of the search tree. The processing of the first type of query is presented in Algorithm 4.
Algorithm 4 Processing a query of the first type with an argument s and a query number r |
Search(s) if
then is associated with v if or and then end if while GetTheRoot do Parent(v) is associated with v if or and then end if end while else AddAString(r) end if |
Let us discuss processing the second type of query. All strings that can be a completion for t belong to the set and }. Here, we can obtain from the string t by replacing the last symbol with the next symbol from the alphabet. Formally, if , then , where the symbol succeeds in the alphabet. We can say that . The query processing consists of three steps.
Step 1. We search for a node
such that
t should be in the left sub-tree of
v and
should be in the right sub-tree of
v. Formally,
, and
. For implementing this idea the procedure
IsItTheTarget(
v) checks the following condition:
The procedure
SelectChild(
v) returns the right child if
, i.e.,
and returns the left child otherwise.
If we come to the null node, then there are no completions of t. If we find , then we carry out the next two steps. In those two steps, we compute the index of the required string and .
Step 2. Let us look to the left sub-tree with t. Let us find a node that contains an index of the minimal string . For implementing this idea, the procedure IsItTheTarget(v) checks whether the current string is t. Formally, it checks Compare. Additionally, the procedure saves the node if , i.e., Compare. The procedure SelectChild(v) works as for searching t. It returns the right child if , i.e., Compare, and returns the left child otherwise. In the end, the target node is stored in .
Then, we go up from this node. Let us consider a node
v. If it is the right child of
Parent(
v), then it is bigger than the string from
Parent(
v) and the left child’s sub-tree, so we do nothing. If it is the left child of
Parent(
v), then it is less than the string from
Parent(
v) and all strings from the right child’s sub-tree, so we update
and
by values from the parent node and the right child. Formally, if
for the
Parent(
v) node and
for the right child node, then we complete the following actions. If
or (
and
), then
and
. If
or (
and
), then
and
. This idea is presented in Algorithm 5.
Algorithm 5 Obtaining the answer of Step 2 by |
while
Parent
do Parent(v) if v = LeftChild then is associated with if or ( and ) then end if is associated with RightChild if RightChild then if ( or ( and )) then end if end if end if end while |
Step 3. Let us look to the right sub-tree with . Let us find a node that contains an index of the maximal string . Then, we go up from this node and carry out the symmetric actions as in Step 2.
Each of these three steps requires running time because each of them observes nodes of a single branch. The complexity of the quantum algorithm is presented in Theorem 5. Before presenting the theorem, let us present a lemma that helps us to prove quantum and classical lower bounds.
Lemma 3. Auto-complete problem at least as hard as unstructured search 1 among bits.
Proof. Assume that the alphabet is binary. For any other case, we just consider two letters from the alphabet. Assume that all strings from queries of the first type have length k.
Let . Let us consider the following case. We have m queries of the first type of the next form: , Here, for all and except one bit. We have two cases.
The first case is the following. There is only one pair such that , , and . In the second case, there is no such a pair.
Next m queries of the first type of the next form .
If is odd, then we add a query of the first type of the next form .
The last query of the second type of the form .
If we have the first case, then , and the answer is . If we have the second case, then , and has a smaller index. Therefore, the answer is .
Hence, the answer for the input is at least as hard as distinguishing between these two cases. At the same time, it requires searching 1 among bits. □
Based on the presented lemma and the discussed algorithm, we can present quantum lower and upper bounds for the problem in the next theorem.
Theorem 5. The quantum algorithm with a noisy Self-Balanced Search tree for the Auto-Complete Problem has running time and error probability , where L is the sum of lengths of all queries. Additionally, the lower bound for quantum running time is .
Proof. Let us start with the upper bound. Let us consider the processing of the first type of query. Here we add a string s. Firstly, we find the target node with running time and error probability according to Corollary 1. Then, we consider at most ancestors for updating with running time and no error. So, processing the first type of query works with running time and error probability.
Let us consider the processing of the second type of query. Here we search for a completion for a string t. Searching nodes , and works with running time and error probability according to Corollary 1. Then, we consider at most ancestors for updating with running time and no error. So, processing the second type of query works with running time and error probability.
Let
be the lengths of strings from queries. So, the total complexity is
The last two equalities are due to Cauchy–Bunyakovsky–Schwarz inequality and .
The error probability of processing a query is at most . Therefore, the error probability of processing n queries is at most due to all error events are independent.
Let us discuss the lower bound. It is known [
55] that the quantum running time for the unstructured search among
variables is
. So, due to Lemma 3 we obtain the required lower bound. □
Let us consider the classical (deterministic or randomized) case. If we use the same Self-balanced search tree, then the running time is
. At the same time, if we use the Trie (prefix tree) data structure [
40], then the complexity is
.
We store all strings of in the trie. For each terminal node, we store the “frequency” of the corresponding string. For each node v (even non-terminal) that corresponds to a string u as a path from the root to v, additionally to the regular information we store the index of the required completion for t and its frequency. When we process the first type of query, we update the frequency in the terminal node and update additional information in all ancestor nodes because they store all possible prefixes of the string. For processing the query of the second type, we just find the required node and take the answer in the additional information of the node.
We can show that it is also the lower bound for the classical case.
Lemma 4. The classical running time for the Auto-Complete Problem is , where L is the sum of the lengths of all queries.
Proof. Let us start with the upper bound. Similarly to the proof of Theorem 5, we can show that the running time of processing a query of both types is or depends on the type of a query. Let be the lengths of strings from queries. So, the total complexity is .
Let us discuss the lower bound. It is known [
55] that the classical running time for the unstructured search among
variables is
. So, due to Lemma 3 we obtain the required lower bound. □
If strings have a length that is at least , then we obtain a quantum speed-up.