Proof of Theorem 2. We verify Conditions C1–C3 in [
37] to derive the rate of convergence. Define
as the Euclidean norm of a vector
u,
is the supremum norm of a function
h, and
. Moreover, let
P denote a probability measure. After that, for the convenience of understanding the proof as follows, we define
, where
is the sets of
, and
is the sets of
g.
q and
r are as defined in Condition 4. It is noteworthy that
is completely the same as
except for the notation. Similarly,
is the corresponding sieve space containing
and
. First, as a result of Condition 5, we can verify that Condition C1 directly stands. That is,
for any
. Second, we verify Condition C2 in [
37]. Based on Conditions 1–4, one can easily find that for every
,
in which
, and this implies that for any
,
. Thus, Condition C2 from [
37] holds when the sign
in their paper is equal to one. In the end, one needs to verify Condition C3 of [
37]. Define the class of functions
and let
denote the
-bracketing number related to
norm of
. Then, we have
by following similar arguments as in Lemma A3 of [
13], where
, and
is the dimensionality of
b. By following the fact that the covering number is always smaller than the bracketing number, we have
. Therefore, Condition C3 in [
37] is satisfied under
, and
in their sign. Hence,
of Theorem 1 in [
37] on page 584 may be equal to
. Because the part behind the minus is close to zero with
, one can set a
a little bigger than
so as to get
with a large
n. Let
replace
but keep the same notation with
, then the new constant
.
Notice that from Theorem 1.6.2 in [
38], there are Bernstein polynomials
that make
,
. Similarly, there also exists a function
satisfying
. Then, the sieve approximate error
in [
37] is
. Therefore, applying the Taylor expansion to
surrounding
, then plugging in
, the Kullback–Leilber pseudodistance of
and
follows
The first equality holds due to the first derivative of
at
being equal to zero. As for the penultimate inequality, it holds because all the derivatives and second-order derivatives of the log-likelihood are bounded. Furthermore, since
, and
, we can get the last inequality, so that
. Hence, by Theorem 1 in [
37], the convergence rate of
is
The proof of Theorem 2 is complete. □
Proof of Theorem 3. Let us sketch the proof of Theorem 3 in five steps as follows.
Step 1. We first calculate the derivatives regarding , such that , and so on; now, we omit in the following formula for convenience in Step 1.
To obtain the score functions of
. Let
denote an arbitrary parametric submodel of
, in which
satisfies the Fréchet derivative
. Similarly, we can also define a submodel of
g noted by
. Moreover, note
and
where
and
. The score function along
is
with
,
and
. Analogously, we have the derivatives with respect to
g as
with
, and
.
The second-order derivatives of
have the form
Similarly, we can derive , and as, respectively, the derivatives of , and with respect to b.
and are, respectively, the derivatives of and with respect to , .
, and are, respectively, the derivatives of and with respect to g.
Step 2. Consider the classes of functions
and
. We need to show these three function classes are Donsker for any
. We determine the bracketing number of
in order to demonstrate that it is Donsker. In accordance with [
37], we have
for
. This results in a finite-valued bracketing integral according to Theorem 2.8.4 of [
36]. Hence, the class
is Donsker. Similar justifications support that
and
are also Donsker.
Step 3. Following similar arguments as in Lemma 2 of [
19] and the properties of the score statistic, there exist
and
satisfying
Let
denote the estimators of the sieve log-likelihood and
is the projection of
onto
,
. We get
Following the discussion about the proof for Theorem
of [
34], we can derive that part (I) is equal to
. In addition, (II) is also equal to
based on (
A3). We can acquire (III) as
due to
being Donsker. As for the fourth term (IV), on account of Theorem 2 and employing the first-order linear expansion of
around
, one can get (IV) is
as well. Summating the four terms, we have
. Likewise, we have the property of
. Hence, we have
Step 4. Combining (
A3) and (
A5), we can easily show that
Furthermore, based on some arguments in the proof of Theorem 3.2 in [
13], there exists a neighborhood of
as
, where
. Then, applying the Taylor expansion for
yields
where
. Likewise, it is also easy to get the property of
and
. Note that the derivatives of the score statistics are bounded. After applying Taylor series expansions about
to (
A6), and combining the Equations (
A7), we have
Taking the first equality in (
A8) and subtracting the second and third equalities, we have
Step 5. Define
and
; then, we have
where
. Next, we need to verify
Q is nonsingular. If
Q is a nonsingular matrix, then we can conclude
from
. Moreover, one is enough to show if
, then
. Thus, we have
where
and
is the likelihood function. Under our Condition 3, (
A10) is equal to zero only if
. As a consequence, we have verified
Q is nonsingular.
Substitute
Q into (
A9), we get
Since , we obtain . Thus, , with and being the efficient score function of . Now, we complete the proof of Theorem 3. □