*5.5. Influence of Fixed-Size Subsets of Bits*

The result in Theorem 4.2 of [6], which is generalized and refined in Theorem 2 here, is turned to study the total influence of the *n* variables of an equiprobable random vector *<sup>X</sup><sup>n</sup>* ∈ {−1, 1}*<sup>n</sup>* on a subset A ⊂ {−1, 1}*n*. To this end, let *<sup>X</sup>*(*i*) denote the vector where the bit at the *<sup>i</sup>*-th position of *<sup>X</sup><sup>n</sup>* is flipped, so *<sup>X</sup>*(*i*) - (*X*1, ... , *Xi*−1, −*Xi*, *Xi*+1, ... , *Xn*) for all *i* ∈ [*n*]. Then, the influence of the *i*-th variable is defined as

$$I\_i(\mathcal{A}) \triangleq \Pr\left[\mathbb{1}\{X^{\eta} \in \mathcal{A}\} \neq \mathbb{1}\{\overline{X}^{(i)} \in \mathcal{A}\}\right], \quad i \in [n], \tag{122}$$

and their total influence is defined to be the sum

$$I(\mathcal{A}) \stackrel{\triangle}{=} \sum\_{i=1}^{n} I\_i(\mathcal{A}). \tag{123}$$

As it is shown in Chapters 9 and 10 of [6], influences of subsets of the binary hypercube have far reaching consequences in the study of threshold phenomena, and many other areas. As a corollary of (107), it is obtained in Theorem 4.3 of [6] that, for every subset A ⊂ {−1, 1}*n*,

$$I(\mathcal{A}) \ge 2\Pr(\mathcal{A})\log\_2\frac{1}{\Pr(\mathcal{A})'} \tag{124}$$

where Pr(A) - <sup>P</sup>[*X<sup>n</sup>* ∈ A] = |A| <sup>2</sup>*<sup>n</sup>* by the equiprobable distribution of *<sup>X</sup><sup>n</sup>* over {−1, 1}*n*.

In light of Theorem 2, the same approach which is used in Section 4.4 of [6] for the transition from (107) to (124) can be also used to obtain, as a corollary, a lower bound on the average total influence over all subsets of *d* variables. To this end, let *k*1, ... , *kd* be integers such that 1 ≤ *k*<sup>1</sup> < ... < *kd* ≤ *n*, and the influence of the variables in positions *k*1, ... , *kd* be given by

$$I\_{(k\_1,\ldots,k\_d)}(\mathcal{A}) \triangleq \Pr\left[\mathbb{1}\{X^n \in \mathcal{A}\} \neq \mathbb{1}\{\overline{X}^{(k\_1,\ldots,k\_d)} \in \mathcal{A}\}\right].\tag{125}$$

Then, let the average influence of subsets of *d* variables be defined as

$$I^{(n,d)}(\mathcal{A}) \stackrel{\Delta}{=} \frac{1}{\binom{n}{d}} \sum\_{\substack{(k\_1,\ldots,k\_d):\\1\le k\_1<\ldots$$

Hence, by (123) and (126), *<sup>I</sup>*(*n*,1)(A) = <sup>1</sup> *<sup>n</sup> <sup>I</sup>*(A) for every subset A ⊂ {−1, 1}*n*. Let

$$\mathcal{B}^{(n,d)}(\mathcal{A}) \triangleq \left\{ (\mathbf{x}^n, \mathbf{y}^n) : \mathbf{x}^n \in \mathcal{A}, \quad \mathbf{y}^n \in \{-1, 1\}^n \; \middle\vert \; \mathcal{A}, \quad \mathsf{d}\_{\mathbf{H}}(\mathbf{x}^n, \mathbf{y}^n) = d\right\},\tag{127}$$

be the set of ordered pairs of sequences (*xn*, *<sup>y</sup>n*), where *<sup>x</sup>n*, *<sup>y</sup><sup>n</sup>* ∈ {−1, 1}*<sup>n</sup>* are of Hamming distance *<sup>d</sup>* from each other, with *<sup>x</sup><sup>n</sup>* ∈ A and *<sup>y</sup><sup>n</sup>* ∈ A. By the equiprobable distribution of *<sup>X</sup><sup>n</sup>* on {−1, 1}*n*, we get

$$I^{(n,d)}(\mathcal{A}) = \frac{1}{\binom{n}{d}} \sum\_{\substack{(k\_1, \dots, k\_d): \\ 1 \le k\_1 < \dots < k\_d \le n}} \Pr\left[ \mathbb{1} \{ X^n \in \mathcal{A} \} \ne \mathbb{1} \{ \overline{X}^{(k\_1, \dots, k\_d)} \in \mathcal{A} \} \right] \tag{128a} = \frac{1}{\binom{n}{d}} I^{(n,d)}(\mathcal{A})$$

$$\mathbf{x} = \frac{2}{\binom{n}{d}} \sum\_{\substack{(k\_1, \ldots, k\_d):\\1 \le k\_1 < \ldots < k\_d \le n}} \Pr\left[X^n \in \mathcal{A}, \ \overline{X}^{(k\_1, \ldots, k\_d)} \notin \mathcal{A}\right] \tag{128b}$$

$$=\frac{2}{\binom{n}{d}} \cdot \frac{|\mathcal{B}^{(n,d)}(\mathcal{A})|}{2^n} \tag{128c}$$

$$= \frac{|\mathcal{B}^{(n,d)}(\mathcal{A})|}{2^{n-1}\binom{n}{d}}.\tag{128d}$$

Since every point in A has ( *n <sup>d</sup>*) neighbors of Hamming distance *<sup>d</sup>* in the set {−1, 1}*n*, it follows that

$$
\binom{n}{d} |\mathcal{A}| = 2 \left| \mathbb{E}\_d(G) \right| + \left| \mathcal{B}^{(n,d)}(\mathcal{A}) \right|, \tag{129}
$$

where *G* is introduced in Theorem 2, and E*d*(*G*) is the set of edges connecting pairs of vertices in *G* which are represented by vectors in A of Hamming distance *d*. The multiplication by 2 on the RHS of (129) is because every edge whose two endpoints are in the set A is counted twice. Hence, by (106) and (129),

$$\left| \mathcal{B}^{(n,d)}(\mathcal{A}) \right| = \binom{n}{d} \left| \mathcal{A} \right| - 2 \left| \mathbb{E}\_d(\mathcal{G}) \right| \tag{130a}$$
 
$$\dots \quad \dots \quad \dots \quad \left( \begin{array}{c} n-1 \\ \end{array} \right) \perp \quad \left( \begin{array}{c} \\ \end{array} \right) \dots \quad \left( \begin{array}{c} n \\ \end{array} \right) \dots \quad \left( \begin{array}{c} n \\ \end{array} \right) \dots \quad \left( \begin{array}{c} n \\ \end{array} \right)$$

$$\geq \binom{n}{d} |\mathcal{A}| - \frac{\binom{n-1}{d-1} |\mathcal{A}| \left( \log |\mathcal{A}| - \frac{n}{d} \log \ell\_d \right)}{\log \frac{m\_d}{\ell\_d}} \tag{130b}$$

$$\mathcal{I} = \binom{n}{d} |\mathcal{A}| \left( 1 - \frac{\frac{d}{n} \log |\mathcal{A}| - \log \ell\_d}{\log \frac{m\_d}{\ell\_d}} \right) \tag{130c}$$

$$\mathcal{I} = \binom{n}{d} |\mathcal{A}| \left( \frac{\log m\_d - \frac{d}{n} \log |\mathcal{A}|}{\log \frac{m\_d}{\ell\_d}} \right), \tag{130d}$$

and the lower bound on the RHS of (130d) is positive if and only if |A| < (*md*) *n <sup>d</sup>* (see also (114)). This gives from (128) that the average influence of subsets of *d* variables satisfies

$$I^{(n,d)}(\mathcal{A}) \ge \frac{|\mathcal{A}|}{2^{n-1}} \left( \frac{\log m\_d - \frac{d}{n} \log |\mathcal{A}|}{\log \frac{m\_d}{l\_d}} \right) \tag{131a}$$

$$\mathcal{I} = 2\operatorname{Pr}(\mathcal{A}) \left( \frac{\log m\_d - \frac{d}{\overline{n}} \log \left( 2^n \operatorname{Pr}(\mathcal{A}) \right)}{\log \frac{m\_d}{\ell\_d}} \right) \tag{131b}$$

$$=2\operatorname{Pr}(\mathcal{A})\left(\frac{\frac{d}{n}\log\frac{1}{\Pr(\mathcal{A})}-\log\frac{2^d}{m\_d}}{\log\frac{m\_d}{\ell\_d}}\right).\tag{131c}$$

Note that by setting *d* = 1, and the default values *md* = 2 and *<sup>d</sup>* = 1 on the RHS of (131c) gives the total influence of the *<sup>n</sup>* variables satisfies, for all A ⊆ {−1, 1}*n*,

$$I(\mathcal{A}) = nI^{(n,1)}(\mathcal{A})\tag{132a}$$

$$1 \ge 2\Pr(\mathcal{A}) \text{ log}\_2 \frac{1}{\Pr(\mathcal{A})'} \tag{132b}$$

which is then specialized to the result in (Theorem 4.3 of [6], see (124)). This gives the following result.

**Theorem 3.** *Let <sup>X</sup><sup>n</sup> be an equiprobable random vector over the set* {−1, 1}*n, let <sup>d</sup>* <sup>∈</sup> [*n*] *and* A ⊂ {−1, 1}*n. Then, the average influence of subsets of <sup>d</sup> variables of <sup>X</sup>n, as it is defined in* (126)*, is lower bounded as follows:*

$$I^{(n,d)}(\mathcal{A}) \ge 2\Pr(\mathcal{A}) \left( \frac{\frac{d}{n}\log\frac{1}{\Pr(\mathcal{A})} - \log\frac{2^d}{m\_d}}{\log\frac{m\_d}{\ell\_d}} \right),\tag{133}$$

*where* Pr(A) - <sup>P</sup>[*X<sup>n</sup>* ∈ A] = |A| <sup>2</sup>*<sup>n</sup> , and the integers md and <sup>d</sup> are introduced in Theorem 2. Similarly to the refined upper bound in Theorem 2, the lower bound on the RHS of* (133) *is informative (i.e., positive) if and only if* |A| < (*md*) *n <sup>d</sup> . The lower bound on the RHS of* (133) *can be loosened (by setting the default values md* = 2 *and <sup>d</sup>* = 1*) to*

$$I^{(n,d)}(\mathcal{A}) \ge 2\Pr(\mathcal{A})\left(\frac{d}{n}\log\_2\frac{1}{\Pr(\mathcal{A})} + 1 - d\right). \tag{134}$$

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.
