Logical Entropy: Introduction to Classical and Quantum Logical Information Theory

Ellerman, David

doi:10.3390/e20090679

Open AccessArticle

Logical Entropy: Introduction to Classical and Quantum Logical Information Theory

by

David Ellerman

Department of Philosophy, University of California Riverside, Riverside, CA 92521, USA

Entropy 2018, 20(9), 679; https://doi.org/10.3390/e20090679

Submission received: 10 August 2018 / Revised: 31 August 2018 / Accepted: 4 September 2018 / Published: 6 September 2018

(This article belongs to the Special Issue Towards Ultimate Quantum Theory (UQT))

Download

Browse Figures

Versions Notes

Abstract

:

Logical information theory is the quantitative version of the logic of partitions just as logical probability theory is the quantitative version of the dual Boolean logic of subsets. The resulting notion of information is about distinctions, differences and distinguishability and is formalized using the distinctions (“dits”) of a partition (a pair of points distinguished by the partition). All the definitions of simple, joint, conditional and mutual entropy of Shannon information theory are derived by a uniform transformation from the corresponding definitions at the logical level. The purpose of this paper is to give the direct generalization to quantum logical information theory that similarly focuses on the pairs of eigenstates distinguished by an observable, i.e., qudits of an observable. The fundamental theorem for quantum logical entropy and measurement establishes a direct quantitative connection between the increase in quantum logical entropy due to a projective measurement and the eigenstates (cohered together in the pure superposition state being measured) that are distinguished by the measurement (decohered in the post-measurement mixed state). Both the classical and quantum versions of logical entropy have simple interpretations as “two-draw” probabilities for distinctions. The conclusion is that quantum logical entropy is the simple and natural notion of information for quantum information theory focusing on the distinguishing of quantum states.

Keywords:

logical entropy; partition logic; qudits of an observable

1. Introduction

The formula for “classical” logical entropy goes back to the early Twentieth Century [1]. It is the derivation of the formula from basic logic that is new and accounts for the name. The ordinary Boolean logic of subsets has a dual logic of partitions [2] since partitions (=equivalence relations = quotient sets) are category-theoretically dual to subsets. Just as the quantitative version of subset logic is the notion of logical finite probability, so the quantitative version of partition logic is logical information theory using the notion of logical entropy [3]. This paper generalizes that “classical” (i.e., non-quantum) logical information theory to the quantum version. The classical logical information theory is briefly developed before turning to the quantum version. Applications of logical entropy have already been developed in several special mathematical settings; see [4] and the references cited therein.

2. Duality of Subsets and Partitions

The foundations for classical and quantum logical information theory are built on the logic of partitions ([2,5]), which is dual (in the category-theoretic sense) to the usual Boolean logic of subsets. This duality can be most simply illustrated using a set function

f : X \to Y

. The image

f (X)

is a subset of the codomain Y and the inverse-image or coimage

f^{- 1} (Y)

is a partition on the domain X, where a partition

π = \{B_{1}, \dots, B_{I}\}

on a set U is a set of subsets or blocks

B_{i}

that are mutually disjoint and jointly exhaustive (

\cup_{i} B_{i} = U

) In category theory, the duality between subobject-type constructions (e.g., limits) and quotient-object-type constructions (e.g., colimits) is often indicated by adding the prefix “co-” to the latter. Hence, the usual Boolean logic of “images” has the dual logic of “coimages”. However, the duality runs deeper than between subsets and partitions. The dual to the notion of an “element” (an “it”) of a subset is the notion of a “distinction” (a “dit”) of a partition, where

(u, u^{'}) \in U \times U

is a distinction or dit of

π

if the two elements are in different blocks. Let

dit (π) \subseteq U \times U

be the set of distinctions or ditset of

π

. Similarly an indistinction or indit of

π

is a pair

(u, u^{'}) \in U \times U

in the same block of

π

. Let

indit (π) \subseteq U \times U

be the set of indistinctions or inditset of

π

. Then,

indit (π)

is the equivalence relation associated with

π

, and

dit (π) = U \times U - indit (π)

is the complementary binary relation that might be called apartition relation or an a partness relation. The notions of a distinction and indistinction of a partition are illustrated in Figure 1.

The relationships between Boolean subset logic and partition logic are summarized in Figure 2, which illustrates the dual relationship between the elements (“its”) of a subset and the distinctions (“dits”) of a partition.

3. From the Logic of Partitions to Logical Information Theory

In Gian-Carlo Rota’s Fubini Lectures [6] (and in his lectures at MIT), he remarked in view of duality between partitions and subsets that, quantitatively, the “lattice of partitions plays for information the role that the Boolean algebra of subsets plays for size or probability” ([7] p. 30) or symbolically:

\frac{Logical Probability Theory}{Boolean Logic of Subsets} = \frac{Logical Information Theory}{Logic of Partitions} .

Andrei Kolmogorov has suggested that information theory should start with sets, not probabilities.

Information theory must precede probability theory, and not be based on it. By the very essence of this discipline, the foundations of information theory have a finite combinatorial character.
([8] p. 39)

The notion of information-as-distinctions does start with the set of distinctions, the information set, of a partition

π = \{B_{1}, \dots, B_{I}\}

on a finite set U where that set of distinctions (dits) is:

dit (π) = \{(u, u^{'}) : \exists B_{i}, B_{i^{'}} \in π, B_{i} \neq B_{i^{'}}, u \in B_{i}, u^{'} \in B_{i^{'}}\} .

The normalized size of a subset is the logical probability of the event, and the normalized size of the ditset of a partition is, in the sense of measure theory, the measure of the amount of information in a partition. Thus, we define the logical entropy of a partition

π = \{B_{1,} \dots, B_{I}\}

, denoted

h (π)

, as the size of the ditset

dit (π) \subseteq U \times U

normalized by the size of

U \times U

:

h (π) = \frac{|dit (π)|}{|U \times U|} = \frac{|U \times U| - \sum_{i = 1}^{I} |B_{i} \times B_{i}|}{|U \times U|} = 1 - \sum_{i = 1}^{I} {(\frac{|B_{i}|}{|U|})}^{2} = 1 - \sum_{i = 1}^{I} \Pr {(B_{i})}^{2} .

In two independent draws from U, the probability of getting a distinction of

π

is the probability of not getting an indistinction.

Given any probability measure

p : U \to [0, 1]

on

U = \{u_{1}, \dots, u_{n}\}

, which defines

p_{i} = p (u_{i})

for

i = 1, \dots, n

, the product measure

p \times p : U \times U \to [0, 1]

has for any binary relation

R \subseteq U \times U

the value of:

p \times p (R) = \sum_{(u_{i}, u_{j}) \in R} p (u_{i}) p (u_{j}) = \sum_{(u_{i}, u_{j}) \in R} p_{i} p_{j} .

The logical entropy of

π

in general is the product-probability measure of its ditset

dit (π) \subseteq U \times U

, where

\Pr (B) = \sum_{u \in B} p (u)

:

h (π) = p \times p (dit (π)) = \sum_{(u_{i}, u_{j}) \in dit (π)} p_{i} p_{j} = 1 - \sum_{B \in π} \Pr {(B)}^{2} .

The standard interpretation of

h (π)

is the two-draw probability of getting a distinction of the partition

π

, just as

\Pr (S)

is the one-draw probability of getting an element of the subset-event S.

4. Compound Logical Entropies

The compound notions of logical entropy are also developed in two stages, first as sets and then, given a probability distribution, as two-draw probabilities. After observing the similarity between the formulas holding for the compound Shannon entropies and the Venn diagram formulas that hold for any measure (in the sense of measure theory), the information theorist, Lorne L. Campbell, remarked in 1965 that the similarity:

suggests the possibility that $H (α)$ and $H (β)$ are measures of sets, that $H (α, β)$ is the measure of their union, that $I (α, β)$ is the measure of their intersection, and that $H (α | β)$ is the measure of their difference. The possibility that $I (α, β)$ is the entropy of the “intersection” of two partitions is particularly interesting. This “intersection,” if it existed, would presumably contain the information common to the partitions $α$ and $β$ .
([9] p. 113)

Yet, there is no such interpretation of the Shannon entropies as measures of sets, but the logical entropies precisely fulfill Campbell’s suggestion (with the “intersection” of two partitions being the intersection of their ditsets). Moreover, there is a uniform requantifying transformation (see the next section) that obtains all the Shannon definitions from the logical definitions and explains how the Shannon entropies can satisfy the Venn diagram formulas (e.g., as a mnemonic) while not being defined by a measure on sets.

Given partitions

π = \{B_{1}, \dots, B_{I}\}

and

σ = \{C_{1}, \dots, C_{J}\}

on U, the joint information set is the union of the ditsets, which is also the ditset for their join:

dit (π) \cup dit (σ) = dit (π \lor σ) \subseteq U \times U

. Given probabilities

p = \{p_{1}, \dots, p_{n}\}

on U, the joint logical entropy is the product probability measure on the union of ditsets:

h (π, σ) = h (π \lor σ) = p \times p (dit (π) \cup dit (σ)) = 1 - \sum_{i, j} \Pr {(B_{i} \cap C_{j})}^{2} .

The information set for the conditional logical entropy

h (π | σ)

is the difference of ditsets, and thus, that logical entropy is:

h (π | σ) = p \times p (dit (π) - dit (σ)) = h (π, σ) - h (σ) .

The information set for the logical mutual information

m (π, σ)

is the intersection of ditsets, so that logical entropy is:

m (π, σ) = p \times p (dit (π) \cap dit (σ)) = h (π, σ) - h (π | σ) - h (σ | π) = h (π) + h (σ) - h (π, σ) .

Since all the logical entropies are the values of a measure

p \times p : U \times U \to [0, 1]

on subsets of

U \times U

, they automatically satisfy the usual Venn diagram relationships as in Figure 3.

At the level of information sets (w/o probabilities), we have the information algebra

I (π, σ)

, which is the Boolean subalgebra of

℘ (U \times U)

generated by ditsets and their complements.

5. Deriving the Shannon Entropies from the Logical Entropies

Instead of being defined as the values of a measure, the usual notions of simple and compound entropy ‘burst forth fully formed from the forehead’ of Claude Shannon [10] already satisfying the standard Venn diagram relationships (one author surmised that “Shannon carefully contrived for this ‘accident’ to occur” ([11] p. 153)). Since the Shannon entropies are not the values of a measure, many authors have pointed out that these Venn diagram relations for the Shannon entropies can only be taken as “analogies” or “mnemonics” ([9,12]). Logical information theory explains this situation since all the Shannon definitions of simple, joint, conditional and mutual information can be obtained by a uniform requantifying transformation from the corresponding logical definitions, and the transformation preserves the Venn diagram relationships.

This transformation is possible since the logical and Shannon notions of entropy can be seen as two different ways to quantify distinctions; and thus, both theories are based on the foundational idea of information-as-distinctions.

Consider the canonical case of n equiprobable elements,

p_{i} = \frac{1}{n}

. The logical entropy of

1

=

\{B_{1}, \dots, B_{n}\}

where

B_{i} = \{u_{i}\}

with

p = \{\frac{1}{n}, \dots, \frac{1}{n}\}

is:

\frac{|U \times U - Δ|}{|U \times U|} = \frac{n^{2} - n}{n^{2}} = 1 - \frac{1}{n} = 1 - \Pr (B_{i}) .

The normalized number of distinctions or ‘dit-count’ of the discrete partition

1

is

1 - \frac{1}{n} = 1 - \Pr (B_{i})

. The general case of logical entropy for any

π = \{B_{1}, \dots, B_{I}\}

is the average of the dit-counts

1 - \Pr (B_{i})

for the canonical cases:

h (π) = \sum_{i} \Pr (B_{i}) (1 - \Pr (B_{i})) .

In the canonical case of

2^{n}

equiprobable elements, the minimum number of binary partitions (“yes-or-no questions” or “bits”) whose join is the discrete partition

1 = \{B_{1}, \dots, B_{2^{n}}\}

with

\Pr (B_{i}) = \frac{1}{2^{n}}

, i.e., that it takes to uniquely encode each distinct element, is n, so the Shannon–Hartley entropy [13] is the canonical bit-count:

n = \log_{2} (2^{n}) = \log_{2} (\frac{1}{1 / 2^{n}}) = \log_{2} (\frac{1}{\Pr (B_{i})}) .

The general case of Shannon entropy is the average of these canonical bit-counts

\log_{2} (\frac{1}{\Pr (B_{i})})

:

H (π) = \sum_{i} \Pr (B_{i}) \log_{2} (\frac{1}{\Pr (B_{i})}) .

The dit-bit transform essentially replaces the canonical dit-counts by the canonical bit-counts. First, express any logical entropy concept (simple, joint, conditional or mutual) as an average of canonical dit-counts

1 - \Pr (B_{i})

, and then, substitute the canonical bit-count

\log (\frac{1}{\Pr (B_{i})})

to obtain the corresponding formula as defined by Shannon. Figure 4 gives examples of the dit-bit transform.

For instance,

h (π | σ) = h (π, σ) - h (σ) = \sum_{i, j} \Pr (B_{i} \cap C_{j}) [1 - \Pr (B_{i} \cap C_{j})] - \sum_{j} \Pr (C_{j}) [1 - \Pr (C_{j})]

is the expression for

h (π | σ)

as an average over

1 - \Pr (B_{i} \cap C_{j})

and

1 - \Pr (C_{j})

, so applying the dit-bit transform gives:

\sum_{i, j} \Pr (B_{i} \cap C_{j}) \log (1 / \Pr (B_{i} \cap C_{j})) - \sum_{j} \Pr (C_{j}) \log (1 / \Pr (C_{j})) = H (π, σ) - H (σ) = H (π | σ) .

The dit-bit transform is linear in the sense of preserving plus and minus, so the Venn diagram formulas, e.g.,

h (π, σ) = h (σ) + h (π | σ)

, which are automatically satisfied by logical entropy since it is a measure, carry over to Shannon entropy, e.g.,

H (π, σ) = H (σ) + H (π | σ)

as in Figure 5, in spite of it not being a measure (in the sense of measure theory):

6. Logical Entropy via Density Matrices

The transition to quantum logical entropy is facilitated by reformulating the classical logical theory in terms of density matrices. Let

U = \{u_{1}, \dots, u_{n}\}

be the sample space with the point probabilities

p = (p_{1}, \dots, p_{n})

. An event

S \subseteq U

has the probability

\Pr (S) = \sum_{u_{j} \in S} p_{j}

.

For any event S with

\Pr (S) > 0

, let:

|S〉 = \frac{1}{\sqrt{\Pr (S)}} {(χ_{S} (u_{1}) \sqrt{p_{1}}, \dots, χ_{S} (u_{n}) \sqrt{p_{n}})}^{t}

(the superscript t indicates transpose) which is a normalized column vector in

R^{n}

where

χ_{S} : U \to \{0, 1\}

is the characteristic function for S, and let

〈S|

be the corresponding row vector. Since

|S〉

is normalized,

〈S | S〉 = 1

. Then, the density matrix representing the event S is the

n \times n

symmetric real matrix:

ρ (S) = |S〉 〈S| so that {(ρ (S))}_{j, k} = \{\begin{matrix} \frac{1}{\Pr (S)} \sqrt{p_{j} p_{k}} for u_{j}, u_{k} \in S \\ 0 otherwise \end{matrix} .

Then,

ρ {(S)}^{2} = |S〉 〈S | S〉 〈S| = ρ (S)

, so borrowing language from quantum mechanics,

ρ (S)

is said to be a pure state density matrix.

Given any partition

π = \{B_{1}, \dots, B_{I}\}

on U, its density matrix is the average of the block density matrices:

ρ (π) = \sum_{i} \Pr (B_{i}) ρ (B_{i}) .

Then,

ρ (π)

represents the mixed state, experiment or lottery where the event

B_{i}

occurs with probability

\Pr (B_{i})

. A little calculation connects the logical entropy

h (π)

of a partition with the density matrix treatment:

h (π) = 1 - \sum_{i = 1}^{I} \Pr {(B_{i})}^{2} = 1 - tr [ρ {(π)}^{2}] = h (ρ (π))

where

ρ {(π)}^{2}

is substituted for

\Pr {(B_{i})}^{2}

and the trace is substituted for the summation.

For the throw of a fair die,

U = \{u_{1}, u_{3}, u_{5}, u_{2}, u_{4}, u_{6}\}

(note the odd faces ordered before the even faces in the matrix rows and columns) where

u_{j}

represents the number j coming up, the density matrix

ρ (0)

is the “pure state”

6 \times 6

matrix with each entry being

\frac{1}{6}

.

ρ (0) = [\begin{matrix} 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 \\ 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 \\ 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 \\ 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 \\ 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 \\ 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 & 1 / 6 \end{matrix}] \begin{matrix} u_{1} \\ u_{3} \\ u_{5} \\ u_{2} \\ u_{4} \\ u_{6} \end{matrix} .

The nonzero off-diagonal entries represent indistinctions or indits of the partition

0

or, in quantum terms, “coherences” where all 6 “eigenstates” cohere together in a pure “superposition” state. All pure states have a logical entropy of zero, i.e.,

h (0) = 0

(i.e., no dits) since

tr [ρ] = 1

for any density matrix, so if

ρ {(0)}^{2} = ρ (0)

, then

tr [ρ {(0)}^{2}] = tr [ρ (0)] = 1

and

h (0) = 1 - tr [ρ {(0)}^{2}] = 0

.

The logical operation of classifying undistinguished entities (like the six faces of the die before a throw to determine a face up) by a numerical attribute makes distinctions between the entities with different numerical values of the attribute. Classification (also called sorting, fibering or partitioning ([14] Section 6.1)) is the classical operation corresponding to the quantum operation of “measurement” of a superposition state by an observable to obtain a mixed state.

Now classify or “measure” the die-faces by the parity-of-the-face-up (odd or even) partition (observable)

π = \{B_{o d d}, B_{e v e n}\} = \{\{u_{1}, u_{3}, u_{5}\}, \{u_{2}, u_{4}, u_{6}\}\}

. Mathematically, this is done by the Lüders mixture operation ([15] p. 279), i.e., pre- and post-multiplying the density matrix

ρ (0)

by

P_{o d d}

and by

P_{e v e n}

, the projection matrices to the odd or even components, and summing the results:

P_{o d d} ρ (0) P_{o d d} + P_{e v e n} ρ (0) P_{e v e n} = [\begin{matrix} 1 / 6 & 1 / 6 & 1 / 6 & 0 & 0 & 0 \\ 1 / 6 & 1 / 6 & 1 / 6 & 0 & 0 & 0 \\ 1 / 6 & 1 / 6 & 1 / 6 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}] + [\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 / 6 & 1 / 6 & 1 / 6 \\ 0 & 0 & 0 & 1 / 6 & 1 / 6 & 1 / 6 \\ 0 & 0 & 0 & 1 / 6 & 1 / 6 & 1 / 6 \end{matrix}] = [\begin{matrix} 1 / 6 & 1 / 6 & 1 / 6 & 0 & 0 & 0 \\ 1 / 6 & 1 / 6 & 1 / 6 & 0 & 0 & 0 \\ 1 / 6 & 1 / 6 & 1 / 6 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 / 6 & 1 / 6 & 1 / 6 \\ 0 & 0 & 0 & 1 / 6 & 1 / 6 & 1 / 6 \\ 0 & 0 & 0 & 1 / 6 & 1 / 6 & 1 / 6 \end{matrix}] = \frac{1}{2} ρ (B_{o d d}) + \frac{1}{2} ρ (B_{e v e n}) = ρ (π) .

Theorem 1 (Fundamental(classical)).

The increase in logical entropy,

h (ρ (π)) - h (ρ (0))

, due to a Lüders mixture operation is the sum of amplitudes squared of the non-zero off-diagonal entries of the beginning density matrix that are zeroed in the final density matrix.

Proof.

Since for any density matrix

ρ

,

tr [ρ^{2}] = \sum_{i, j} {|ρ_{i j}|}^{2}

([16] p. 77), we have:

h (ρ (π)) - h (ρ (0)) = (1 - tr [ρ {(π)}^{2}]) - (1 - tr [ρ {(0)}^{2}]) = tr [ρ {(0)}^{2}] - tr [ρ {(π)}^{2}] = \sum_{i, j} ({|ρ_{i j} (0)|}^{2} - {|ρ_{i j} (π)|}^{2})

, if

(u_{i}, u_{i^{'}}) \in dit (π)

, then and only then are the off-diagonal terms corresponding to

u_{i}

and

u_{i^{'}}

zeroed by the Lüders operation. ☐

The classical fundamental theorem connects the concept of information-as-distinctions to the process of “measurement” or classification, which uses some attribute (like parity in the example) or “observable” to make distinctions.

In the comparison with the matrix

ρ (0)

of all entries

\frac{1}{6}

, the entries that got zeroed in the Lüders operation

ρ (0) ⟶ ρ (π)

correspond to the distinctions created in the transition

0

=

\{\{u_{1}, \dots, u_{6}\}\} ⟶ π = \{\{u_{1}, u_{3}, u_{5}\}, \{u_{2}, u_{4}, u_{6}\}\}

, i.e., the odd-numbered faces were distinguished from the even-numbered faces by the parity attribute. The increase in logical entropy = sum of the squares of the off-diagonal elements that were zeroed =

h (π) - h (0) = 2 \times 9 \times {(\frac{1}{6})}^{2} = \frac{18}{36} = \frac{1}{2}

. The usual calculations of the two logical entropies are:

h (π) = 2 \times {(\frac{1}{2})}^{2} = \frac{1}{2}

and

h (0) = 1 - 1 = 0

.

Since, in quantum mechanics, a projective measurement’s effect on a density matrix is the Lüders mixture operation, that means that the effects of the measurement are the above-described “making distinctions” by decohering or zeroing certain coherence terms in the density matrix, and the sum of the absolute squares of the coherences that were decohered is the increase in the logical entropy.

7. Quantum Logical Information Theory: Commuting Observables

The idea of information-as-distinctions carries over to quantum mechanics.

[Information] is the notion of distinguishability abstracted away from what we are distinguishing, or from the carrier of information. …And we ought to develop a theory of information which generalizes the theory of distinguishability to include these quantum properties…
([17] p. 155)

Let

F : V \to V

be a self-adjoint operator (observable) on a n-dimensional Hilbert space V with the real eigenvalues

ϕ_{1}, \dots, ϕ_{I}

, and let

U = \{u_{1}, \dots, u_{n}\}

be an orthonormal (ON) basis of eigenvectors of F. The quantum version of a dit, a qudit, is a pair of states definitely distinguishable by some observable. Any nondegenerate self-adjoint operator such as

\sum_{k = 1}^{n} k P_{[u_{k}]}

, where

P_{[u_{k}]}

is the projection to the one-dimensional subspace generated by

u_{k}

, will distinguish all the vectors in the orthonormal basis U, which is analogous classically to a pair

(u, u^{'})

of distinct elements of U that are distinguishable by some partition (i.e.,

1

). In general, a qudit is relativized to an observable, just as classically a distinction is a distinction of a partition. Then, there is a set partition

π = {\{B_{i}\}}_{i = 1, \dots, I}

on the ON basis U so that

B_{i}

is a basis for the eigenspace of the eigenvalue

ϕ_{i}

and

|B_{i}|

is the “multiplicity” (dimension of the eigenspace) of the eigenvalue

ϕ_{i}

for

i = 1, \dots, I

. Note that the real-valued function

f : U \to R

takes each eigenvector in

u_{j} \in B_{i} \subseteq U

to its eigenvalue

ϕ_{i}

so that

f^{- 1} (ϕ_{i}) = B_{i}

contains all the information in the self-adjoint operator

F : V \to V

since F can be reconstructed by defining it on the basis U as

F u_{j} = f (u_{j}) u_{j}

.

The generalization of “classical” logical entropy to quantum logical entropy is straightforward using the usual ways that set-concepts generalize to vector-space concepts: subsets → subspaces, set partitions → direct-sum decompositions of subspaces (hence the “classical” logic of partitions on a set will generalize to the quantum logic of direct-sum decompositions [18] that is the dual to the usual quantum logic of subspaces), Cartesian products of sets → tensor products of vector spaces and ordered pairs

(u_{k}, u_{k^{'}}) \in U \times U \to

basis elements

u_{k} \otimes u_{k^{'}} \in V \otimes V

. The eigenvalue function

f : U \to R

determines a partition

{\{f^{- 1} (ϕ_{i})\}}_{i \in I}

on U, and the blocks in that partition generate the eigenspaces of F, which form a direct-sum decomposition of V.

Classically, a dit of the partition

{\{f^{- 1} (ϕ_{i})\}}_{i \in I}

on U is a pair

(u_{k}, u_{k^{'}})

of points in distinct blocks of the partition, i.e.,

f (u_{k}) \neq f (u_{k^{'}})

. Hence, a qudit of F is a pair

(u_{k}, u_{k^{'}})

(interpreted as

u_{k} \otimes u_{k^{'}}

in the context of

V \otimes V

) of vectors in the eigenbasis definitely distinguishable by F, i.e.,

f (u_{k}) \neq f (u_{k^{'}})

, distinct F-eigenvalues. Let

G : V \to V

be another self-adjoint operator on V, which commutes with F so that we may then assume that U is an orthonormal basis of simultaneous eigenvectors of F and G. Let

{\{γ_{j}\}}_{j \in J}

be the set of eigenvalues of G, and let

g : U \to R

be the eigenvalue function so a pair

(u_{k}, u_{k^{'}})

is a qudit ofG if

g (u_{k}) \neq g (u_{k^{'}})

, i.e., if the two eigenvectors have distinct eigenvalues of G.

As in classical logical information theory, information is represented by certain subsets (or, in the quantum case, subspaces) prior to the introduction of any probabilities. Since the transition from classical to quantum logical information theory is straightforward, it will be presented in table form in Figure 6 (which does not involve any probabilities), where the qudits

(u_{k}, u_{k^{'}})

are interpreted as

u_{k} \otimes u_{k^{'}}

.

The information subspace associated with F is the subspace

[q u d i t (F)] \subseteq V \otimes V

generated by the qudits

u_{k} \otimes u_{k^{'}}

of F. If

F = λ I

is a scalar multiple of the identity I, then it has no qudits, so its information space

[q u d i t (λ I)]

is the zero subspace. It is an easy implication of the common dits theorem of classical logical information theory (([19] (Proposition 1)) or ([5] (Theorem 1.4))) that any two nonzero information spaces

[q u d i t (F)]

and

[q u d i t (G)]

have a nonzero intersection, i.e., have a nonzero mutual information space. That is, there are always two eigenvectors

u_{k}

and

u_{k^{'}}

that have different eigenvalues both by F and by G.

In a measurement, the observables do not provide the point probabilities; they come from the pure (normalized) state

ψ

being measured. Let

|ψ〉 = \sum_{j = 1}^{n} 〈u_{j} | ψ〉 |u_{j}〉 = \sum_{j = 1}^{n} α_{j} |u_{j}〉

be the resolution of

|ψ〉

in terms of the orthonormal basis

U = \{u_{1}, \dots, u_{n}\}

of simultaneous eigenvectors for F and G. Then,

p_{j} = α_{j} α_{j}^{*}

(

α_{j}^{*}

is the complex conjugate of

α_{j}

) for

j = 1, \dots, n

are the point probabilities on U, and the pure state density matrix

ρ (ψ) = |ψ〉 〈ψ|

(where

〈ψ| = {|ψ〉}^{†}

is the conjugate-transpose) has the entries:

ρ_{j k} (ψ) = α_{j} α_{k}^{*}

, so the diagonal entries

ρ_{j j} (ψ) = α_{j} α_{j}^{*} = p_{j}

are the point probabilities. Figure 7 gives the remaining parallel development with the probabilities provided by the pure state

ψ

where we write

ρ {(ψ)}^{†} ρ (ψ)

as

ρ {(ψ)}^{2}

.

The formula

h (ρ) = 1 - tr [ρ^{2}]

is hardly new. Indeed,

tr [ρ^{2}]

is usually called the purity of the density matrix since a state

ρ

is pure if and only if

tr [ρ^{2}] = 1

, so

h (ρ) = 0

, and otherwise,

tr [ρ^{2}] < 1

, so

h (ρ) > 0

; and the state is said to be mixed. Hence, the complement

1 - tr [ρ^{2}]

has been called the “mixedness” ([20] p. 5) or “impurity” of the state

ρ

. The seminal paper of Manfredi and Feix [21] approaches the same formula

1 - tr [ρ^{2}]

(which they denote as

S_{2}

) from the viewpoint of Wigner functions, and they present strong arguments for this notion of quantum entropy (thanks to a referee for this important reference to the Manfredi and Feix paper). This notion of quantum entropy is also called by the misnomer “linear entropy” even though it is quadratic in

ρ

, so we will not continue that usage. See [22] or [23] for references to that literature. The logical entropy is also the quadratic special case of the Tsallis–Havrda–Charvat entropy ([24,25]) and the logical distance special case [19] of C. R. Rao’s quadratic entropy [26].

What is new here is not the formula, but the whole back story of partition logic outlined above, which gives the logical notion of entropy arising out of partition logic as the normalized counting measure on ditsets of partitions; just as logical probability arises out of Boolean subset logic as the normalized counting measure on subsets. The basic idea of information is differences, distinguishability and distinctions ([3,19]), so the logical notion of entropy is the measure of the distinctions or dits of a partition, and the corresponding quantum version is the measure of the qudits of an observable.

8. Two Theorems about Quantum Logical Entropy

Classically, a pair of elements

(u_{j}, u_{k})

either “cohere” together in the same block of a partition on U, i.e., are an indistinction of the partition, or they do not, i.e., they are a distinction of the partition. In the quantum case, the nonzero off-diagonal entries

α_{j} α_{k}^{*}

in the pure state density matrix

ρ (ψ) = |ψ〉 〈ψ|

are called quantum “coherences” ([27] p. 303; [15] p. 177) because they give the amplitude of the eigenstates

|u_{j}〉

and

|u_{k}〉

“cohering” together in the coherent superposition state vector

|ψ〉 = \sum_{j = 1}^{n} 〈u_{j} | ψ〉 |u_{j}〉 = \sum_{j} α_{j} |u_{j}〉

. The coherences are classically modeled by the nonzero off-diagonal entries

\sqrt{p_{j} p_{k}}

for the indistinctions

(u_{j}, u_{k}) \in B_{i} \times B_{i}

, i.e., coherences ≈ indistinctions.

For an observable F, let

ϕ : U \to R

be for F-eigenvalue function assigning the eigenvalue

ϕ (u_{i}) = ϕ_{i}

for each

u_{i}

in the ON basis

U = \{u_{1}, \dots, u_{n}\}

of F-eigenvectors. The range of

ϕ

is the set of F-eigenvalues

\{ϕ_{1}, \dots, ϕ_{I}\}

. Let

P_{ϕ_{i}} : V \to V

be the projection matrix in the U-basis to the eigenspace of

ϕ_{i}

. The projective F-measurement of the state

ψ

transforms the pure state density matrix

ρ (ψ)

(represented in the ON basis U of F-eigenvectors) to yield the Lüders mixture density matrix

ρ^{'} (ψ) = \sum_{i = 1}^{I} P_{ϕ_{i}} ρ (ψ) P_{ϕ_{i}}

([15] p. 279). The off-diagonal elements of

ρ (ψ)

that are zeroed in

ρ^{'} (ψ)

are the coherences (quantum indistinctions or quindits) that are turned into “decoherences” (quantum distinctions or qudits of the observable being measured).

For any observable F and a pure state

ψ

, a quantum logical entropy was defined as

h (F : ψ) = tr [P_{[q u d i t (F)]} ρ (ψ) \otimes ρ (ψ)]

. That definition was the quantum generalization of the “classical” logical entropy defined as

h (π) = p \times p (dit (π))

. When a projective F-measurement is performed on

ψ

, the pure state density matrix

ρ (ψ)

is transformed into the mixed state density matrix by the quantum Lüders mixture operation, which then defines the quantum logical entropy

h (ρ^{'} (ψ)) = 1 - tr [ρ^{'} {(ψ)}^{2}]

. The first test of how the quantum logical entropy notions fit together is showing that these two entropies are the same:

h (F : ψ) = h (ρ^{'} (ψ))

. The proof shows that they are both equal to classical logical entropy of the partition

π (F : ψ)

defined on the ON basis

U = \{u_{1}, \dots, u_{n}\}

of F-eigenvectors by the F-eigenvalues with the point probabilities

p_{j} = α_{j}^{*} α_{j}

. That is, the inverse images

B_{i} = ϕ^{- 1} (ϕ_{i})

of the eigenvalue function

ϕ : U \to R

define the eigenvalue partition

π (F : ψ) = \{B_{1}, \dots, B_{I}\}

on the ON basis

U = \{u_{1}, \dots, u_{n}\}

with the point probabilities

p_{j} = α_{j}^{*} α_{j}

provided by the state

|ψ〉

for

j = 1, \dots, n

. The classical logical entropy of that partition is:

h (π (F : ψ)) = 1 - \sum_{i = 1}^{I} p {(B_{i})}^{2}

where

p (B_{i}) = \sum_{u_{j} \in B_{i}} p_{j}

.

We first show that

h (F : ψ) = tr [P_{[q u d i t (F)]} ρ (ψ) \otimes ρ (ψ)] = h (π (F : ψ))

. Now,

q u d i t (F) = \{u_{j} \otimes u_{k} : ϕ (u_{j}) \neq ϕ (u_{k})\}

, and

[q u d i t (F)]

is the subspace of

V \otimes V

generated by it. The

n \times n

pure state density matrix

ρ (ψ)

has the entries

ρ_{j k} (ψ) = α_{j} α_{k}^{*}

, and

ρ (ψ) \otimes ρ (ψ)

is an

n^{2} \times n^{2}

matrix. The projection matrix

P_{[q u d i t (F)]}

is an

n^{2} \times n^{2}

diagonal matrix with the diagonal entries, indexed by

j, k = 1, \dots, n

:

{[P_{[q u d i t (F)]}]}_{j j k k} = 1

if

ϕ (u_{j}) \neq ϕ (u_{k})

and zero otherwise. Thus, in the product

P_{[q u d i t (F)]} ρ (ψ) \otimes ρ (ψ)

, the nonzero diagonal elements are the

p_{j} p_{k}

where

ϕ (u_{j}) \neq ϕ (u_{k})

, and so, the trace is

\sum_{j . k = 1}^{n} \{p_{j} p_{k} : ϕ (u_{j}) \neq ϕ (u_{k})\}

, which by definition, is

h (F : ψ)

. Since

\sum_{j = 1}^{n} p_{j} = \sum_{i = 1}^{I} p (B_{i}) = 1

,

{(\sum_{i = 1}^{I} p (B_{i}))}^{2} = 1 = \sum_{i = 1}^{I} p {(B_{i})}^{2} + \sum_{i \neq i^{'}} p (B_{i}) p (B_{i^{'}})

. By grouping the

p_{j} p_{k}

in the trace according to the blocks of

π (F : ψ)

, we have:

h (F : ψ) = tr [P_{[q u d i t (F)]} ρ (ψ) \otimes ρ (ψ)] = \sum_{j . k = 1}^{n} \{p_{j} p_{k} : ϕ (u_{j}) \neq ϕ (u_{k})\} = \sum_{i \neq i^{'}} \sum \{p_{j} p_{k} : u_{j} \in B_{i}, u_{k} \in B_{i^{'}}\} = \sum_{i \neq i^{'}} p (B_{i}) p (B_{i^{'}}) = 1 - \sum_{i = 1}^{I} p {(B_{i})}^{2} = h (π (F : ψ))

.

To show that

h (ρ^{'} (ψ)) = 1 - tr [ρ^{'} {(ψ)}^{2}] = h (π (F : ψ))

for

ρ^{'} (ψ) = \sum_{i = 1}^{I} P_{ϕ_{i}} ρ (ψ) P_{ϕ_{i}}

, we need to compute

tr [ρ^{'} {(ψ)}^{2}]

. An off-diagonal element in

ρ_{j k} (ψ) = α_{j} α_{k}^{*}

of

ρ (ψ)

survives (i.e., is not zeroed and has the same value) the Lüders operation if and only if

ϕ (u_{j}) = ϕ (u_{k})

. Hence, the j-th diagonal element of

ρ^{'} {(ψ)}^{2}

is:

\sum_{k = 1}^{n} \{α_{j}^{*} α_{k} α_{j} α_{k}^{*} : ϕ (u_{j}) = ϕ (u_{k})\} = \sum_{k = 1}^{n} \{p_{j} p_{k} : ϕ (u_{j}) = ϕ (u_{k})\} = p_{j} p (B_{i})

where

u_{j} \in B_{i}

. Then, grouping the j-th diagonal elements for

u_{j} \in B_{i}

gives

\sum_{u_{j} \in B_{i}} p_{j} p (B_{i}) = p {(B_{i})}^{2}

. Hence, the whole trace is:

tr [ρ^{'} {(ψ)}^{2}] = \sum_{i = 1}^{I} p {(B_{i})}^{2}

, and thus:

h (ρ^{'} (ψ)) = 1 - tr [ρ^{'} {(ψ)}^{2}] = 1 - \sum_{i = 1}^{I} p {(B_{i})}^{2} = h (F : ψ) .

This completes the proof of the following theorem.

Theorem 2.

h (F : ψ) = h (π (F : ψ)) = h (ρ^{'} (ψ))

.

Measurement creates distinctions, i.e., turns coherences into “decoherences”, which, classically, is the operation of distinguishing elements by classifying them according to some attribute like classifying the faces of a die by their parity. The fundamental theorem about quantum logical entropy and projective measurement, in the density matrix version, shows how the quantum logical entropy created (starting with

h (ρ (ψ)) = 0

for the pure state

ψ

) by the measurement can be computed directly from the coherences of

ρ (ψ)

that are decohered in

ρ^{'} (ψ)

.

Theorem 3 (Fundamental (quantum)).

The increase in quantum logical entropy,

h (F : ψ) = h (ρ^{'} (ψ))

, due to the F-measurement of the pure state ψ, is the sum of the absolute squares of the nonzero off-diagonal terms (coherences) in

ρ (ψ)

(represented in an ON basis of F-eigenvectors) that are zeroed (‘decohered’) in the post-measurement Lüders mixture density matrix

ρ^{'} (ψ) = \sum_{i = 1}^{I} P_{ϕ_{i}} ρ (ψ) P_{ϕ_{i}}

.

Proof.

h (ρ^{'} (ψ)) - h (ρ (ψ)) = (1 - tr [ρ^{'} {(ψ)}^{2}]) - (1 - tr [ρ {(ψ)}^{2}]) = \sum_{j k} ({|ρ_{j k} (ψ)|}^{2} - {|ρ_{j k}^{'} (ψ)|}^{2})

. If

u_{j}

and

u_{k}

are a qudit of F, then and only then are the corresponding off-diagonal terms zeroed by the Lüders mixture operation

\sum_{i = 1}^{I} P_{ϕ_{i}} ρ (ψ) P_{ϕ_{i}}

to obtain

ρ^{'} (ψ)

from

ρ (ψ)

. ☐

Density matrices have long been a standard part of the machinery of quantum mechanics. The fundamental theorem for logical entropy and measurement shows there is a simple, direct and quantitative connection between density matrices and logical entropy. The theorem directly connects the changes in the density matrix due to a measurement (sum of absolute squares of zeroed off-diagonal terms) with the increase in logical entropy due to the F-measurement

h (F : ψ) = tr [P_{[q u d i t (F)]} ρ (ψ) \otimes ρ (ψ)] = h (ρ^{'} (ψ))

(where

h (ρ (ψ)) = 0

for the pure state

ψ

).

This direct quantitative connection between state discrimination and quantum logical entropy reinforces the judgment of Boaz Tamir and Eliahu Cohen ([28,29]) that quantum logical entropy is a natural and informative entropy concept for quantum mechanics.

We find this framework of partitions and distinction most suitable (at least conceptually) for describing the problems of quantum state discrimination, quantum cryptography and in general, for discussing quantum channel capacity. In these problems, we are basically interested in a distance measure between such sets of states, and this is exactly the kind of knowledge provided by logical entropy [Reference to [19]].
([28] p. 1)

Moreover, the quantum logical entropy has a simple “two-draw probability” interpretation, i.e.,

h (F : ψ) = h (ρ^{'} (ψ))

is the probability that two independent F-measurements of

ψ

will yield distinct F-eigenvalues, i.e., will yield a qudit of F. In contrast, the von Neumann entropy has no such simple interpretation, and there seems to be no such intuitive connection between pre- and post-measurement density matrices and von Neumann entropy, although von Neumann entropy also increases in a projective measurement ([30] Theorem 11.9, p. 515).

The development of the quantum logical concepts for two non-commuting operators (see Appendix B) is the straightforward vector space version of the classical logical entropy treatment of partitions on two set X and Y (see Appendix A).

9. Quantum Logical Entropies of Density Operators

The extension of the classical logical entropy

h (p) = 1 - \sum_{i = 1}^{n} p_{i}^{2}

of a probability distribution

p = (p_{1}, \dots, p_{n})

to the quantum case is

h (ρ) = 1 - tr [ρ^{2}]

where a density matrix

ρ

replaces the probability distribution p and the trace replaces the summation.

An arbitrary density operator

ρ

, representing a pure or mixed state on V, is also a self-adjoint operator on V, so quantum logical entropies can be defined where density operators play the double role of providing the measurement basis (as self-adjoint operators), as well as the state being measured.

Let

ρ

and

τ

be two non-commuting density operators on V. Let

X = {\{u_{i}\}}_{i = 1, \dots, n}

be an orthonormal (ON) basis of

ρ

eigenvectors, and let

{\{λ_{i}\}}_{i = 1, \dots, n}

be the corresponding eigenvalues, which must be non-negative and sum to one, so they can be interpreted as probabilities. Let

Y = {\{v_{j}\}}_{j = 1, \dots, n}

be an ON basis of eigenvectors for

τ

, and let

{\{μ_{j}\}}_{j = 1, \dots, n}

be the corresponding eigenvalues, which are also non-negative and sum to one.

Each density operator plays a double role. For instance,

ρ

acts as the observable to supply the measurement basis of

{\{u_{i}\}}_{i}

and the eigenvalues

{\{λ_{i}\}}_{i}

, as well as being the state to be measured supplying the probabilities

{\{λ_{i}\}}_{i}

for the measurement outcomes. In this section, we define quantum logical entropies using the discrete partition

1_{X}

on the set of “index” states

X = {\{u_{i}\}}_{i}

and similarly for the discrete partition

1_{Y}

on

Y = {\{v_{j}\}}_{j}

, the ON basis of eigenvectors for

τ

.

The qudit sets of

(V \otimes V) \otimes (V \otimes V)

are then defined according to the identity and difference on the index sets and independent of the eigenvalue-probabilities, e.g.,

q u d i t (1_{X}) = \{(u_{i} \otimes v_{j}) \otimes (u_{i^{'}} \otimes v_{j^{'}}) : i \neq i^{'}\}

. The, n the qudit subspaces are the subspaces of

{(V \otimes V)}^{2}

generated by the qudit sets of generators:

$[q u d i t (1_{X})] = [(u_{i} \otimes v_{j}) \otimes (u_{i^{'}} \otimes v_{j^{'}}) : i \neq i^{'}]$ ;
$[q u d i t (1_{Y})] = [(u_{i} \otimes v_{j}) \otimes (u_{i^{'}} \otimes v_{j^{'}}) : j \neq j^{'}]$ ;
$[q u d i t (1_{X}, 1_{Y})] = [q u d i t (1_{X}) \cup q u d i t (1_{Y})] = [(u_{i} \otimes v_{j}) \otimes (u_{i^{'}} \otimes v_{j^{'}}) : i \neq i^{'} or j \neq j^{'}]$ ;
$[q u d i t (1_{X} | 1_{Y})] = [q u d i t (1_{X}) - q u d i t (1_{Y})] = [(u_{i} \otimes v_{j}) \otimes (u_{i^{'}} \otimes v_{j^{'}}) : i \neq i^{'} and j = j^{'}]$ ;
$[q u d i t (1_{Y} | 1_{X})] = [q u d i t (1_{Y}) - q u d i t (1_{X})] = [(u_{i} \otimes v_{j}) \otimes (u_{i^{'}} \otimes v_{j^{'}}) : i = i^{'} and j \neq j^{'}]$ ; and
$[q u d i t (1_{Y} & 1_{X})] = [q u d i t (1_{Y}) \cap q u d i t (1_{X})] = [(u_{i} \otimes v_{j}) \otimes (u_{i^{'}} \otimes v_{j^{'}}) : i \neq i^{'} and j \neq j^{'}]$ .

Then, as qudit sets:

q u d i t (1_{X}, 1_{Y}) = q u d i t (1_{X} | 1_{Y}) ⊎ q u d i t (1_{Y} | 1_{X}) ⊎ q u d i t (1_{Y} & 1_{X})

, and the corresponding qudit subspaces stand in the same relation where the disjoint union is replaced by the disjoint sum.

The density operator

ρ

is represented by the diagonal density matrix

ρ_{X}

in its own ON basis X with

{(ρ_{X})}_{i i} = λ_{i}

and similarly for the diagonal density matrix

τ_{Y}

with

{(τ_{Y})}_{j j} = μ_{j}

. The density operators

ρ, τ

on V define a density operator

ρ \otimes τ

on

V \otimes V

with the ON basis of eigenvectors

{\{u_{i} \otimes v_{j}\}}_{i, j}

and the eigenvalue-probabilities of

{\{λ_{i} μ_{j}\}}_{i, j}

. The operator

ρ \otimes τ

is represented in its ON basis by the diagonal density matrix

ρ_{X} \otimes τ_{Y}

with diagonal entries

λ_{i} μ_{j}

where

1 = (λ_{1} + \dots + λ_{n}) (μ_{1} + \dots + μ_{n}) = \sum_{i, j = 1}^{n} λ_{i} μ_{j}

. The probability measure

p (u_{i} \otimes v_{j}) = λ_{i} μ_{j}

on

V \otimes V

defines the product measure

p \times p

on

{(V \otimes V)}^{2}

where it can be applied to the qudit subspaces to define the quantum logical entropies as usual.

In the first instance, we have:

h (1_{X} : ρ \otimes τ) = p \times p ([q u d i t (1_{X})]) = \sum \{λ_{i} μ_{j} λ_{i^{'}} μ_{j^{'}} : i \neq i^{'}\} = \sum_{i \neq i^{'}} λ_{i} λ_{i^{'}} \sum_{j, j^{'}} μ_{j} μ_{j^{'}} = \sum_{i \neq i^{'}} λ_{i} λ_{i^{'}} = 1 - \sum_{i} λ_{i}^{2} = 1 - tr [ρ^{2}] = h (ρ)

and similarly

h (1_{Y} : ρ \otimes τ) = h (τ)

. Since all the data are supplied by the two density operators, we can use simplified notation to define the corresponding joint, conditional and mutual entropies:

$h (ρ, τ) = h (1_{X}, 1_{Y} : ρ \otimes τ) = p \times p ([q u d i t (1_{X}) \cup q u d i t (1_{Y})])$ ;
$h (ρ | τ) = h (1_{X} | 1_{Y} : ρ \otimes τ) = p \times p ([q u d i t (1_{X}) - q u d i t (1_{Y})])$ ;
$h (τ | ρ) = h (1_{Y} | 1_{X} : ρ \otimes τ) = p \times p ([q u d i t (1_{Y}) - q u d i t (1_{X})])$ ; and
$m (ρ, τ) = h (1_{Y} & 1_{X} : ρ \otimes τ) = p \times p ([q u d i t (1_{Y}) \cap q u d i t (1_{X})])$ .

Then, the usual Venn diagram relationships hold for the probability measure

p \times p

on

{(V \otimes V)}^{2}

, e.g.,

h (ρ, τ) = h (ρ | τ) + h (τ | ρ) + m (ρ, τ),

and probability interpretations are readily available. There are two probability distributions

λ = {\{λ_{i}\}}_{i}

and

μ = {\{μ_{j}\}}_{j}

on the sample space

\{1, \dots, n\}

. Two pairs

(i, j)

and

(i^{'}, j^{'})

are drawn with replacement; the first entry in each pair is drawn according to

λ

and the second entry according to

μ

. Then,

h (ρ, τ)

is the probability that

i \neq i^{'}

or

j \neq j^{'}

(or both);

h (ρ | τ)

is the probability that

i \neq i^{'}

and

j = j^{'}

, and so forth. Note that this interpretation assumes no special significance to a

λ_{i}

and

μ_{i}

having the same index since we are drawing a pair of pairs.

In the classical case of two probability distributions

p = (p_{1}, \dots, p_{n})

and

q = (q_{1}, \dots, q_{n})

on the same index set, the logical cross-entropy is defined as:

h (p | | q) = 1 - \sum_{i} p_{i} q_{i}

and interpreted as the probability of getting different indices in drawing a single pair, one from p and the other from q. However, this cross-entropy assumes some special significance to

p_{i}

and

q_{i}

having the same index. However, in our current quantum setting, there is no correlation between the two sets of “index” states

{\{u_{i}\}}_{i = 1, \dots, n}

and

{\{v_{j}\}}_{j = 1, \dots, n}

. However, when the two density operators commute,

τ ρ = ρ τ

, then we can take

{\{u_{i}\}}_{i = 1, \dots, n}

as an ON basis of simultaneous eigenvectors for the two operators with respective eigenvalues

λ_{i}

and

μ_{i}

for

u_{i}

with

i = 1, \dots, n

. In that special case, we can meaningfully define the quantum logical cross-entropy as

h (ρ | | τ) = 1 - \sum_{i = 1}^{n} λ_{i} μ_{i}

, but the general case awaits further analysis below.

10. The Logical Hamming Distance between Two Partitions

The development of logical quantum information theory in terms of some given commuting or non-commuting observables gives an analysis of the distinguishability of quantum states using those observables. Without any given observables, there is still a natural logical analysis of the distance between quantum states that generalizes the “classical” logical distance

h (π | σ) + h (σ | π)

between partitions on a set. In the classical case, we have the logical entropy

h (π)

of a partition where the partition plays the role of the direct-sum decomposition of eigenspaces of an observable in the quantum case. However, we also have just the logical entropy

h (p)

of a probability distribution

p = (p_{1}, \dots, p_{n})

and the related compound notions of logical entropy given another probability distribution

q = (q_{1}, \dots, q_{n})

indexed by the same set.

First, we review that classical treatment to motivate the quantum version of the logical Hamming distance between states. A binary relation

R \subseteq U \times U

on

U = \{u_{1}, \dots, u_{n}\}

can be represented by an

n \times n

incidence matrix

I (R)

where:

I {(R)}_{i j} = \{\begin{matrix} 1 if (u_{i}, u_{j}) \in R \\ 0 if (u_{i}, u_{j}) \notin R . \end{matrix}

Taking R as the equivalence relation

indit (π)

associated with a partition

π = \{B_{1}, \dots, B_{I}\}

, the density matrix

ρ (π)

of the partition

π

(with equiprobable points) is just the incidence matrix

I (indit (π))

rescaled to be of trace one (i.e., the sum of diagonal entries is one):

ρ (π) = \frac{1}{|U|} I (indit (π)) .

From coding theory ([31] p. 66), we have the notion of the Hamming distance between two

0, 1

vectors or matrices (of the same dimensions), which is the number of places where they differ. Since logical information theory is about distinctions and differences, it is important to have a classical and quantum logical notion of Hamming distance. The powerset

℘ (U \times U)

can be viewed as a vector space over

Z_{2}

where the sum of two binary relations

R, R^{'} \subseteq U \times U

is the symmetric difference, symbolized as

R Δ R^{'} = (R - R^{'}) \cup (R^{'} - R) = R \cup R^{'} - R \cap R^{'}

, which is the set of elements (i.e., ordered pairs

(u_{i}, u_{j}) \in U \times U

) that are in one set or the other, but not both. Thus, the Hamming distance

D_{H} (I (R), I (R^{'}))

between the incidence matrices of two binary relations is just the cardinality of their symmetric difference:

D_{H} (I (R), I (R^{'})) = |R Δ R^{'}|

. Moreover, the size of the symmetric difference does not change if the binary relations are replaced by their complements:

|R Δ R^{'}| = |(U^{2} - R) Δ (U^{2} - R^{'})|

.

Hence, given two partitions

π = \{B_{1}, \dots, B_{I}\}

and

σ = \{C_{1}, \dots, C_{J}\}

on U, the unnormalized Hamming distance between the two partitions is naturally defined as (this is investigated in Rossi [32]):

D (π, σ) = D_{H} (I (indit (π)), I (indit (σ))) = |indit (π) Δ indit (σ)| = |dit (π) Δ dit (σ)|,

and the Hamming distance between

π

and

σ

is defined as the normalized

D (π, σ)

:

d (π, σ) = \frac{D (π, σ)}{|U \times U|} = \frac{|dit (π) Δ dit (σ)|}{|U \times U|} = \frac{|dit (π) - dit (σ)|}{|U \times U|} + \frac{|dit (σ) - dit (π)|}{|U \times U|} = h (π | σ) + h (σ | π) .

This motivates the general case of point probabilities

p = (p_{1}, \dots, p_{n})

where we define the Hamming distance between the two partitions as the sum of the two logical conditional entropies:

d (π, σ) = h (π | σ) + h (σ | π) = 2 h (π \lor σ) - h (π) - h (σ) .

To motivate the bridge to the quantum version of the Hamming distance, we need to calculate it using the density matrices

ρ (π)

and

ρ (σ)

of the two partitions. To compute the trace

tr [ρ (π) ρ (σ)]

, we compute the diagonal elements in the product

ρ (π) ρ (σ)

and add them up:

{[ρ (π) ρ (σ)]}_{k k} = \sum_{l} ρ {(π)}_{k l} ρ {(σ)}_{l k} = \sum_{l} \sqrt{p_{k} p_{l}} \sqrt{p_{l} p_{k}}

where the only nonzero terms are where

u_{k}, u_{l} \in B \cap C

for some

B \in π

and

C \in σ

. Thus, if

u_{k} \in B \cap C

, then

{[ρ (π) ρ (σ)]}_{k k} = \sum_{u_{l} \in B \cap C} p_{k} p_{l}

. Therefore, the diagonal element for

u_{k}

is the sum of the

p_{k} p_{l}

for

u_{l}

in the same intersection

B \cap C

as

u_{k}

, so it is

p_{k} \Pr (B \cap C)

. Then, when we sum over the diagonal elements, then for all the

u_{k} \in B \cap C

for any given

B, C

, we just sum

\sum_{u_{k} \in B \cap C} p_{k} \Pr (B \cap C) = \Pr {(B \cap C)}^{2}

so that

tr [ρ (π) ρ (σ)] = \sum_{B \in π, C \in σ} \Pr {(B \cap C)}^{2} = 1 - h (π \lor σ)

.

Hence, if we define the logical cross-entropy of

π

and

σ

as:

h (π | | σ) = 1 - tr [ρ (π) ρ (σ)],

then for partitions on U with the point probabilities

p = (p_{1}, \dots, p_{n})

, the logical cross-entropy

h (π | | σ)

of two partitions is the same as the logical joint entropy, which is also the logical entropy of the join:

h (π | | σ) = h (π, σ) = h (π \lor σ) .

Thus, we can also express the logical Hamming distance between two partitions entirely in terms of density matrices:

d (π, σ) = 2 h (π | | σ) - h (π) - h (σ) = tr [ρ {(π)}^{2}] + tr [ρ {(σ)}^{2}] - 2 tr [ρ (π) ρ (σ)] .

11. The Quantum Logical Hamming Distance

The quantum logical entropy

h (ρ) = 1 - tr [ρ^{2}]

of a density matrix

ρ

generalizes the classical

h (p) = 1 - \sum_{i} p_{i}^{2}

for a probability distribution

p = (p_{1}, \dots, p_{n})

. As a self-adjoint operator, a density matrix has a spectral decomposition

ρ = \sum_{i = 1}^{n} λ_{i} |u_{i}〉 〈u_{i}|

where

{\{|u_{i}〉\}}_{i = 1, \dots, n}

is an orthonormal basis for V and where all the eigenvalues

λ_{i}

are real, non-negative and

\sum_{i = 1}^{n} λ_{i} = 1

. Then,

h (ρ) = 1 - \sum_{i} λ_{i}^{2}

so

h (ρ)

can be interpreted as the probability of getting distinct indices

i \neq i^{'}

in two independent measurements of the state

ρ

with

\{|u_{i}〉\}

as the measurement basis. Classically, it is the two-draw probability of getting distinct indices in two independent samples of the probability distribution

λ = (λ_{1}, \dots, λ_{n})

, just as

h (p)

is the probability of getting distinct indices in two independent draws on p. For a pure state

ρ

, there is one

λ_{i} = 1

with the others zero, and

h (ρ) = 0

is the probability of getting distinct indices in two independent draws on

λ = (0, \dots, 0, 1, 0, \dots, 0)

.

In the classical case of the logical entropies, we worked with the ditsets or sets of distinctions of partitions. However, everything could also be expressed in terms of the complementary sets of indits or indistinctions of partitions (ordered pairs of elements in the same block of the partition) since:

dit (π) ⊎ indit (π) = U \times U

. When we switch to the density matrix treatment of “classical” partitions, then the focus shifts to the indistinctions. For a partition

π = \{B_{1}, \dots, B_{I}\}

, the logical entropy is the sum of the distinction probabilities:

h (π) = \sum_{(u_{k}, u_{l}) \in dit (π)} p_{k} p_{l}

, which in terms of indistinctions is:

h (π) = 1 - \sum_{(u_{k}, u_{l}) \in indit (π)} p_{k} p_{l} = 1 - \sum_{i = 1}^{I} \Pr {(B_{i})}^{2} .

When expressed in the density matrix formulation, then

tr [ρ {(π)}^{2}]

is the sum of the indistinction probabilities:

tr [ρ {(π)}^{2}] = \sum_{(u_{k}, u_{l}) \in indit (π)} p_{k} p_{l} = \sum_{i = 1}^{I} \Pr {(B_{i})}^{2} .

The nonzero entries in

ρ (π)

have the form

\sqrt{p_{k} p_{l}}

for

(u_{k}, u_{l}) \in indit (π)

; their squares are the indistinction probabilities. That provides the interpretive bridge to the quantum case.

The quantum analogue of an indistinction probability is the absolute square

{|ρ_{k l}|}^{2}

of a nonzero entry

ρ_{k l}

in a density matrix

ρ

, and

tr [ρ^{2}] = \sum_{k, l} {|ρ_{k l}|}^{2}

is the sum of those “indistinction” probabilities. The nonzero entries in the density matrix

ρ

might be called “coherences” so that

ρ_{k l}

may be interpreted as the amplitudes for the states

u_{k}

and

u_{l}

to cohere together in the state

ρ

, so

tr [ρ^{2}]

is the sum of the coherence probabilities, just as

tr [ρ {(π)}^{2}] = \sum_{(u_{k}, u_{l}) \in indit (π)} p_{k} p_{l}

is the sum of the indistinction probabilities. The quantum logical entropy

h (ρ) = 1 - tr [ρ^{2}]

may then be interpreted as the sum of the decoherence probabilities, just as

h (ρ (π)) = h (π) = 1 - \sum_{(u_{k}, u_{l}) \in indit (π)} p_{k} p_{l}

is the sum of the distinction probabilities.

This motivates the general quantum definition of the joint entropy

h (π, σ) = h (π \lor σ) = h (π | | σ)

, which is the:

h (ρ | | τ) = 1 - tr [ρ^{†} τ] quantum logical cross-entropy .

To work out its interpretation, we again take ON eigenvector bases

{\{|u_{i}〉\}}_{i = 1}^{n}

for

ρ

and

{\{|v_{j}〉\}}_{j = 1}^{n}

for

τ

with

λ_{i}

and

μ_{j}

as the respective eigenvalues and compute the operation of

τ^{†} ρ : V \to V

. Now,

|u_{i}〉 = \sum_{j} 〈v_{j} | u_{i}〉 |v_{j}〉

so

ρ |u_{i}〉 = λ_{i} |u_{i}〉 = \sum_{j} λ_{i} 〈v_{j} | u_{i}〉 |v_{j}〉

, and then, for

τ^{†} = \sum_{j} |v_{j}〉 〈v_{j}| μ_{j}

, so

τ^{†} ρ |u_{i}〉 = \sum_{j} λ_{i} μ_{j} 〈v_{j} | u_{i}〉 |v_{j}〉

. Thus,

τ^{†} ρ

in the

{\{u_{i}\}}_{i}

basis would have the diagonal entries

〈u_{i} | τ^{†} ρ | u_{i}〉 = \sum_{j} λ_{i} μ_{j} 〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉

, so the trace is:

tr [τ^{†} ρ] = \sum_{i} 〈u_{i} | τ^{†} ρ | u_{i}〉 = \sum_{i, j} λ_{i} μ_{j} 〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉 = tr [ρ^{†} τ]

which is symmetrical. The other information we have is the

\sum_{i} λ_{i} = 1 = \sum_{j} μ_{j}

, and they are non-negative. The classical logical cross-entropy of two probability distributions is

h (p | | q) = 1 - \sum_{i, j} p_{i} q_{j} δ_{i j}

, where two indices i and j are either identical or totally distinct. However, in the quantum case, the “index” states

|u_{i}〉

and

|v_{j}〉

have an “overlap” measured by the inner product

〈u_{i} | v_{j}〉

. The trace

tr [ρ^{†} τ]

is real since

〈v_{j} | u_{i}〉 = {〈u_{i} | v_{j}〉}^{*}

and

〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉 = {|〈u_{i} | v_{j}〉|}^{2} = {|〈v_{j} | u_{i}〉|}^{2}

is the probability of getting

λ_{i}

when measuring

v_{j}

in the

u_{i}

basis and the probability of getting

μ_{j}

when measuring

u_{i}

in the

v_{j}

basis. The twofold nature of density matrices as states and as observables then allows

tr [ρ^{†} τ]

to be interpreted as the average value of the observable

ρ

when measuring the state

τ

or vice versa.

We may call

〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉

the proportion or extent of overlap for those two index states. Thus,

tr [ρ^{†} τ]

is the sum of all the probability combinations

λ_{i} μ_{j}

weighted by the overlaps

〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉

for the index states

|u_{i}〉

and

|v_{j}〉

. The quantum logical cross-entropy can be written in a number of ways:

h (ρ | | τ) = 1 - tr [ρ^{†} τ] = 1 - \sum_{i, j} λ_{i} μ_{j} 〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉 = tr [τ^{†} (I - ρ)] = \sum_{i, j} (1 - λ_{i}) μ_{j} 〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉 = tr [ρ^{†} (I - τ)] = \sum_{i, j} λ_{i} (1 - μ_{j}) 〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉

.

Classically, the “index state”

\{i\}

completely overlaps with

\{j\}

when

i = j

and has no overlap with any other

\{i^{'}\}

from the indices

\{1, \dots, n\}

, so the “overlaps” are, as it were,

〈j | i〉 〈i | j〉 = δ_{i j}

, the Kronecker delta. Hence, the classical analogue formulas are:

h (p | | q) = 1 - \sum_{i, j} p_{i} q_{j} δ_{i j} = \sum_{i, j} (1 - p_{i}) q_{j} δ_{i j} = \sum_{i, j} p_{i} (1 - q_{j}) δ_{i j} .

The quantum logical cross-entropy

h (ρ | | τ)

can be interpreted by considering two measurements, one of

ρ

with the

{\{|u_{i}〉\}}_{i}

measurement basis and the other of

τ

with the

{\{|v_{j}〉\}}_{j}

measurement basis. If the outcome of the

ρ

measurement was

u_{i}

with probability

λ_{i}

, then the outcome of the

τ

measurement is different than

v_{j}

with probability

1 - μ_{j}

; however, that distinction probability

λ_{i} (1 - μ_{j})

is only relevant to the extent that

u_{i}

and

v_{j}

are the “same state” or overlap, and that extent is

〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉

. Hence, the quantum logical cross-entropy is the sum of those two-measurement distinction probabilities weighted by the extent that the states overlap. The interpretation of

h (ρ)

and

h (τ | | ρ)

, as well as the later development of the quantum logical conditional entropy

h (ρ | τ)

and the quantum Hamming distance

d (ρ, τ)

are all based on using the eigenvectors and eigenvalues of density matrices, which Michael Nielsen and Issac Chuang seem to prematurely dismiss as having little or no “special significance” ([30] p. 103).

When the two density matrices commute,

ρ τ = τ ρ

, then (as noted above) we have the essentially classical situation of one set of index states

{\{|u_{i}〉\}}_{i}

which is an orthonormal basis set of simultaneous eigenvectors for both

ρ

and

τ

with the respective eigenvalues

{\{λ_{i}\}}_{i}

and

{\{μ_{j}\}}_{j}

. Then,

〈u_{j} | u_{i}〉 〈u_{i} | u_{j}〉 = δ_{i j}

, so

h (ρ | | τ) = \sum_{i, j} λ_{i} (1 - μ_{j}) δ_{i j}

is the probability of getting two distinct index states

u_{i}

and

u_{j}

for

i \neq j

in two independent measurements, one of

ρ

and one of

τ

in the same measurement basis of

{\{|u_{i}〉\}}_{i}

. This interpretation includes the special case when

τ = ρ

and

h (ρ | | ρ) = h (ρ)

.

We saw that classically, the logical Hamming distance between two partitions could be defined as:

d (π, σ) = 2 h (π | | σ) - h (π) - h (σ) = tr [ρ {(π)}^{2}] + tr [ρ {(σ)}^{2}] - 2 tr [ρ (π) ρ (σ)]

so this motivates the quantum definition. Nielsen and Chuang suggest the idea of a Hamming distance between quantum states, only to then dismiss it. “Unfortunately, the Hamming distance between two objects is simply a matter of labeling, and a priori there are not any labels in the Hilbert space arena of quantum mechanics!” ([30] p. 399). They are right that there is no correlation, say, between the vectors in the two ON bases

{\{u_{i}\}}_{i}

and

{\{v_{j}\}}_{j}

for V, but the cross-entropy

h (ρ | | τ)

uses all possible combinations in the terms

λ_{i} (1 - μ_{j}) 〈v_{j} | u_{i}〉 〈u_{i} | v_{j}〉

; thus, the definition of the Hamming distance given below does not use any arbitrary labeling or correlations.

d (ρ, τ) = 2 h (ρ | | τ) - h (ρ) - h (τ) = tr [ρ^{2}] + tr [τ^{2}] - 2 tr [ρ^{†} τ]

This is the definition of the quantum logical Hamming distance between two quantum states.

There is another distance measure between quantum states, namely the Hilbert–Schmidt norm, which has been recently investigated in [29] (with an added

\frac{1}{2}

factor). It is the square of the Euclidean distance between the quantum states, and ignoring the

\frac{1}{2}

factor, it is the square of the “trace distance” ([30] Chapter 9) between the states.

tr [{(ρ - τ)}^{2}] Hilbert-chmidt norm,

where we write

A^{2}

for

A^{†} A

. Then, the naturalness of this norm as a “distance” is enhanced by the fact that it is the same as the quantum Hamming distance:

Theorem 4.

Hilbert–Schmidt norm = quantum logical Hamming distance.

Proof.

tr [{(ρ - τ)}^{2}] = tr [ρ^{2}] + tr [τ^{2}] - 2 tr [ρ^{†} τ] = 2 h (ρ | | τ) - h (ρ) - h (τ) = d (ρ, τ) .

☐

Hence, the information inequality holds trivially for the quantum logical Hamming distance:

d (ρ, τ) \geq 0 with equality iff ρ = τ .

The fundamental theorem can be usefully restated in this broader context of density operators instead of in terms of the density matrix represented in the ON basis for F-eigenvectors. Let

ρ

be any state to be measured by an observable F, and let

\hat{ρ} = \sum_{i = 1}^{I} P_{ϕ_{i}} ρ P_{ϕ_{i}}

be the result of applying the Lüders mixture operation (where the

P_{ϕ_{i}}

are the projection operators to the eigenspaces of F). Then, a natural question to ask is the Hilbert–Schmidt norm or quantum logical Hamming distance between the pre- and post-measurement states. It might be noted that the Hilbert–Schmidt norm and the Lüders mixture operation are defined independently of any considerations of logical entropy.

Theorem 5 (Fundamental (quantum)).

tr [{(ρ - \hat{ρ})}^{2}] = h (\hat{ρ}) - h (ρ)

.

Proof.

tr [{(ρ - \hat{ρ})}^{2}] = tr [(ρ^{†} - {\hat{ρ}}^{†}) (ρ - \hat{ρ})] = tr [ρ^{2}] + tr [{\hat{ρ}}^{2}] - tr [ρ \hat{ρ}] - tr [\hat{ρ} ρ]

where

ρ^{†} = ρ

,

{\hat{ρ}}^{†} = \hat{ρ}

,

tr [ρ \hat{ρ}] = tr [\hat{ρ} ρ]

and

tr [ρ \hat{ρ}] = tr [ρ \sum_{i = 1}^{I} P_{ϕ_{i}} ρ P_{ϕ_{i}}] = tr [\sum ρ P_{ϕ_{i}} ρ P_{ϕ_{i}}]

. Also

tr [{\hat{ρ}}^{2}] = tr [(\sum_{i} P_{ϕ_{i}} ρ P_{ϕ_{i}}) (\sum_{j} P_{ϕ_{j}} ρ P_{ϕ_{j}})] = tr [\sum_{i} P_{ϕ_{i}} ρ P_{ϕ_{i}}^{2} ρ P_{ϕ_{i}}]

using the orthogonality of the distinct projection operators. Then, using the idempotency of the projections and the cyclicity of the trace,

tr [{\hat{ρ}}^{2}] = tr [\sum_{i} ρ P_{ϕ_{i}} ρ P_{ϕ_{i}}]

so

tr [ρ \hat{ρ}] = tr [\hat{ρ} ρ] = tr [{\hat{ρ}}^{2}]

, and hence,

tr [{(ρ - \hat{ρ})}^{2}] = tr [ρ^{2}] - tr [{\hat{ρ}}^{2}] = h (\hat{ρ}) - h (ρ)

. ☐

12. Results

Logical information theory arises as the quantitative version of the logic of partitions just as logical probability theory arises as the quantitative version of the dual Boolean logic of subsets. Philosophically, logical information is based on the idea of information-as-distinctions. The Shannon definitions of entropy arise naturally out of the logical definitions by replacing the counting of distinctions by the counting of the minimum number of binary partitions (bits) that are required, on average, to make all the same distinctions, i.e., to encode the distinguished elements uniquely, which is why the Shannon theory is so well adapted for the theory of coding and communication.

This “classical” logical information theory may be developed with the data of two partitions on a set with point probabilities. Section 7 gives the generalization to the quantum case where the partitions are provided by two commuting observables (the point set is an ON basis of simultaneous eigenvectors), and the point probabilities are provided by the state to be measured. In Section 8, the fundamental theorem for quantum logical entropy and measurement established a direct quantitative connection between the increase in quantum logical entropy due to a projective measurement and the eigenstates (cohered together in the pure superposition state being measured) that are distinguished by the measurement (decohered in the post-measurement mixed state). This theorem establishes quantum logical entropy as a natural notion for a quantum information theory focusing on distinguishing states.

The classical theory might also start with partitions on two different sets and a probability distribution on the product of the sets (see Appendix A). Appendix B gives the quantum generalization of that case with the two sets being two ON bases for two non-commuting observables, and the probabilities are provided by a state to be measured. The classical theory may also be developed just using two probability distributions indexed by the same set, and this is generalized to the quantum case where we are just given two density matrices representing two states in a Hilbert space. Section 10 and Section 11 carry over the Hamming distance measure from the classical to the quantum case where it is equal to the Hilbert–Schmidt distance (square of the trace distance). The general fundamental theorem relating measurement and logical entropy is that the Hilbert–Schmidt distance (=quantum logical Hamming distance) between any pre-measurement state

ρ

and the state

\hat{ρ}

resulting from a projective measurement of the state is the difference in their logical entropies,

h (\hat{ρ}) - h (ρ)

.

13. Discussion

The overall argument is that quantum logical entropy is the simple and natural notion of information-as-distinctions for quantum information theory focusing on the distinguishing of quantum states. These results add to the arguments already presented by Manfredi and Feix [21] and many others (see [23]) for this notion of quantum entropy.

There are two related classical theories of information, classical logical information theory (focusing on information-as-distinctions and analyzing classification) and the Shannon theory (focusing on coding and communications theory). Generalizing to the quantum case, there are also two related quantum theories of information, the logical theory (using quantum logical entropy to focus on distinguishing quantum states and analyzing measurement as the quantum version of classification) and the conventional quantum information theory (using von Neumann entropy to develop a quantum version of the Shannon treatment of coding and communications).

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Classical Logical Information Theory with Two Sets X and Y

The usual (“classical”) logical information theory for a probability distribution

\{p (x, y)\}

on

X \times Y

(finite) in effect uses the discrete partition on X and Y [3]. For the general case of quantum logical entropy for not-necessarily commuting observables, we need to first briefly develop the classical case with general partitions on X and Y.

Given two finite sets X and Y and real-valued functions

f : X \to R

with values

{\{ϕ_{i}\}}_{i = 1}^{I}

and

g : Y \to R

with values

{\{γ_{j}\}}_{j = 1}^{J}

, each function induces a partition on its domain:

π = {\{f^{- 1} (ϕ_{i})\}}_{i \in I} = \{B_{1}, \dots, B_{I}\} on X, and σ = {\{g^{- 1} (γ_{j})\}}_{j \in J} = \{C_{1}, \dots, C_{J}\} on Y .

We need to define logical entropies on

X \times Y

, but first, we need to define the ditsets or information sets.

A partition

π = \{B_{1}, \dots, B_{I}\}

on X and a partition

σ = \{C_{1}, \dots, C_{J}\}

on Y define a product partition

π \times σ

on

X \times Y

whose blocks are

{\{B_{i} \times C_{j}\}}_{i, j}

. Then,

π

induces

π \times 0_{Y}

on

X \times Y

(where

0_{Y}

is the indiscrete partition on Y), and

σ

induces

0_{X} \times σ

on

X \times Y

. The corresponding ditsets or information sets are:

$dit (π \times 0_{Y}) = \{((x, y), (x^{'}, y^{'})) : f (x) \neq f (x^{'})\} \subseteq {(X \times Y)}^{2}$ ;
$dit (0_{X} \times σ) = \{((x, y), (x^{'}, y^{'})) : g (y) \neq g (y^{'})\} \subseteq {(X \times Y)}^{2}$ ;
$dit (π \times σ) = dit (π \times 0_{Y}) \cup dit (0_{X} \times σ)$ ; and so forth.

Given a joint probability distribution

p : X \times Y \to [0, 1]

, the product probability distribution is

p \times p : {(X \times Y)}^{2} \to [0, 1]

.

All the logical entropies are just the product probabilities of the ditsets and their union, differences and intersection:

$h (π \times 0_{Y}) = p \times p (dit (π \times 0_{Y}))$ ;
$h (0_{X} \times σ) = p \times p (dit (0_{X} \times σ))$ ;
$h (π \times σ) = p \times p (dit (π \times σ)) = p \times p (dit (π \times 0_{Y}) \cup dit (0_{X} \times σ))$ ;
$h (π \times 0_{Y} | 0_{X} \times σ) = p \times p (dit (π \times 0_{Y}) - dit (0_{X} \times σ))$ ;
$h (0_{X} \times σ | π \times 0_{Y}) = p \times p (dit (0_{X} \times σ) - dit (π \times 0_{Y}))$ ;
$m (π \times 0_{Y}, 0_{X} \times σ) = p \times p (dit (π \times 0_{Y}) \cap dit (0_{X} \times σ))$ .

All the logical entropies have the usual two-draw probability interpretation where the two independent draws from

X \times Y

are

(x, y)

and

(x^{'}, y^{'})

and can be interpreted in terms of the f-values and g-values:

$h (π \times 0_{Y})$ = probability of getting distinct f-values;
$h (0_{X} \times σ)$ = probability of getting distinct g-values;
$h (π \times σ)$ = probability of getting distinct f- or g-values;
$h (π \times 0_{Y} | 0_{X} \times σ)$ = probability of getting distinct f-values, but the same g-values;
$h (0_{X} \times σ | π \times 0_{Y})$ = probability of getting distinct g-values, but the same f-values;
$m (π \times 0_{Y}, 0_{X} \times σ)$ = probability of getting distinct f- and distinct g-values.

We have defined all the logical entropies by the general method of the product probabilities on the ditsets. In the first three cases,

h (π \times 0_{Y})

,

h (0_{X} \times σ)

and

h (π \times σ)

, they were the logical entropies of partitions on

X \times Y

, so they could equivalently be defined using density matrices. The case of

h (π \times σ)

illustrates the general case. If

ρ (π)

is the density matrix defined for

π

on X and

ρ (σ)

the density matrix for

σ

on Y, then

ρ (π \times σ) = ρ (π) \otimes ρ (σ)

is the density matrix for

π \times σ

defined on

X \times Y

, and:

h (π \times σ) = 1 - tr [ρ {(π \times σ)}^{2}] .

The marginal distributions are:

p_{X} (x) = \sum_{y} p (x, y)

and

p_{Y} (y) = \sum_{x} p (x, y)

. Since

π

is a partition on X, there is also the usual logical entropy

h (π) = p_{X} \times p_{X} (dit (π)) = 1 - tr [ρ {(π)}^{2}] = h (π \times 0_{Y})

where

dit (π) \subseteq X \times X

and similarly for

p_{Y}

.

Since the context should be clear, we may henceforth adopt the old notation from the case where

π

and

σ

were partitions on the same set U, i.e.,

h (π) = h (π \times 0_{Y})

,

h (σ) = h (0_{X} \times σ)

,

h (π, σ) = h (π \times σ)

, etc.

Since the logical entropies are the values of a probability measure, all the usual identities hold where the underlying set is now

{(X \times Y)}^{2}

instead of

U^{2}

, as illustrated in Figure A1.

Figure A1. Venn diagram for logical entropies as values of a probability measure

p \times p

on

{(X \times Y)}^{2}

.

Figure A1. Venn diagram for logical entropies as values of a probability measure

p \times p

on

{(X \times Y)}^{2}

.

The previous treatment of

h (X)

,

h (Y)

,

h (X, Y)

,

h (X | Y)

,

h (Y | X)

and

m (X, Y)

in [3] was just the special cases where

π = 1_{X}

and

σ = 1_{Y}

.

Appendix B. Quantum Logical Entropies with Non-Commuting Observables

As before in the case of commuting observables, the quantum case can be developed in close analogy with the previous classical case. Given a finite-dimensional Hilbert space V and not necessarily commuting observables

F, G : V \to V

, let X be an orthonormal basis of V of F-eigenvectors, and let Y be an orthonormal basis for V of G-eigenvectors (so

|X| = | Y |

).

Let

f : X \to R

be the eigenvalue function for F with values

{\{ϕ_{i}\}}_{i = 1}^{I}

, and let

g : Y \to R

be the eigenvalue function for G with values

{\{γ_{j}\}}_{j = 1}^{J}

.

Each eigenvalue function induces a partition on its domain:

π = \{f^{- 1} (ϕ_{i})\} = \{B_{1}, \dots, B_{I}\} on X, and σ = \{g^{- 1} (γ_{j})\} = \{C_{1}, \dots, C_{J}\} on Y .

We associated with the ordered pair

(x, y)

, the basis element

x \otimes y

in the basis

{\{x \otimes y\}}_{x \in X, y \in Y}

for

V \otimes V

. Then, each pair of pairs

((x, y), (x^{'}, y^{'}))

is associated with the basis element

(x \otimes y) \otimes (x^{'} \otimes y^{'})

in

(V \otimes V) \otimes (V \otimes V) = {(V \otimes V)}^{2}

.

Instead of ditsets or information sets, we now have qudit subspaces or information subspaces. For

R \subseteq {(V \otimes V)}^{2}

, let

[R]

be the subspace generated by R. We simplify the notation of

q u d i t (π \times 0_{Y}) = q u d i t (π) = \{(x \otimes y) \otimes (x^{'} \otimes y^{'}) : f (x) \neq f (x^{'})\}

, etc.

$[q u d i t (π)] = [\{(x \otimes y) \otimes (x^{'} \otimes y^{'}) : f (x) \neq f (x^{'})\}]$ ;
$[q u d i t (σ)] = [\{(x \otimes y) \otimes (x^{'} \otimes y^{'}) : g (y) \neq g (y^{'})\}]$ ;
$[q u d i t (π, σ)] = [q u d i t (π) \cup q u d i t (σ)]$ ; and so forth. Tt is again an easy implication of the aforementioned common dits theorem that any two nonzero information spaces $[q u d i t (π)]$ and $[q u d i t (σ)]$ have a nonzero intersection, so the mutual information space $[q u d i t (π) \cap q u d i t (σ)]$ is not the zero space.

A normalized state

|ψ〉

on

V \otimes V

defines a pure state density matrix

ρ (ψ) = |ψ〉 〈ψ|

. Let

α_{x, y} = 〈x \otimes y | ψ〉

, so if

P_{[x \otimes y]}

is the projection to the subspace (ray) generated by

x \otimes y

in

V \otimes V

, then a probability distribution on

X \times Y

is defined by:

p (x, y) = α_{x, y} α_{x, y}^{*} = tr [P_{[x \otimes y]} ρ (ψ)],

or more generally, for a subspace

T \subseteq V \otimes V

, a probability distribution is defined on the subspaces by:

\Pr (T) = tr [P_{T} ρ (ψ)] .

Then, the product probability distribution

p \times p

on the subspaces of

{(V \otimes V)}^{2}

defines the quantum logical entropies when applied to the information subspaces:

$h (F : ψ) = p \times p ([q u d i t (π)]) = tr [P_{[q u d i t (π)]} (ρ (ψ) \otimes ρ (ψ))]$ ;
$h (G : ψ) = p \times p ([q u d i t (σ)]) = tr [P_{[q u d i t (σ)]} (ρ (ψ) \otimes ρ (ψ))]$ ;
$h (F, G : ψ) = p \times p ([q u d i t (π) \cup q u d i t (σ)]) = tr [P_{[q u d i t (π) \cup q u d i t (σ)]} (ρ (ψ) \otimes ρ (ψ))]$ ;
$h (F | G : ψ) = p \times p ([q u d i t (π) - q u d i t (σ)]) = tr [P_{[q u d i t (π) - q u d i t (σ)]} (ρ (ψ) \otimes ρ (ψ))]$ ;
$h (G | F : ψ) = p \times p ([q u d i t (σ) - q u d i t (π)]) = tr [P_{[q u d i t (σ) - q u d i t (π)]} (ρ (ψ) \otimes ρ (ψ))]$ ;
$m (F, G : ψ) = p \times p ([q u d i t (π) \cap q u d i t (σ)]) = tr [P_{[q u d i t (π) \cap q u d i t (σ)]} (ρ (ψ) \otimes ρ (ψ))]$ .

The observable

F : V \to V

defines an observable

F \otimes I : V \otimes V \to V \otimes V

with the eigenvectors

x \otimes v

for any nonzero

v \in V

and with the same eigenvalues

ϕ_{1}, \dots, ϕ_{I}

(the context should suffice to distinguish the identity operator

I : V \to V

from the index set I for the F-eigenvalues). Then, in two independent measurements of

ψ

by the observable

F \otimes I

, we have:

h (F : ψ) = probability of getting distinct eigenvalues ϕ_{i} and ϕ_{i^{'}}, i.e., of getting a qudit of F .

In a similar manner,

G : V \to V

defines the observable

I \otimes G : V \otimes V \to V \otimes V

with the eigenvectors

v \otimes y

and with the same eigenvalues

γ_{1}, \dots, γ_{J}

. Then, in two independent measurements of

ψ

by the observable

I \otimes G

, we have:

h (G : ψ) = probability of getting distinct eigenvalues γ_{j} and γ_{j^{'}} .

The two observables

F, G : V \to V

define an observable

F \otimes G : V \otimes V \to V \otimes V

with the eigenvectors

x \otimes y

for

(x, y) \in X \times Y

and eigenvalues

f (x) g (y) = ϕ_{i} γ_{j}

. To interpret the compound logical entropies cleanly, we assume there is no accidental degeneracy, so there are no

ϕ_{i} γ_{j} = ϕ_{i^{'}} γ_{j^{'}}

for

i \neq i^{'}

and

j \neq j^{'}

. Then, for two independent measurements of

ψ

by

F \otimes G

, the compound quantum logical entropies can be interpreted as the following “two-measurement” probabilities:

$h (F, G : ψ)$ = probability of getting distinct eigenvalues $ϕ_{i} γ_{j} \neq ϕ_{i^{'}} γ_{j^{'}}$ where $i \neq i^{'}$ or $j \neq j^{'}$ ;
$h (F | G : ψ)$ = probability of getting distinct eigenvalues $ϕ_{i} γ_{j} \neq ϕ_{i^{'}} γ_{j}$ where $i \neq i^{'}$ ;
$h (G | F : ψ)$ = probability of getting distinct eigenvalues $ϕ_{i} γ_{j} \neq ϕ_{i} γ_{j^{'}}$ where $j \neq j^{'}$ ;
$m (F, G : ψ)$ = probability of getting distinct eigenvalues $ϕ_{i} γ_{j} \neq ϕ_{i^{'}} γ_{j^{'}}$ where $i \neq i^{'}$ and $j \neq j^{'}$ .

All the quantum logical entropies have been defined by the general method using the information subspaces, but in the first three cases

h (F : ψ)

,

h (G : ψ)

and

h (F, G : ψ)

, the density matrix method of defining logical entropies could also be used. Then, the fundamental theorem could be applied relating the quantum logical entropies to the zeroed entities in the density matrices indicating the eigenstates distinguished by the measurements.

The previous set identities for disjoint unions now become subspace identities for direct sums such as:

[q u d i t (π) \cup q u d i t (σ)] = [q u d i t (π) - q u d i t (σ)] \oplus [q u d i t (π) \cap q u d i t (σ)] \oplus [q u d i t (σ) - q u d i t (π)] .

Hence, the probabilities are additive on those subspaces as shown in Figure A2:

h (F, G : ψ) = h (F | G : ψ) + m (F, G : ψ) + h (G | F : ψ) .

Figure A2. Venn diagram for quantum logical entropies as probabilities on

{(V \otimes V)}^{2}

.

Figure A2. Venn diagram for quantum logical entropies as probabilities on

{(V \otimes V)}^{2}

.

References

Gini, C. Variabilità e Mutabilità; Tipografia di Paolo Cuppini: Bologna, Italy, 1912. (In Italian) [Google Scholar]
Ellerman, D. An Introduction of Partition Logic. Logic J. IGPL 2014, 22, 94–125. [Google Scholar] [CrossRef]
Ellerman, D. Logical Information Theory: New Foundations for Information Theory. Logic J. IGPL 2017, 25, 806–835. [Google Scholar] [CrossRef]
Markechová, D.; Mosapour, B.; Ebrahimzadeh, A. Logical Divergence, Logical Entropy, and Logical Mutual Information in Product MV-Algebras. Entropy 2018, 20, 129. [Google Scholar] [CrossRef]
Ellerman, D. The Logic of Partitions: Introduction to the Dual of the Logic of Subsets. Rev. Symb. Logic 2010, 3, 287–350. [Google Scholar] [CrossRef]
Rota, G.-C. Twelve Problems in Probability No One Likes to Bring up. In Algebraic Combinatorics and Computer Science; Crapo, H., Senato, D., Eds.; Springer: Milan, Italy, 2001; pp. 57–93. [Google Scholar]
Kung, J.P.; Rota, G.C.; Yan, C.H. Combinatorics: The Rota Way; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
Kolmogorov, A. Combinatorial Foundations of Information Theory and the Calculus of Probabilities. Russ. Math. Surv. 1983, 38, 29–40. [Google Scholar] [CrossRef]
Campbell, L. Lorne. Entropy as a Measure. IEEE Trans. Inf. Theory 1965, 11, 112–114. [Google Scholar] [CrossRef]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
Rozeboom, W.W. The Theory of Abstract Partials: An Introduction. Psychometrika 1968, 33, 133–167. [Google Scholar] [CrossRef] [PubMed]
Abramson, N. Information Theory Coding; McGraw-Hill: New York, NY, USA, 1963. [Google Scholar]
Hartley, R.V. Transmission of information. Bell Syst. Tech. J. 1928, 7, 553–563. [Google Scholar] [CrossRef]
Lawvere, F. William and Stephen Schanuel. In Conceptual Mathematics: A First Introduction to Categories; Cambridge University Press: New York, NY, USA, 1997. [Google Scholar]
Auletta, G.; Mauro, F.; Giorgio, P. Quantum Mechanics; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Fano, U. Description of States in Quantum Mechanics by Density Matrix and Operator Techniques. Rev. Mod. Phys. 1957, 29, 74–93. [Google Scholar] [CrossRef]
Bennett, C.H. Quantum Information: Qubits and Quantum Error Correction. Int. J. Theor. Phys. 2003, 42, 153–176. [Google Scholar] [CrossRef]
Ellerman, D. The Quantum Logic of Direct-Sum Decompositions: The Dual to the Quantum Logic of Subspaces. Logic J. IGPL 2018, 26, 1–13. [Google Scholar] [CrossRef]
Ellerman, D. Counting Distinctions: On the Conceptual Foundations of Shannon’s Information Theory. Synthese 2009, 168, 119–149. [Google Scholar] [CrossRef]
Jaeger, G. Quantum Information: An Overview; Springer: New York, NY, USA, 2007. [Google Scholar]
Manfredi, G.; Feix, M.R. Entropy and Wigner Functions. Phys. Rev. E 2000, arXiv:quant-ph/020310262, 4665. [Google Scholar] [CrossRef]
Buscemi, F.; Bordone, P.; Bertoni, A. Linear Entropy as an Entanglement Measure in Two-Fermion Systems. Phys. Rev. 2007, arXiv:quant-ph/0611223v275, 032301. [Google Scholar] [CrossRef]
Woldarz, J.J. Entropy and Wigner Distribution Functions Revisited. Int. J. Theor. Phys. 2003, 42, 1075–1084. [Google Scholar] [CrossRef]
Havrda, J.; Charvát, F. Quantification Methods of Classification Processes: Concept of Structural Alpha-Entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
Tsallis, C. Possible Generalization for Boltzmann-Gibbs Statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Rao, C.R. Diversity and Dissimilarity Coefficients: A Unified Approach. Theor. Popul. Biol. 1982, 21, 24–43. [Google Scholar] [CrossRef]
Cohen-Tannoudji, C.; Laloe, F.; Diu, B. Quantum Mechanics; John Wiley and Sons: New York, NY, USA, 2005; Volume 1. [Google Scholar]
Tamir, B.; Cohen, E. Logical Entropy for Quantum States. arXiv, 2014; arXiv:1412.0616v2. [Google Scholar]
Tamir, B.; Cohen, E. A Holevo-Type Bound for a Hilbert Schmidt Distance Measure. J. Quantum Inf. Sci. 2015, 5, 127–133. [Google Scholar] [CrossRef]
Nielsen, M.; Chuang, I. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
McEliece, R.J. The Theory of Information and Coding: A Mathematical Framework for Communication. In Encyclopedia of Mathematics and Its Applications; Addison-Wesley: Reading, MA, USA, 1977; Volume 3. [Google Scholar]
Rossi, G. Partition Distances. arXiv, 2011; arXiv:1106.4579v1. [Google Scholar]

Figure 1. Distinctions and indistinctions of a partition.

Figure 2. Dual logics: Boolean subset logic of subsets and partition logic.

Figure 3. Venn diagram for logical entropies as values of a probability measure

p \times p

on

U \times U

.

Figure 3. Venn diagram for logical entropies as values of a probability measure

p \times p

on

U \times U

.

Figure 4. Summary of the dit-bit transform.

Figure 5. Venn diagram mnemonic for Shannon entropies.

Figure 6. The parallel development of classical and quantum logical information prior to probabilities.

Figure 7. The development of classical and quantum logical entropies for commuting F and G.

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ellerman, D. Logical Entropy: Introduction to Classical and Quantum Logical Information Theory. Entropy 2018, 20, 679. https://doi.org/10.3390/e20090679

AMA Style

Ellerman D. Logical Entropy: Introduction to Classical and Quantum Logical Information Theory. Entropy. 2018; 20(9):679. https://doi.org/10.3390/e20090679

Chicago/Turabian Style

Ellerman, David. 2018. "Logical Entropy: Introduction to Classical and Quantum Logical Information Theory" Entropy 20, no. 9: 679. https://doi.org/10.3390/e20090679

APA Style

Ellerman, D. (2018). Logical Entropy: Introduction to Classical and Quantum Logical Information Theory. Entropy, 20(9), 679. https://doi.org/10.3390/e20090679

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Logical Entropy: Introduction to Classical and Quantum Logical Information Theory

Abstract

1. Introduction

2. Duality of Subsets and Partitions

3. From the Logic of Partitions to Logical Information Theory

4. Compound Logical Entropies

5. Deriving the Shannon Entropies from the Logical Entropies

6. Logical Entropy via Density Matrices

7. Quantum Logical Information Theory: Commuting Observables

8. Two Theorems about Quantum Logical Entropy

9. Quantum Logical Entropies of Density Operators

10. The Logical Hamming Distance between Two Partitions

11. The Quantum Logical Hamming Distance

12. Results

13. Discussion

Funding

Conflicts of Interest

Appendix A. Classical Logical Information Theory with Two Sets X and Y

Appendix B. Quantum Logical Entropies with Non-Commuting Observables

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI