*2.7. Related Methods*

The authors of [18] discussed a clustering method for data with measurement errors. They also assumed that each observation, *yi*, is associated with a known covariance matrix, Λ˜ *<sup>i</sup>*, but they assume that this covariance matrix is for the distance *between the observation and the center of a cluster*. Their conceptual model, using our notation, assumes that

$$\mathbf{y}\_i | k \sim \mathcal{N}\_d(\boldsymbol{\mu}\_{k'} \boldsymbol{\tilde{\mathbf{A}}}\_i) \tag{24}$$

when observation *i* belongs to cluster *k* (under their model, group membership is deterministic, not probabilistic). Comparing (24) to our MCLUST-ME model (6) and (7), we see that their model lacks the "model-based" element—the covariance matrix Σ*k*—for each cluster *k*, *k* = 1, ... , *G*. In other words, their Λ˜ *<sup>i</sup>* plays the role of our Σ*<sup>k</sup>* + Λ*i*. This is a crucial difference: we understand that in MCLUST and MCLUST-ME models, Σ*k*'s are used to capture different shapes, orientations, and scales of the different clusters. Also, although it is reasonable to assume that the error covariances of the measurements (Λ*<sup>i</sup>* in MCLUST-ME) are known or can be estimated, it is much more difficult to know Σ*<sup>k</sup>* + Λ*<sup>i</sup>* (i.e., Λ˜ *<sup>i</sup>*), as we do not where the centers of the clusters are before running the clustering algorithm.

The authors of that paper discussed two heuristic algorithms for fitting *G* clusters into observations: hError and kError. Under their model, they need to estimate the *μk*'s for all the clusters and the deterministic (or hard) group memberships for each observation. Both algorithms are distance-based, and not based on an EM algorithm. The hError algorithm is a hierarchical clustering algorithm: it iteratively merges two current clusters with the smallest distances. The error covariances Λ˜ *<sup>i</sup>* were incorporated into the distance formula. For each current cluster *k*, let *Sk* be the collection of observations. The center of cluster *k* is estimated by a weighted average of the observations:

$$\mathfrak{h}\_k = (\sum\_{i \in S\_k} \bar{\mathbf{A}}\_i^{-1})^{-1} \sum\_{i \in S\_k} \bar{\mathbf{A}}\_i^{-1} \mathbf{y}\_i \tag{25}$$

with covariance matrix

$$\Psi\_k = \text{Var}(\mathfrak{h}\_k) = (\sum\_{i \in \mathbb{S}\_k} \tilde{\mathbf{A}}\_i^{-1})^{-1}. \tag{26}$$

The distance between any two clusters *k* and *l* is defined by

$$d\_{kl} = \left(\hat{\mathfrak{p}}\_k - \hat{\mathfrak{p}}\_l\right)^T (\mathbf{\varPsi}\_k + \mathbf{\varPsi}\_l)^{-1} (\hat{\mathfrak{p}}\_k - \hat{\mathfrak{p}}\_l) \tag{27}$$

The kError algorithm is an extension of the *k*-means method. It iterates between two steps: (1) Computing the centers of the clusters using (25). (2) Assigning each point to the closest cluster based on the distance formula

$$d\_{ik} = (\mathbf{y}\_i - \boldsymbol{\mathfrak{h}}\_k)^T \boldsymbol{\tilde{\mathbf{A}}}\_i^{-1} (\mathbf{y}\_i - \boldsymbol{\mathfrak{h}}\_k). \tag{28}$$

We implemented the simpler kError algorithm as described above and applied it the real-data example. We summarized our findings in Section 3.3.

The authors of [19] proposed another extension to the *k*-means method that incorporates errors on individual observations. Under their model, each cluster is characterized by a "profile" *α* = (*α*1, ... , *αm*), where *m* is the dimension of the data. Each observation, *gi* = (*gi*1, ... , *gim*), from this cluster is modeled as

$$\mathfrak{g}\_{\rm ij} = \mathfrak{beta}\_{\rm i}\mathfrak{a}\_{\rm j} + \gamma\_{\rm i} + \mathfrak{e}\_{\rm ij}, \quad \mathfrak{j} = 1, \ldots, m,\tag{29}$$

where *ij* ∼ *N*(0, *σij*) with known error variances *σij*. The distance from an observation *gi* to a cluster with profile *α* is defined as

$$\min\_{\beta\_i, \gamma\_i} \sum\_{j=1}^{m} \left[ \frac{g\_{ij} - (\beta\_i a\_j + \gamma\_i)}{\sigma\_{ij}} \right]^2,\tag{30}$$

essentially the weighted sum of squared errors from a weighted least-squares regression of *gi* on the profile *α*. The motivation of this distance measure is that it captures both the euclidean distance and the correlation between an observation and a profile. Their version of *k*-means algorithm, CORE, proceeds by iteratively estimating the profile *α* for each cluster and then assigning each observation *gi* to the closest cluster according to (30). We note that their distance measure is less useful for low-dimensional data, as a regression line needs to be fitted between each observation and the cluster profile. If we force the slope *β<sup>i</sup>* to be 0, then we see that their method will be similar to the kError method in [18].
