**2. Notation**

Matrices are represented as *M*, and vectors as *v*. Vectors are column vectors unless otherwise noted. *x* denotes the largest integer less than or equal to *x*. *x*<sup>∗</sup> represents the conjugate of *x*, *M H* denotes the Hermitian of *M*, and *v<sup>T</sup>* represents the transpose of the vector *v*. *<sup>E</sup>*[*x*] is the expectation operator applied to the random variable *x*. *i* ∈ {1, ... , *D*} is used to index a mode among the *D* modes used in the fiber and ∗ represents the convolution operator. The result of the convolution operator applied to a *D*1 × *D*2 matrix *a*(*t*) and a *D*2 × *D*3 matrix *b*(*t*) is a *D*1 × *D*3 matrix denoted as *c*(*t*) and given by:

$$
\underline{\cdot}(t) = \underline{a}(t) \* \underline{b}(t) \tag{1}
$$

where each of the elements of *<sup>c</sup>*(*t*), denoted as *cij*(*t*), are obtained as in a simple matrix multiplication, but substituting the product by the convolution operator:

$$c\_{ij}(t) = \sum\_{k=1}^{D\_2} a\_{ik}(t) \* b\_{kj}(t). \tag{2}$$

Similarly, the result of the convolution operator applied to a *D*1 × *D*2 matrix *a*(*t*) and a time dependent signal *y*(*t*) is a *D*1 × *D*2 matrix denoted as *d*(*t*) where *d*(*t*) = *a*(*t*) ∗ *y*(*t*). Each of the elements of the matrix *d*(*t*) are obtained as in a multiplication of a matrix with a scalar, however substituting the product by the convolution operator:

$$d\_{ij}(t) = a\_{ij}(t) \* y(t). \tag{3}$$

F {*y*(*t*)} denotes the Fourier transform of the continuous-time signal *y*(*t*) and F <sup>−</sup><sup>1</sup>{*Y*(*ω*))} the denotes inverse Fourier transform of *<sup>Y</sup>*(*ω*). Similarly, for the discrete-time signal *y*[*n*] we denote its corresponding discrete Fourier transform as *<sup>Y</sup>*(Ω).

#### **3. Long-Haul Optical Link MIMO Channel Model**

In this section we describe the multi-section optical channel model used in this work. The effect of the channel noise is discussed separately in Section 4. The relationship between the input vector *<sup>x</sup>*(*ω*)=[*<sup>x</sup>*1(*ω*), *<sup>x</sup>*2(*ω*), ... , *xD*(*ω*)]*<sup>T</sup>* of complex electric field amplitudes of each of the *D* modes propagating along the fiber, and the corresponding output vector *y*(*ω*)=[*y*1(*ω*), *y*2(*ω*), ... , *yD*(*ω*)]*<sup>T</sup>* can be modeled, after neglecting non-linear effects, as a multiple-input multiple-output linear system *Htot*(*ω*) [2]:

$$
\underline{\underline{y}}(\omega) = \underline{\underline{H}}\_{tot}(\omega)\underline{\underline{x}}(\omega),
\tag{4}
$$

where *Htot*(*ω*) is a *D* × *D* matrix that models the signal propagation along the channel. For *D* = 2, the system is equivalent to a classical PDM over a SMF, and *Htot*(*ω*) takes the form of the Jones matrix [8]. For *D* > 2, extensions to the Jones matrix have been proposed to be adequate for the SDM model [19,20].

In the case of long-haul systems, *Htot*(*ω*) can be further modeled as a concatenation of *Kamp* spans composed of the optical fiber and an optical amplifier [2,18,33,34]. Hence, the whole channel transfer function can be written as:

$$
\underline{H}\_{\text{tot}}(\omega) = H\_{CD}(\omega) \cdot \underline{H}(\omega),
\tag{5}
$$

where *HCD*(*ω*) = *e* − *j* 2 *ω*<sup>2</sup>*β*¯ 2*tot* is a single-input single-output (SISO) term that models the mode-averaged distortion due to CD, *β*¯ 2 represents the mode-averaged CD per unit length, and *tot* denotes the total link length. The matrix *<sup>H</sup>*(*ω*) includes inter-mode cross-talk, MDL and MD effects of the complete MIMO system. Equation (5) can be written as a product over the *Kamp* spans:

$$
\underline{\underline{H}}(\omega) = \prod\_{k=1}^{\mathcal{K}\_{\text{amp}}} \underline{\underline{H}}^{(k)}(\omega),
\tag{6}
$$

where *<sup>H</sup>*(*k*)(*ω*) is the channel response of the *k*th span. We use *k* ∈ {1, ... , *Kamp*} to index the spans in the optical channel. We can write out *<sup>H</sup>*(*k*)(*ω*) as [20,34]:

$$
\underline{\underline{H}}^{(k)}(\omega) = \underline{\underline{V}}^{(k)} \underline{\underline{\Lambda}}^{(k)}(\omega) (\underline{\underline{U}}^{(k)})^H,\tag{7}
$$

where the diagonal matrix <sup>Λ</sup>(*k*)(*ω*) for a given span *k* includes the MDL effects and the MD of each mode w.r.t. the mode-averaged value [33], and can be expressed as:

$$\underline{\mathbf{A}}^{(k)}(\omega) = \text{diag}\left(\left[e^{\left(\frac{1}{2}g\_1^{(k)} - j\omega \tau\_1^{(k)}\right)}, \dots, e^{\left(\frac{1}{2}g\_D^{(k)} - j\omega \tau\_D^{(k)}\right)}\right]\right),\tag{8}$$

being *g*(*k*) = [*g*(*k*) 1 , *g*(*k*) 2 , ... , *g*(*k*) *D* ] the uncoupled modal gains and *τ*(*k*) = [*τ*(*k*) 1 , *τ*(*k*) 2 , ... , *τ*(*k*) *D* ] uncoupled modal group delays. We assume that the uncoupled modal group-velocity dispersion is equal to zero for all the *k* spans [33].

The *k*-th span mode coupling is modeled by the frequency-independent *V*(*k*) and *U*(*k*) matrices. It is important to note that, by considering that all the modes propagating through the fiber experience the same attenuation, both matrices are unitary, i.e.,

$$
\underline{V}^{(k)} \cdot (\underline{\underline{V}}^{(k)})^H = \underline{I} = \underline{\underline{II}}^{(k)} \cdot (\underline{\underline{II}}^{(k)})^H. \tag{9}
$$

Alternatively, *<sup>H</sup>*(*ω*) can also be written by applying a singular value decomposition (SVD), as the product of two unitary matrices *<sup>U</sup>*(*tot*)(*ω*) and *<sup>V</sup>*(*tot*)(*ω*), and a diagonal matrix <sup>Λ</sup>(*tot*)(*ω*) as [35]:

$$
\underline{\underline{H}}(\omega) = \underline{\underline{V}}^{(\text{tot})}(\omega) \underline{\underline{\Delta}}^{(\text{tot})}(\omega) \underline{\underline{U}}^{(\text{tot})^H}(\omega) \tag{10}
$$

where now, the diagonal matrix <sup>Λ</sup>(*tot*)(*ω*) is given by:

$$\underline{\Delta}^{(tot)}(\omega) = \text{diag}\left(\left[e^{\left(\frac{1}{2}\xi\_1^{(tot)} - j\omega \tau\_1^{(tot)}\right)}, \dots, e^{\left(\frac{1}{2}\xi\_D^{(tot)} - j\omega \tau\_D^{(tot)}\right)}\right]\right) \tag{11}$$

where *g*(*tot*) = [*g*(*tot*) 1 , *g*(*tot*) 2 , ... , *g*(*tot*) *D* ] are the coupled modal gains of the overall channel and *τ*(*tot*) = [*τ*(*tot*) 1, *τ*(*tot*) 2,..., *τ*(*tot*) *D*] denote the coupled modal group delays.

Note that in (10), both *<sup>U</sup>*(*tot*)(*ω*) and *<sup>V</sup>*(*tot*)(*ω*) unitary matrices have in general frequency dependence, in contrast to *U*(*k*) and *V*(*k*) in (7) that have not [27,33].

#### **4. SDM Communication System Model**

This section describes the model employed to represent the communication system established over the optical channel with multiple spans. The SVD of the channel in (10) can be useful for designing a transmitter based on a precoding matrix combined with a linear receiver, as used in wireless systems [36]. However, this approach becomes unfeasible for long-haul optical communication systems, since the end-to-end channel side information needed to build the transmitter precoding matrix changes faster than the time needed for the system to collect, send, and process that information [33]. Therefore we focus on a SDM system with no channel side information that uses a linear receiver to cope with the channel impairments as shown in Figure 1 [37].

The binary data symbols, *s*[*n*] = [*<sup>s</sup>*1[*n*],*s*2[*n*],...,*sD*[*n*]]*<sup>T</sup>*, are PAM modulated in parallel for each of the *i* ∈ {1, ... , *D*} optical modes using the same transmitter pulse *<sup>P</sup>*(*ω*) to ge<sup>t</sup> the PAM signals, denoted by the column vector *x*(*t*) = [*<sup>x</sup>*1(*t*), *<sup>x</sup>*2(*t*),..., *xD*(*t*)]*<sup>T</sup>*. In Figure 1, *T* is the transmitted symbol period and the first block represent *D* parallel

PAM modulators working at a symbol rate (and, hence, it includes the discrete-time to continuous-time conversion). The transmitted signal is distorted by ISI and crosstalk introduced by the MIMO channel, modeled with the *Htot*(*ω*) matrix described in Section 3. It has been shown that the noise in a MDL-impaired system is additive and spatially white [21]. Therefore, in this work we add before the receiver, as part of the channel, an additive white Gaussian noise (AWGN) vector *n*(*t*) = [*<sup>n</sup>*1(*t*), *<sup>n</sup>*2(*t*),..., *nD*(*t*)] *T*, which, for a certain mode *i*, has a variance equal to *N*0 2.

The resulting continuous-time signal vector *y*(*t*) = [*y*1(*t*), *y*2(*t*),..., *yD*(*t*)] *T* is processed by a MIMO receiver to obtain the estimation of the transmitted symbols *<sup>s</sup>*[*n*], denoted as *s*<sup>ˆ</sup>[*n*] = [*s*ˆ1[*n*],*<sup>s</sup>*ˆ2[*n*],...,*s*<sup>ˆ</sup>*D*[*n*]] *T*. In this work we focus on linear MIMO receivers and so, the estimation part in the receiver is depicted in Figure 1 with a generic linear filter of response *<sup>O</sup>*(*ω*), which is followed by a sampler working at the symbol rate. In the following, we propose linear MIMO receiver structures based on the MMSE criterion of an estimated symbol vector.

**Figure 1.** Spatial division multilpexing (SDM) communication system model with linear multipleinput multiple-output (MIMO) receiver.
