NL-COMM: Enabling High-Performing Next-Generation Networks via Advanced Non-Linear Processing

Jayawardena, Chathura; Katsaros, George Ntavazlis; Nikitopoulos, Konstantinos

doi:10.3390/fi17100447

Open AccessArticle

NL-COMM: Enabling High-Performing Next-Generation Networks via Advanced Non-Linear Processing

by

Chathura Jayawardena

,

George Ntavazlis Katsaros

and

Konstantinos Nikitopoulos

^*

5G & 6G Innovation Centre, Institute for Communication Systems, University of Surrey, Guildford GU2 7XH, Surrey, UK

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(10), 447; https://doi.org/10.3390/fi17100447

Submission received: 25 August 2025 / Revised: 21 September 2025 / Accepted: 26 September 2025 / Published: 30 September 2025

(This article belongs to the Special Issue Key Enabling Technologies for Beyond 5G Networks—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Future wireless networks are expected to deliver enhanced spectral efficiency while being energy efficient. MIMO and other non-orthogonal transmission schemes, such as non-orthogonal multiple access (NOMA), offer substantial theoretical spectral efficiency gains. However, these gains have yet to translate into practical deployments, largely due to limitations in current signal processing methods. Linear transceiver processing, though widely adopted, fails to fully exploit non-orthogonal transmissions, forcing massive MIMO systems to use a disproportionately large number of RF chains for relatively few streams, increasing power consumption. Non-linear processing can unlock the full potential of non-orthogonal schemes but is hindered by high computational complexity and integration challenges. Moreover, existing message-passing receivers for NOMA depend on specially designed sparse signals, limiting resource allocation flexibility and efficiency. This work presents NL-COMM, an efficient non-linear processing framework that translates the theoretical gains of non-orthogonal transmissions into practical benefits for both the uplink and downlink. NL-COMM delivers over 200% spectral efficiency gains, enables 50% reductions in antennas and RF chains (and thus base station power consumption), and increases concurrently supported users by 450%. In distributed MIMO deployments, the antenna reduction halves fronthaul bandwidth requirements, mitigating a key system bottleneck. Furthermore, NL-COMM offers the flexibility to unlock new NOMA schemes. Finally, we present both hardware and software architectures for NL-COMM that support massively parallel execution, demonstrating how advanced non-linear processing can be realized in practice to meet the demands of next-generation networks.

Keywords:

MU-MIMO; NOMA; distributed MIMO; non-linear processing; Open RAN; massive connectivity; parallel processing; FPGA

1. Introduction

Future mobile and wireless local area networks are anticipated to deliver higher throughput than current networks and support massive connectivity, while maintaining energy efficiency [1,2]. The ever-increasing demand for throughput and connectivity, along with limited spectrum resources, highlights the importance of enhancing the spectral efficiency of future wireless communication systems. For example, a recent auction of C-band spectrum (3.7–3.98 GHz) garnered over USD 80 billion in bids, further emphasizing the value of spectrum resources [3].

In this direction, the need to improve spectral efficiency has led to a shift towards non-orthogonal signal transmissions, as exemplified by widely adopted MIMO (Multiple Input Multiple Output) systems and emerging NOMA (Non-Orthogonal Multiple Access) schemes [4,5,6,7]. These transmissions promise significant theoretical capacity and connectivity gains by allowing multiple mutually interfering information streams to share the available spectrum. However, these theoretical gains have not yet been fully translated into practical and tangible benefits, thus undermining the potential of non-orthogonal signal transmissions. While the theoretical gains of non-orthogonal transmissions are well established, the specific signal processing required to achieve these gains in practice remains an open question.

Demultiplexing non-orthogonal signal transmissions requires joint processing of multiple mutually interfering information streams. This introduces significant complexity, which can potentially scale exponentially with the number of streams [8,9,10,11,12,13,14]. To address this challenge, current MIMO systems often use linear approximations of the original transceiver processing problem (e.g., Zero Forcing (ZF), Minimum Mean Square Error (MMSE)) [15,16]. Linear approaches transform the mutually interfering MIMO channel into multiple single-user channels. This enables the direct use of traditional single-user processing techniques for detection and decoding. Furthermore, linear processing facilitates easy system integration due to the applicability of legacy radio resource management (RRM) methods, such as those used for adaptive Modulation and Coding Scheme (MCS) selection [17,18,19,20]. However, these benefits come at a substantial cost to throughput and connectivity capabilities.

The limitations of linear transceiver processing have resulted in power-inefficient massive MIMO base stations. Despite having large arrays of antennas and RF chains, these base stations can support only a limited number of data streams. Similarly, NOMA schemes utilize specifically designed sparse signals to reduce receiver processing complexity by employing the message passing algorithm [6,21]. Still, such restrictions on sparsity limits the number of users supported and the flexibility of spectrum resource sharing (Section 5.3).

In contrast, non-linear (NL) signal processing schemes based on optimal transceiver processing principles have the potential to overcome the limitations above [22]. However, the complexity requirements of such signal processing, which scales exponentially with the number of concurrently transmitted streams, have been a significant bottleneck that has prevented its practical adoption [9,10,14]. Additionally, functionalities such as channel and SNR adaptive MCS selection, which are necessary to predict the transmission rate that maximizes throughput and integrate NL processing into existing networks, are currently unavailable. This is because, in NL processing, the adaptive MCS selection becomes a joint minimization problem, where the error propagation effects between users must be considered, and legacy schemes do not apply.

In this direction, an advanced NL processing framework (NL-COMM) [23] has been recently introduced, which can translate the promising theoretical spectral efficiency gains of non-orthogonal transmissions into practical and tangible benefits. NL-COMM revisits the optimal receiver processing problem and introduces a novel probabilistic framework to identify the candidate solutions to this problem in a complexity-efficient manner [22,24,25]. Although the benefits of NL-COMM have been evaluated in limited use case scenarios in the uplink of MIMO systems, its applications to the general case of non-orthogonal transmissions and the downlink have not been discussed. In this work, we present the generalized NL-COMM framework and describe how it can be adopted both in the uplink and the downlink of communication systems that employ non-orthogonal transmissions. In addition, we discuss how NL-COMM can be applied to emerging use case scenarios, such as distributed MIMO systems. Finally, we discuss NL-COMM implementation strategies on both software and FPGA platforms, validating its feasibility in real-world systems with practical latency and power requirements. In particular,

We introduce an efficient, generalized and massively parallelizable NL processing framework (NL-COMM) that applies in general to non-orthogonal signal transmissions both in the uplink and the downlink (Section 4). NL-COMM also includes the necessary radio resource management functionalities required to integrate advanced NL processing into existing networks.
We evaluate how NL-COMM can enable the promising gains of non-orthogonal transmissions in practice (Section 5). In particular, we consider how NL-COMM can apply to both traditional and distributed cell-free MIMO scenarios, the substantial throughput and connectivity gains it provides, and how these gains can also translate to system-level power savings.
We present both software- and hardware-based solutions for NL-COMM processing (Section 6), exploiting the intrinsic characteristics of each platform to achieve real-time performance, and demonstrating their effectiveness through over-the-air evaluations.

Indicative first results show that NL-COMM can enable gains of 200% or more in spectral efficiency, or enable substantial power consumption gains at the base-station side by requiring 50% fewer antennas (and RF chains). Furthermore, we discuss that (Section 5) the 50% reduction in the number of antennas can result in a similar decrease in the required fronthaul bandwidth in distributed MIMO scenarios, thus mitigating the fronthaul data-rate bottleneck in such scenarios. Additionally, we show that the NL-COMM can efficiently share the wireless channel with as many users as the channel allows (Section 5.3). This allows NL-COMM to support four times more users than the number of base station antennas, which is double that achievable by NOMA schemes, all without altering existing standards and merely by modifying the signal processing at the base station. Then, for the first time, we present implementation strategies that demonstrate the feasibility of NL-COMM in both hardware and software environments, highlighting its potential for real-world deployment. Specifically, we describe and evaluate a software-based solution that is integrable into an Open RAN-compliant distributed unit (DU), as well as an FPGA-based prototype. The FPGA prototype achieves up to

26.7 \times

higher power efficiency than the optimized software solution, while still ensuring real-time performance. Finally, we report results from over-the-air experiments, which validate the simulated gains.

Related Work

In this subsection, we review related work on non-orthogonal signal transmissions, such as MIMO and NOMA, focusing on transceiver processing aspects. MIMO systems are a cornerstone in modern communication standards and have transformative potential for next-generation radio networks, enhancing spectral efficiency, connectivity, quality of service, and coverage. Variants such as Network MIMO [26], Cell-Free MIMO [5,27], and Coordinated Multipoint (CoMP) [28] highlight its versatility. At the same time, broader applications beyond communications are also being considered, including sensing and radar [29,30,31]. Although the theoretical capacity and connectivity gains of MIMO are well known, translating these gains into practical benefits remains an active research area. In this direction, the ability to demultiplex mutually interfering, and therefore non-orthogonal, information streams determines the extent to which theoretical capacity gains can be translated into practical improvements in throughput and connectivity. In this context, current MIMO deployments predominantly rely on linear detection and precoding techniques, owing to their favorable computational complexity and ease of integration [15,16,17,18,19]. However, as we also show in Section 5, linear processing can be highly suboptimal when the MIMO channel is not well-conditioned; for example, when the number of concurrently supported streams approaches the number of antennas at the base station. To enhance the performance beyond what linear processing can achieve, various NL processing schemes have been proposed that involve joint processing of the mutually interfering information streams. However, many of these methods incur substantially higher computational complexity, often several orders of magnitude greater than linear processing [9,10,12,13,14,32,33,34]. Additionally, reduced complexity approximations of joint processing usually depend on assumptions related to statistics of large numbers [35,36,37], which can undermine their performance benefits under realistic channel conditions. Moreover, all these NL processing schemes lack the appropriate rate adaptation functionalities required to be integrated into practical communication networks. Consequently, the advantages of NL processing have yet to be demonstrated over-the-air in real-world scenarios, in a manner that fully complies with standards while also meeting real-time processing requirements.

In another direction, to demultiplex the corresponding information streams, advanced code-domain NOMA schemes, such as LDS-OFDM and Sparse Code Multiple Access (SCMA) [38,39], employ sparse signal transmissions that enable efficient detection through the Message Passing Algorithm (MPA) [40]. Still, the computation complexity of the corresponding messages (per iteration) is determined by the number of mutually interfering streams and the modulation order. To reduce the computational complexity of MPA, distribution projection-based approximations, such as the Expectation Propagation Algorithm (EPA) [41], have been introduced. However, these approximations result in a performance loss in highly overloaded scenarios where the number of users exceeds the number of base station antennas. Additionally, existing codebook designs utilizing single antenna receivers do not allow more than two users to share, on average, a single subcarrier regardless of the SNR. Furthermore, MPA and its low complexity variants do not apply to non-sparse signal transmissions such as simple power domain-NOMA [42] or overloaded MIMO systems. Also, compressive sensing has been introduced for the receiver processing of NOMA schemes [43]. Similar to MPA, compressive sensing also assumes sparse signals, but in contrast, it can only provide hard detection estimates.

In another context, in power-domain NOMA scenarios, when the power difference between mutually interfering users is significant, Successive Interference Cancellation (SIC) could be employed for detection [42]. However, SIC becomes highly suboptimal when the number of mutually interfering users increases and the power difference decreases, due to error propagation.

Generalized Sphere Decoding (SD), which avoids an exhaustive search for maximum-likelihood (ML) detection, can also be utilized for the receiver processing of NOMA schemes [44,45]. However, the highly sequential nature of state-of-the-art approaches [45] prohibits practical applications, especially for a large number of mutually interfering streams. Furthermore, the latency requirements for obtaining the exact Max-Log maximum a posteriori probability (MAP) soft information using depth-first SDs [9] are random and become impractical even for full-rank high-dimensional systems. Similarly, the complexity of existing approximate fixed-latency SD schemes, such as the Soft Fixed Complexity SD (SFSD) [10,13] and the K-Best list SD [11,46], does not scale efficiently for large systems, and their processing complexity becomes impractical. This is because, in principle, such approaches do not account for the specific interference matrix realization but target the worst-case transmission condition. In the following sections, we will discuss the principles, details, and validations of the NL-COMM framework that can overcome the aforementioned limitations.

2. A Generic System Model for Non-Orthogonal Signal Transmissions

In this section, we introduce a generic system model to describe the uplink and downlink of systems employing non-orthogonal signal transmissions.

2.1. Uplink

The baseband-received signal for a non-orthogonal system can be given by

y = Hs + n,

(1)

where

y

is the

M \times 1

received vector,

s

is the

K \times 1

transmitted symbol vector with elements belonging to a constellation

O

and

E {| s_{i} |^{2}} = 1

with

E {}

denoting the expected value,

n

is the

M \times 1

additive white Gaussian noise vector with variance

σ^{2}

, and

H

is the interference matrix that differs per non-orthogonal system and is estimated at the receiver side.

2.2. Downlink

In the downlink, when the channel

H

is known at the transmitter side and can be estimated, a precoding or beamforming technique can be applied. Then, the precoded

M \times 1

transmit vector can be denoted as

u

and the

K \times 1

received signal can be expressed as

y = H^{T} u + n .

(2)

When linear precoding is employed (e.g., ZF, MMSE),

u

can be generated by multiplying

s

by a

M \times K

linear precoding or beamforming weight matrix. In contrast, when NL precoding is employed (e.g., vector perturbation), as we discuss in Section 3.2,

u

is a result of NL perturbation operations.

2.3. Examples of Interference Matrices

Spatially multiplexed MIMO systems: In an uplink spatially multiplexed MIMO system with K single-antenna user equipment (UE)s and M base station antennas, the corresponding $M \times K$ MIMO channel matrix is $H_{MIMO}$ .
NOMA systems: In a code-domain NOMA system, an (orthogonal) subcarrier is loaded with the signals of multiple users, which are superimposed. Specifically, the $M \times 1$ received signal vector for an LDS-OFDM system where M orthogonal subcarriers are occupied by K users is given by

y = [\begin{matrix} h_{1} & h_{2} & \dots & h_{K} \end{matrix}] \circ [\begin{matrix} g_{1} & g_{2} & \dots & g_{K} \end{matrix}] s + n,

(3)

where

h_{l}

is the frequency domain channel for user l,

l \in [1, K]

, ∘ denotes the Hadamard product representing element-wise multiplication, and

g_{l}

is the “sparse signature vector” for the user l, which consists of complex entries that define how the signal is spread over subcarriers [7]. These sparse signature vectors can be selected by predefined codebooks, as discussed in [47]. Therefore, in relation to Equation (1), the interference matrix

H_{LDS - OFDM} = [\begin{matrix} h_{1} & h_{2} & \dots & h_{K} \end{matrix}] \circ [\begin{matrix} g_{1} & g_{2} & \dots & g_{K} \end{matrix}]

, and it is rank-deficient since

K > M

. For 4-point codebooks and rotation-based codebook design, as discussed in [47], the received observable for an SCMA scheme can also be described as Equation (3).

In power domain NOMA schemes [48] with only one base station antenna,

H

becomes a row vector since

M = 1

, and

K > M

in general.

3. Transceiver Processing Challenge for Non-Orthogonal Signal Transmissions

This section addresses the optimal transceiver processing problem with the goal of minimizing transmission bit error rates of non-orthogonal signals. Specifically, we examine the hard ML detection problem and the soft detection problem, which involves calculating the MAP Log Likelihood Ratios (LLRs) in the uplink. In the downlink, we explore the vector perturbation precoding [12] problem, which is based on the principles of dirty paper coding [49].

3.1. Uplink

NL processing involves joint processing of the non-orthogonal signals based directly on the channel interference matrix and received observables, therefore preserving the promised gains in throughput and connectivity. Due to their optimality in terms of achieving hard ML or soft detection performance of max-log optimal LLRs, we will focus on tree search-based NL receiver processing algorithms.

A triangular decomposition, such as QR, of the channel matrix

H

, is needed to transform the ML detection problem or the max-log optimal soft detection problem into a tree search.

In particular, the channel matrix

H

can be decomposed into a

M \times K

matrix

Q

with orthonormal columns and a

K \times K

upper triangular matrix

R

. The generalized QR decomposition of a Tikhonov regularized matrix

\bar{H}

, can be defined as

\bar{H} ≜ [\begin{matrix} H \\ α I_{K} \end{matrix}] = \bar{Q} R = [\begin{matrix} Q \\ Q_{2} \end{matrix}] R,

(4)

This regularization makes the tree search applicable to rank-deficient

H

matrices, where

K > M

, as shown in Section 5, and becomes useful for receiver processing in distributed MIMO systems, especially for the decentralized case. It has been shown in [50] that selecting the regularization parameter as

α = σ

yields the best performance in the uplink. In the more general case where the transmitted symbols have non-unit average power, i.e.,

E {| s_{i} |^{2}} \neq 1

, the appropriate choice of the regularization parameter is given by

α = \frac{σ}{E {| s_{i} |}}

. Furthermore, through extensive simulations, we have verified that the choice

α = σ

indeed provides the best tradeoff between performance and complexity in the uplink for NL-COMM processing.

3.1.1. Non-Linear Joint Soft Information Processing

In systems that use soft channel decoding, like those found in the 3GPP and Wi-Fi standards, MAP detection is optimal for minimizing the detection bit error probability [9]. The MAP detection process requires the calculation of LLRs. In particular, the LLR for the b-th bit of user k, denoted as

c_{k, b}

, is defined as

L^{k, b} ≜ ln (\frac{P (c_{k, b} = 1 | y)}{P (c_{k, b} = 0 | y)}),

(5)

Then, since symbols are equiprobable and assuming a single detection and decoding iteration is performed, by employing the max-log approximation Equation (5) can be approximated as

\begin{matrix} L^{k, b} & \approx min_{s \in O_{k, b_{0}}^{K}} \{\frac{1}{σ^{2}} {∥ y - H s ∥}^{2}\} - min_{s \in O_{k, b_{1}}^{K}} \{\frac{1}{σ^{2}} {∥ y - H s ∥}^{2}\} \\ = sign ({\hat{c}}_{k, b}) (min_{s \in O_{{\bar{c}}_{k, b}}^{K}} \{\frac{1}{σ^{2}} {∥ y - H s ∥}^{2}\} \\ - min_{s \in O^{K}} \{\frac{1}{σ^{2}} {∥ y - H s ∥}^{2}\}), \end{matrix}

(6)

where

min_{s \in O^{K}} \{\frac{1}{σ^{2}} {∥ y - H s ∥}^{2}\} = \frac{1}{σ^{2}} {∥ y - H {\hat{s}}_{ML} ∥}^{2},

(7)

is the distance metric corresponding to the ML solution,

{\hat{c}}_{k, b}

is the b-th bit of user k in the ML solution’s (i.e.,

{\hat{s}}_{ML}

) bit mapping, and

O_{{\bar{c}}_{k, b}}^{K}

is a subset of possible symbol vectors with the b-th bit of user k’s bit mapping set to the inverse of

{\hat{c}}_{k, b}

.

The above LLR computation can be transformed into multiple tree search problems. In particular, by substituting the QR decomposition in Equation (4) into the LLR computation in Equation (6).

\begin{matrix} L^{k, b} \approx & sign ({\hat{c}}_{k, b}) (min_{s \in O_{{\bar{c}}_{k, b}}^{K}} \{\frac{1}{σ^{2}} ∥ \tilde{y} {- R s ∥}^{2} - {∥ s ∥}^{2}\} \\ - min_{s \in O^{K}} \{\frac{1}{σ^{2}} ∥ \tilde{y} {- R s ∥}^{2} - {∥ s ∥}^{2}\}), \end{matrix}

(8)

where

\tilde{y} = Q^{H} y

is an

K \times 1

vector, and because multiplication by the orthogonal matrix

Q^{H}

does not change the vector norms.

The distance metric

d (s) = \frac{1}{σ^{2}} ∥ \tilde{y} {- R s ∥}^{2} - {∥ s ∥}^{2},

(9)

starting from user/layer K, can be calculated recursively as

d (s_{l}) = d (s_{l + 1}) + e (s_{l})

(10)

where

s_{l} = [s_{l}, s_{l + 1}, \dots, s_{K}]

, and

e (s_{l})

is the cost assigned to each branch,

e (s_{l}) = \frac{1}{σ^{2}} | {\tilde{y}}_{l} - \sum_{k = l}^{K} R_{l, k} s_{k} |^{2} - {| s_{l} |}^{2} .

(11)

3.2. Downlink

A precoding problem similar to the ML detection problem introduced in Equation (7) exists for an interfering channel matrix

H

when NL precoding is employed at the base station. This NL precoding problem is based on the theory of dirty paper coding [49]. In simple terms, dirty paper coding states that when an interfering signal is known perfectly at the transmitter, an NL precoding scheme can ideally be designed to mitigate the effect of the interference without an SNR loss, as if the interference did not exist. This is in contrast to linear precoding, where a substantial SNR loss is introduced that depends on the singular values of the channel.

Specifically, in the MU-MIMO downlink, the

M \times 1

transmitted signal from the M-antenna base station, when NL precoding is employed, can be expressed as [12]

u = \sqrt{\frac{P_{t}}{γ_{NL}}} H^{*} {({\bar{H}}^{T} {\bar{H}}^{*})}^{- 1} (s + τ \hat{l}),

(12)

where

H^{*}

denotes the conjugate of matrix

H

,

\bar{H}

is defined as in Equation (4) with

α

set as

K σ

to provide the best performance complexity tradeoff in the downlink [12],

γ_{NL} = E {∥ H^{*} {({\bar{H}}^{T} {\bar{H}}^{*})}^{- 1} (s + τ \hat{l}) ∥^{2}}

,

P_{t}

is the transmit power, and

τ

is a constant that depends on the constellation [12] and can be computed for a QAM constellation as

τ = \sqrt{| O |} d_{QAM},

(13)

where

d_{QAM}

is the minimum distance between constellation points of the constellation

O

. Then,

\hat{l}

is the integer perturbation vector that can be computed as

\hat{l} = \arg min_{l \in L^{K}} {∥ H^{*} {({\bar{H}}^{T} {\bar{H}}^{*})}^{- 1} (s + τ l) ∥}^{2},

(14)

where

L

are the possible perturbations. The problem in Equation (14) is a K-dimensional integer-lattice least squares problem that is similar to the ML detection problem in Equation (7). Due to the vector perturbation operation involved, such NL precoding is referred to as vector perturbation precoding. Vector perturbation precoding minimizes

γ_{NL}

in Equation (12), therefore minimizing the SNR loss of the transmitted signal. Similarly to the uplink, a triangular decomposition can be employed to transform the problem in Equation (14) into a tree search. In particular, by substituting from Equation (4)

\begin{matrix} H^{*} {({\bar{H}}^{T} {\bar{H}}^{*})}^{- 1} & = Q^{*} R^{*} {(R^{T} {\bar{Q}}^{T} {\bar{Q}}^{*} R^{*})}^{- 1} \\ = Q^{*} R^{*} {(R^{T} R^{*})}^{- 1} \\ = Q^{*} {(R^{T})}^{- 1} \\ = Q^{*} \bar{R}, \end{matrix}

(15)

computing the pseudo-inverse can be avoided while also providing the triangular matrix necessary for performing a tree search. Note that the inversion of a triangular matrix is computationally less intensive than general matrix inversion. Then, the problem in Equation (14) can be simplified and translated into a tree search as

\hat{l} = \arg min_{l \in L^{K}} {∥ \bar{R} (s + τ l) ∥}^{2},

(16)

Then, the

K \times 1

received signal by the K single-antenna UEs can be expressed as

y = H^{T} u + n .

(17)

Consequently, the received signal at the

k^{t h}

single-antenna UE is

y_{k} = h_{k}^{T} \sqrt{\frac{P_{t}}{γ_{NL}}} Q^{*} \bar{R} (s + τ \hat{l}) + n_{k} .

(18)

The effect of the perturbation can be removed by a modulo operation at the receiving UE side as

{\hat{y}}_{k} = y_{k} - ⌊\frac{y_{k} + τ / 2}{τ}⌋

(19)

where

⌊ a ⌋

is the largest integer less than or equal to a. Then, LLRs can be computed, as in the single-user case based on

{\hat{y}}_{k}

.

4. NL-COMM: An Efficient, Massively Parallelizable Non-Linear Processing Framework

4.1. Motivation

Existing linear physical layer processing leaves throughput and connectivity benefits unexploited, which can make them inefficient in current and next-generation communication systems. On the other hand, the complexity and/or latency requirements of traditional NL processing attempts are impractical, especially when the number of users increases [9,10]. While the speed of traditional processors is plateauing, emerging microprocessor architectures may have hundreds or even thousands of logical cores.

4.2. NL-COMM PHY Processing

4.2.1. Principles

The NL-COMM physical layer (PHY) framework identifies the most likely vector solutions, before any signal is received, based on the channel in terms of relative distances to the received vector, and for the first time, quantifies the probability of being the solution to the NL transceiver processing problem. In contrast to traditional NL processing, NL-COMM PHY is complexity-efficient, has a flexible performance/complexity trade-off, massively parallelizable, adjustable to the transmission conditions, and transparent to implementation technology (i.e., GPP, GPU, FPGA, ASIC).

4.2.2. Uplink

The NL-COMM framework exploits the triangular matrix

R

to identify the tree paths of the most promising solutions to the minimization problems introduced in Section 3.1.1 (Equations (7) and (8)), prior to detection processing. This principle enables NL-COMM to focus the available processing power on the most promising solutions, which can be processed in parallel. As a result, both complexity and latency are significantly reduced. To enable this, NL-COMM introduces a probabilistic likelihood metric which can be expressed specifically for the uplink as [24]

M (b) = \sum_{k = 1}^{K} β_{k} [b_{k} - 1] {|R_{k, k}|}^{2},

(20)

where

b

is a relative distance vector with integer elements

b_{k} \in [1, | O |], k \in [1, K]

that indicate the order of the relative distance and

β_{k}

depends on the minimum distance between QAM symbols (i.e.,

d_{QAM}

). For example,

β_{k} = 1.11

when

d_{QAM} = 2

.

The upper part of Figure 1 illustrates an example tree diagram consisting of the most promising tree paths (MPP)s based on relative distances as identified by the NL-COMM PHY framework. We assume that four parallel processing units (PU)s are available for parallel detection, and 4-QAM-modulated symbols are transmitted. Then, the channel-based (e.g.,

R

matrix) preprocessing stage of NL-COMM PHY processing identifies four relative distance vectors that minimize the probabilistic likelihood metric of Equation (20). For this, the low complexity preprocessing introduced in Ref. [24] and Ref. [51] can be employed. Let us assume that the identified four relative distance vectors are

b^{(1)} = {[1, 1, 1]}^{T}, b^{(2)} = {[1, 2, 1]}^{T}, b^{(3)} = {[1, 1, 2]}^{T}

, and

b^{(4)} = {[1, 1, 3]}^{T}

. To further elaborate, the second most promising tree path is

b^{(2)} = {[1, 2, 1]}^{T}

, which represents the first closest symbol to the received signal in the first level, the second closest symbol to the received signal in the second level and the first closest symbol to the received signal in the third level of the search tree.

Then, when the received signal vector is available, these relative distance vectors need to be demapped to constellation symbols to identify the symbol estimate

\hat{s}

to approximate hard ML detection (Equation (7)) or to compute LLRs (Equation (8)). This will be performed by the detection stage of NL-COMM PHY processing. The equivalent received signal point for level k, denoted as

{\hat{y}}_{k}

, can be expressed as

{\hat{y}}_{k} = ({\tilde{y}}_{k} - \sum_{l = k + 1}^{K} R_{k, l} x_{l}) R_{k, k}^{- 1} .

(21)

4.2.3. Downlink

Similarly to the uplink, the likelihood metric for the downlink can be expressed based on

\bar{R}

as

M (b) = \sum_{k = 1}^{K} [b_{k} - 1] {|{\bar{R}}_{k, k}|}^{2}

, where

b_{k} \in [1, | L |], k \in [1, K]

[52]. The equivalent transmitted signal point for level i, denoted as

{\hat{s}}_{i}

, can be expressed as

{\hat{s}}_{i} = ({\tilde{s}}_{i} - \sum_{k = i + 1}^{K} {\bar{R}}_{i, k} τ l_{k}) {({\bar{R}}_{i, i} τ)}^{- 1}

, where

\tilde{s} = \bar{R} s

. Then, the procedure illustrated in Figure 1, as described for the uplink, can be employed to identify the perturbation vector that approximates Equation (16).

The complexity of the NL-COMM PHY post-processing stage is

\underset{Q^{H} y}{\underset{︸}{M K}} + N_{M P P} K (1 + (K + 1) / 2)

(22)

complex multiplications, where

N_{M P P}

denotes the number of MPPs [24]. The

K (1 + (K + 1) / 2)

term represents the multiplications required to compute

d (s)

for Equation (8), and the equivalent received points in Equation (21) by each MPP. The demapping is assumed to be performed based on a predefined ordering by lookup table operations, which consider

{\hat{y}}_{k}

as an input [24]. The QR decomposition is only required at the rate of change in channel and has a complexity of

6 M K^{2}

multiplications.

4.3. NL-COMM Radio Resource Management

Radio Resource Management functionalities, such as adaptive rate selection, that are required to select the transmission MCS according to channel conditions, and SNR is currently unavailable for any kind of NL processing. This is because, in contrast to linear processing, where legacy schemes are applicable as in the single-user scenarios, adaptive MCS selection becomes a joint minimization problem for NL processing. In this direction, NL-COMM introduces an adaptive MCS selection scheme for NL processing based on the principles above by relying on the triangular structure of

R

matrices and has been demonstrated in [53].

5. Evaluations of Potential Gains and Opportunities

In this section, we evaluate the potential benefits of the NL-COMM framework in terms of spectral efficiency, connectivity capabilities, and power efficiency, in both the uplink and downlink, as well as the opportunities it can unlock.

5.1. Enhancing Spectral Efficiency

In this section, we discuss the spectral efficiency gains of NL-COMM PHY processing in the uplink and the downlink as introduced in Section 3.1.1, Section 3.2 and Section 4.2. As shown in Figure 2, NL-COMM can significantly improve the spectral efficiency in both the uplink and the downlink. In particular, NL-COMM, while using 16 MPPs, can provide throughput gains of more than 200% in existing systems, employing linear MMSE (LMMSE) processing.

5.2. Enhancing Energy Efficiency in MIMO Systems

Linear transceiver processing techniques typically require a well-conditioned MIMO channel to meet performance targets [8]. To overcome this limitation of linear processing, current MIMO systems support only a much smaller number of streams compared to the number of base station antennas. This workaround has led to power-hungry MIMO base stations with a massive number of antennas (and RF chains) that can only support a relatively smaller number of streams. In contrast, NL-COMM processing needs fewer base-station antennas (and RF chains) to achieve the same sum throughput as linear processing.

In particular, in Figure 3a, we compare the number of base station antennas and the associated power consumption required to support high-rate (64-QAM, 2/3 code rate) users at a Packet Error Rate (PER) of

10 %

of various schemes, in a 3GPP CDL-B channel [54] and at a 25 dB SNR. As shown in Figure 3a, NL-COMM processing can support the same number of high-rate users with

43 %

and

33 %

fewer antennas than LMMSE and MMSE SIC detection, respectively. At the same time, a

56 %

reduction is expected compared to ZF detection. As shown in Figure 3b, the complexity of NL-COMM is only around

4 \times

that of linear processing.

5.3. Enabling Massive Connectivity

NL-COMM processing enables the allocation of available spectral efficiency to multiple users, even when the number of users exceeds the number of base station antennas, based on channel capabilities. As illustrated in Figure 4a, NL-COMM can support more than four UEs using 4-QAM modulation and 0.5 rate LDPC codes with a single-antenna base station in the uplink. The maximum number of UEs supported by existing NOMA schemes for a single-antenna base station is indicated by the dashed line in Figure 4a. Additionally, the number of UEs that can be supported by NL-COMM processing can be further increased when their rate requirements are lower (for example, using BPSK modulation) and as the number of base station antennas increases, as shown in Figure 4b.

Similar to the uplink, NL-COMM can support a large number of UEs beyond the number of base station antennas in the downlink. As shown in Figure 5, NL-COMM precoding can support 2× the number of UEs than the base station antennas.

5.4. Boosting-Distributed MIMO Performance

In this section, we explore the application of NL-COMM in both centralized and decentralized processing within distributed MIMO systems in the uplink. As a relevant and timely case study, we focus on distributed systems based on the Open RAN paradigm. We then assess the error rate performance, spectral efficiency, and achievable throughput for both processing types in these distributed systems.

5.4.1. Centralized Processing in Open RAN-Based Distributed MIMO Systems

Figure 6 illustrates a centralized processing architecture for a distributed MIMO system with

N_{B}

Radio Units (RU)s, each with M antennas. Here, we have assumed the 3GPP split 7.2× architecture, where part of the physical layer processing, referred to as lower-PHY, consisting of Fast Fourier Transform (FFT) and Analogue-to-Digital Conversion (ADC), is performed at the RU, and upper-PHY, consisting of the rest of the physical layer operations, is performed at the DU. Here, we have assumed less capable category A RUs, which cannot perform MIMO precoding as category B RUs. Note that split 7.2 minimizes the fronthaul bandwidth while also making RU less complex. For example, assuming eight-bit accuracy for I and Q signals and 100 MHz bandwidth, each RU with four antennas should send raw information at a rate of 6.4 Gbps through the fronthaul. Performing the FFT at the RU side, as in split 7.2 and data compression techniques, can reduce this rate requirement significantly. Fronthaul can be based on enhanced Common Public Radio Interface (eCPRI) [56], which can provide aggregate data rates of up to 100 Gbps. We note that the fronthaul data rate depends on the number of antennas at the RU side. For example, the above raw data rate would become 126 Gbps if each RU was equipped with 64 antennas. At the DU, centralized MIMO detection is performed, which processes all the received observables from

M N_{B}

antennas. Centralized processing can exploit the maximum spatial multiplexing gain (i.e.,

M N_{B}

) of the distributed MIMO channel.

However, employing linear processing (e.g., LMMSE, ZF) for MIMO detection can significantly lower the number of supported streams, thus reducing the achievable spatial multiplexing gain. This is because linear processing can only deliver acceptable throughput performance in well-conditioned MIMO channels where the number of supported streams is much smaller than the number of receiver antennas [8]. Typically, linear processing can only support half the number of streams, therefore reducing the achievable spatial multiplexing gain to

\frac{M N_{B}}{2}

from

M N_{B}

. When referring to spatial multiplexing gain, we employ the definition of [57], which defines spatial multiplexing gain as the maximum achievable sum rate normalized by the SNR. At the same SNR, the spatial multiplexing gain of different transceiver processing schemes is proportional to the achievable sum rate.

For evaluations, a distributed MIMO system is considered with three RU units in an edge-excited cell scenario, each equipped with colocated antennas [58]. The simulations employ the 5G close-in free space reference distance model for urban micro-cellular scenarios. Large-scale fading is modeled with a path loss exponent of 2.8 and with log-normal distributed shadowing with a standard deviation of 8.3. The UEs are placed at a random distance to the RUs, between 10 and 100 m. In Figure 7, we compare the PER of NL-COMM and LMMSE for RUs equipped with varying numbers of antennas. We consider 16-QAM modulation and 3/4-rate LDPC codes. NL-COMM, when using 32 MPPs, can achieve a better PER with only half the number of antennas compared to LMMSE. This reduction in the number of antennas can significantly lower power consumption due to reduced RF chains and the necessary fronthaul bandwidth.

5.4.2. Decentralized Processing for Distributed MIMO Systems

Figure 8 illustrates a decentralized processing architecture for a distributed MIMO system with

N_{B}

RUs, each with M antennas. Similar to Figure 6, we have assumed the 3GPP split 7.2× architecture, where part of the physical layer processing, referred to as lower-PHY, consisting of FFT and ADC, is performed on the RU side. In contrast, for decentralized processing, a part or complete upper-PHY processing is performed at the local DU, and the rest is centralized together with the centralized unit (CU) processing. For example, in Figure 8, we assume that up to MIMO detection is performed locally. Then, the LLRs computed locally are combined centrally to provide an SNR gain by soft combining. However, decentralized processing cannot exploit the maximum spatial multiplexing gain of the distributed MIMO channel, and the maximum number of antennas at each RU limits the achievable spatial multiplexing gain. For example, since all RUs have M antennas each in Figure 8, the achievable spatial multiplexing gain is limited to M.

Employing linear processing for MIMO detection can further halve the number of streams that can be supported, thus reducing the achievable spatial multiplexing gain even further. For the example considered in Figure 8, employing linear processing can reduce the achievable spatial multiplexing gain even further to

\frac{M}{2}

from M.

Figure 9 depicts the uplink spectral efficiency for a distributed MIMO system with centralized and decentralized processing. To take rate-adaptation into account, the spectral efficiency shown is the maximum across several QAM orders (i.e., 4, 16, 64) and code rates (i.e., 1/2, 2/3, 3/4, 5/6, where 3GPP-compliant LDPC codes with a block length of 1944 are assumed. Three RU units are assumed to have four collocated antennas each. We consider both high and low SNRs (the average per-user SNR is 19 dB for the high SNR scenario, while it is 10 dB for the low SNR scenario). We assume standard 3GPP DMRS reference signals for channel estimation, which limits K to 12. As illustrated in Figure 9, centralized NL processing can achieve up to twice the spectral efficiency of centralized linear processing. Similarly, decentralized NL processing can offer higher spectral efficiency than decentralized linear processing and can approach the spectral efficiency of centralized linear processing as K increases. When comparing centralized and decentralized NL processing in low and high SNRs, it can be seen that the SNR gain of the decentralized case due to soft combining is prominent at low SNRs, resulting in a smaller spectral efficiency loss compared to the centralized case. As verified in [59], when lower order modulation schemes are employed, and in low SNRs, LLR combining can approach optimal performance.

In Figure 10, we consider the empirical CDF of user throughput for the network configuration introduced previously. In particular, we assume three RUs with four antennas each, supporting 12 users. For comparison, we consider a traditional RAN-based scenario where each RU covers one (

120^{\circ}

) of three cell sectors, using 20 MHz bandwidth to support the four closest users. We also assume cell-free-based centralized processing, employing the whole 60 MHz bandwidth to support the 12 users. Such a use case is a simple example that leverages the concept of shared spectrum and hardware resources [60]. As shown by substantially improved average user throughput and uniformity in user throughput, NL-COMM can enable such use cases. The transmission rate that maximizes throughput is selected from the 3GPP MCS Table 1 for 5G NR [61] (Table 5.1.3.1-1), and the SNR is 23 dB.

6. Realizing NL-COMM: Software, Hardware, and System-Level Demonstrations

While the theoretical foundations of the NL-COMM framework were presented in previous sections, this section focuses on its implementability across diverse computing architectures. To this end, we describe a series of implementation efforts that not only validate the feasibility of NL-COMM in both hardware- and software-based environments, but also demonstrate its integration potential in real-world deployments. It is worth mentioning here that NL-COMM is not confined to a specific processing platform. This has enabled the development of both hardware-based realizations leveraging FPGA acceleration [62,63], and software-based implementations running on general-purpose CPUs integrated within disaggregated Open RAN DUs [64]. Each realization targets different operating points along the latency–throughput–power trade-off spectrum and provides critical insight into how NL-COMM can be tuned or scaled depending on deployment constraints.

This question of implementability is especially relevant in the context of Open RAN, where traditional monolithic base stations are disaggregated into modular components with open interfaces. This architecture opens the door for PHY-layer innovations like NL-COMM to be adopted more easily, provided they can demonstrate compatibility with real-time execution, integration within standardized software stacks, and deployment feasibility on both cloud-native and hardware-accelerated platforms. NL-COMM’s flexibility addresses these requirements directly by supporting both lightweight CPU-based deployments for rapid integration and FPGA-based acceleration for high-performance or DU power-constrained scenarios.

This section presents an overview of our key implementation milestones. These include (i) a software-based realization embedded within an Open RAN-compliant DU, (ii) an FPGA-based prototype developed to explore architectural scalability and acceleration potential, and (iii) our over-the-air (OTA) demonstration setup using commercial UEs, where real-time performance and quality-of-service can be evaluated under practical wireless conditions. These demonstrations serve not only as proof of concept but also as validation platforms for further system integration and industrial engagement. The results presented in the subsections that follow collectively highlight NL-COMM’s implementability, portability, and relevance for next-generation wireless systems.

6.1. Software-Based Design

Open-RAN [65,66] promises a highly diversified RAN ecosystem, through open disaggregation, interoperability, and the softwarization of digital processing. However, transitioning towards heavily software-based solutions, and particularly implementing 5G-NR MIMO PHY, presents significant design and implementation challenges. This is due to the fact that software-based systems are typically orders of magnitude less computationally efficient compared to dedicated hardware approaches [67]. Existing software-based PHY solutions either do not support MU-MIMO [68,69] or deviate substantially from the 5G-NR and Open-RAN standards and principles [15,16,70].

To demonstrate the feasibility of deploying NL-COMM in software-defined Open RAN environments, we developed a software-based physical layer processing framework [64], significantly extending the OpenAirInterface (OAI) platform. NL-COMM software implementation achieves real-time MU-MIMO performance across a range of supported bandwidth and MIMO dimensions, as we discuss later, while maintaining full compliance with the 5G-NR standard. To unlock the performance potential of modern processors, we introduced a comprehensive set of Single Instruction Multiple Data (SIMD)-based optimizations across the PHY stack. This involved systematically restructuring key OAI PHY procedures to fully exploit 512-bit AVX-512 vector units. Specifically, we refactored data structures and rewrote computational routines to expose instruction-level parallelism in components such as physical downlink shared channel (PDSCH), physical downlink control channel (PDCCH), and physical broadcast channel (PBCH) in the downlink, as well as physical uplink shared channel (PUSCH) and physical random access channel (PRACH) in the uplink. These enhancements delivered significant single-thread speedups, improving computational throughput by an order of magnitude compared to non-vectorized implementations. Our design emphasizes memory alignment and contiguous data access to ensure efficient vector loads and stores, minimizing penalties from misaligned memory access or cache inefficiencies.

NL-COMM PHY also extends the limited multi-core support of OAI by adopting a more fine-grained parallelization strategy. This strategy distributes processing workloads to each individual core on a per-user level during channel estimation, per-resource block during MIMO detection, and per-code block during decoding. Within the detector function, we further extend parallelism by leveraging the unique characteristics of NL-COMM, enabling each decoding path to be processed independently (provided sufficient CPU cores are available), thereby improving scalability on high-core-count systems. The software architecture and its parallelization strategy are depicted in Figure 11. Additionally, we implemented CPU shielding to isolate PHY processing from the operating system and developed a custom busy-spinning thread scheduler. This scheduler ensures low-latency responsiveness, achieving over 90% reduction in thread wake-up time, even under real-time kernel conditions. While this method increases power consumption under high traffic, it is configurable and can be disabled during low-activity periods. Finally, our framework is non-uniform memory access (NUMA)-aware and supports thread pinning to optimize memory locality for compute-intensive operations like LDPC decoding, paving the way for scalable deployments across multi-socket systems.

Figure 12 shows the processing time of NL-COMM across different MIMO configurations and transmission bandwidths. Profiling was conducted on a workstation equipped with an 18-core Intel Core i9-7980XE CPU @ 2.60 GHz, running Ubuntu 20.04 with the 5.4.0-96-lowlatency kernel. The system operated at a 30 kHz subcarrier spacing, with 12 out of the 18 cores exclusively dedicated to real-time PHY processing. As shown, NL-COMM can support large MU-MIMO configurations, such as 16 × 12, at lower transmission bandwidths (10 MHz), using only 12 processing cores. At higher bandwidths, such as 40 MHz, smaller MIMO configurations (e.g., 16 × 4) remain fully supported within the same core budget. The real-time processing requirement is set to a stringent 500 µs, corresponding to the slot duration for a 30 kHz subcarrier spacing (SCS). In practice, the hybrid automatic repeat request (HARQ) process permits longer processing times before feedback must be transmitted, thereby relaxing the per-slot real-time constraint.

This implementation validates the practical integration and feasibility of NL MIMO processing within an open, modular, and software-based DU architecture. As detailed in Section 7, NL-COMM has been demonstrated operating in real-time over-the-air, confirming its readiness for deployment in next-generation wireless systems.

6.2. Hardware-Based Design

While software-based implementations of NL-COMM can scale to support higher transmission bandwidths by leveraging multi-core servers, this approach comes at the cost of significantly increased computational power consumption. To address this, we present a hardware-accelerated version of NL-COMM, implemented as a look-aside FPGA-based accelerator. As we later show in this section, the FPGA implementation achieves up to 26.7 times higher power efficiency compared to the optimized software solution, while maintaining real-time performance.

Our FPGA design supports MU-MIMO configurations of up to 8 × 8 with up to 64-QAM modulation order and evaluates the eight most promising detection paths per OFDM resource element. The accelerator offloads the entire detection chain. The top-level architectural diagram of NL-COMM accelerator is shown in Figure 13. This includes the following: (i) Preprocessing, implementing a Sorted QR Decomposition (SQRD). (ii) Candidate path selection (MPP block) based on the NL-COMM probabilistic likelihood metric (see Section 4.2). The preprocessing handles batches of 12 subcarriers in parallel, feeding to the postprocessing unit. (iii) The post-processing parallel detection, and (iv) LLR computation. The post-processing stage is mapped onto a set of PUs, each operating in a pipelined fashion across subcarriers and MPPs. The number of instantiated PUs is configurable at design time (pre-synthesis), allowing the implementation to be tailored to different performance and power constraints. For instance, with 8 MPPs, a single PU is capable of handling one subcarrier every eight clock cycles. To enable integration with OpenAirInterface, the FPGA design is connected to the host system via PCIe Gen3 × 16 using AMD’s XDMA IP core.

Figure 14 presents the post-implementation processing latency of the FPGA-based NL-COMM detector on the Xilinx VCU118 evaluation board featuring the XCVU9P FPGA. Results are shown for different PU configurations (i.e, 1, 2, 4, and 8 instantiated PUs) and include the full end-to-end latency, accounting for both processing and data movement over PCIe. The achievable speedups compared to the highly optimized AVX-512-based software implementation can reach up to 15.1×. Further extensions of the accelerator towards the FEC will significantly compress the PCIe Write to host overhead, as it currently demands transferring 8 bits per LLR value to be transferred back to the host machine.

Power consumption was estimated using the AMD Vivado Power Estimator under worst-case switching activity. Table 1 details the resource utilization in terms of LUTs, LUTRAM, FFs, URAM/BRAMs, and DSPs, and reports both static and dynamic power components. Finally, Figure 15 illustrates the energy efficiency of the system, quantified as the number of detected resource blocks (RBs) per joule. As shown, the FPGA-based implementation delivers over 26× higher energy efficiency compared to the software baseline, making it a compelling candidate for deployment in power-constrained environments.

7. Over-the-Air Demonstration Platform

To validate the practical feasibility of NL-COMM under realistic wireless channel conditions, we developed an over-the-air (OTA) testbed based on a fully functional 5G-NR Stand-Alone (SA) Open RAN system. This testbed has been utilized in prior demonstration efforts [53,71,72], showcasing the real-time capabilities of NL-COMM in multi-user MIMO (MU-MIMO) environments. The system supports both conventional linear (ZF, MMSE) and NL-COMM-based NL processing schemes, entirely in software, as described in Section 6.1.

The system setup, illustrated in Figure 16, consists of a 5G base station co-located with an 8-element commercial antenna array and implemented on a DELL R740 server. The radio frontend is realized using an Ettus USRP X440 software-defined radio (Austin, TX, USA). The base station connects to a full 5G Core Network, hosted on a separate DELL R740 server, ensuring end-to-end SA operation. UE nodes include multiple commercial 5G smartphones (e.g., Nokia XR20, Espoo, Finland), providing realistic uplink and downlink traffic profiles.

The OTA framework supports dynamic reconfiguration of PHY and scheduling parameters, such as the number of active antennas, MIMO stream count, and the selected detection algorithm (linear or NL-COMM). A dedicated control interface allows for automated orchestration of experiments and performance logging. This setup facilitates side-by-side comparisons of linear and NL MU-MIMO processing under real-time wireless conditions, making it suitable for evaluating the practical gains of NL-COMM in terms of reliability, scalability, and resource efficiency.

Figure 17 shows indicative OTA results obtained with the described setup from the location shown in Figure 16. As shown, NL-COMM can support the same average per-user spectral efficiency as linear approaches with half the requirement of base-station antennas. The results align with our simulation results presented in Section 5, confirming the framework’s deployability and practical benefits.

8. Conclusions and Future Directions

This work introduced the generalized NL-COMM framework, an efficient NL processing framework that applies to both the uplink and downlink of non-orthogonal signal transmission. NL-COMM has significant potential to translate the theoretical advantages of non-orthogonal signal transmissions into practical and tangible benefits. In particular, NL-COMM can either increase spectral efficiency by over 200% or deliver the same spectral efficiency while halving the number of required base station antennas and reducing the power consumption by hundreds of Watts compared to existing linear processing schemes. In distributed MIMO systems where the fronthaul bandwidth is a bottleneck, NL-COMM can nearly halve the fronthaul data rate requirements due to the reduction in the number of antennas. Additionally, NL-COMM can support

4 \times

the users than the number of base station antennas, exceeding the capabilities of existing NOMA schemes, and without requiring changes to the existing standards. Such capabilities can unlock new use cases and NOMA schemes in future networks. Furthermore, for the first time, we present a series of implementation efforts that validate the feasibility of NL-COMM in both hardware and software environments, demonstrating its integration potential in real-world deployments. In particular, we present a software-based solution that can be integrated into an Open RAN-compliant distributed unit (DU), along with an FPGA-based accelerator prototype. This prototype achieves up to 26.7 times greater power efficiency compared to the optimized software solution, while still maintaining real-time performance. Finally, we summarize the results from over-the-air experiments that validate the simulation results.

Future wireless networks will increasingly rely on non-orthogonal signal transmissions, creating new opportunities for advanced non-linear processing. In particular, non-linear processing can be applied to the decoding of channel codes [73], leading to efficient joint detection and decoding architectures. Additionally, non-linear processing methods, such as NL-COMM, can be adapted for scenarios where signals are non-orthogonal due to asynchronous users [74]. Furthermore, non-linear precoding techniques require further research and development efforts to be fully integrated into compliant systems. For instance, techniques like vector perturbation necessitate modifications to the UE side processing to accommodate modulo operations.

Author Contributions

Conceptualization, K.N.; methodology, C.J. and K.N.; software, C.J. and G.N.K.; data curation, C.J. and G.N.K.; writing—original draft, C.J. and G.N.K.; writing—review and editing, C.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research paper received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Giordani, M.; Polese, M.; Mezzavilla, M.; Rangan, S.; Zorzi, M. Toward 6G Networks: Use Cases and Technologies. IEEE Commun. Mag. 2020, 58, 55–61. [Google Scholar] [CrossRef]
Deng, C.; Fang, X.; Han, X.; Wang, X.; Yan, L.; He, R.; Long, Y.; Guo, Y. IEEE 802.11be Wi-Fi 7: New Challenges and Opportunities. IEEE Commun. Surv. Tutor. 2020, 22, 2136–2166. [Google Scholar] [CrossRef]
Reuters. 2025. Available online: https://www.reuters.com/business/media-telecom/net-proceeds-key-us-spectrum-auction-tops-80-billion-2021-01-15/ (accessed on 7 May 2025).
Goldsmith, A.; Jafar, S.; Jindal, N.; Vishwanath, S. Capacity limits of MIMO channels. J. Sel. Areas Commun. 2003, 21, 684–702. [Google Scholar] [CrossRef]
Zheng, J.; Zhang, J.; Du, H.; Niyato, D.; Ai, B.; Debbah, M.; Letaief, K.B. Mobile Cell-Free Massive MIMO: Challenges, Solutions, and Future Directions. IEEE Wirel. Commun. 2024, 31, 140–147. [Google Scholar] [CrossRef]
Chaturvedi, S.; Liu, Z.; Bohara, V.A.; Srivastava, A.; Xiao, P. A tutorial on decoding techniques of sparse code multiple access. IEEE Access 2022, 10, 58503–58524. [Google Scholar] [CrossRef]
Hoshyar, R.; Razavi, R.; Al-Imari, M. LDS-OFDM an efficient multiple access technique. In Proceedings of the IEEE VTC, Taipei, Taiwan, 16–19 May 2010; pp. 1–5. [Google Scholar]
Nikitopoulos, K.; Zhou, J.; Congdon, B.; Jamieson, K. Geosphere: Consistently turning MIMO capacity into throughput. ACM SIGCOMM Comput. Commun. Rev. 2014, 44, 631–642. [Google Scholar] [CrossRef]
Studer, C.; Burg, A.; Bolcskei, H. Soft-output sphere decoding: Algorithms and VLSI implementation. IEEE J. Sel. Areas Commun. 2008, 26, 290–300. [Google Scholar] [CrossRef]
Barbero, L.G.; Ratnarajah, T.; Cowan, C. A Low-Complexity Soft-MIMO Detector Based on the Fixed-Complexity Sphere Decoder. In Proceedings of the IEEE ICASSP, Las Vegas, NV, USA, 31 March–4 April 2008; pp. 2669–2672. [Google Scholar]
Guo, Z.; Nilsson, P. Algorithm and implementation of the k-best sphere decoding for mimo detection. IEEE JSAC 2006, 24, 491–503. [Google Scholar]
Hochwald, B.; Peel, C.; Swindlehurst, A. A vector-perturbation technique for near-capacity multiantenna multiuser communication—Part II: Perturbation. IEEE Trans. Commun. 2005, 53, 537–544. [Google Scholar] [CrossRef]
Dai, Y.-X.; Jhang, S.-J.; Chen, Y.-M.; Lan, S.-P.; Ueng, Y.-L. An efficient soft output MIMO detector architecture considering high-order modulations. In Proceedings of the IEEE ACSSC, Singapore, 23–27 May 2022; pp. 623–627. [Google Scholar]
Chen, Y.-M.; Dai, Y.-X.; Jhang, S.-J.; Ueng, Y.-L. An efficient soft-output fixed-complexity sphere decoder for large qam constellations. IEEE Trans. Veh. Technol. 2025, 1–15. [Google Scholar] [CrossRef]
Yang, Q.; Li, X.; Yao, H.; Fang, J.; Tan, K.; Hu, W.; Zhang, J.; Zhang, Y. BigStation: Enabling scalable real-time signal processing in large MU-MIMO systems. ACM SIGCOMM Comput. Commun. Rev. 2013, 43, 399–410. [Google Scholar] [CrossRef]
Ding, J.; Doost-Mohammady, R.; Kalia, A.; Zhong, L. Agora: Real-time massive MIMO baseband processing in software. In Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies, Barcelona, Spain, 1–4 December 2020; pp. 232–244. [Google Scholar] [CrossRef]
Jensen, T.L.; Kant, S.; Wehinger, J.; Fleury, B.H. Fast Link Adaptation for MIMO OFDM. IEEE Trans. Veh. Technol. 2010, 59, 3766–3778. [Google Scholar] [CrossRef]
Shen, W.-L.; Lin, K.C.-J.; Gollakota, S.; Chen, M.-S. Rate Adaptation for 802.11 Multiuser MIMO Networks. IEEE Trans. Mob. Comput. 2014, 13, 35–47. [Google Scholar] [CrossRef]
Fan, J.; Yin, Q.; Li, G.Y.; Peng, B.; Zhu, X. MCS Selection for Throughput Improvement in Downlink LTE Systems. In Proceedings of the International Conference on Computer Communications and Networks (ICCCN), Maui, HI, USA, 31 July–4 August 2011; pp. 1–5. [Google Scholar]
Miuccio, L.; Panno, D.; Pisacane, P.; Riolo, S. A QoS-aware and channel-aware Radio Resource Management framework for multi-numerology systems. Comput. Commun. 2022, 191, 299–314. [Google Scholar] [CrossRef]
Zhang, J.; Lu, L.; Sun, Y.; Chen, Y.; Liang, J.; Liu, J.; Yang, H.; Xing, S.; Wu, Y.; Ma, J.; et al. PoC of SCMA-Based Uplink Grant-Free Transmission in UCNC for 5G. IEEE J. Sel. Areas Commun. 2017, 35, 1353–1362. [Google Scholar] [CrossRef]
Nikitopoulos, K. Massively Parallel, Nonlinear Processing for 6G: Potential Gains and Further Research Challenges. IEEE Commun. Mag. 2022, 60, 81–87. [Google Scholar] [CrossRef]
NL-COMM. 2025. Available online: https://nl-comm.com (accessed on 21 May 2025).
Nikitopoulos, K.; Georgis, G.; Jayawardena, C.; Chatzipanagiotis, D.; Tafazolli, R. Massively parallel tree search for high-dimensional sphere decoders. IEEE Trans. Parallel Distrib. Syst. 2018, 30, 2309–2325. [Google Scholar] [CrossRef]
Nikitopoulos, K.; Tafazolli, R. Parallel Processing of Sphere Decoders and Other Vector Finding Approaches Using Tree Search. Patent No. WO2016198845A1, 7 June 2016. Available online: https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016198845 (accessed on 21 March 2025).
Venkatesan, S.; Lozano, A.; Valenzuela, R. Network MIMO: Overcoming intercell interference in indoor wireless systems. In Proceedings of the Conference Record of The Forty-First Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 4–7 November 2007; pp. 83–87. [Google Scholar]
Björnson, E.; Sanguinetti, L. Scalable Cell-Free Massive MIMO Systems. IEEE Trans. Commun. 2020, 68, 4247–4261. [Google Scholar] [CrossRef]
Marsch, P.; Fettweis, G. Uplink CoMP under a Constrained Backhaul and Imperfect Channel Knowledge. IEEE Trans. Wirel. Commun. 2011, 10, 1730–1742. [Google Scholar] [CrossRef]
Xie, Q.; Wang, Z.; Wen, F.; He, J.; Truong, T.-K. Coarray Tensor Train Decomposition for Bistatic MIMO Radar With Uniform Planar Array. IEEE Trans. Antennas Propag. 2025, 73, 5310–5323. [Google Scholar] [CrossRef]
Xie, Q.; Shi, J.; Wen, F.; Zheng, Z. Higher-order tensor decomposition for 2D-DOD and 2D-DOA estimation in bistatic MIMO radar. Signal Process. 2026, 238, 110196. [Google Scholar] [CrossRef]
Wen, F.; Shi, J.; Gui, G.; Yuen, C.; Sari, H.; Adachi, F. Joint DOD and DOA Estimation for NLOS Target Using IRS-Aided Bistatic MIMO Radar. IEEE Trans. Veh. Technol. 2024, 73, 15798–15802. [Google Scholar] [CrossRef]
Kosasih, A.; Onasis, V.; Miloslavskaya, V.; Hardjawana, W.; Andrean, V.; Vucetic, B. Graph neural network aided MU-MIMO detectors. IEEE J. Sel. Areas Commun. 2022, 40, 2540–2555. [Google Scholar] [CrossRef]
Ducoing, J.C.D.L.; Jayawardena, C.; Nikitopoulos, K. An Assessment of Deep Learning Versus Massively Parallel, Non-Linear Methods for Highly-Efficient MIMO Detection. IEEE Access 2023, 11, 97493–97502. [Google Scholar]
Mohaisen, M.; Chang, K. Fixed-complexity sphere encoder for multi-user MIMO systems. J. Commun. Netw. 2011, 13, 63–69. [Google Scholar] [CrossRef]
Narasimhan, T.L.; Chockalingam, A. Channel hardening-exploiting message passing (CHEMP) receiver in large-scale MIMO systems. IEEE J. Sel. Top. Signal Process. 2014, 8, 847–860. [Google Scholar] [CrossRef]
Ma, Y.; Yamani, A.; Yi, N.; Tafazolli, R. Low-Complexity MU-MIMO Nonlinear Precoding Using Degree-2 Sparse Vector Perturbation. IEEE J. Sel. Areas Commun. 2016, 34, 497–509. [Google Scholar] [CrossRef]
Rangan, S.; Schniter, P.; Fletcher, A.K. Vector approximate message passing. IEEE Trans. Inf. Theory 2019, 65, 6664–6684. [Google Scholar] [CrossRef]
Yu, L.; Liu, Z.; Wen, M.; Cai, D.; Dang, S.; Wang, Y.; Xiao, P. Sparse code multiple access for 6g wireless communication networks: Recent advances and future directions. IEEE Commun. Stand. Mag. 2021, 5, 92–99. [Google Scholar] [CrossRef]
Rebhi, M.; Hassan, K.; Raoof, K.; Chargé, P. Sparse code multiple access: Potentials and challenges. IEEE Open J. Commun. Soc. 2021, 2, 1205–1238. [Google Scholar] [CrossRef]
Razavi, R.; L-Imari, M.A.; Imran, M.A.; Hoshyar, R.; Chen, D. On Receiver Design for Uplink Low Density Signature OFDM (LDS-OFDM). IEEE Trans. Commun. 2012, 60, 3499–3508. [Google Scholar] [CrossRef]
Meng, X.; Wu, Y.; Chen, Y.; Cheng, M. Low Complexity Receiver for Uplink SCMA System via Expectation Propagation. In Proceedings of the IEEE WCNC, San Francisco, CA, USA, 19–22 March 2017; pp. 1–5. [Google Scholar]
Özduran, V.; Mohammadi, M.; Nomikos, N.; Ansari, I.S.; Trakadas, P. On the performance of uplink power-domain noma with imperfect csi and sic in 6g networks. J. Commun. Netw. 2024, 26, 445–460. [Google Scholar] [CrossRef]
Wang, B.; Dai, L.; Yuan, Y.; Wang, Z. Compressive sensing based multi-user detection for uplink grant-free non-orthogonal multiple access. In Proceedings of the IEEE VTC, Boston, MA, USA, 6–9 September 2015; pp. 1–5. [Google Scholar]
Cui, T.; Tellambura, C. An efficient generalized sphere decoder for rank-deficient MIMO systems. In Proceedings of the IEEE VTC, Milan, Italy, 5 September 2004; Volume 5, pp. 3689–3693. [Google Scholar]
Liu, Z.; Yang, L.-L. Sparse or Dense: A Comparative Study of Code-Domain NOMA Systems. IEEE Trans. Wirel. Commun. 2021, 20, 4768–4780. [Google Scholar] [CrossRef]
Shabany, M.; Su, K.; Gulak, P. A pipelined scalable high-throughput implementation of a near-ML K-best complex lattice decoder. In Proceedings of the IEEE ICASSP, Las Vegas, NV, USA, 31 March–4 April 2008. [Google Scholar]
Bao, J.; Ma, Z.; Ding, Z.; Karagiannidis, G.K.; Zhu, Z. On the design of multiuser codebooks for uplink SCMA systems. IEEE Commun. Lett. 2016, 20, 1920–1923. [Google Scholar] [CrossRef]
Dai, L.; Wang, B.; Yuan, Y.; Han, S.; Chih-Lin, I.; Wang, Z. Non-orthogonal multiple access for 5G: Solutions, challenges, opportunities, and future research trends. IEEE Commun. Mag. 2015, 53, 74–81. [Google Scholar] [CrossRef]
Costa, M. Writing on dirty paper (Corresp.). IEEE Trans. Inf. Theory 1983, 29, 439–441. [Google Scholar] [CrossRef]
Wubben, D.; Bohnke, R.; Kuhn, V.; Kammeyer, K.D. MMSE extension of V-BLAST based on sorted QR decomposition. In Proceedings of the IEEE VTC, Orlando, FL, USA, 6–9 October 2003; Volume 1, pp. 508–512. [Google Scholar]
Jayawardena, C.; Nikitopoulos, K. G-MultiSphere: Generalizing Massively Parallel Detection for Non-Orthogonal Signal Transmissions. IEEE Trans. Commun. 2020, 68, 1227–1239. [Google Scholar] [CrossRef]
Husmann, C.; Nikitopoulos, K. Viper mimo: Increasing large mimo efficiency via practical vector-perturbation. In Proceedings of the IEEE GLOBECOM, Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
Jayawardena, C.; Filo, M.; Katsaros, G.N.; Nikitopoulos, K. Nl-comm: Demonstrating gains of non-linear processing in open-ran ecosystem. In Proceedings of the IEEE CAMAD, Athens, Greece, 21–23 October 2024; pp. 1–2. [Google Scholar]
3GPP. 5G; Study on Channel Model for Frequency Spectrum Above 6 GHz; 3rd Generation Partnership Project (3GPP); Technical Report (TR) 38.901; version 16.1.0; 3GPP: Sophia Antipolis, France, 2020. [Google Scholar]
Gong, Y.; Zhang, L.; Liu, R.; Yu, K.; Srivastava, G. Nonlinear MIMO for industrial internet of things in cyber-physical systems. IEEE Trans. Ind. Inform. 2021, 17, 5533–5541. [Google Scholar] [CrossRef]
eCPRI Specification V2.0. Available online: http://www.cpri.info (accessed on 7 March 2025).
Lozano, A.; Jindal, N. Transmit diversity vs. spatial multiplexing in modern MIMO systems. IEEE Trans. Wirel. Commun. 2010, 9, 186–197. [Google Scholar] [CrossRef]
Basnayaka, D.A.; Smith, P.J.; Martin, P.A. Performance analysis of macrodiversity MIMO systems with MMSE and ZF receivers in flat Rayleigh fading. IEEE Trans. Wirel. Commun. 2013, 12, 2240–2251. [Google Scholar] [CrossRef]
Jang, E.W.; Lee, J.; Lou, H.-L.; Cioffi, J.M. On the combining schemes for MIMO systems with hybrid ARQ. IEEE Trans. Wirel. Commun. 2009, 8, 836–842. [Google Scholar] [CrossRef]
Damnjanovic, A.; Knisley, D.; Saurabh, A.; Prakash, R.; Zhang, X.; Chen, S. Spectrum sharing with O-RAN architecture. In Proceedings of the IEEE DySPAN, Washington, DC, USA, 13–16 May 2024; pp. 108–113. [Google Scholar]
3GPP. 5G:NR; Physical Procedures for Data; 3rd Generation Partnership Project (3GPP); Technical Specification (TS) 38.214; version 16.2.0; 3GPP: Sophia Antipolis, France, 2020. [Google Scholar]
Husmann, C.; Georgis, G.; Nikitopoulos, K.; Jamieson, K. Flexcore: Massively parallel and flexible processing for large MIMO access points. In Proceedings of the USENIX NSDI, Boston, MA, USA, 27–29 March 2017; pp. 197–211. [Google Scholar]
Katsaros, G.N.; Nikitopoulos, K. Power efficient and ultra dense open-ran vehicular networks with non-linear processing. IEEE Access 2024, 12, 38150–38162. [Google Scholar] [CrossRef]
Katsaros, G.N.; Filo, M.; Tafazolli, R.; Nikitopoulos, K. MIMO-SoftiPHY: A Software-Based PHY Design and Implementation Framework for Highly-Efficient Open-RAN MIMO Radios. IEEE Trans. Mob. Comput. 2024, 23, 12491–12504. [Google Scholar] [CrossRef]
Yang, M.; Li, Y.; Jin, D.; Su, L.; Ma, S.; Zeng, L. OpenRAN: A software-defined ran architecture via virtualization. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, Hong Kong, China, 12–16 August 2013; pp. 549–550. [Google Scholar] [CrossRef]
Ofcom: Open RAN and the Link Between Competition and Innovation. January 2022. Available online: https://www.ofcom.org.uk/research-and-data/economics-discussion-papers/open-ran-competition-innovation (accessed on 15 March 2025).
Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014; pp. 10–14. [Google Scholar]
srsRAN. Available online: https://www.srslte.com/ (accessed on 15 March 2025).
OpenAirInterface. Available online: https://gitlab.eurecom.fr/oai/openairinterface5g (accessed on 21 May 2025).
Gong, J.; Kalia, A.; Yu, M. Scalable distributed massive {MIMO} baseband processing. In Proceedings of the USENIX NSDI, Boston, MA, USA, 17–19 April 2023; pp. 405–417. [Google Scholar]
Nikitopoulos, K.; Filo, M.; Katsaros, G.N.; Jayawardena, C.; Tafazolli, R. MU-MIMO, Open-RAN PHY with Linear and Massively Parallelizable Non-Linear Processing. In Proceedings of the ACM MobiCom, Madrid, Spain, 2–6 October 2023. [Google Scholar] [CrossRef]
Filo, M.; Katsaros, G.N.; Jayawardena, C.; Nikitopoulos, K. Nl-comm: Enhanced video streaming via advanced non-linear processing. In Proceedings of the IEEE WCNC, Milan, Italy, 24–27 March 2025; pp. 1–3. [Google Scholar]
Husmann, C.; Nikolaou, P.C.; Nikitopoulos, K. Reduced Latency ML Polar Decoding via Multiple Sphere-Decoding Tree Searches. IEEE Trans. Veh. Technol. 2018, 67, 1835–1839. [Google Scholar] [CrossRef]
Jayawardena, C.; Nikitopoulos, K. Joint Frequency Offset Compensation and Detection for Multi-User MIMO-OFDM Systems with Frequency Asynchronous User Access. In Proceedings of the IEEE ICC, Rome, Italy, 28 May–1 June 2023; pp. 529–534. [Google Scholar]

Figure 1. The most promising tree paths based on relative distances of NL-COMM PHY processing (upper diagram), detection stage employing four parallel processing units (lower-left), demapping relative distances from equivalent received signal point (X) to constellation symbols (lower-right).

Figure 2. Simulation results comparing spectral efficiency (SE) utilize the 3GPP CDL-B channel model for a 16-antenna base station supporting 16 UEs. Modulation and code rate are selected to maximize throughput for each method.

Figure 3. The number of base station antennas, the corresponding power consumption (a), and computational complexity (b) required to support high-rate (64-QAM, 2/3 code rate) users at a PER of

10 %

in the uplink. A 3GPP CDL-B channel model is assumed. It is estimated that each RF chain consumes 15.4 W, according to [55].

Figure 3. The number of base station antennas, the corresponding power consumption (a), and computational complexity (b) required to support high-rate (64-QAM, 2/3 code rate) users at a PER of

10 %

in the uplink. A 3GPP CDL-B channel model is assumed. It is estimated that each RF chain consumes 15.4 W, according to [55].

Figure 4. The figures show the SNR required to support several number of low-rate users while achieving a PER

< 10 %

in a CDL-B channel with a single-antenna (a) and four-antenna (b) base stations employing NL-COMM processing in the uplink.

Figure 4. The figures show the SNR required to support several number of low-rate users while achieving a PER

< 10 %

in a CDL-B channel with a single-antenna (a) and four-antenna (b) base stations employing NL-COMM processing in the uplink.

Figure 5. Spectral efficiency comparison for an increasing number of low-rate UEs supported by an 8-antenna base station. Simulation results utilizing the CDL-B channel model for a 8-antenna base station. The single-antenna low-rate UEs employ 4-QAM modulation and 0.5 rate LDPC codes with 1944 block length. SNR of 17 dB is assumed in the downlink.

Figure 6. A block diagram of centralized processing for distributed MIMO systems.

Figure 7. PER of LMMSE and NL processing with different numbers of RU antennas (

M = 4

and

M = 8

).

Figure 7. PER of LMMSE and NL processing with different numbers of RU antennas (

M = 4

and

M = 8

).

Figure 8. A block diagram of decentralized processing for distributed MIMO systems.

Figure 9. Uplink spectral efficiency for a distributed MIMO system with centralized and decentralized processing.

N_{B} = 3

,

M = 4

is assumed with varying K in the high SNR regime.

Figure 9. Uplink spectral efficiency for a distributed MIMO system with centralized and decentralized processing.

N_{B} = 3

,

M = 4

is assumed with varying K in the high SNR regime.

Figure 10. Empirical CDF of UE throughput for

N_{B} = 3

,

M = 4

,

K = 12

, with different RAN designs and processing.

Figure 10. Empirical CDF of UE throughput for

N_{B} = 3

,

M = 4

,

K = 12

, with different RAN designs and processing.

Figure 11. Software-based processing architecture of NL-COMM, where available processing cores are used for both per-subcarrier and per-MPP processing when core availability is high.

Figure 12. NL-COMM software-based processing latency with 16 cores allocated for real-time processing, measured for transmission bandwidths of 10, 20, and 40 MHz.

Figure 13. Top-level FPGA architectural diagram.

Figure 14. FPGA processing latency, including PCIe transfer time, for transmission bandwidths of 10, 20, 40, 80, and 100 MHz, measured for different numbers of instantiated PUs (increased post-processing parallelism).

Figure 15. The Computational Energy efficiency gains of the NL-COMM FPGA accelerator, expressed in detected RBs per joule, including gains over the software-based NL-COMM implementation.

Figure 16. The over-the-air demonstration setup, showing the 8-element antenna array and the USRP-based base station configuration supporting 8 concurrently transmitting COTS UEs.

Figure 17. Indicative average over-the-air spectral efficiency per UE. Results comparing NL-COMM and linear MU-MIMO processing (MMSE and ZF) across different antenna configurations (16-QAM, MCS16).

Table 1. FPGA resource utilization and power consumption.

Resource	1PU	2PU	4PU	8PU
LUT	402.5 k	423.2 k	462.7 k	544.2 k
LUTRAM	21.7 k	22.7 k	24.9 k	29.2 k
FF	425.3 k	445.8 k	497.0 k	593.8 k
DSP	2.0 k	2.3 k	3.1 k	4.6 k
BRAM + URAM	405	455	506	600
Static Power (W)	2.9	3.1	3.3	3.7
Dynamic Power (W)	15.5	18.6	26.6	39.3
Total Power (W)	18.4	21.7	29.9	43.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jayawardena, C.; Katsaros, G.N.; Nikitopoulos, K. NL-COMM: Enabling High-Performing Next-Generation Networks via Advanced Non-Linear Processing. Future Internet 2025, 17, 447. https://doi.org/10.3390/fi17100447

AMA Style

Jayawardena C, Katsaros GN, Nikitopoulos K. NL-COMM: Enabling High-Performing Next-Generation Networks via Advanced Non-Linear Processing. Future Internet. 2025; 17(10):447. https://doi.org/10.3390/fi17100447

Chicago/Turabian Style

Jayawardena, Chathura, George Ntavazlis Katsaros, and Konstantinos Nikitopoulos. 2025. "NL-COMM: Enabling High-Performing Next-Generation Networks via Advanced Non-Linear Processing" Future Internet 17, no. 10: 447. https://doi.org/10.3390/fi17100447

APA Style

Jayawardena, C., Katsaros, G. N., & Nikitopoulos, K. (2025). NL-COMM: Enabling High-Performing Next-Generation Networks via Advanced Non-Linear Processing. Future Internet, 17(10), 447. https://doi.org/10.3390/fi17100447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

NL-COMM: Enabling High-Performing Next-Generation Networks via Advanced Non-Linear Processing

Abstract

1. Introduction

Related Work

2. A Generic System Model for Non-Orthogonal Signal Transmissions

2.1. Uplink

2.2. Downlink

2.3. Examples of Interference Matrices

3. Transceiver Processing Challenge for Non-Orthogonal Signal Transmissions

3.1. Uplink

3.1.1. Non-Linear Joint Soft Information Processing

3.2. Downlink

4. NL-COMM: An Efficient, Massively Parallelizable Non-Linear Processing Framework

4.1. Motivation

4.2. NL-COMM PHY Processing

4.2.1. Principles

4.2.2. Uplink

4.2.3. Downlink

4.3. NL-COMM Radio Resource Management

5. Evaluations of Potential Gains and Opportunities

5.1. Enhancing Spectral Efficiency

5.2. Enhancing Energy Efficiency in MIMO Systems

5.3. Enabling Massive Connectivity

5.4. Boosting-Distributed MIMO Performance

5.4.1. Centralized Processing in Open RAN-Based Distributed MIMO Systems

5.4.2. Decentralized Processing for Distributed MIMO Systems

6. Realizing NL-COMM: Software, Hardware, and System-Level Demonstrations

6.1. Software-Based Design

6.2. Hardware-Based Design

7. Over-the-Air Demonstration Platform

8. Conclusions and Future Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI