From CSI to Coordinates: An IoT-Driven Testbed for Individual Indoor Localization

Macedo, Diana; Loureiro, Miguel; Martins, Óscar G.; Sousa, Joana Coutinho; Belo, David; Gomes, Marco

doi:10.3390/fi17090395

Open AccessArticle

From CSI to Coordinates: An IoT-Driven Testbed for Individual Indoor Localization

by

Diana Macedo

^1,2,†

,

Miguel Loureiro

^1,2,†

,

Óscar G. Martins

^1,3,†

,

Joana Coutinho Sousa

^4,†

,

David Belo

^4,5,†

and

Marco Gomes

^1,2,*,†

¹

Instituto de Telecomunicações (IT), 3810-193 Aveiro, Portugal

²

Department of Electrical and Computer Engineering (DEEC), University of Coimbra, 3030-290 Coimbra, Portugal

³

CRACS/INESCTEC, CISUC and Department of Computer Science, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal

⁴

NOS Inovação, 1000-029 Lisboa, Portugal

⁵

Safe AI [4U], 2485-201 Mira de Aire, Portugal

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Future Internet 2025, 17(9), 395; https://doi.org/10.3390/fi17090395 (registering DOI)

Submission received: 26 July 2025 / Revised: 21 August 2025 / Accepted: 26 August 2025 / Published: 30 August 2025

(This article belongs to the Special Issue Joint Design and Integration in Smart IoT Systems, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Indoor wireless networks face increasing challenges in maintaining stable coverage and performance, particularly with the widespread use of high-frequency Wi-Fi and growing demands from smart home devices. Traditional methods to improve signal quality, such as adding access points, often fall short in dynamic environments where user movement and physical obstructions affect signal behavior. In this work, we propose a system that leverages existing Internet of Things (IoT) devices to perform real-time user localization and network adaptation using fine-grained Channel State Information (CSI) and Received Signal Strength Indicator (RSSI) measurements. We deploy multiple ESP-32 microcontroller-based receivers in fixed positions to capture wireless signal characteristics and process them through a pipeline that includes filtering, segmentation, and feature extraction. Using supervised machine learning, we accurately predict the user’s location within a defined indoor grid. Our system achieves over

82 %

accuracy in a realistic laboratory setting and shows improved performance when excluding redundant sensors. The results demonstrate the potential of communication-based sensing to enhance both user tracking and wireless connectivity without requiring additional infrastructure.

Keywords:

joint communication and sensing; IoT localization; user tracking; Wi-Fi sensing

1. Introduction

The rapid proliferation of Internet of Things (IoT) devices has transformed the modern home into a complex network of sensors, actuators, and smart appliances. These devices depend on uninterrupted wireless connectivity for critical functions such as home automation, security, and entertainment. Although high-frequency Wi-Fi bands (e.g., 5 GHz and above) offer greater bandwidth and higher data rates, they suffer from severe attenuation through walls and other obstacles, leading to coverage gaps and degraded user experience [1]. Ensuring robust, whole-home Wi-Fi coverage with acceptable Quality of Service (QoS) and Quality of Experience (QoE) remains a fundamental challenge in smart home design.

Traditional approaches to enhancing Wi-Fi performance often involve signal boosting through additional access points or mesh networking. However, these methods may not fully address performance degradation caused by environmental factors such as interference, user mobility, and physical obstructions. Therefore, a more dynamic solution is required. One that continuously adapts based on user locations, movement patterns, and network performance metrics such as Channel State Information (CSI), Received Signal Strength Indicator (RSSI), Signal-to-Noise Ratio (SNR), and throughput [2]. Joint Communication and Sensing (JCAS) has recently emerged as a promising paradigm that fuses wireless data transmission with environmental sensing [3]. This vision is now being formally supported by the upcoming IEEE 802.11bf standard, officially titled “Amendment 4: Enhancements for Wireless Local Area Network (WLAN) Sensing” [4]. Approved as a draft in 2025, IEEE 802.11bf defines extensions to the 802.11 protocol to support device-free sensing through modifications to the PHY and MAC layers. This initiative underscores the growing importance of an environment based on Wi-Fi perception and reinforces the viability of CSI-driven indoor localization in next-generation wireless networks.

By treating the communication infrastructure itself as a sensing medium, JCAS techniques enable simultaneous data delivery and indoor localization, offering rich insights into device positions and user trajectories, all without adding dedicated sensors [3]. Fingerprint-based localization generally involves two key stages: (i) a training stage and (ii) a testing stage [5]. During the training stage, a reference database is assembled by gathering and preprocessing survey measurements tied to known locations. Instead of simply storing every received signal strength (RSS) reading, machine learning techniques can be employed at this point to build compact fingerprint models [6].

In this work, we leverage widely deployed IoT endpoints as fixed “anchor” nodes for both localization and Wi-Fi sensing. Specifically, we use ESP32 microcontrollers mounted at known, fixed locations in an indoor environment to extract fine-grained CSI and RSSI measurements on the 2.4 GHz band. These anchors serve as stable reference points: their spatial distribution and immobility make them ideal for continuous monitoring of signal variations caused by user movement.

In the landscape of IoT connectivity, multiple wireless technologies coexist, including Bluetooth Low Energy (BLE), Zigbee, RFID, Near Field Communication (NFC), and Wi-Fi [7], each targeting short-range communications with different trade-offs. Within domestic, smart home, and office environments, however, Wi-Fi has emerged as the dominant choice, with the majority of consumer IoT devices, such as smart lights, thermostats, home assistants, and shutters, connecting via WLAN, most commonly in the 2.4 GHz frequency band. This prevalence provides a practical advantage for our system, as the static nature of such devices can be repurposed as fixed sensing anchors, aligning with our objectives of real-time user localization and adaptive network optimization based on CSI and RSSI. While more recent Wi-Fi standards support operation in the 5 GHz (e.g., 802.11n/ac/ax) and even 6 GHz (Wi-Fi 7, 802.11 be) bands, which offer higher spatial resolution and faster data rates, these frequencies also experience stronger path loss and greater degradation through obstacles such as walls. As a result, their integration into everyday IoT devices remains limited compared to 2.4 GHz. Focusing on 2.4 GHz WLAN, therefore, enables the development of a readily deployable and cost-effective framework without the need for specialized infrastructure, making it particularly well suited for the experimentation and deployment scenarios considered in this work.

Our contributions are threefold:

We design and implement an ESP32-based CSI extraction tool and deploy it across multiple fixed anchors to collect a comprehensive dataset of indoor CSI/RSSI readings under varying user positions and movements.
We develop a signal-processing pipeline that encompasses denoising, pattern clustering, and feature extraction that transforms raw CSI and RSSI data into inputs suitable for real-time localization and adaptive network tuning.
We construct and evaluate a machine learning model that fuses localization estimates with network metrics to dynamically adjust Wi-Fi parameters, thereby reducing dead zones and enhancing overall connectivity.

The remainder of this paper is organized as follows. Section 2 reviews related work in Wi-Fi sensing and indoor localization. Section 3 details our IoT-based sensing system, including hardware setup and data collection methodology. Section 4 describes the signal-processing techniques and key performance metrics. Section 5 presents our localization and adaptation model. The experiments conducted and the results obtained are presented in Section 6. Section 7 discusses the limitations of our work and future research directions. Finally, Section 8 presents the main conclusions.

To summarize our proposed system, we organize the pipeline into six stages, illustrated in Figure 1. The Acquire phase refers to the collection of CSI/RSSI data using our ESP32-based infrastructure. The Compute step includes initial signal denoising and magnitude computation. The Process block encompasses all preprocessing operations, including segmentation, normalization, and feature extraction from multiple subcarriers. The Select stage involves feature selection and, when necessary, dimensionality reduction. Two Random Forest–based models are then trained in the Model stage: (1) a dual-stage classifier for hierarchical prediction of columns and rows, and (2) a unified multiclass classifier for direct position prediction. Finally, in the Predict step, these models are applied to new samples to infer the user’s location in real time.

2. Related Work

Research on CSI-based indoor localization has evolved rapidly over the last decade, moving from classical fingerprinting approaches toward increasingly sophisticated machine learning and optimization methods that jointly exploit amplitude, phase, and temporal dynamics. Early systems such as FIFS [8] and Horus [9] relied on Bayesian inference and K-nearest neighbor classifiers over RSSI or coarse CSI amplitude fingerprints, demonstrating meter-level accuracy but requiring extensive site surveys and suffering under dynamic environmental changes.

The introduction of commercial Wi-Fi hardware capable of exposing CSI measurements led to new possibilities. CSI contains detailed multipath information across each Orthogonal Frequency Division Multiplexing (OFDM) subcarrier, which can be leveraged for spatial discrimination. OFDM is a digital modulation technique that splits a signal across multiple closely spaced subcarriers, each modulated at a lower data rate, allowing high-throughput communication while being resilient to multipath fading and interference.

DeepFi [10] introduced stacked autoencoders to learn compact magnitude-based CSI fingerprints for each physical location in an indoor environment, where fingerprints represent the unique multipath profiles of Wi-Fi signals measured between a user device (e.g., smartphone or IoT sensor) and one or more anchors (e.g., Wi-Fi access points). By reducing the dimensionality of raw CSI measurements while preserving discriminative spatial features, DeepFi improved scalability over traditional fingerprinting systems. CiFi [11] further incorporated phase-difference analysis across antenna pairs and applied Convolutional Neural Networks (CNNs) to model spatial correlations in CSI matrices, achieving robust device localization even under non-line-of-sight (NLOS) conditions, a critical challenge in cluttered indoor spaces. CNNs are especially suited to this task, as they hierarchically extract patterns from CSI’s subcarrier–antenna dimensions, which encode location-dependent multipath effects. More recent works, including OpenCSI [12] and DelFi [13], automated the collection of fused amplitude–phase datasets and deployed lightweight convolutional architectures for smaller-scale environments (e.g., single-room offices), demonstrating the feasibility of real-time inference on embedded hardware like Raspberry Pis or ESP32-based anchors.

In addition to model design, researchers have explored optimization strategies for both feature selection and system deployment. For instance, genetic algorithms and particle swarm optimization have been used to identify subcarrier subsets that maximize localization accuracy [14]. Other studies focused on optimizing the placement of sensing nodes, using greedy heuristics to minimize coverage gaps with fewer devices. Some have taken this further by adapting the behavior of network components: reinforcement learning methods adjust transmission parameters such as frequency channels and power levels based on live performance feedback, effectively linking localization quality with network configuration [15]. This reflects a shift toward joint localization and communication optimization, where Access Point (AP) settings play an active role in sensing. An AP is a network device that connects wireless clients (such as smartphones, laptops, or IoT devices) to a wired local area network (LAN), acting as a bridge and providing wireless connectivity within a defined area. APs are key infrastructure components in Wi-Fi systems, and their configuration directly influences both coverage and performance.

Complementary to these efforts, multi-modal fusion has shown promise in increasing system resilience in challenging indoor scenarios. By integrating CSI with inertial measurements, Bluetooth signal strength, or visual input, researchers have achieved centimeter-level precision even in dynamic or obstructed indoor areas [16]. Although such systems often require additional hardware, the growing prevalence of IoT devices suggests a practical pathway toward scalable, infrastructure-free localization in future smart homes [17].

More recently, the growing integration of localization and communication has raised important concerns around privacy, transparency, and control. In this context, Martins et al. [18] provide a comprehensive survey of privacy-preserving mechanisms for Joint Communication and Sensing (JCAS) systems, highlighting emerging risks and mitigation strategies relevant to future deployments. Their work underscores the importance of embedding security-by-design principles into CSI-based sensing systems, especially in settings such as smart homes, where passive or invisible sensing may impact user trust.

It is also important to note that most existing CSI- and RSSI-based localization systems are not fully passive. That is, they typically rely on a device carried by the user to be localized or on auxiliary IoT devices associated with the user. For example, the work of Hernandez and Bulut [19] employs an IoT device embedded in the user’s smartphone, enabling accuracies above 90%. Our system focuses on a realistic laboratory environment using only three ESP32 modules and a single AP, with the user being completely device-free. To the best of our knowledge, previous studies have not reported comparable accuracy under such minimal hardware requirements in a fully passive setting.

Building upon this foundation, our work explores three key contributions: (i) deploying low-cost ESP32 devices to collect fused amplitude–phase CSI on the 2.4 GHz band, (ii) applying a denoising, clustering and feature extraction pipeline suitable to generate compact representations amenable to real time inference, and (iii) integrating an attention based CNN–LSTM network with dynamic AP parameter tuning to optimize localization accuracy and network performance jointly.

3. Testbed Setup and Data Collection

3.1. Setup

This study relies on a dedicated testbed designed to assess the feasibility of CSI-based human presence detection in indoor environments. To simulate a smart home equipped with multiple IoT devices, four ESP32-WROOM-32E modules [20] were strategically deployed in a laboratory office at the Instituto de Telecomunicações (IT), located within the Department of Electrical and Computer Engineering (DEEC), of the University of Coimbra (UC), Portugal. The office has dimensions of

3.2 m \times 5.6 m

and is divided into a grid of approximately equal-sized rectangles with dimensions of

0.65 m \times 0.45 m

. Each device was configured to capture and transmit Channel State Information (CSI) and Received Signal Strength Indicator (RSSI) over a 20 MHz IEEE 802.11n channel operating in the 2.4 GHz band, with an additional access point (AP) ensuring wireless connectivity across the system. The nodes were placed in fixed positions throughout the room, surrounded by typical laboratory elements such as desks, two metal cabinets, and computing equipment (as laid out in the schematic plan in Figure 2, and shown in the office photo in Figure 3). A personal computer on the same network acted as a UDP server, receiving CSI packets from each ESP32 in real time and assigning timestamps to enable precise synchronization and offline processing.

The entire setup operated in a real-world environment (shown in Figure 3), where some level of environmental noise was unavoidable. Despite restricted access during measurements, factors such as nearby servers and active computers, frequent door movement, ongoing construction, and occasional presence of students introduced realistic variability that could affect the recorded signal quality.

3.2. Toolbox

The boards were programmed using the open source ESP32-CSI-Tool [21] and built upon Espressif’s CSI framework. In this setup, each ESP32 was set to Active Station (STA) mode, connecting to a common AP that handled Wi-Fi connectivity, while the boards worked as receivers, collecting CSI and RSSI information from the received data.

To send all these packets for storage and processing, the original firmware was also modified so that each device could transmit the collected information via UDP to a server. This server ran on a Linux-based (Ubuntu 24.04.2 LTS) personal computer and was responsible for listening to the network and saving the incoming data from each ESP into separate files for later processing and analysis. The main hardware and software components involved in this setup are summarized in Table 1. To ensure consistency across different hardware units, we applied sensor-specific normalization to each ESP32’s data.

3.3. 802.11n 20 MHz Structure

The IEEE 802.11n standard [22] employs a resource grid based on Orthogonal Frequency Division Multiplexing (OFDM), which structures a 20 MHz bandwidth by dividing the available bandwidth into 64 equally spaced subcarriers. Among these, only 52 carry relevant information: 48 transmit user data, and 4 serve as pilot signals. These pilot subcarriers are inserted at fixed positions within the OFDM symbol and are used by the receiver to estimate the channel frequency response, a process known as channel estimation, which enables compensation for the distortions introduced by the wireless channel. The remaining 12 subcarriers, comprising the guard bands and the central DC subcarrier, do not carry useful data. The guard bands help minimize interference between adjacent channels, while the DC subcarrier is set to zero to mitigate hardware-induced distortions during signal conversion, as shown in Figure 4.

Each active subcarrier provides CSI, describing how reflections, interference, and absorption within the environment influence the wireless signals. While guard interval values remain constant and thus contain no useful localization information, the values of data and pilot subcarriers vary according to signal propagation paths. Therefore, these subcarriers hold critical information for indoor localization purposes.

3.4. Data Collection Strategy

To enable robust CSI-based indoor localization, we designed a comprehensive data collection protocol within a controlled laboratory environment. A set of predefined positions was established across a grid-like layout, and at each position, the user remained still for approximately five minutes while Channel State Information (CSI) was passively captured.

CSI recordings were conducted using four ESP32-WROOM-32E devices, all configured to operate on the same 20 MHz IEEE 802.11n channel. Each device transmitted the collected data via UDP to a central server, which organized and saved the information in structured CSV files, separated by device and position. Although the ESP32 collects CSI at 100 packets per second, only one out of every five packets was transmitted to the server to reduce bandwidth usage, resulting in an effective sampling rate of 20 packet samples/s. Each packet encapsulates CSI measurements from 64 subcarriers, represented by 128 values of 8-bit signed integers (each subcarrier being described by a tuple consisting of its real and imaginary components).

To ensure synchronization, all four devices simultaneously recorded CSI during each session. The data collected includes three main settings: (1) recordings spanning 10 hours during the night without human presence; (2) recordings of 30 min during the day, and (3) a fixed stand-up position for 5 minutes for each position in the defined grid.

To capture variability in human-induced signal dynamics, the data were collected by two individuals, a male (1.83 m) and a female (1.62 m). Moreover, data collection sessions were spread across different times of the day and under varying environmental conditions, such as changes in ambient temperature, nearby electronic activity, and other undetected interferences. This ensured that the dataset reflects realistic variations commonly found in indoor environments. To ensure the integrity and reliability of the data, all incomplete or malformed CSI vectors were automatically discarded by the system during acquisition. This filtering step guarantees that only valid and complete measurements contribute to the final dataset. For each position, a fixed number of 120 CSI samples were selected. This uniform sample count helps prevent class imbalance and supports fair training and evaluation of machine learning models.

4. Signal Processing

4.1. Channel State Information (CSI) Fundamentals

Channel State Information (CSI) captures how a wireless signal is altered as it travels from each transmit antenna through the environment to each receive antenna, encoding both the amplitude attenuation and phase shift on every OFDM subcarrier. By sending known pilot symbols and observing how they emerge at the receiver, we can reconstruct a multidimensional matrix of complex gains, our CSI, which effectively maps the unique multipath fingerprints created by walls, furniture, and other obstacles in an indoor space. Because these fingerprints vary predictably with position, comparing real-time CSI measurements against a pre-collected database or applying machine learning models allows us to pinpoint a device’s location to within a few centimeters. In essence, CSI transforms the rich, frequency-dependent distortions of Wi-Fi signals into a highly discriminative signature that makes accurate indoor positioning possible, even in environments where GPS is imprecise and without a complex array of Bluetooth beacons. In OFDM systems, known pilot symbols, either interleaved with data or sent as a preamble, are transmitted across all subcarriers to estimate the channel state information (CSI). Denoting the transmitted pilot on subcarrier i as

x [i]

and the received signal as

y [i]

, the propagation channel is modeled as

y [i] = H [i] x [i] + η [i],

(1)

where

H [i] \in C

is the complex channel gain for subcarrier i and

η [i]

represents additive noise at subcarrier i.

4.2. CSI vs. RSSI

RSSI-based methods, when combined with mathematical models such as Linear Least Squares (LLS), can achieve meaningful localization accuracy in certain scenarios [19,23]. However, most RSSI-based localization approaches assume that the user carries an active device acting as a transmitter (or receiver). At the same time, anchor nodes with known positions serve as receivers (or transmitters, respectively). This setup enables trilateration or triangulation based on distance estimations derived from line-of-sight RSSI measurements. In contrast, our system is fully passive: the user carries no device, and we rely on a fixed access point together with IoT-based receivers (e.g., a router and ESP32 modules) to detect variations in the wireless channel caused by human presence and movement. In this setting, RSSI-only trilateration approaches are less directly applicable, as RSSI is primarily determined by large-scale path loss with variations mainly due to shadowing effects. By leveraging CSI features, our system can infer the user’s position without requiring the user to hold any equipment, which we argue offers a more practical solution for IoT-driven smart environments [24,25]. Experimentally, we found that using RSSI alone was insufficient to achieve accurate localization. Moreover, combining RSSI with CSI features resulted in negligible performance improvements compared to CSI-only models, consistent with prior work showing that CSI generally outperforms RSSI in localization accuracy [26,27]. For this reason, we report exclusively on CSI-based results in this manuscript. Nevertheless, a preliminary data analysis was conducted where multiple feature sets and signal-processing techniques were evaluated, including models based purely on RSSI inputs. The outcomes of these tests were systematically documented, but for conciseness only the final CSI-based system is presented here.

4.3. Signal Processing Pipeline

The proposed signal-processing pipeline consists of six key stages, as depicted in the flowchart of Figure 5. First, Raw CSI Extraction involves obtaining the raw channel state information (CSI) measurements from the received wireless signal stream. Next, FFT Shift & Subcarrier Selection is applied to center the frequency spectrum via FFT shift and retain only the subcarriers relevant to our analysis. Then, Temporal Smoothing filtering is applied to reduce high-frequency noise and fluctuations in the CSI magnitudes. After that, Segmentation partitions the continuous CSI signal into overlapping windows, enabling the model to capture temporal context while maintaining smooth transitions. Subsequently, Baseline Normalization is performed to remove slow-varying trends by normalizing the CSI values against a computed reference. Finally, Feature Extraction derives meaningful statistical metrics from each windowed segment to feed into downstream models or support visual interpretation.

4.3.1. Raw CSI Extraction

The CSI data is initially extracted from the Wi-Fi in the form of 128 values (64 complex values). Each measurement contains both real and imaginary components, which are combined to form complex numbers:

H_{raw} [i, t] = h_{r} [i, t] + j h_{i} [i, t], i = 0, \dots, 63,

(2)

where i, and t index the subcarriers, and time samples, respectively, and

h_{r} [\cdot]

,

h_{i} [\cdot]

are the respective real and imaginary components.

4.3.2. FFT Shift and Active Subcarrier Selection

Each CSI measurement consists of 64 complex subcarriers representing the frequency response of the wireless channel. However, not all subcarriers are equally useful for localization. Some carry only reference information, others serve as guard bands to reduce interference, and the central subcarrier (DC) is typically suppressed during transmission. These components do not vary significantly with the environment and are therefore excluded from the analysis.

To isolate the subcarriers that encode meaningful multipath information, we first perform an FFT shift to center the spectrum, aligning the zero-frequency (DC) subcarrier at the center of the array. After this transformation, we discard the subcarriers known to correspond to pilot tones, guard bands, and the central DC, resulting in a reduced set of active subcarriers. This selection ensures that only the most informative and environment-sensitive components are retained for further processing. The final subset of subcarriers is implementation-specific and may vary depending on the IEEE 802.11 variant or hardware configuration.

We then compute the magnitude of each selected subcarrier:

\tilde{H} [i, t] = |H_{sel} [i, t]|, i = 0, \dots, N_{sub} - 1 .

(3)

Here, t indexes the time sample and i indexes the selected subcarriers, where

N_{sub}

is the total number of subcarriers retained after filtering.

4.3.3. Normalization

To mitigate environmental static components and hardware variations, we employ a two-stage normalization process: Mean subtraction using empty-room calibration data:

H_{norm 1} [i, t] = {\tilde{H}}_{active} [i, t] - μ_{empty} [i],

(4)

where

μ_{empty} [i]

is the mean of subcarrier i in empty-room conditions. Amplitude normalization by the maximum absolute value:

H_{norm 2} [i, t] = \frac{H_{norm 1} [i, t]}{{max}_{t} | H_{norm 1} [i, t] |} .

(5)

4.3.4. Temporal Smoothing

A symmetric moving average filter with window size

2 w + 1

samples (with

w = 20

in our case) is applied to each subcarrier’s temporal trace to reduce high-frequency noise while preserving motion-induced variations:

H_{smooth} [i, t] = \frac{1}{2 w + 1} \sum_{k = - w}^{w} H_{norm 2} [i, t + k] .

(6)

The filter employs mirrored padding at signal boundaries to prevent edge artifacts.

4.3.5. Segmentation and Feature Extraction

As previously described, although the ESP32 devices collect CSI at a rate of 100 packets per second, only one out of every five packets is transmitted via UDP. This results in an effective sampling rate of approximately 20 packets per second at the receiver.

To process the temporal dynamics of the signal, the CSI data are segmented into sliding windows of 10 samples, corresponding to roughly 500 ms of data. This window duration strikes a balance between temporal responsiveness and statistical stability: it is short enough to capture movement-induced variations, yet long enough to compute reliable statistical features.

For each window W, two key statistical descriptors are extracted for every subcarrier: the mean and the standard deviation of the smoothed magnitude values. These features summarize the signal’s energy and variability within the window and are used as input to the machine learning models.

\begin{matrix} μ_{W} [i] & = \frac{1}{| W |} \sum_{t \in W} |H_{smooth} [i, t]|, \end{matrix}

(7)

\begin{matrix} σ_{W} [i] & = \sqrt{\frac{1}{| W |} \sum_{t \in W} {(| H_{smooth} [i, t] | - μ_{W} [i])}^{2}} . \end{matrix}

(8)

4.3.6. Feature Vector Construction

The final feature vector for each window concatenates the mean and standard deviation across all subcarriers:

x = [μ_{W} [0], \dots, μ_{W} [45], σ_{W} [0], \dots, σ_{W} [45]] \in R^{92} .

(9)

This compact representation captures both the average channel response and its temporal variability, providing discriminative features for subsequent positioning and classification algorithms.

5. Machine Learning-Based Localization Framework

In this work, we employ machine learning to infer two-dimensional grid coordinates from the CSI collected by multiple ESP nodes. We compare two Random Forest-based approaches: (1) a dual-stage classifier that predicts columns and rows independently, and (2) a unified multiclass classifier that predicts combined column–row labels. We also investigate the contribution of individual ESP nodes via an ablation study.

Random Forest was selected as the primary classification model based on its ability to handle high-dimensional input features and its relatively low sensitivity to overfitting. This makes it a practical choice for CSI-based localization tasks, where data dimensionality and environmental noise are significant factors. In addition, Random Forests provide feature importance scores, which support interpretability and enable targeted analysis of sensor contributions. While other algorithms, such as K-Nearest Neighbors (KNN) [28] were considered, Random Forest offered a good trade-off between performance and computational simplicity for our application. During training, we performed hyperparameter tuning on the Random Forest classifier to optimize performance without overfitting. All experiments were conducted using a fixed random seed of 200 to ensure reproducibility.

5.1. Model 1: Dual-Stage Random Forest for Hierarchical Classification

To develop a robust machine learning approach for indoor localization of individuals, we implemented a hierarchical classification strategy consisting of two distinct multiclass classifiers: one for column prediction (classes “a” through “e”) and another for row prediction (classes “01” through “12”). This separation is motivated by the spatial structure of our grid layout, where column and row variations are partially independent and affected differently by signal propagation and obstacle positioning. By decomposing the classification task into two simpler subproblems, we can reduce inter-class confusion and improve interoperability. Similar hierarchical models have been explored in other spatial prediction tasks where the target space has natural sub-dimensions [29].

This pipeline generates a feature matrix

X

, where each row represents a fixed duration time window of 500, and a corresponding label set

Y

containing pairs

〈 column, row 〉

that indicate the true grid coordinates for each instance. The final architecture comprises two independent classifiers: a column classifier responsible for the prediction of 5 classes in the set

{a, b, c, d, e}

, and a row classifier responsible for the prediction of 12 classes in the set

{01, 02, \dots, 12}

.

5.2. Model 2: Unified Random Forest for Direct Coordinate Prediction

In this formulation, we collapse the two-stage prediction into a single multiclass problem by assigning each of the 43 usable grid positions a unique integer label. The signal-processing pipeline is identical to that of Model 1. This produces a design matrix

X \in R^{M \times d}

and a label vector

Y_{raw} \in {1, \dots, 43}^{M}

.

6. Experiments and Results

6.1. Raw vs. Processed CSI Visualization

To first understand the impact of the signal-processing component, we present a visual comparison between raw CSI magnitudes and the resulting data after applying the complete signal-processing pipeline. Figure 6 shows the raw CSI magnitude traces collected from all four ESP32 devices when the environment is empty.

These measurements correspond to raw CSI values, captured directly from the ESP32 devices before any signal processing is applied. Although a frequency-domain transformation (FFT) and subcarrier selection are performed by the CSI driver, no additional filtering, normalization, or smoothing has been applied at this stage. As shown, the signals exhibit substantial noise and short-term fluctuations.

In contrast, Figure 7 illustrates the same CSI data after undergoing baseline normalization, temporal smoothing, and segmentation. As can be seen, the processed signals exhibit reduced variance and clearer patterns, highlighting the presence of slower-varying environmental features. These enhancements are crucial for ensuring that the extracted features (mean and standard deviation over windows) are both stable and discriminative, particularly when feeding into machine learning models.

This transformation highlights the importance of filtering and normalization steps to reduce noise, mitigate hardware inconsistencies, and enhance spatial sensitivity, ultimately improving the robustness and interpretability of the extracted features.

6.2. Dataset Partitioning and Baseline Performance

To evaluate the proposed system under realistic conditions, we first partitioned the complete dataset, composed of CSI-derived feature vectors labeled by position, into 70% for training and 30% for testing, using a fixed random seed to ensure reproducibility. This split guarantees that no test samples were seen during model training, thus preventing data leakage and ensuring fair evaluation. During training, we performed hyperparameter tuning on the Random Forest classifier to optimize performance without overfitting. In addition to the baseline configuration using all four ESP32 receivers, we conducted an ablation study by systematically removing one ESP at a time. This analysis allowed us to quantify the individual contribution of each receiver node to the overall system accuracy and to explore the potential for reducing hardware requirements without compromising performance.

6.3. Comparative Evaluation of Machine Learning Models

In Figure 8, we can see the baseline performance across all 43 positions when all four ESPs are used, using Model 2.

This baseline configuration corresponds to the full-sensor setup. Under Model 1 (dual-stage), it achieves a column prediction accuracy of 81.13% and a row prediction accuracy of 77.39%. Under Model 2 (unified multiclass), the overall accuracy reaches 77.09%. The strongest performance occurs in central areas with dense sensor overlap and favorable multipath propagation, while peripheral and corner positions remain more challenging.

Next, we performed a series of ablation experiments. To assess the contribution of each ESP32 node to overall system performance, we conducted an ablation study by systematically removing one ESP node at a time while keeping the access point (AP) location fixed. The objective was to identify which sensors are essential and whether redundant or uninformative data could hinder classification performance. Figure 9 shows the resulting spatial accuracies using Model 2, and Table 2 presents the corresponding metrics.

Removing ESP#1 caused a moderate drop in row accuracy, suggesting that its placement contributes to vertical discrimination. ESP#2 and ESP#3 exhibited a slight overall performance degradation, indicating their supportive but not critical role. Interestingly, excluding ESP#4 led to an improvement in all metrics, raising questions about its contribution. This may be due to ESP#4’s proximity to the AP router, where signal saturation or reduced multipath diversity could reduce its discriminative power. However, it is also possible that signal interference or triangulation effects involving ESP#2 and ESP#3 degrade the spatial diversity of received signals when ESP#4 is present. To confidently attribute the performance improvement to ESP#4’s placement relative to the AP, additional experiments with the AP relocated, for example, positioned next to ESP#2, would be necessary.

These results support the idea that ESP placement has a nontrivial effect on system performance and that strategic sensor positioning can reduce hardware requirements while enhancing classification accuracy.

It is important to emphasize that our objective was not solely to maximize localization accuracy under optimized placements, but to emulate realistic IoT deployment conditions. In practice, devices such as smart lights, thermostats, or shutters are typically installed at fixed, and often peripheral, positions that cannot be easily relocated. Accordingly, we deliberately maintained ESP#4 at its peripheral location to better reflect practical deployment scenarios rather than an optimized laboratory arrangement. Instead of repositioning nodes, we assessed their individual contributions through an ablation study, allowing us to identify which subset of devices most effectively supports accurate localization.

Although removing a different ESP might yield better performance for specific grid positions, we chose to proceed with the configuration that achieved the best overall results. Therefore, ESP#4 was excluded from the remainder of the analysis, as its removal consistently led to improved classification accuracy across the entire dataset. Additional experiments with varied environmental conditions or alternative AP placements would help further validate these conclusions and generalize sensor selection strategies.

6.4. Performance Evaluation of Model 1

The dual-stage hierarchical Random Forest classifier achieves strong performance across both dimensions of the localization task. On the held-out test set, it reached a column prediction accuracy of 83.94% and a row prediction accuracy of 81.41%.

Figure 10 presents the normalized confusion matrices for each subtask, confirming strong classification capabilities across all classes.

All column and row classes exceeded 50% accuracy, and most surpassed the 65% threshold, confirming consistent classification performance across the grid. The column classifier demonstrated particular robustness, even under environmental perturbations. Misclassifications were most frequent in column 4, which suggests that the interference from areas affected by metallic shelving and nearby electronic devices that distort the channel response is significant.

This behavior reflects a key characteristic of the dual-stage architecture: the column prediction task benefits from stronger spatial separability and reduced multipath variability compared to row prediction, which spans longer physical distances and intersects more physical barriers.

The results reinforce the effectiveness of decomposing the localization task into two orthogonal subproblems. This hierarchical structure not only enhances classification performance but also improves interpretability and modular optimization. As such, the dual-stage model remains well suited for structured indoor environments where spatial axes can be treated independently.

6.5. Performance Evaluation of Model 2

The unified Random Forest classifier simplifies the localization task into a single multiclass prediction across 43 unique grid coordinates. On the same test set, this model achieves an overall accuracy of 82.12%, as shown in Figure 11.

More than 65% of the positions were classified with over 80% accuracy. The best performance was observed in central grid locations, particularly those near the access point, such as positions b06 and B07, where multipath diversity and sensor overlap are at their peak. In contrast, peripheral and corner positions, including A12 and E05, exhibited lower performance. This degradation is likely due to reduced angular diversity and geometric symmetries that produce similar propagation patterns, limiting the CSI distinctiveness for those areas.

While the unified model lacks the decomposability and targeted subtask optimization of Model 1, it offers a streamlined, end-to-end classification pipeline that is simpler to deploy in practice. However, the increased confusion between spatially symmetric or weakly differentiated positions, especially near the room boundaries, highlights one of its key limitations.

These results underscore a trade-off, although Model 2 provides architectural simplicity and robust overall performance, Model 1 remains advantageous for task-specific tuning, spatial error interpretability, and targeted improvements

To evaluate the resilience of our system, we conducted a robustness analysis by selectively removing each ESP32 node from the feature set and re-evaluating the model’s performance. This analysis aimed to determine the individual contribution of each ESP to localization accuracy and to identify whether any sensor introduced redundancy or noise. As previously detailed in Table 2, the removal of ESP#1, ESP#2, or ESP#3 resulted in only slight performance degradation, indicating that these nodes play a supportive but not critical role. In contrast, the exclusion of ESP#4 produced a consistent and notable improvement across all evaluation metrics. Specifically, the model achieved a column accuracy of 83.94%, a row accuracy of 81.41%, and an overall accuracy of 82.12%.

This improvement is likely linked to ESP#4’s physical proximity to the AP router. Such positioning may lead to signal saturation and reduced multipath diversity, both of which diminish the discriminative power of the collected CSI. In contrast, ESP#1, ESP#2, and ESP#3 are more spatially distributed, contributing to greater angular diversity and stronger signal fingerprinting.

These findings confirm that not all sensors contribute equally to model performance. Selectively removing poorly positioned nodes can reduce feature redundancy and improve classification accuracy, highlighting the importance of thoughtful sensor placement in real-world deployments. This decision aligns with practical deployment constraints, where minimizing the number of sensors reduces hardware costs and installation complexity without sacrificing accuracy.

However, additional testing across different environmental conditions and AP placements is required to draw definitive conclusions about the general impact of each sensor. Future experiments should investigate whether similar trends emerge in alternative layouts or under varying wireless configurations, enabling a more comprehensive understanding of sensor contribution and interaction effects.

6.6. Analysis of Best-Performing Positions

To gain deeper insight into the model’s behavior, we examined the three highest and three lowest performing grid positions and generated targeted confusion heatmaps to identify the most frequent misclassifications associated with these extremes.

Figure 12 highlights the three best classified locations, C02 (third), B02 (second), and B07 (first).

Position C02 exhibits approximately a 3% confusion rate with both A01 and A03. This level of misclassification is readily explained by the immediate spatial proximity of those points and their shared adjacency to dense metal bookshelves, which introduce similar multipath reflections. It is noteworthy that C02 maintains high accuracy despite its location amid shelving and at a relatively large distance from the access point.

Position B07 shows an analogous pattern, with roughly 3% of its instances mistaken for B06. Given that B07 and B06 lie adjacent to one another near the access point, this modest confusion is an expected consequence of the room’s local geometry.

In contrast, position B02 is misclassified as C11 at a rate of approximately 3.2%, even though these two locations are not spatially adjacent. Initial examination suggests that B02 and C11 occupy approximately symmetric regions of the room, which may yield similar CSI signatures under certain environmental conditions. Alternatively, intermittent external disturbances, such as building vibrations or moving objects, could transiently alter the channel characteristics at these points. Further controlled experiments and temporal analysis will be necessary to isolate the underlying cause of this non-local confusion, which may prove challenging given the complexity of indoor propagation phenomena.

6.7. Analysis of Worst-Performing Positions

Figure 13 illustrates the three grid positions with the lowest classification accuracy: B11 (first), E05 (second), and C09 (third).

Position B11 is misclassified in 50% of trials. As shown in Figure 13a, the majority of errors occur with its immediate neighbors B10 and B12 (14.4% each), which is unsurprising given their spatial adjacency. Secondary confusion with C11 and A10 (3.6% each) likewise reflects proximity in the grid. However, unexpected misclassifications toward distant positions such as B01, B02, C03, and E07 (each 3.6%) suggest non-local multipath artifacts or sparse sampling in those regions. Augmenting the dataset with additional measurements at B11 may help to reduce these anomalies.

Position E05 exhibits an overall accuracy rate of 57.1% (Figure 13b). Predictable confusions occur with adjacent cells E06 (7.1%), E07 (3.6%), D08 (3.6%), and C08 (3.6%), consistent with local geometry and multipath overlap. Errors toward more distant points, specifically B01 (3.6%) and A11 (7.1%), are less readily explained by physical proximity. These non-local confusions may arise from centralized signal reflections or transient environmental noise at the center of column E.

Position C09 is only correctly classified 59.3% of the time (Figure 13c). Its errors are distributed among several surrounding positions, reflecting the dense multipath environment at this central location. Notably, a linear trend of confusion along the propagation path from ESP#3 to the access point hints at directional fading or constructive/destructive interference effects. Further controlled experiments, such as rotating obstacles or isolating individual antennas, will be necessary to validate these hypotheses and improve classification robustness in this region.

6.8. Latency Evaluation

We evaluated the end-to-end latency of the proposed system, defined as the time between CSI packet acquisition and the final localization output. The machine learning pipeline was executed on Google Colab’s hosted environment (Intel Xeon-class CPU, 2.2 GHz, single core, no GPU acceleration). Each decision requires a window of 10 CSI packets sampled at 100 Hz, which introduces a buffering delay of 100 ms. This effectively provides a 100 ms processing window before the next set of packets is acquired.

Within this window, the signal conditioning steps (as illustrated in Figure 5) and inference with the Random Forest classifier require, on average, 11.4 ms of computation time. Since this is well below the available 100 ms, the system achieves an overall end-to-end latency of approximately 111 ms. For comparison, prior CSI-based localization frameworks such as DeepFi and CiFi report inference times between 150–300 ms per sample [10,11]. These results indicate that the proposed Random Forest–based approach offers lower inference latency and competitive end-to-end performance, thereby satisfying the real-time requirements of device-free indoor localization.

7. Limitations and Future Directions

While our study demonstrates the feasibility and effectiveness of CSI-based indoor localization using commodity ESP32 devices, some limitations remain. Addressing these challenges is essential to ensure broader applicability and robustness in real-world deployments.

7.1. Single Antenna ESP32 Hardware Limits CSI Fidelity

The ESP32 devices used in our setup possess only a single receiving antenna. This hardware limitation restricts the collected CSI to only information of the magnitude, excluding phase measurements and spatial diversity cues available in multi-antenna systems. As a result, we cannot apply advanced MIMO-based localization or direction-of-arrival estimation techniques, which could significantly improve localization granularity. This constraint is acknowledged in our signal-processing pipeline design and justifies our reliance on amplitude-based statistical features.

7.2. Limited Environmental Diversity

All experiments were conducted in a single indoor space with a relatively static layout. As such, the models were not tested across varied floor plans, construction materials, or room geometries. This raises concerns about generalization. Future work should validate the approach in diverse environments and apply transfer learning or domain adaptation techniques to reduce the need for retraining.

7.3. Environmental Dynamics Not Fully Captured

The dataset does not include large-scale temporal dynamics such as furniture rearrangement, doors opening/closing, or frequent human activity. Though we did capture time-of-day variability, the current models may degrade under substantial structural or occupancy changes. Incorporating continual learning or online adaptation mechanisms could help models adjust to evolving conditions in real time.

7.4. Suboptimal Sensor Placement

The placement of ESP32 nodes was determined empirically rather than through formal optimization. As our ablation study shows, some nodes contribute less or even negatively to performance. Future efforts should explore sensor placement as a constrained optimization problem, possibly using greedy or evolutionary algorithms to balance accuracy and hardware cost.

7.5. No Multimodal Data Fusion

Our system relies solely on Wi-Fi CSI for localization. While this simplifies deployment, it may limit robustness in challenging conditions (e.g., occlusions, interference). Fusing CSI with auxiliary modalities, such as inertial sensors, Bluetooth RSSI, or even low-resolution camera data, could enhance performance and support system calibration, especially in symmetrical or low-diversity regions.

7.6. Future Research Directions

To address the identified limitations and progress toward real-world deployment, several research avenues should be explored. First, cross-environment generalization is essential. This involves collecting CSI data in multiple rooms and buildings with varying layouts and materials, and applying domain adaptation techniques to enable model transferability across different settings without extensive retraining.

Second, optimizing sensor placement remains a crucial direction. By leveraging optimization algorithms, it is possible to reduce the number of deployed nodes while maximizing spatial coverage and maintaining classification robustness, ultimately improving cost-efficiency and scalability.

At the same time, increasing the number of ESP32 modules is also expected to further improve accuracy, as denser deployments generally enhance localization performance [27,30]. In this work, however, we deliberately limited the number of nodes to evaluate whether a sparse setup could already provide meaningful results, reserving larger deployments and scenarios encompassing multiple divisions for future research.

A third direction involves dynamic adaptation. Techniques such as reinforcement learning or continual online learning could be used to help the model adjust to time-varying environments, including changes in occupancy, furniture layout, or signal interference, ensuring long-term resilience.

Temporal modeling using deep learning is another promising line of research. Incorporating deep spatiotemporal architectures, such as recurrent neural networks, CNN–LSTM hybrids, or attention-based models, can enhance the system’s ability to detect rapid motion and capture transient propagation effects.

Additionally, future systems should explore multi modal integration. By fusing CSI with auxiliary sensing modalities such as inertial measurements, Bluetooth RSSI, or low-resolution imagery, it is possible to disambiguate spatially similar CSI signatures and improve localization in challenging environments.

Finally, future work should focus on edge deployment and real-world validation. This includes implementing the pipeline on embedded hardware capable of real-time inference and conducting user studies to evaluate improvements in perceived Quality of Experience (QoE) in smart home or office scenarios.

8. Conclusions

This work presents a complete framework for a free device indoor localization based on Channel State Information (CSI) extracted from low-cost ESP32 devices. We demonstrate that, despite hardware limitations such as single-antenna receivers, CSI magnitudes collected on a 2.4 GHz IEEE 802.11ac channel can be processed into highly discriminative features for accurate indoor positioning. Our system integrates real-time CSI acquisition, signal preprocessing, statistical feature extraction, and machine learning classification using a dual-stage and a unified Random Forest model.

We validated the system in a realistic laboratory environment across 43 grid positions and showed that classification accuracy exceeds 82% under the best configuration. An ablation study revealed that, in a case study involving four ESP32 devices, removing one of them in certain scenarios actually improved performance, highlighting the critical influence of sensor placement on overall system effectiveness. These findings confirm that strategic node selection and minimal hardware can yield robust spatial inference, contributing to both cost efficiency and deployment scalability.

Beyond static localization, our approach sets the foundation for future Wi-Fi-based sensing applications in smart environments, including activity recognition and dynamic network adaptation. Future work will address environmental generalization, dynamic conditions, optimal sensor placement, and multimodal fusion to further improve robustness and extend applicability to more diverse real-world settings.

Author Contributions

Conceptualization, D.M., M.L., D.B., M.G. and J.C.S.; methodology, D.M., M.L., Ó.G.M., D.B., M.G. and J.C.S.; software, D.M. and M.L.; validation, D.M., M.L.,Ó.G.M., J.C.S., D.B. and M.G.; formal analysis, D.M., M.L., J.C.S., D.B. and M.G.; investigation, D.M., M.L., D.B., M.G. and J.C.S.; resources, D.M., M.L., Ó.G.M. and M.G.; data curation, D.M. and M.L.; writing—original draft preparation, D.M. and M.L.; writing—review and editing, D.M., M.L., Ó.G.M., J.C.S., D.B. and M.G.; visualization, D.M., M.L., Ó.G.M., D.B., J.C.S. and M.G.; supervision, D.B., M.G. and J.C.S.; project administration, M.G. and J.C.S.; funding acquisition, M.G. and J.C.S. All authors have read and agreed to the published version of the manuscript.

Funding

FCT/MEC funds this work through national funds and when applicable co-funded by European Regional Development Fund (FEDER), the Competitiveness and Internationalization Operational Programme (COMPETE 2020) of the Portugal 2020 framework, Regional OP Centro (POCI-01-0145-FEDER-030588) and Regional Operational Program of Lisbon (Lisboa-01-0145-FEDER-030588) and Financial Support National Public (OE) through national funds, Fundação para a Ciência e Tecnologia (FCT), I.P, under the projects 10.54499/UIDB/50008/2020 (DOI identifier: https://doi.org/10.54499/UIDB/50008/2020), UIDP/50008/2020 and the research grant 2024.06005.BD. It is also based upon work from COST Action 6GPHYSEC (CA22168), supported by COST (European Cooperation in Science and Technology).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. Restrictions apply to the availability of these data due to privacy or institutional policies.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Adame, T.; Carrascosa, M.; Bellalta, B. The TMB Path Loss Model for 5 GHz Indoor WiFi Scenarios: On the Empirical Relationship Between RSSI, MCS, and Spatial Streams. In Proceedings of the 2019 Wireless Days (WD), Manchester, UK, 24–26 April 2019; pp. 1–8. [Google Scholar] [CrossRef]
Havinga, T.; Jiao, X.; Liu, W.; Chen, B.; Shahid, A.; Moerman, I. Wi-Fi 6 Cross-Technology Interference Detection and Mitigation by OFDMA: An Experimental Study. CoRR 2025, arXiv:2503.05429. [Google Scholar] [CrossRef]
Zhang, J.A.; Rahman, M.L.; Wu, K.; Huang, X.; Guo, Y.J.; Chen, S.; Yuan, J. Enabling Joint Communication and Radar Sensing in Mobile Networks—A Survey. IEEE Commun. Surv. Tutor. 2022, 24, 306–345. [Google Scholar] [CrossRef]
IEEE 802.11bf-2025; IEEE Approved Draft Standard for Information Technology—Telecommunications and Information Exchange Between Systems Local and Metropolitan Area Networks—Specific Requirements—Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications—Amendment 4: Enhancements for Wireless Local Area Network (WLAN) Sensing. IEEE Draft Standard; IEEE: New York, NY, USA, 2025. Available online: https://standards.ieee.org/ieee/802.11bf/11574/ (accessed on 1 May 2025).
Bahl, P.; Padmanabhan, V.N. RADAR: An In-Building RF-Based User Location and Tracking System. In Proceedings of the IEEE INFOCOM 2000, Tel Aviv, Israel, 26–30 March 2000; Volume 2, pp. 775–784. [Google Scholar] [CrossRef]
Wang, X.; Gao, L.; Mao, S.; Pandey, S. CSI-Based Fingerprinting for Indoor Localization: A Deep Learning Approach. IEEE Trans. Veh. Technol. 2017, 66, 763–776. [Google Scholar] [CrossRef]
Tlili, S.; Sami, M.; Val, T. The Internet of Things enabling communication technologies, applications and challenges: A survey. Int. J. Wirel. Mob. Comput. 2022, 23, 9–21. [Google Scholar] [CrossRef]
Xiao, J.; Wu, K.; Yi, Y.; Ni, L.M. FIFS: Fine-Grained Indoor Fingerprinting System. In Proceedings of the 2012 International Conference on Computer Communications and Networks (ICCCN), Munich, Germany, 30 July–2 August 2012; pp. 1–7. [Google Scholar] [CrossRef]
Youssef, M.; Agrawala, A. The Horus WLAN Location Determination System. In Proceedings of the 3rd International Conference on Mobile Systems, Applications, and Services (MobiSys), Seattle, WA, USA, 6–8 June 2005; pp. 205–218. [Google Scholar] [CrossRef]
Wang, X.; Gao, L.; Mao, S.; Pandey, S. DeepFi: Deep Learning for Indoor Fingerprinting Using Channel State Information. In Proceedings of the 2015 IEEE Wireless Communications and Networking Conference (WCNC), New Orleans, LO, USA, 9–12 March 2015. [Google Scholar] [CrossRef]
Wang, Z.; Guo, B.; Yu, Z.; Zhou, X. Wi-Fi CSI-Based Behavior Recognition: From Signals, Actions to Activities. IEEE Commun. Mag. 2018, 56. [Google Scholar] [CrossRef]
Gassner, A.; Musat, C.; Rusu, A.; Burg, A. OpenCSI: An Open-Source Dataset for Indoor Localization Using CSI-Based Fingerprinting. arXiv 2021, arXiv:2104.07963. [Google Scholar] [CrossRef]
Berruet, B.; Baala, O.; Caminada, A.; Guillet, V. DelFin: A Deep Learning-Based CSI Fingerprinting Indoor Localization in IoT Context. In Proceedings of the 2018 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Nantes, France, 24–27 September 2018. [Google Scholar] [CrossRef]
Zhou, M.; Long, Y.; Zhang, W.; Pu, Q.; Wang, Y.; Nie, W.; He, W. Adaptive Genetic Algorithm-Aided Neural Network With Channel State Information Tensor Decomposition for Indoor Localization. IEEE Trans. Evol. Comput. 2021, 25, 913–927. [Google Scholar] [CrossRef]
Bi, J.; Zhao, M.; Yao, G.; Cao, H.; Feng, Y.; Jiang, H.; Chai, D. PSOSVRPos: WiFi Indoor Positioning Using SVR Optimized by PSO. Expert Syst. Appl. 2023, 222, 119778. [Google Scholar] [CrossRef]
Li, B.; Zhang, S.; Shen, S. CSI-Based WiFi-Inertial State Estimation. In Proceedings of the 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Baden-Baden, Germany, 19–21 September 2016; pp. 244–250. [Google Scholar] [CrossRef]
Mottakin, K.; Davuluri, K.; Allison, M.; Song, Z. SWiLoc: Fusing Smartphone Sensors and WiFi CSI for Accurate Indoor Localization. Sensors 2024, 24, 6327. [Google Scholar] [CrossRef] [PubMed]
Martins, O.G.; Åkesson, H.; Gomes, M.; Osorio, D.P.M.; Sen, P.; Vilela, J.P. Delving Into Security and Privacy of Joint Communication and Sensing: A Survey. IEEE Open J. Commun. Soc. 2025, 6, 4978–5004. [Google Scholar] [CrossRef]
Hernandez, S.M.; Bulut, E. Lightweight and Standalone IoT Based WiFi Sensing for Active Repositioning and Mobility. In Proceedings of the 2020 IEEE 21st International Symposium on “A World of Wireless, Mobile and Multimedia Networks” (WoWMoM), Cork, Ireland, 31 August–3 September 2020; pp. 277–286. [Google Scholar] [CrossRef]
Espressif Systems. ESP32-WROOM-32E & ESP32-WROOM-32UE Datasheet. Rev. 1.3, March 2023. Available online: https://www.espressif.com/sites/default/files/documentation/esp32-wroom-32e_esp32-wroom-32ue_datasheet_en.pdf (accessed on 1 February 2025).
Hernandez, S.M.; Bulut, E. ESP32 CSI Tool. 2023. Available online: https://github.com/StevenMHernandez/ESP32-CSI-Tool (accessed on 1 February 2025).
Xiao, Y. IEEE 802.11n: Enhancements for higher throughput in wireless LANS. Wirel. Commun. IEEE 2006, 12, 82–91. [Google Scholar] [CrossRef]
Kumar, V.; Arablouei, R.; Jurdak, R.; Kusy, B.; Bergmann, N.W. RSSI-based self-localization with perturbed anchor positions. In Proceedings of the 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, Canada, 8–13 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Wu, C.; Yang, Z.; Liu, Y. Smartphones Based Crowdsourcing for Indoor Localization. IEEE Trans. Mob. Comput. 2015, 14, 444–457. [Google Scholar] [CrossRef]
Dang, X.; Si, X.; Hao, Z.; Huang, Y. A Novel Passive Indoor Localization Method by Fusion CSI Amplitude and Phase Information. Sensors 2019, 19, 875. [Google Scholar] [CrossRef] [PubMed]
Mendez, D.; Zennaro, M.; Altayeb, M.; Manzoni, P. On TinyML WiFi Fingerprinting-Based Indoor Localization: Comparing RSSI vs. CSI Utilization. In Proceedings of the 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 6–9 January 2024; pp. 1–6. [Google Scholar] [CrossRef]
Xiao, J.; Wu, K.; Yi, Y.; Wang, L.; Ni, L.M. Pilot: Passive Device-Free Indoor Localization Using Channel State Information. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems, Philadelphia, PA, USA, 8–11 July 2013; pp. 236–245. [Google Scholar] [CrossRef]
Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. In Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019; pp. 1255–1260. [Google Scholar] [CrossRef]
Calderoni, L.; Ferrara, M.; Franco, A.; Maio, D. Indoor localization in a hospital environment using Random Forest classifiers. Expert Syst. Appl. 2015, 42, 125–134. [Google Scholar] [CrossRef]
Hamdoun, S.; Rachedi, A.; Benslimane, A. Comparative analysis of RSSI-based indoor localization when using multiple antennas in Wireless Sensor Networks. In Proceedings of the 2013 International Conference on Selected topics in mobile and wireless networking (MoWNeT), Montreal, QC, Canada, 19–21 August 2013; pp. 146–151. [Google Scholar] [CrossRef]

Figure 1. End-to-end CSI-based localization pipeline.

Figure 2. The layout plan of the environment (5.6 m × 3.2 m) with reference objects and node identification. The floor plan is divided into a matrix of equal rectangles, each uniquely identified by a tuple (letter, number), where letters range from A to E and numbers from 1 to 12.

Figure 3. Real experimental setup showing ESP32 node placements.

Figure 4. Allocation of data, pilot, guard, and DC subcarriers in a 20 MHz IEEE 802.11n channel. Each colored line corresponds to the magnitude of a CSI vector for one received packet, plotted across all subcarriers.

Figure 5. Proposed signal-processing pipeline comprising six key stages, from raw CSI extraction to feature extraction.

Figure 6. Raw CSI magnitudes captured from all ESP32 devices in an empty environment. The signals exhibit high variability and short-term fluctuations.

Figure 7. Smoothed and normalized CSI after full signal-processing pipeline, revealing stable patterns for feature extraction.

Figure 8. Baseline accuracy heatmap using all four ESPs. Each cell’s color intensity reflects the proportion of correct predictions at that grid coordinate, highlighting high-accuracy central regions and more challenging peripheral positions, using Model 2.

Figure 9. Spatial accuracy heatmaps when each ESP32 is removed individually. Color intensity indicates per-position classification accuracy, using Model 2.

Figure 10. Confusion matrices for the dual-stage classifier. Diagonal dominance confirms strong per-class prediction accuracy, especially for columns.

Figure 11. Normalized confusion matrix for the unified classifier. The diagonal represents correct predictions for each of the 43 grid positions.

Figure 12. Confusion heatmaps for the three best-classified grid positions: C02, B02, and B07. Each subplot shows the percentage of instances of the target position that were predicted as each other position, highlighting the primary confusion pairs.

Figure 13. Confusion heatmaps for the three worst-classified grid positions: B11, E05, and C09. Each subplot shows the percentage of instances of the target position that were predicted as each other position, highlighting the primary confusion pairs.

Table 1. Summary of hardware and software components used in the experiment.

Component	Specification/Description
Access Point	Cisco AIR-CAP1702I-E-K9
ESP32 boards	4 × ESP32-WROOM-32E modules
CSI collection software	ESP32-CSI-Tool (Active Station mode)
Data transmission	UDP protocol
Receiver	Desktop
Operating system	Ubuntu 24.04.2 LTS
Additional tools	MATLAB R2023b and Google Colab

Table 2. Effect of removing each ESP32 node on localization accuracy.

Excluded ESP	Column Accuracy (%)	Row Accuracy (%)	Overall Accuracy (%)
None (all ESPs used)	81.13	77.39	77.09
ESP#1	81.33	73.02	73.82
ESP#2	78.04	74.86	75.02
ESP#3	78.36	74.18	73.05
ESP#4	83.94	81.41	82.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Macedo, D.; Loureiro, M.; Martins, Ó.G.; Sousa, J.C.; Belo, D.; Gomes, M. From CSI to Coordinates: An IoT-Driven Testbed for Individual Indoor Localization. Future Internet 2025, 17, 395. https://doi.org/10.3390/fi17090395

AMA Style

Macedo D, Loureiro M, Martins ÓG, Sousa JC, Belo D, Gomes M. From CSI to Coordinates: An IoT-Driven Testbed for Individual Indoor Localization. Future Internet. 2025; 17(9):395. https://doi.org/10.3390/fi17090395

Chicago/Turabian Style

Macedo, Diana, Miguel Loureiro, Óscar G. Martins, Joana Coutinho Sousa, David Belo, and Marco Gomes. 2025. "From CSI to Coordinates: An IoT-Driven Testbed for Individual Indoor Localization" Future Internet 17, no. 9: 395. https://doi.org/10.3390/fi17090395

APA Style

Macedo, D., Loureiro, M., Martins, Ó. G., Sousa, J. C., Belo, D., & Gomes, M. (2025). From CSI to Coordinates: An IoT-Driven Testbed for Individual Indoor Localization. Future Internet, 17(9), 395. https://doi.org/10.3390/fi17090395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From CSI to Coordinates: An IoT-Driven Testbed for Individual Indoor Localization

Abstract

1. Introduction

2. Related Work

3. Testbed Setup and Data Collection

3.1. Setup

3.2. Toolbox

3.3. 802.11n 20 MHz Structure

3.4. Data Collection Strategy

4. Signal Processing

4.1. Channel State Information (CSI) Fundamentals

4.2. CSI vs. RSSI

4.3. Signal Processing Pipeline

4.3.1. Raw CSI Extraction

4.3.2. FFT Shift and Active Subcarrier Selection

4.3.3. Normalization

4.3.4. Temporal Smoothing

4.3.5. Segmentation and Feature Extraction

4.3.6. Feature Vector Construction

5. Machine Learning-Based Localization Framework

5.1. Model 1: Dual-Stage Random Forest for Hierarchical Classification

5.2. Model 2: Unified Random Forest for Direct Coordinate Prediction

6. Experiments and Results

6.1. Raw vs. Processed CSI Visualization

6.2. Dataset Partitioning and Baseline Performance

6.3. Comparative Evaluation of Machine Learning Models

6.4. Performance Evaluation of Model 1

6.5. Performance Evaluation of Model 2

6.6. Analysis of Best-Performing Positions

6.7. Analysis of Worst-Performing Positions

6.8. Latency Evaluation

7. Limitations and Future Directions

7.1. Single Antenna ESP32 Hardware Limits CSI Fidelity

7.2. Limited Environmental Diversity

7.3. Environmental Dynamics Not Fully Captured

7.4. Suboptimal Sensor Placement

7.5. No Multimodal Data Fusion

7.6. Future Research Directions

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI