Channel State Information (CSI) Amplitude Coloring Scheme for Enhancing Accuracy of an Indoor Occupancy Detection System Using Wi-Fi Sensing

Son, Jaeseong; Park, Jaesung

doi:10.3390/app14177850

Open AccessArticle

Channel State Information (CSI) Amplitude Coloring Scheme for Enhancing Accuracy of an Indoor Occupancy Detection System Using Wi-Fi Sensing

by

Jaeseong Son

and

Jaesung Park

^*

School of Information Convergence, Kwangwoon University, Seoul 01897, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7850; https://doi.org/10.3390/app14177850

Submission received: 30 July 2024 / Revised: 28 August 2024 / Accepted: 2 September 2024 / Published: 4 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Indoor occupancy detection (IOD) via Wi-Fi sensing capitalizes on the varying patterns in CSI (Channel State Information) to estimate the number of people in a given area. However, the precision of such systems heavily depends on the quality of the CSI data, which can be degraded by noise and environmental factors. To address this issue, In this paper, we present a CSI preprocessing method to improve the accuracy of IOD systems using Wi-Fi sensing. Unlike existing preprocessing methods that use computationally complex signal processing or statistical techniques, we expand the dimension of CSI amplitude data into a three-channel vector through nonlinear transformation to amplify subtle differences between CSI data belonging to a different number of people. By drawing clearer boundaries between CSI data distributions belonging to a different number of people in a monitored area, our method improves the people-counting accuracy of a Wi-Fi sensing system. To ensure temporal consistency and improve data quality, we discretize the CSI measurements based on their transmission periods and aggregate consecutive measurements over a given time interval. These samples are then fed into a Convolutional Neural Network (CNN) specifically trained for the IOD task. Experimental results in diverse real-world scenarios verify that compared to the traditional methods, the enhanced feature representation capability of our approach leads to more accurate and robust sensing outcomes even in the most resource-constrained environment, where a commercial off-the-shelf CSI capture machine with only one antenna is used when a Wi-Fi sender with one transmit antenna sends packets periodically to the channel with the smallest Wi-Fi channel bandwidth.

Keywords:

Wi-Fi sensing; channel state information (CSI) data preprocessing; dimension expansion; CSI amplitude coloring; people counting; convolutional neural network (CNN); real-world experiments

1. Introduction

Measuring the number of people in indoor environments plays a pivotal role in preventing the spread of infectious diseases like COVID-19 and avoiding safety accidents. The existing research on people counting primarily utilizes image-processing techniques that acquire images through cameras and analyze head counts from the obtained images to estimate the number of people [1,2]. Since cameras can visually capture real-life situations, they can provide data for people to easily understand and analyze. In addition, camera-based systems are able to determine the number of people by leveraging high-resolution images with the latest deep learning and computer vision algorithms.

However, camera-based people-counting systems present the following issues, which limits the system capabilities and their widespread use. First, capturing images by using cameras involves directly recording individuals’ faces and bodies, which leads to the privacy concerns and potential legal issues in certain countries [3]. Second, the accuracy of camera-based systems are highly dependent on environmental conditions. Low lighting, strong backlighting, and shadows can degrade the quality of images captured by cameras, which makes accurate IOD difficult. Additionally, when people are overlapped or obscured by objects and when blind spots occur, the accuracy of counting people decreases. Furthermore, installing and maintaining high-quality cameras and the related infrastructure can be costly.

To address these issues, wireless sensing systems have attracted a lot of attention. By analyzing the varying patterns of wireless signals, such as Infrared, UWB (Ultra Wide-Band), and Wi-Fi in a surveillance region, they detect and interpret the changes in the region. All of these technologies enable sensing even in low-light conditions. However, compared to Wi-Fi signals, Infrared and UWB have shorter sensing ranges and their signals can be blocked by obstacles. Thus, the performance and the utility of wireless sensing systems using Infrared and UWB are limited. In addition, Wi-Fi networks are pervasive in homes, offices, and public spaces. This ubiquity makes Wi-Fi sensing a highly accessible, scalable, and cost-effective solution for people counting.

RSS (Received Signal Strength) and CSI (Channel State Information) are two representative Wi-Fi signal data that have been utilized for Wi-Fi sensing. Since signal strength decreases as the distance between a sender and a receiver increases, RSS has been primarily used for positioning techniques in early Wi-Fi sensing research [4]. However, RSS only provides general information about the measurement environment and is susceptible to random variations in the environment [5]. Recently, CSI has been used for Wi-Fi sensing because it provides more detailed information about the wireless signal propagation environment [6]. Thus, in this paper, we use CSI as Wi-Fi signal data for people counting.

On the other hand, deep learning models have been advanced remarkably. These models significantly enhance the accuracy of classification and regression problems. Consequently, to perceive environments in a device-free manner, research is actively underway to analyze Wi-Fi signals measured in various environments using deep learning techniques [7,8]. More specifically, to adopt a deep learning model, the people-counting problem by using Wi-Fi signals is often regarded as a classification problem. Each distinct count of people in a monitored area is treated as a separate class. Then, a deep learning model is trained to learn the classification boundaries among the classes to categorize the number of people in the area. Various deep learning models have been used for Wi-Fi sensing, which includes MLP (Multilayer Perceptron), CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), and more recently, transformer models. Each of these architectures has its strengths and is chosen based on the specific requirements of the sensing task. In [9], a comprehensive study that quantitatively compares the performance of these diverse deep learning models using publicly available CSI datasets is conducted. This research highlights the capability of these models to discern subtle differences between input classes.

However, a common challenge emerges. As the distinctions between input data from different classes become more minute, the sensing accuracy of these deep learning models tends to decline [10,11]. The wireless signal propagation environment changes randomly over time and space. In addition, the hardware in the Wi-Fi transmitters and receivers are not perfect. Thus, the measured CSI values exhibit random fluctuations even when the monitoring area and conditions remain constant. The fluctuating nature of these measurements presents a substantial obstacle to achieving precise categorization with deep learning algorithms. To address the challenges posed by the CSI variability and improve the classification accuracy of a deep learning model, various CSI data representation methods have been proposed. These methods aim to enhance the quality of input data provided to a deep learning model by reducing noise and random fluctuations in the CSI measurements [12], or extracting more robust and informative features from the raw CSI data [13,14]. These techniques consider rapidly changing components in the measured CSI values as noise and remove them to reduce variability in the data. In other words, they extract features that show less variation over time by assuming that these more stable features will be more informative for a deep learning model for classification. However, the extracted values often do not differ significantly for the number of people (i.e., class) in the monitored area. These subtle differences make it challenging for a deep learning model to accurately classify the number of people.

To resolve the issue, in this paper, we propose a novel CSI feature representation method that can better capture the class-specific characteristics in CSI data while still mitigating the effects of noise and random fluctuations. As is noted in [15], it is very hard to ensure the existence of specific features performing better than others for all applications and scenarios. This is mainly attributed to the wide spectrum of the environments where Wi-Fi sensing systems operate. For example, the Wi-Fi signal transmission environment differs for each location where Wi-Fi sensing is applied. The strength of noise and interference at the moment of CSI measurement varies. Furthermore, the specifications, locations, and configurations of Wi-Fi sensing transmitters and receivers are different. Thus, our goal in this paper is not to find a panacea method that always gives the best performance for all classes in all Wi-Fi sensing environments but rather to find a method that performs well in most environments and for most classes.

To achieve the goal, we propose a method to transform CSI amplitude data collected over a time interval into a color image by using the jet colormap [16]. Colormaps are important tools used in data visualization to represent data by converting them into colors. Various colormaps have been designed by considering various factors such as color theory, the human visual system, and the relationship between colors and data [17,18]. Our purpose is not data visualization but to better distinguish CSIs belonging to different classes in a higher-dimensional space. Therefore, we use a colormap not as a tool to transform data into a form suitable for the human visual system without visual distortion in color changes but as a tool to amplify subtle differences in CSI data. The jet colormap is a rainbow color palette which can represent scalar data with uniform differences nonuniformly through nonlinear changes in brightness and color [19]. Therefore, when the jet colormap is used, data boundaries or features may appear more exaggerated than they actually are [20]. By using the characteristics of the jet colormap, we finely segment each CSI amplitude of each subcarrier at a measured moment into three-dimensional RGB (Red–Green–Blue) channel data through the nonlinear transformation of the jet colormap. During the segmentation of a scalar CSI amplitude value of a subcarrier into three distinct data points, we amplify the subtle differences in the CSI data belonging to different classes so that a deep learning model can better distinguish them. In other words, by separating the CSI features among classes more widely in the input layer of a deep learning model, we enhance the classification accuracy of the deep learning model used for estimating the number of people in a monitored region.

To verify our approach, we carry out people-counting experiments across three different real-life scenarios and compare the results obtained by our method to those when other conventional preprocessing methods are applied. The first scenario identifies the number of people inside a typical seminar room from outside the room while people are seated or moving around in the monitored room. In the second scenario, we identify the number of people standing in a row in a classroom. In this scenario, a deep learning model is challenged to correctly estimate the number of people when people are overlapped and obscured. The third scenario detects the presence of a person in a T-shaped corridor by changing the position of a person in the part of the corridor that is not visible from the measurement point. In light of the research findings in [15,21], where various preprocessing methods are classified into a few categories, we choose four representative preprocessing methods for comparison, each of which belongs to different categories. The first method which uses the median feature belongs to time statistics category. Among the methods in the category using the frequency feature, we use the FFT (Fast Fourier Transform) method as the second alternative method. The third method uses the PCA (Principle Component Analysis) and belongs to the dimensionality reduction category. The fourth method, which utilizes the characteristics of CNN, considers the measured CSI as a grayscale image. The experimental results show that, compared to existing preprocessing techniques, the proposed method improves the people-counting accuracy of a deep learning model in all experimental environments even when we use low-cost commercial off-the-shelf Wi-Fi transceivers having only one receive antenna and one transmit antenna.

The rest of the paper is organized as follows. In Section 2, we present the related works. We explain the experiment scenarios and the datasets measured in each scenario in Section 3. In Section 4, we detail our CSI amplitude coloring method and discuss the experimental results in Section 5. We conclude the paper with future research directions in Section 6.

2. Related Works

2.1. Channel State Information and Its Relation to Wi-Fi Sensing

CSI in a Wi-Fi system is a collection of data that describes how a wireless signal propagates from a transmitter to a receiver. CSI captures the characteristics of wireless signal propagation in physical environments, which includes the combined effect of diffraction, reflection, and scattering. Contemporary wireless networks adhering to the IEEE 802.11 standard employ advanced techniques like Multiple-Input Multiple-Output (MIMO) and Orthogonal Frequency Division Multiplexing (OFDM) at the physical layer. These technologies aim to enhance the data capacity and improve channel orthogonality in the environments affected by multipath propagation. For each transmitter–receiver antenna pair, CSI provides detailed information about the phase shift and amplitude attenuation experienced by the signal across multiple paths for each subcarrier. Specifically, the CSI for a subcarrier k between a transmitter with

N_{t}

antennas and a receiver with

N_{r}

antennas is often represented by a complex matrix

H (k)

\begin{matrix} H (k) & = [\begin{matrix} h_{11} (k) & \dots & h_{1 N_{t}} (k) \\ h_{21} (k) & \dots & h_{2 N_{t}} (k) \\ ⋮ & ⋮ & ⋮ \\ h_{N_{r} 1} (k) & \dots & h_{N_{r} N_{t}} (k), \end{matrix}], k \in [1, K] \end{matrix}

where K represents the number of subcarriers used by a Wi-Fi system, and

h_{r c} (k)

represents the complex channel gain from the c-th transmitting antenna to the r-th receiving antenna. Each element

h_{r c} (k)

in the CSI matrix

H (k)

can be further decomposed into the amplitude and phase components as follows:

h_{r c} (k) = | h_{r c} (k) | * e x p (j θ_{r c} (k)),

(1)

where

| h_{r c} (k) |

is the amplitude component, and

θ_{r c} (k)

is the phase component of

h_{r c} (k)

.

Since CSI contains both amplitude and phase information for each subcarrier and each transmitting antenna and receiving antenna pair, it offers higher-resolution data compared to simple RSS measurements. Therefore, CSI data can be regarded as Wi-Fi images of the signal propagation environment. The changes in the environment by the presence and movement of people affect the Wi-Fi signal propagation, and these alterations in the signal path cause variations in the CSI. Since different number of people create subtle but different patterns in the CSI, it is possible to count the number of people by learning the patterns of CSI variations, which is the principal concept behind the CSI-based people-counting system.

Among the two components in each

h_{r c} (k)

, the phase component has been mainly used in model-based Wi-Fi sensing systems that aim to explicitly describe the Wi-Fi signal propagation environment [22]. However, the presence of random offsets in the phase component makes it challenging to extract stable phase values when compared to the amplitude component [23]. Therefore, in general, deep learning-based Wi-Fi sensing systems usually utilize the amplitude component. Specifically, deep learning-based Wi-Fi sensing systems stabilize the amplitude component through preprocessing to make it more suitable for deep learning model training [9]. Our research aligns with this trend. We utilize only the amplitude component of CSI data. It is well established that even a sophisticated deep learning model can only perform as well as the quality of the data fed into the input layer of the model allows. With this in mind, we propose an enhanced CSI amplitude data preparation method to increase the accuracy of a deep learning-based Wi-Fi sensing system for detecting the number of people in a wide range of real-world scenarios.

2.2. CSI Data Preparation Methods

A variety of CSI data preparation strategies have been developed to enhance the precision of Wi-Fi sensing systems that utilize deep learning, which includes methods to physically increase the number of subcarriers in the CSI measurements, strategies to filter out random noise from the measured CSI, approaches to convert CSI measurement data series into images, and techniques to reduce data dimensionality.

Generally, as the dimension of the input layer in a deep learning model increases, the input data contain a more diverse set of features. Accordingly, as the input data dimension increases, the deep learning model becomes able to approximate more complex functions more correctly. To improve the accuracy of the deep learning models for Wi-Fi sensing, it has been proposed to expand the dimension of CSI data by increasing the number of subcarriers (i.e., K) and the number of transmitting and receiving antennas (i.e.,

N_{t}

, and

N_{r}

). As K increases, more detailed information about the Wi-Fi signal propagation environment can be obtained. According to the Wi-Fi standard, both K and the data transmission rate increase with the bandwidth of a Wi-Fi channel. For example, in the IEEE 802.11/ac standard, when the Wi-Fi bandwidth is 20 MHz, the total number of subcarriers is 56, whereas when the bandwidth increases to 40 MHz, the number of subcarriers increases to 114, resulting in an approximately 2.3 times faster data transmission speed compared to the 20 MHz case [24]. However, since the channel bandwidth of a Wi-Fi system is determined by Wi-Fi standards [25], it is not possible to arbitrarily increase the number of subcarriers in CSI data. To address this issue, the authors in [12] proposed a method of splicing multiple Wi-Fi channels to increase the number of subcarriers in CSI data for Wi-Fi sensing. As can be seen in Equation (1), CSI data are measured for each pair of transmitting and receiving antennas. Therefore, the dimension of CSI data increases with the number of transmitting and receiving antenna pairs, even when the Wi-Fi channel bandwidth is fixed [26,27]. To enhance data transmission speeds in modern laptops and smartphones, most COTS Wi-Fi chips are equipped with multiple transmit and receive antennas. However, IoT systems like Wi-Fi sensing systems are made up of numerous microcontrollers, each typically equipped with a Wi-Fi chip having a single Tx and Rx antenna. Although the price difference between Wi-Fi chips with multiple antennas and those with a single antenna is not significant, deploying a Wi-Fi sensing system over a large area, such as a building, requires a vast number of microcontrollers. As a result, using Wi-Fi chips with multiple antennas can significantly increase the overall system installation cost.

Strategies for removing random noise from CSI measurements consider CSI data arranged by measurement time as time-series data. These methods can be classified into two categories. The methods in the first category calculate statistical measures such as the median and variance of CSI data within a fixed time window to remove noise contained in the CSI measurements in the time domain [13,15]. The methods in the second category operate in the frequency domain. They remove noise in the CSI data by transforming the CSI time series into the frequency domain and then eliminating the high-frequency components [28,29]. These noise removal methods can easily remove impulse noise or high-frequency noise. However, since the noise present in the CSI data is not known in advance, the detailed CSI information useful for classification may be lost during the noise removal process.

Spectograms are often used to convert CSI time-series data into images which contain not only the information in the time domain but also the information in the frequency domain [30,31]. By using spectrograms, features of CSI data can be extracted simultaneously in both the time and frequency domains, which can help to improve the performance of Wi-Fi sensing. However, the process of generating spectrograms is computationally expensive, which particularly poses challenges for real-time analysis in resource-constrained Wi-Fi devices. Additionally, it requires setting multiple parameters, which increases the likelihood of inaccurate results due to parameter setting errors.

Even though high-dimensional CSI data can help to improve the accuracy of Wi-Fi sensing, it can also lead to increased computational costs and model over-fitting issues. Therefore, methods for generating low-dimensional features have been proposed. They extract only important information from the measured CSI data by using PCA (Principle Component Analysis) or autoencoder [32,33]. These techniques can maintain the main components of the measured CSI data while reducing noise. However, some useful information may be lost in the process of reducing dimensions, and the complexity of calculations may make real-time application difficult.

3. CSI Measurement Environments

3.1. Experimental Scenarios

Prior research has predominantly focused on CSI measurements in scenarios where there is a line-of-sight (LoS) between a Wi-Fi transmitter and a CSI receiver. In addition, these studies typically concentrate on counting the number of stationary individuals in the environment. In this study, as shown in Figure 1, we collect CSI data in several challenging scenarios that push the boundaries of traditional Wi-Fi sensing. These challenging environments are chosen to rigorously test the robustness and versatility of the CSI-based people-counting system. By showing the successful operation of our method under these conditions, we aim to demonstrate the potential of our approach in real-world situations where ideal line-of-sight conditions are rarely available.

In the first scenario named TTW (Through The Wall), we set up an environment where the Wi-Fi transmitter and the CSI measurement device are separated by two walls. This setup tests the system’s ability to penetrate physical obstacles. In this scenario, people are in the typical seminar room. To capture the diverse scenarios that may occur in a typical seminar room, we conduct a series of CSI measurements. During these measurements, we systematically vary the location and movement patterns of occupants within the seminar room. This approach allows us to encompass a wide range of potential spatial configurations and mobility dynamics that are commonly encountered in seminar room environments. The specific variations we implement are as follows: all people are sitting in a chair and do not move around, all people are standing in a row along a wall and do not move, people are clustered and the number of cluster is changed, some people are walking around while the others are sitting, and all people are moving around in a random direction and velocity. We do not impose any restrictions on their movement. Since each participant is free to decide their movement speed and direction based on their own judgment, the measured CSI data include various CSI patterns caused by movement of people. We vary the number of people in a room from zero to five. For each variation in the seminar room, we also conduct CSI measurements in a setting where the Wi-Fi signal propagation is disrupted by individuals moving at random speeds and directions through the corridor between two rooms. We also do not impose any restrictions on the movement patterns of each individuals disturbing Wi-Fi signals. Thus, their movements are reflected as noise in the measured CSI.

The second CSI measurement scenario is called Queuing. For this scenario, we position individuals in a linear formation between the Wi-Fi transmitter and the CSI receiver at the front of a classroom. This arrangement creates a challenging environment for the system. Since people are overlapped between a transmitter and a CSI receiver, the differences in the CSI characteristic change patterns between classes are small. Thus, it becomes difficult for a Wi-Fi sensing system to accurately determine the number of people. In this scenario, we change the number of people in a row from zero to four.

The third scenario is named Corner. In this scenario, we explore situations where a measurement target is not directly visible from the location of Wi-Fi transceivers. This setup tests the system’s capability to detect a person in occluded areas. We position one person around a corner in a hallway. In addition, we colocate the Wi-Fi transmitter and a CSI receiver at the other corner so that the subject is not within direct line-of-sight of the Wi-Fi transceivers. The Corner scenario is designed to address a specific safety concern: preventing collisions with an individual emerging from blind corners. This approach is different from our previous experiment scenarios, which focus on counting people within a defined area. In this Corner scenario, our primary objective is to detect the presence of an individual from a corner with obstructed visibility. For this purpose, we position the transmitter and the CSI receiver 2.5 m away from the corner, and place one person at distances of 1 m, 3 m, and 5 m from the opposite corner. We then measure CSI for each of these scenarios.

3.2. CSI Measurement Tools and Deep Learning Model

To capture the CSI from Wi-Fi signals, several CSI extraction tools have emerged, each with its unique capabilities. These include the Intel 5300 NIC (Network Interface Card) [34], Atheros CSI Tool [12], and Nexmon CSI Tool [35], all of which have been instrumental in developing practical sensing platforms. Intel 5300 NIC is the first and widely used tool. It captures 30 subcarriers at a 20 MHz Wi-Fi channel bandwidth. The Atheros CSI tool improves the resolution by recording 56 subcarriers at 20 MHz bandwidth and 114 subcarriers at 40 MHz bandwidth. The Nexmon CSI tool is the first to enable CSI recording on smartphones and Raspberry Pi. It can capture up to 256 subcarriers at 80 MHz bandwidth.

For our study, we install the Nexmon CSI tool on a Raspberry Pi 4 with a single Wi-Fi receiving antenna to capture CSI [35]. We use an ESP8266 as the Wi-Fi transmitter to send Wi-Fi frames using only one transmitting antenna on the narrowest 20 MHz Wi-Fi channel. Specifically, we configure an ESP8266 to periodically send an UDP segment with a 1-byte payload every 10 milliseconds over a 20 MHz Wi-Fi channel in the 2.4 GHz band.

Since the purpose of our study is to devise a novel CSI preprocessing method, we use CNN as the deep learning model for Wi-Fi sensing. This choice is motivated by the widespread adoption of CNNs in Wi-Fi sensing applications, owing to their effectiveness in processing spatial data and their ability to automatically learn hierarchical features. By using a well-established model, we can more clearly demonstrate the impact of our proposed preprocessing method, isolating its effects from the complexities of newer or less common deep learning architectures.

4. Indoor Occupancy Detection

4.1. CSI Amplitude Coloring

In Figure 2, we depict the pipeline for people counting through CSI amplitude coloring. We denote the number of subcarriers measured by the Nexmon as S. We also denote the number of classes that a Wi-Fi sensing system is requested to classify as C. Since the ESP8266 transmits a UDP packet every 10 ms for CSI measurement, we discretize time based on the CSI measurement period. In other words, we use the t-th measured CSI data interchangeably with the CSI data measured at time t. We denote the amplitude component of a subcarrier i in the CSI data measured at time t and belonging to a class c as

a_{c, i} (t)

. Then, at the t-th measurement time, the amplitude vector of the CSI data belonging to class c is given as

A_{c} (t) = (a_{c, 1} (t), \dots, a_{c, S} (t)) .

(2)

We construct the basic data samples for CNN by considering both the temporal correlation of CSI data and the spatial correlation between subcarriers. Specifically, we collect

A_{c} (t)

s measured over consecutive h measurement time to form the basic input data sample

X_{c} (t : t + h)

for the class c as follows:

\begin{matrix} X_{c} (t : t + h) = {(A_{c} (t), . . ., A_{c} (t + h))}^{T} & = [\begin{matrix} a_{c, 1} (t) & \dots & a_{c, 1} (t + h) \\ a_{c, 2} (t) & \dots & a_{c, 2} (t + h) \\ ⋮ & ⋮ & ⋮ \\ a_{c, S} (t) & \dots & a_{c, S} (t + h) \end{matrix}] . \end{matrix}

where

X_{c} (t : t + h)

can be regarded as an image of the Wi-Fi signal propagation environment in grayscale. Since the difference between

A_{c} (t)

and

A_{c} (t + 1)

is marginal, we construct

X_{c} (t : t + h)

every

τ

measurement time from the collected CSI data for a class c. In other words, we create a dataset for a class c as

(X_{c}, c)

, where

X_{c} = {X_{c} (1 : h), X_{c} (τ, τ + h), \dots} = {X_{c} ((t - 1) τ : (t - 1) τ + h) : t = 1, 2 \dots}

. Then, the dataset for CNN is given as

D = {(X_{c}, c) : \forall c \in [1, C]} .

(3)

To expand each

a_{c, s} (t) (\forall c \in [1, C] \cap s \in [1, S])

into a three-dimensional vector, we use the jet colormap [16]. The jet colormap is a rainbow color palette that includes red (R), green (G), and blue (B) colors. It visualizes data values through a continuous change from blue (lowest value) to red (highest value), with middle values represented by green, yellow, and other intermediate colors. While the jet colormap changes colors sequentially, it shows strong color changes in the middle range and in areas with noticeable color transitions, which makes data boundaries or features potentially appear more exaggerated than they actually are [20]. We utilize this characteristics of the jet colormap to amplify subtle differences between CSIs belonging to different classes but that have similar values. Specifically, the jet colormap consists of the following three functions

(f_{R} (x), f_{G} (x), f_{B} (x))

, each of which expands scalar data x into R, G, and B channels:

f_{R} (x) = \{\begin{matrix} 0 & if 0 \leq x \leq 0.35 \\ 3.23 (x - 0.35) & if 0.35 \leq x \leq 0.66 \\ 1 & if 0.66 \leq x \leq 0.89 \\ - 4.55 (x + 0.89) & if 0.89 \leq x \leq 1 \end{matrix}

(4)

f_{G} (x) = \{\begin{matrix} 0 & if 0 \leq x \leq 0.125 \\ 4 (x - 0.125) & if 0.125 \leq x \leq 0.375 \\ 1 & if 0.375 \leq x \leq 0.640 \\ - 3.703 (x + 0.64) & if 0.640 \leq x \leq 0.910 \\ 0 & if 0.910 \leq x \leq 1 \end{matrix}

(5)

f_{B} (x) = \{\begin{matrix} 4.55 x & if 0 \leq x \leq 0.11 \\ 1 & if 0.11 \leq x \leq 0.34 \\ - 3.23 (x + 0.34) & if 0.34 \leq x \leq 0.65 \\ 0 & if 0.65 \leq x \leq 1 \end{matrix}

(6)

In Figure 3, we also show

f_{R} (x), f_{G} (x)

, and

f_{B} (x))

graphically. To expand

a_{c, s} (t)

into three (i.e., RGB) channels by using the jet colormap, we first normalize

a_{c, s} (t)

in each

X_{c} (t : t + h)

via min-max normalization. In other words, if we denote

a_{c, s}^{M} (t) = {max}_{a_{c, s} (t) \in X_{c} (t : t + h)} a_{c, s} (t)

and

a_{c, s}^{m} (t) = {min}_{a_{c, s} (t) \in X_{c} (t : t + h)} a_{c, s} (t)

,

a_{c, s} (t)

is scaled as

{\tilde{a}}_{c, s} (t) = \frac{a_{c, s} (t) - a_{c, s}^{m} (t)}{a_{c, s}^{M} (t) - a_{c, s}^{m} (t)} .

(7)

Then, the nonlinear transformation in Equation (8) divides

{\tilde{a}}_{c, s} (t)

into three channels:

\begin{matrix} R ({\tilde{a}}_{c, s} (t)) = f_{R} ({\tilde{a}}_{c, s} (t)) \\ G ({\tilde{a}}_{c, s} (t)) = f_{G} ({\tilde{a}}_{c, s} (t)) \\ B ({\tilde{a}}_{c, s} (t)) = f_{B} ({\tilde{a}}_{c, s} (t)) \end{matrix},

(8)

As we can observe in Figure 3, the larger the value of

{\tilde{a}}_{c, s} (t)

, the more it is transformed into the R channel. The smaller the value, the more

{\tilde{a}}_{c, s} (t)

is transformed into the B channel and the middle-range values of

{\tilde{a}}_{c, s} (t)

are mainly transformed into the G channel. In addition, the absolute value of the slopes of the linear parts in

f_{R} (x), f_{G} (x)

and

f_{B} (x)

are larger than one. Therefore, the coloring functions amplify the differences between

{\tilde{a}}_{c, s} (t)

values within these linear transformation ranges, which helps CNN to distinguish between the

{\tilde{a}}_{c, s} (t)

values in these ranges more easily.

For each component of

a_{c, s} (t)

, we construct segmented basic input data samples as follows:

\begin{matrix} \begin{matrix} X_{R, c} (t : t + h) = [\begin{matrix} R ({\tilde{a}}_{c, 1} (t)) & \dots & R ({\tilde{a}}_{c, 1} (t + h)) \\ R ({\tilde{a}}_{c, 2} (t)) & \dots & R ({\tilde{a}}_{c, 2} (t + h)) \\ ⋮ & ⋮ & ⋮ \\ R ({\tilde{a}}_{c, S} (t)) & \dots & R ({\tilde{a}}_{c, S} (t + h)) \end{matrix}] \\ X_{G, c} (t : t + h) = [\begin{matrix} G ({\tilde{a}}_{c, 1} (t)) & \dots & G ({\tilde{a}}_{c, 1} (t + h)) \\ G ({\tilde{a}}_{c, 2} (t)) & \dots & G ({\tilde{a}}_{c, 2} (t + h)) \\ ⋮ & ⋮ & ⋮ \\ G ({\tilde{a}}_{c, S} (t)) & \dots & G ({\tilde{a}}_{c, S} (t + h)) \end{matrix}] \\ X_{B, c} (t : t + h) = [\begin{matrix} B ({\tilde{a}}_{c, 1} (t)) & \dots & B ({\tilde{a}}_{c, 1} (t + h)) \\ B ({\tilde{a}}_{c, 2} (t)) & \dots & B ({\tilde{a}}_{c, 2} (t + h)) \\ ⋮ & ⋮ & ⋮ \\ B ({\tilde{a}}_{c, S} (t)) & \dots & B ({\tilde{a}}_{c, S} (t + h)) \end{matrix}] \end{matrix} \end{matrix}

These

X_{R, c} (t : t + h)

s,

X_{G, c} (t : t + h)

s, and

X_{B, c} (t : t + h)

s are fed into the CNN module to determine the number of people.

4.2. CNN Model and Training

To evaluate the performance of the proposed preprocessing method, we configure a three-channel CNN as shown in Figure 4. The CNN used in the experiment consists of an input layer, hidden layers, and an output layer. Since the proposed method expands the

S \times h

CSI amplitude data into three channels, we set the dimension of the CNN input layer to

3 \times S \times h

. After taking

X_{R, c} (t : t + h), X_{G, c} (t : t + h)

, and

X_{B, c} (t : t + h)

at the input layer, convolution and polling operations are performed at the first hidden layer. For the convolution operation, we use a

3 \times 3

kernel and the ReLU activation function. We set the padding size to one and the stride to one. For the polling operation, we use a

2 \times 2

kernel with a stride of 2 and adopt the max pooling function. We set the dimension of the second hidden layer to

32 \times (S / 2) \times (h / 2)

and perform the convolution and polling operations with the same hyper-parameters as those used for the first layer. We configure the dimension of the third hidden layer to

64 \times (S / 4) \times (h / 4)

. At this layer, we set the same hyper-parameters as those used for the first layer for the convolution and polling operation. The dimension of the fourth hidden layer is configured as

128 \times (S / 8) \times (h / 8)

. We perform a convolution operation with a

3 \times 3

kernel with a padding size of one and a stride of one. After we perform the batch normalization and the ReLU activation function, we apply the adaptive average polling operation to produce a

C \times 1 \times 1

tensor. Then, at the output layer, we apply the softmax function to determine the probability of the input CSI data belonging to each class.

To train the CNN, we use cross entropy between the predicted class label and the true class label as the loss function, which is

Ω = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c} log (p_{i, c}),

(9)

where N is the number of data points, C is the number of class, and

y_{i, c} = 1

if the i-th data point corresponds to class c; otherwise, it is zero.

p_{i, c}

is the probability estimated by CNN that the i-th data point belongs to c. We use the Adam optimizer. For model training, we set the number of epochs to 100, the batch size to 64, and the learning rate to 0.001.

The hyper-parameters for model training are configured as follows. We set the epoch to 100 and the batch size to 64. The learning rate is configured to 0.001, and 80% of the total CSI data is used for training, while 20% of the data is used for testing. We use a computer equipped with an Intel i5-12400F CPU and Nvidia GeForce RTX 3060. The size of the random access memory is 32 GB, and its operating system is Window 10. When we run the CNN model, we use Python version 3.9.18 and PyTorch version 2.3.0.

5. Experimental Results and Discussions

In this section, we evaluate the performance of the proposed method in the respect of IOD accuracy by using various CSI data measured in real-world experimental scenarios. For the experiments, we set

h = 20

,

τ = 10

, and

S = 52

.

5.1. Accuracy Comparison

In each scenario, we compare the performance of our method to four alternative methods commonly used for CSI preprocessing techniques. As the first alternative, we select a statistical method that takes a median value for each

a_{c, i} (t)

s in

X_{c}

(\forall i \in [1, S])

to smooth the temporal dynamics inherent in the measured CSI. Using the median of the measured CSIs is a representative method using statistical measures in the time domain, and it has been used in [15] and the references presented in Table 1 of [15]. Henceforth, we will call this method

Median

. The second comparison target is a signal processing method that takes the FFT on each

a_{c, i} (t)

s in

X_{c}

and filters out high-frequency components of the

a_{c, i} (t)

s. This approach has been used in [28]. We will name the method that removes the upper half of the high-frequency components from the CSI frequency components as

FFT

. The third alternative method, which we will call

PCA

, is a dimension reduction method using PCA [32]. For each subcarrier i (

i \in [1, S]

) in

X_{c} (t : t + h)

, PCA is applied to

{a_{c, i} (t), \dots, a_{c, i} (t + h)}

and obtains its k principal components. Therefore, the

PCA

method decreases the dimension of

X_{c} (t : t = h)

from

S \times h

to

S \times k

(in general,

k < < h

). In the experiments, we set k to two values: two and four. CNN is famous for its image classification capability. To exploit the capability of the CNN, a set of measured data is often regarded as a gray image [36]. We further divide the one-dimensional CSI amplitude value per each subcarrier to three-dimensional color values. Thus, to show the validity of the proposed method, we also present the accuracy of CNN when a set of measured CSI amplitudes is used as a gray image (i.e.,

X_{c} (t : t + h)

). Hereafter, we will call this case

Gray

Image

.

In Figure 5, we compare the CSI preprocessing methods in terms of the overall IOD accuracy of the CNN in various measurement scenarios when the input to the CNN is prepared by each method. If we denote the total number of tests for the i-th experiment as

n_{i}

, and the number of times the Wi-Fi sensing system accurately determines the number of people as

m_{i}

, then the overall IOD accuracy becomes

m_{i} / n_{i}

. We repeat the same experiments with different test datasets for ten iterations and depict box plots for

m_{i} / n_{i}

to compare not only the overall average prediction accuracy but also the variation in the overall prediction accuracy across various test datasets. In this figure, we observe that the relative efficacy of the conventional methods is contingent upon the specific Wi-Fi sensing environments. For example, in the TTW scenario, the median IOD accuracy is the highest with the

PCA

method with the two principle components method. However, in the other two scenarios, our method outperforms in terms of the IOD accuracy. We observe that Q1 (i.e., 25 percentile) of the IOD accuracy when our method is applied is higher than the Q3 (i.e., 75 percentile) obtained by other methods.

To analyze the influence of the proposed method on the prediction accuracy for each number of people, we scrutinize the confusion matrices in Figure 6. A confusion matrix is a table used to evaluate the performance of a classification algorithm. It is particularly useful in supervised learning, where the goal is to predict labels for a set of instances. The confusion matrix allows us to see how well a Wi-Fi sensing system is performing by comparing the predicted labels to the actual labels (i.e., the number of people). Specifically, an element

A (i, j)

located in the i-th row and the j-th column of a confusion matrix represents the proportion of cases where the actual label is i but the Wi-Fi sensing system predicts j during the experiments.

In the confusion matrices in Figure 6, we observe that while existing schemes result in very low classification accuracy in some environments and for some classes (i.e., the number of people), the proposed method achieves high classification accuracy across all environments and classes, with very small differences in classification performance between classes. For example, in the case of the

Median

method, the probability of correctly identifying two people when there are actually two people in the TTW environment is 0.74, and the probability of correctly identifying zero people when there are actually zero people in the Queuing environment is 0.32. In contrast, when the proposed method is applied, the probability of correctly predicting two people when there are actually two people in the TTW environment is increased to 0.91, and the probability of correctly identifying zero people when there are actually zero people in the Queuing environment is greatly improved to 0.99.

In Figure 5, we show the box plots for the overall IOD accuracy obtained by each method. We observe that except in the TTW scenario, our method outperforms the

PCA

methods in terms of the IOD sensing accuracy. In Figure 6, we also compare the

PCA

method to our method in terms of the confusion matrix. Let us denote

c_{T}

as the true class (i.e., the number of people). When we inspect the TTW scenario, we observe that even though the overall accuracy obtained by

PCA

method with two principle components (

k = 2

) is higher than the accuracy acquired by our method, our method outperforms the

PCA

method with

k = 2

for all

c_{T}

s except

c_{T} = 4

. When compared to the

PCA

method with four principle components (

k = 4

), our method achieves higher accuracy in all the scenarios and all the

c_{T}

s.

We also observe that the smaller the number of principal components used becomes, the more that information that might be useful for classification is removed from the data, which makes it difficult to distinguish between classes. For example, in the Queuing scenario, the accuracy of

PCA

with

k = 2

is 0.68 when

c_{T} = 2

, while the accuracy is 0.90 when

PCA

with

k = 4

is used. Conversely, as the number of principal components increases, the amount of deleted information decreases, but the possibility of including unnecessary information increases, which also makes it difficult to distinguish between classes. For example, in the Queuing scenario,

PCA

with

k = 4

produces an accuracy of 0.86 when

c_{T} = 3

, while it becomes 0.93 when

PCA

with

k = 2

is used. Therefore, it is difficult to find the optimal number of principal components for each scenario and for each class within each scenario. However, in both cases, our method achieves the highest accuracy. When

c_{T} = 2

in the Queuing scenario, the accuracy obtained by our method is 0.93, and it is 0.95 when

c_{T} = 4

.

5.2. Complexity Analysis of CSI Coloring

Each basic input data sample

X_{c} (t : t + h)

is composed of

S \times h

amplitude values. Given

X_{c} (t : t + h)

, the

Median

method performs a median operation on

{{\tilde{a}}_{c, s} (t), \dots, {\tilde{a}}_{c, s} (t + h - 1)}

for each subcarrier

s \in {0, S - 1}

. Even though the time complexity of the median operation depends on the sorting algorithm, it is generally supposed to be

O (h log h)

. Since there are S subcarriers, the time complexity of the

Median

method is

O (S h log h)

. In the case of the

FFT

method, FFT and IFFT operations are performed on

{\tilde{a}}_{c, s} (t + i)

s

(0 \leq i \leq h - 1)

for each subcarrier s. Since the time complexities of both FFT and IFFT are

O (h log h)

, the time complexity of the

FFT

method is

O (S h log h)

. The computational complexity of PCA depends on the applied method and the data size. When the PCA of data

X_{c} (t : t + h)

is computed by eigendecomposition, the time complexity becomes

O (S h^{2} + h^{3})

. When PCA is obtained by SVD (singular value decomposition), the computation complexity becomes

O (S h min (S, h))

.

When the CSI amplitude coloring method is applied, each

{\tilde{a}}_{c, s} (t + i)

in

X_{c} (t : t + h)

is expanded by

f_{R} ({\tilde{a}}_{c, s} (t + i))

,

f_{G} ({\tilde{a}}_{c, s} (t + i))

, and

f_{B} ({\tilde{a}}_{c, s} (t + i))

. To obtain the output values of these functions, only a fixed number of comparison operations are required. For example, as shown in Equation (4), to determine

f_{R} ({\tilde{a}}_{c, s} (t + i))

, up to just three comparisons with

{\tilde{a}}_{c, s} (t + i)

are needed (i.e., comparison with 0.35, 0.66, and 0.89). Since the time complexity of the comparison operation is

O (1)

, the computational complexity of expanding each

{\tilde{a}}_{c, s} (t + i)

into three channels becomes

O (1)

. Since there are

S h

{\tilde{a}}_{c, s} (t + i)

s in

X_{c} (t : t + h)

, the time complexity of our CSI amplitude coloring method becomes

O (S h)

, which is the smallest among the preprocessing methods.

We measure the time it takes for each method to preprocess

X_{c} (t : t + h)

. When we measure the real-time performance, we use the same hardware and software to train the CNN. We show the measurement results in Table 1. In the table, the

Gray

Image

column shows the time to make

X_{c} (t : t + h)

from the measured CSI dataset. The gray image shows the smallest amount of time because other preprocessing methods are performed after the gray image (i.e.,

X_{c} (t : t + h)

) is constructed. In the table, we observe that our method and the

PCA

method take a similar amount of time to preprocess

X_{c} (t : t + h)

. Our method is

15.12

faster than the

Median

method. Compared to the

FFT

method, our method reduces the preprocessing time by a factor of

37.33

. Therefore, considering the sensing accuracy and preprocessing time, we believe that the proposed method is a better choice for resource-constrained devices than other conventional methods.

5.3. Influence of CSI Amplitude Coloring

To identify the root cause of these performance differences, we examine the CSI data distribution for class 2 (i.e., two people) in the Queuing scenario. In this case, when the

Gray

Image

method is used, the rate at which the CNN incorrectly classifies CSI data belonging to class 2 as class 4 is

36 %

. However, when our method is used, the misclassfication rate dramatically decreases to

4 %

. To understand the factors contributing to the improved results, we conduct an analysis using t-SNE (t-Distributed Stochastic Neighbor Embedding) plots. t-SNE is a machine learning algorithm used for the visualization of high-dimensional data [37]. t-SNE is a popular tool for visualizing complex datasets in a low-dimensional space (typically 2D or 3D), making it easier to identify patterns, clusters, and relationships that might be hidden in higher dimensions. Figure 7 shows the t-SNE plots of the CSI data at the input layer of CNN, and Figure 8 shows the t-SNE plots of the CSI data at the last layer of the CNN, which contains the features that CNN uses for indoor occupancy classification. In these figures, black dots indicate CSI data that both belong to and are correctly classified as class 2 (denoted as C2P2), while blue dots signify the CSI data that are both from class 4 and correctly classified (denoted as C4P4). Red dots represent CSI data belonging to class 2 but mistakenly classified as class 4. They are denoted as C2P4. The spatial distribution of the data points in Figure 7a reveals that the red dots are in closer proximity to the blue dots compared to the black dots. This arrangement indicates that in the CNN feature space shown in Figure 8a, the CSI data associated with C2P4 are more closely related to the CSI data from C4P4 than to the CSI data from C2P2. As a result, CNN mistakenly categorizes the CSI data from C2P4 as belonging to class 4.

Figure 7b, Figure 7c, and Figure 7d respectively show the t-SNE plots for the red, green, and blue channel data when the measured CSI data belonging to each class are segmented into each channel. In the green channel, we observe that the CSI data for C2P4 remain closer to C4P4 than to C2P2. However, this relationship inverts in the red and blue channels, where C2P4 data align more closely with C2P2 data than with C4P4 data because our CSI data dimension expansion method amplifies the subtle differences between C2P4 and C4P4. In Figure 8b–d, we also examine the t-SNEs of the final classification features extracted after the data from each channel have been processed through CNN. These figures reveal that when red channel data and blue channel data are fed into CNN, the red dots (C2P4 data) are positioned nearer to the black dots (C2P2 data) than to the blue dots (C4P4 data). Figure 8e illustrates the t-SNE plot for the data collected at the last layer of CNN after all RGB channel data go through CNN. As evidenced in this figure, by partitioning the measured CSI data into three distinct channels, CNN gains the ability to delineate more distinct boundaries between CSI data from different classes. In other words, the tripartite channel approach that amplifies the subtle differences among the classes in terms of the measured CSI allows CNN to leverage varied data representations, leading to improved discrimination between classes.

5.4. Performance on Other Datasets

To further show the validity of our proposed method, we validate our method by applying it to the publicly available datasets for people counting by Wi-Fi sensing. The first dataset is named the EHUCOUNT dataset described in [15]. The second dataset is the dataset in [38]. Henceforth, we call the second dataset the RTV dataset.

The ETHCOUNT dataset is constructed by capturing Wi-Fi signals in six different indoor scenarios over facilities of the faculty of engineering of the university of the Basque Country. The six scenarios are A (Office), B (Lab), C (Corridor), D (Hall + Stairs), E (Corridor), and F (Corridor). Depending on the scenario, the number of people in the scenarios is from 3 to 5, and the number of CSI traces per the number of people and the scenario ranges between 12,000 and 15,000. In the corridor scenario, people maintain a direction for a while before changing it, while people wander in the room scenario (A, B, D). In all scenarios, volunteers are instructed to move slower than 3 km/h. The RTV dataset is constructed by collecting the CSI data in three rooms, which differ in size, the number of furniture pieces, and their locations. Room A is a small-size office room (5 m × 5 m) and room B is a medium-size meeting room (5 m × 9 m) while room C is a large-size meeting room (6 m × 12.5 m). In each scenario, up to 7 people are in the rooms and 5000 CSI samples are collected for each number of people (i.e., 0 to 7). During the CSI collection, the people in the rooms move randomly around or stand still without any guidelines.

Table 2 and Table 3 show the average sensing accuracy when different CSI preprocessing methods are applied to each dataset. We observe that in most scenarios, the

Median

method shows the best performance. This is attributed to the fact that the CSI measurement environments for EHUCOUNT and RTV datasets are more favorable for the Wi-Fi signal propagation than our datasets. The biggest difference between the experimental environments of these datasets and our experimental environments is that in the experimental settings for EHUCOUNT and RTV datasets, the transmitter and CSI receiver are located in the same space, while in our experimental environments (especially the TTW and Corner scenarios), the transmitter and CSI receiver are located in different spaces. In case of our Queuing scenario, even though the transmitter and CSI receiver are in the same space, there is no direct signal path between them. In other words, while the EHUCOUNT and RTV datasets measure CSI in line-of-sight (LoS) environments between the transmitter and receiver, we measure CSI in non-line-of-sight (NLoS) environments. Thus, more noise is included in our CSI dataset. As a result, the statistical characteristics of CSIs belonging to different classes become less distinct and are more affected by noise when our dataset is used. Because the statistical differences between the CSI data belonging to different classes are large in the EHUCOUNT and RTV datasets, the

Median

method effectively distinguishes between classes. However, we observe in these tables that our method is compatible with other CSI preprocessing methods in terms of the sensing accuracy.

To test the generality and the robustness of our method, we extend the experimental scope by carrying out additional experiments in different indoor environments. The first dataset is collected in a typical laboratory environment, which we will call the

LAB

dataset. The size of the laboratory is 5.6 m × 3.4 m. We locate a CSI receiver on the table positioned at the center of the laboratory and place a transmitter at the center of the left side of the laboratory. The laboratory is equipped with typical office furniture and supplies, including seven tables and six chairs. Up to six people participate in the experiments in the

LAB

environment. Each participant sits on the designated chair and works on a computer or reads papers but does not move around during the experiment. The second dataset, which we name

TTW

2, is constructed in an environment similar to

TTW

. However, the measurement location for the

TTW

2 dataset is different from that for the

TTW

dataset. The size of the room, the number and arrangement of furniture, and the material of the walls are also different. In addition, the positions of the transmitter and CSI receiver are reversed. In other words, in the

TTW

environment, the Wi-Fi transmitter is located in the room where the targets to be detected are present, while the CSI receiver is placed in a different room from the transmitter. On the other hand, in the

TTW

2 environment, the CSI receiver is located in the room with the targets while the transmitter is placed in a different room from the CSI receiver. Furthermore, unlike the

TTW

environment, where someone other than the experiment participants can move freely between the two rooms without any restriction, in the

TTW

2 environment, the CSI is measured under controlled conditions, where no one is allowed to pass between the two rooms. In the

TTW

2 environment, the number of people in a room ranges from zero to four.

In Table 4, we show the average sensing accuracy obtained by different CSI preprocessing methods when they are applied to the

LAB

dataset and the

TTW

2 dataset. In the case of the

TTW

2 dataset, the average sensing accuracy is the same as 0.99 regardless of the preprocessing methods used. This is attributed to the fact that compared to other datasets, the differences among the CSI data belonging to each class in the

TTW

2 dataset are significant. As a result, regardless of the different CSI features extracted by various preprocessing techniques, the CNN is able to accurately distinguish the CSI data belonging to each class. On the contrary, when we observe the results of the

LAB

dataset, our method and the PCA method show the best performance.

Table 5 shows the rankings of each method across various environments in terms of the average IOD accuracy. In the table, the number corresponding to the i-th row and j-th column represents the accuracy ranking of method i in environment j, with a smaller number indicating higher accuracy. In the table, we observe that there is no preprocessing method that consistently delivers the best performance across all environments. This result agrees with the claims made in [15]. We also observe in the table that the performance of the proposed method ranks second or higher in 10 out of 14 real-world Wi-Fi sensing environments. The results suggest that compared to conventional representative methods, our CSI amplitude coloring method is a more universal and comprehensive preprocessing method for IOD via Wi-Fi sensing, which aligns with our goal of finding a method that performs well in most environments.

6. Conclusions and Future Works

In this paper, we propose a CSI data preparation method to enhance the classification performance of a Wi-Fi sensing system that counts the number of people. We normalize the measure CSI data and divide the normalized range into three sections. Then, we propose three transform functions for each range to amplify the values in each section. We heighten the fine distinctions between CSI classes by expanding the normalized CSI amplitude data of each subcarrier to a three-dimensional vector with the transformation functions. The experimental results in 14 real-world scenarios show that our method is a more universal and comprehensive method for people counting via Wi-Fi sensing, compared to conventional methods.

As our future works, we will investigate the influence of the CSI measurement configurations such as the CSI sampling frequencies and the size of probe packets on the sensing accuracy. Then, we will propose installation guidelines for Wi-Fi sensing systems. Further dimension expansion can be obtained by applying our method consecutively multiple times. Since this process causes complex nonlinear transformation, we will explore the influence of this transformation on the accuracy of Wi-Fi sensing. In addition, we will train the transform functions in conjunction with CNN training to automatically find the optimal transform functions in different Wi-Fi sensing environments. We are also planning to apply our method to various Wi-Fi sensing applications other than people counting.

Author Contributions

Methodology, J.P.; Software, J.S.; Validation, J.S.; Writing—original draft, J.P.; Supervision, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Research Foundation of Republic of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1F1A1065371).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lian, D.; Chen, X.; Li, J.; Luo, W.; Gao, S. Locating and Counting Heads in Crowds with a Depth Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 9056–9072. [Google Scholar] [CrossRef]
Khan, M.A.; Menouar, H.; Hamila, R. Revisiting crowd counting: State-of-the-art, trends, and future perspectives. Image Vis. Comput. 2023, 129, 104597. [Google Scholar] [CrossRef]
EU Artificial Intelligence Act. Available online: https://artificialintelligenceact.eu/ (accessed on 13 June 2024).
Seifeldin, M.; Saeed, A.; Kosba, A.E.; El-Keyi, A.; Youssef, M. Nuzzer: A Large-Scale Device-Free Passive Localization System for Wireless Environments. IEEE Trans. Mob. Comput. 2013, 12, 1321–1334. [Google Scholar] [CrossRef]
Yang, Z.; Zhou, Z.; Liu, Y. From RSSI to CSI: Indoor Localization via Channel Response. ACM Comput. Surv. 2013, 46, 25. [Google Scholar]
Xiao, W.; Song, B.; Yu, X.; Chen, P. Nonlinear Optimization-Based Device-Free Localization with Outlier Link Rejection. Sensors 2015, 15, 8072–8087. [Google Scholar] [CrossRef]
Nirmal, I.; Khamis, A.; Hassan, M.; Hu, W.; Zhu, X. Deep Learning for Radio-based Human Sensing: Recent Advances and Future Directions. IEEE Commun. Surv. Tutor. 2021, 23, 995–1019. [Google Scholar]
Lie, C.; Cao, Z.; Liu, Y. Deep AI Enabled Ubiquitous Wireless Sensing: A Survey. ACM Comput. Surv. 2021, 54, 32. [Google Scholar]
Yang, J.; Chen, X.; Zou, H.; Lu, C.X.; Wang, D.; Sun, S.; Xie, L. SenseFi: A Library and Benchmark on Deep-Learning-Empowered WiFi Human Sensing. Sci. Patterns 2023, 4, 100703. [Google Scholar]
Luca, A.R.; Ursuleanu, T.F.; Gheorghe, L.; Grigorovici, R.; Iancu, S.; Hlusneac, M.; Grigorovici, A. Impact of Quality, Type and Volume of Data used by Deep Learning Models in the Analysis of Medical Images. Inform. Med. Unlocked 2022, 29, 100911. [Google Scholar]
Ilyas, I.F.; Rekatsinas, T. Machine Learning and Data Cleaning: Which Serves the Other? ACM J. Data Inf. Qual. 2022, 14, 13. [Google Scholar] [CrossRef]
Xie, Y.; Li, Z.; Li, M. Precise Power Delay Profiling with Commodity Wi-Fi. IEEE Trans. Mob. Comput. 2019, 18, 1342–1355. [Google Scholar] [CrossRef]
Kianoush, S.; Savazzi, S.; Rampa, V.; Nicoli, M. People Counting by Dense WiFi MIMO Networks: Channel Features and Machine Learning Algorithms. Sensors 2019, 19, 3450. [Google Scholar] [CrossRef] [PubMed]
Kulin, M.; Kazaz, T.; Moerman, I.; Poorter, E.D. End-to-End Learning From Spectrum Data: A Deep Learning Approach for Wireless Signal Identification in Spectrum Monitoring Applications. IEEE Access 2018, 6, 18484–18501. [Google Scholar] [CrossRef]
Sobron, I.; Ser, J.D.; Eizmendi, I.; Vélez, M. Device-Free People Counting in IoT Environments: New Insights, Results, and Open Challenges. IEEE Internet Things J. 2018, 5, 4396–4408. [Google Scholar] [CrossRef]
Matplotlib Documentation. Choosing Colormaps in Matplotlib. 2024. Available online: https://matplotlib.org/stable/users/explain/colors/colormaps.html (accessed on 12 August 2024).
Zhou, L.; Hansen, C.D. A Survey of Colormaps in Visualization. IEEE Trans. Vis. Comput. Graph. 2016, 22, 2051–2069. [Google Scholar] [CrossRef]
Bujack, R.; Turton, T.L.; Samsel, F.; Ware, C.; Rogers, D.H.; Ahrens, J. The Good, the Bad, and the Ugly: A Theoretical Framework for the Assessment of Continuous Colormaps. IEEE Trans. Vis. Comput. Graph. 2018, 24, 923–933. [Google Scholar] [CrossRef]
Borland, D.; Taylor Ii, M.T. Rainbow Color Map (Still) Considered Harmful. IEEE Comput. Graph. Appl. 2007, 27, 14–17. [Google Scholar] [CrossRef]
Ware, C.; Stone, M.; Szafir, D.A. Rainbow Colormaps Are Not All Bad. IEEE Comput. Graph. Appl. 2023, 43, 88–93. [Google Scholar] [CrossRef]
Hernandez, S.M.; Bulut, E. WiFi Sensing on the Edge: Signal Processing Techniques and Challenges for Real-World Systems. IEEE Commun. Surv. Tutor. 2023, 25, 46–76. [Google Scholar] [CrossRef]
Zeng, Y.; Wu, D.; Xiong, J.; Yi, E.; Gao, R.; Zhang, D. FarSense: Pushing the Range Limit of WiFi-based Respiration Sensing with CSI Ratio of Two Antennas. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2019, 3, 121. [Google Scholar] [CrossRef]
Liu, J.; Teng, G.; Hong, F. Human Activity Sensing with Wireless Signals: A Survey. Sensors 2020, 20, 1210. [Google Scholar] [CrossRef]
Gast, M.S. 802.11ac: A Survival Guide; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2013; ISBN 9781449343149. [Google Scholar]
IEEE Std 802.11-2020 (Revision of IEEE Std 802.11-2016); IEEE Standard for Information Technology–Telecommunications and Information Exchange between Systems-Local and Metropolitan Area Networks–Specific Requirements-Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE: Piscataway, NJ, USA, 2021; pp. 1–4379.
Yousefi, S.; Narui, N.; Dayal, S.; Ermon, S.; Valaee, S. A Survey on Behavior Recognition Using WiFi Channel State Information. IEEE Commun. Mag. 2017, 55, 98–104. [Google Scholar] [CrossRef]
Yang, J.; Chen, X.; Zou, H.; Wang, D.; Xu, Q.; Xie, L. EfficientFi: Toward Large-Scale Lightweight WiFi Sensing via CSI Compression. IEEE Internet Things J. 2022, 9, 13086–13095. [Google Scholar] [CrossRef]
Sharma, A.; Li, J.; Mishra, D.; Seneviratne, A. Robust ML Model for Human Counting Using Ambient WiFi Traffic from Multiple Sources. In Proceedings of the IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada, 14–23 June 2021. [Google Scholar]
Liu, Z.; Yuan, R.; Yuan, Y.; Ying, Y.; Guan, X. A Sensor-Free Crowd Counting Framework for Indoor Environments Based on Channel State Information. IEEE Sens. J. 2022, 22, 6062–6071. [Google Scholar] [CrossRef]
Zhao, Y.; Liu, S.; Xue, F.; Chen, B.; Chen, X. DeepCount: Crowd Counting with Wi-Fi using Deep Learning. J. Commun. Inf. Netw. 2019, 4, 38–52. [Google Scholar] [CrossRef]
Sharma, A.; Jiang, W.; Mishra, D.; Jha, S.; Seneviratne, A. Optimised CNN for Human Counting Using Spectrograms of Probabilistic WiFi CSI. In Proceedings of the 2022 IEEE Global Communications Conference (GLOBECOM 2022), Rio de Janeiro, Brazil, 4–8 December 2022. [Google Scholar]
Zhang, H.; Zhou, M.; Sun, H.; Zhao, G.; Qi, J.; Wang, J.; Esmaiel, H. Que-Fi: A Wi-Fi Deep-Learning-Based Queuing People Counting. IEEE Syst. J. 2021, 15, 2926–2937. [Google Scholar] [CrossRef]
Shen, L.-H.; Hsiao, A.-H.; Lu, K.-I.; Feng, K.-T. Attention-Enhanced Deep Learning for Device-Free Through-the-Wall Presence Detection Using Indoor WiFi Systems. IEEE Sens. J. 2024, 24, 5288–5302. [Google Scholar] [CrossRef]
Halperin, D.; Hu, W.; Sheth, A.; Wetherall, D. Tool Release: Gathering 802.11n Traces with Channel State Information. ACM SIGCOMM Comput. Commun. Rev. 2011, 41, 53. [Google Scholar] [CrossRef]
Gringoli, F.; Schulz, M.; Link, J.; Hollick, M. Free Your CSI: A Channel State Information Extraction Platform for Modern Wi-Fi Chipsets. In Proceedings of the 13th International Workshop on Wireless Network Testbeds, Experimental Evaluation and Characterization (WiNTECH ’19), Los Cabos, Mexico, 25 October 2019. [Google Scholar]
Bae, H.; Park, J. Proactive Service Caching in a MEC System by Using Spatio-Temporal Correlation among MEC Servers. Appl. Sci. 2023, 13, 12509. [Google Scholar] [CrossRef]
van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Domenico, D.D.; Sanctis, M.D.; Cianca, E.; Bianchi, G. A Trained-once Crowd Counting Method Using Differential WiFi Channel State Information. In Proceedings of the 3rd International on Workshop on Physical Analytics (WPA’16), Singapore, 26 June 2016. [Google Scholar]

Figure 1. CSI measurement environments (Tx represents a Wi-Fi transmitter, and Rx represents a CSI receiver).

Figure 2. Pipeline for people counting via CSI amplitude coloring.

Figure 3. Coloring functions.

Figure 4. CNN architecture used for indoor occupancy detection.

Figure 5. Impact of each CSI preprocessing method on people counting. PCA(k2) represents the case when the number of principal component is 2, while PCA(k4) represents the case with 4 principal components.

Figure 6. Comparison of confusion matrices in each experiment scenario. In the case of the Corner scenario, 0 m indicates the case where there is no person in the corridor. PCA(k2) represent the case where the number of principal components is 2, while PCA(k4) represents the case with 4 principal components.

Figure 7. Comparison of t-SNE plots for the data at the input layer of CNN. (The notation C2P2 refers to the subset of class 2 CSI data that the CNN correctly identifies as class 2, while C2P4 represents the set of CSI data belonging to class 2 but that the CNN misclassifies as class 4. C4P4 indicates the set of class 4 CSI data that the CNN successfully classifies as class 4.)

Figure 8. Comparison of t-SNE plots for the data at the last layer of CNN.

Table 1. Preprocessing time comparison (unit: second).

	Gray Image	Median	FFT	PCA	Proposed
Avg.	0.06	1.83	4.48	0.10	0.12
Var.	0.00	0.41	1.45	0.00	0.00

Table 2. Average Wi-Fi sensing accuracy for the EHUCOUNT dataset. PCA(k2) represent the case where the number of principal components is 2, while PCA(k4) represents the case with 4 principal components.

	A	B	C	D	E	F
Gray Image	0.76	0.77	0.83	0.72	0.92	0.85
Median	0.91	0.75	0.98	0.86	1.00	0.87
FFT	0.76	0.78	0.90	0.69	0.87	0.78
PCA(k2)	0.72	0.67	0.74	0.64	0.79	0.76
PCA(k4)	0.65	0.64	0.67	0.48	0.72	0.57
Proposed	0.72	0.68	0.84	0.76	0.82	0.86

Table 3. Average Wi-Fi sensing accuracy for RTV dataset. PCA(k2) represents the case where the number of principal components is 2, while PCA(k4) represents the case with 4 principal components.

	Room A	Room B	Room C
Gray Image	0.93	0.85	0.82
Median	0.99	0.99	0.94
FFT	0.90	0.84	0.82
PCA(k2)	0.70	0.60	0.64
PCA(k4)	0.65	0.54	0.49
Proposed	0.95	0.87	0.85

Table 4. Average Wi-Fi sensing accuracy for TTW2 and LAB datasets. PCA(k2) represents the case where the number of principal components is 2, while PCA(k4) represents the case with 4 principal components.

Env.	Gray Image	Median	FFT	PCA(k2)	PCA(k4)	Proposed
$LAB$	0.85	0.70	0.88	0.99	0.99	0.99
$TTW$ 2	0.99	0.99	0.99	0.99	0.99	0.99

Table 5. Comparison of rankings of each method across various environments in terms of the average IOD accuracy. The number corresponding to the i-th row and j-th column represents the accuracy ranking of method i in environment j, with a smaller number indicating higher accuracy.

	Our Datasets					EHUCOUNT						RTV
Method	TTW	Queuing	Corner	LAB	TTW2	A	B	C	D	E	F	RoomA	RoomB	RoomC
Gray Image	6	3	4	5	1	2	2	4	3	2	3	3	3	3
Median	4	6	3	6	1	1	3	1	1	1	1	1	1	1
FFT	4	5	6	4	1	2	1	2	4	3	4	4	4	3
PCA(k2)	1	4	4	1	1	4	5	5	5	5	5	5	5	5
PCA(k4)	3	2	2	1	1	6	6	6	6	6	6	6	6	6
Proposed	2	1	1	1	1	4	4	3	2	4	2	2	2	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Son, J.; Park, J. Channel State Information (CSI) Amplitude Coloring Scheme for Enhancing Accuracy of an Indoor Occupancy Detection System Using Wi-Fi Sensing. Appl. Sci. 2024, 14, 7850. https://doi.org/10.3390/app14177850

AMA Style

Son J, Park J. Channel State Information (CSI) Amplitude Coloring Scheme for Enhancing Accuracy of an Indoor Occupancy Detection System Using Wi-Fi Sensing. Applied Sciences. 2024; 14(17):7850. https://doi.org/10.3390/app14177850

Chicago/Turabian Style

Son, Jaeseong, and Jaesung Park. 2024. "Channel State Information (CSI) Amplitude Coloring Scheme for Enhancing Accuracy of an Indoor Occupancy Detection System Using Wi-Fi Sensing" Applied Sciences 14, no. 17: 7850. https://doi.org/10.3390/app14177850

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Channel State Information (CSI) Amplitude Coloring Scheme for Enhancing Accuracy of an Indoor Occupancy Detection System Using Wi-Fi Sensing

Abstract

1. Introduction

2. Related Works

2.1. Channel State Information and Its Relation to Wi-Fi Sensing

2.2. CSI Data Preparation Methods

3. CSI Measurement Environments

3.1. Experimental Scenarios

3.2. CSI Measurement Tools and Deep Learning Model

4. Indoor Occupancy Detection

4.1. CSI Amplitude Coloring

4.2. CNN Model and Training

5. Experimental Results and Discussions

5.1. Accuracy Comparison

5.2. Complexity Analysis of CSI Coloring

5.3. Influence of CSI Amplitude Coloring

5.4. Performance on Other Datasets

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI