1. Introduction
Measuring the number of people in indoor environments plays a pivotal role in preventing the spread of infectious diseases like COVID-19 and avoiding safety accidents. The existing research on people counting primarily utilizes image-processing techniques that acquire images through cameras and analyze head counts from the obtained images to estimate the number of people [
1,
2]. Since cameras can visually capture real-life situations, they can provide data for people to easily understand and analyze. In addition, camera-based systems are able to determine the number of people by leveraging high-resolution images with the latest deep learning and computer vision algorithms.
However, camera-based people-counting systems present the following issues, which limits the system capabilities and their widespread use. First, capturing images by using cameras involves directly recording individuals’ faces and bodies, which leads to the privacy concerns and potential legal issues in certain countries [
3]. Second, the accuracy of camera-based systems are highly dependent on environmental conditions. Low lighting, strong backlighting, and shadows can degrade the quality of images captured by cameras, which makes accurate IOD difficult. Additionally, when people are overlapped or obscured by objects and when blind spots occur, the accuracy of counting people decreases. Furthermore, installing and maintaining high-quality cameras and the related infrastructure can be costly.
To address these issues, wireless sensing systems have attracted a lot of attention. By analyzing the varying patterns of wireless signals, such as Infrared, UWB (Ultra Wide-Band), and Wi-Fi in a surveillance region, they detect and interpret the changes in the region. All of these technologies enable sensing even in low-light conditions. However, compared to Wi-Fi signals, Infrared and UWB have shorter sensing ranges and their signals can be blocked by obstacles. Thus, the performance and the utility of wireless sensing systems using Infrared and UWB are limited. In addition, Wi-Fi networks are pervasive in homes, offices, and public spaces. This ubiquity makes Wi-Fi sensing a highly accessible, scalable, and cost-effective solution for people counting.
RSS (Received Signal Strength) and CSI (Channel State Information) are two representative Wi-Fi signal data that have been utilized for Wi-Fi sensing. Since signal strength decreases as the distance between a sender and a receiver increases, RSS has been primarily used for positioning techniques in early Wi-Fi sensing research [
4]. However, RSS only provides general information about the measurement environment and is susceptible to random variations in the environment [
5]. Recently, CSI has been used for Wi-Fi sensing because it provides more detailed information about the wireless signal propagation environment [
6]. Thus, in this paper, we use CSI as Wi-Fi signal data for people counting.
On the other hand, deep learning models have been advanced remarkably. These models significantly enhance the accuracy of classification and regression problems. Consequently, to perceive environments in a device-free manner, research is actively underway to analyze Wi-Fi signals measured in various environments using deep learning techniques [
7,
8]. More specifically, to adopt a deep learning model, the people-counting problem by using Wi-Fi signals is often regarded as a classification problem. Each distinct count of people in a monitored area is treated as a separate class. Then, a deep learning model is trained to learn the classification boundaries among the classes to categorize the number of people in the area. Various deep learning models have been used for Wi-Fi sensing, which includes MLP (Multilayer Perceptron), CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), and more recently, transformer models. Each of these architectures has its strengths and is chosen based on the specific requirements of the sensing task. In [
9], a comprehensive study that quantitatively compares the performance of these diverse deep learning models using publicly available CSI datasets is conducted. This research highlights the capability of these models to discern subtle differences between input classes.
However, a common challenge emerges. As the distinctions between input data from different classes become more minute, the sensing accuracy of these deep learning models tends to decline [
10,
11]. The wireless signal propagation environment changes randomly over time and space. In addition, the hardware in the Wi-Fi transmitters and receivers are not perfect. Thus, the measured CSI values exhibit random fluctuations even when the monitoring area and conditions remain constant. The fluctuating nature of these measurements presents a substantial obstacle to achieving precise categorization with deep learning algorithms. To address the challenges posed by the CSI variability and improve the classification accuracy of a deep learning model, various CSI data representation methods have been proposed. These methods aim to enhance the quality of input data provided to a deep learning model by reducing noise and random fluctuations in the CSI measurements [
12], or extracting more robust and informative features from the raw CSI data [
13,
14]. These techniques consider rapidly changing components in the measured CSI values as noise and remove them to reduce variability in the data. In other words, they extract features that show less variation over time by assuming that these more stable features will be more informative for a deep learning model for classification. However, the extracted values often do not differ significantly for the number of people (i.e., class) in the monitored area. These subtle differences make it challenging for a deep learning model to accurately classify the number of people.
To resolve the issue, in this paper, we propose a novel CSI feature representation method that can better capture the class-specific characteristics in CSI data while still mitigating the effects of noise and random fluctuations. As is noted in [
15], it is very hard to ensure the existence of specific features performing better than others for all applications and scenarios. This is mainly attributed to the wide spectrum of the environments where Wi-Fi sensing systems operate. For example, the Wi-Fi signal transmission environment differs for each location where Wi-Fi sensing is applied. The strength of noise and interference at the moment of CSI measurement varies. Furthermore, the specifications, locations, and configurations of Wi-Fi sensing transmitters and receivers are different. Thus, our goal in this paper is not to find a panacea method that always gives the best performance for all classes in all Wi-Fi sensing environments but rather to find a method that performs well in most environments and for most classes.
To achieve the goal, we propose a method to transform CSI amplitude data collected over a time interval into a color image by using the jet colormap [
16]. Colormaps are important tools used in data visualization to represent data by converting them into colors. Various colormaps have been designed by considering various factors such as color theory, the human visual system, and the relationship between colors and data [
17,
18]. Our purpose is not data visualization but to better distinguish CSIs belonging to different classes in a higher-dimensional space. Therefore, we use a colormap not as a tool to transform data into a form suitable for the human visual system without visual distortion in color changes but as a tool to amplify subtle differences in CSI data. The jet colormap is a rainbow color palette which can represent scalar data with uniform differences nonuniformly through nonlinear changes in brightness and color [
19]. Therefore, when the jet colormap is used, data boundaries or features may appear more exaggerated than they actually are [
20]. By using the characteristics of the jet colormap, we finely segment each CSI amplitude of each subcarrier at a measured moment into three-dimensional RGB (Red–Green–Blue) channel data through the nonlinear transformation of the jet colormap. During the segmentation of a scalar CSI amplitude value of a subcarrier into three distinct data points, we amplify the subtle differences in the CSI data belonging to different classes so that a deep learning model can better distinguish them. In other words, by separating the CSI features among classes more widely in the input layer of a deep learning model, we enhance the classification accuracy of the deep learning model used for estimating the number of people in a monitored region.
To verify our approach, we carry out people-counting experiments across three different real-life scenarios and compare the results obtained by our method to those when other conventional preprocessing methods are applied. The first scenario identifies the number of people inside a typical seminar room from outside the room while people are seated or moving around in the monitored room. In the second scenario, we identify the number of people standing in a row in a classroom. In this scenario, a deep learning model is challenged to correctly estimate the number of people when people are overlapped and obscured. The third scenario detects the presence of a person in a T-shaped corridor by changing the position of a person in the part of the corridor that is not visible from the measurement point. In light of the research findings in [
15,
21], where various preprocessing methods are classified into a few categories, we choose four representative preprocessing methods for comparison, each of which belongs to different categories. The first method which uses the median feature belongs to time statistics category. Among the methods in the category using the frequency feature, we use the FFT (Fast Fourier Transform) method as the second alternative method. The third method uses the PCA (Principle Component Analysis) and belongs to the dimensionality reduction category. The fourth method, which utilizes the characteristics of CNN, considers the measured CSI as a grayscale image. The experimental results show that, compared to existing preprocessing techniques, the proposed method improves the people-counting accuracy of a deep learning model in all experimental environments even when we use low-cost commercial off-the-shelf Wi-Fi transceivers having only one receive antenna and one transmit antenna.
The rest of the paper is organized as follows. In
Section 2, we present the related works. We explain the experiment scenarios and the datasets measured in each scenario in
Section 3. In
Section 4, we detail our CSI amplitude coloring method and discuss the experimental results in
Section 5. We conclude the paper with future research directions in
Section 6.
3. CSI Measurement Environments
3.1. Experimental Scenarios
Prior research has predominantly focused on CSI measurements in scenarios where there is a line-of-sight (LoS) between a Wi-Fi transmitter and a CSI receiver. In addition, these studies typically concentrate on counting the number of stationary individuals in the environment. In this study, as shown in
Figure 1, we collect CSI data in several challenging scenarios that push the boundaries of traditional Wi-Fi sensing. These challenging environments are chosen to rigorously test the robustness and versatility of the CSI-based people-counting system. By showing the successful operation of our method under these conditions, we aim to demonstrate the potential of our approach in real-world situations where ideal line-of-sight conditions are rarely available.
In the first scenario named TTW (Through The Wall), we set up an environment where the Wi-Fi transmitter and the CSI measurement device are separated by two walls. This setup tests the system’s ability to penetrate physical obstacles. In this scenario, people are in the typical seminar room. To capture the diverse scenarios that may occur in a typical seminar room, we conduct a series of CSI measurements. During these measurements, we systematically vary the location and movement patterns of occupants within the seminar room. This approach allows us to encompass a wide range of potential spatial configurations and mobility dynamics that are commonly encountered in seminar room environments. The specific variations we implement are as follows: all people are sitting in a chair and do not move around, all people are standing in a row along a wall and do not move, people are clustered and the number of cluster is changed, some people are walking around while the others are sitting, and all people are moving around in a random direction and velocity. We do not impose any restrictions on their movement. Since each participant is free to decide their movement speed and direction based on their own judgment, the measured CSI data include various CSI patterns caused by movement of people. We vary the number of people in a room from zero to five. For each variation in the seminar room, we also conduct CSI measurements in a setting where the Wi-Fi signal propagation is disrupted by individuals moving at random speeds and directions through the corridor between two rooms. We also do not impose any restrictions on the movement patterns of each individuals disturbing Wi-Fi signals. Thus, their movements are reflected as noise in the measured CSI.
The second CSI measurement scenario is called Queuing. For this scenario, we position individuals in a linear formation between the Wi-Fi transmitter and the CSI receiver at the front of a classroom. This arrangement creates a challenging environment for the system. Since people are overlapped between a transmitter and a CSI receiver, the differences in the CSI characteristic change patterns between classes are small. Thus, it becomes difficult for a Wi-Fi sensing system to accurately determine the number of people. In this scenario, we change the number of people in a row from zero to four.
The third scenario is named Corner. In this scenario, we explore situations where a measurement target is not directly visible from the location of Wi-Fi transceivers. This setup tests the system’s capability to detect a person in occluded areas. We position one person around a corner in a hallway. In addition, we colocate the Wi-Fi transmitter and a CSI receiver at the other corner so that the subject is not within direct line-of-sight of the Wi-Fi transceivers. The Corner scenario is designed to address a specific safety concern: preventing collisions with an individual emerging from blind corners. This approach is different from our previous experiment scenarios, which focus on counting people within a defined area. In this Corner scenario, our primary objective is to detect the presence of an individual from a corner with obstructed visibility. For this purpose, we position the transmitter and the CSI receiver 2.5 m away from the corner, and place one person at distances of 1 m, 3 m, and 5 m from the opposite corner. We then measure CSI for each of these scenarios.
3.2. CSI Measurement Tools and Deep Learning Model
To capture the CSI from Wi-Fi signals, several CSI extraction tools have emerged, each with its unique capabilities. These include the Intel 5300 NIC (Network Interface Card) [
34], Atheros CSI Tool [
12], and Nexmon CSI Tool [
35], all of which have been instrumental in developing practical sensing platforms. Intel 5300 NIC is the first and widely used tool. It captures 30 subcarriers at a 20 MHz Wi-Fi channel bandwidth. The Atheros CSI tool improves the resolution by recording 56 subcarriers at 20 MHz bandwidth and 114 subcarriers at 40 MHz bandwidth. The Nexmon CSI tool is the first to enable CSI recording on smartphones and Raspberry Pi. It can capture up to 256 subcarriers at 80 MHz bandwidth.
For our study, we install the Nexmon CSI tool on a Raspberry Pi 4 with a single Wi-Fi receiving antenna to capture CSI [
35]. We use an ESP8266 as the Wi-Fi transmitter to send Wi-Fi frames using only one transmitting antenna on the narrowest 20 MHz Wi-Fi channel. Specifically, we configure an ESP8266 to periodically send an UDP segment with a 1-byte payload every 10 milliseconds over a 20 MHz Wi-Fi channel in the 2.4 GHz band.
Since the purpose of our study is to devise a novel CSI preprocessing method, we use CNN as the deep learning model for Wi-Fi sensing. This choice is motivated by the widespread adoption of CNNs in Wi-Fi sensing applications, owing to their effectiveness in processing spatial data and their ability to automatically learn hierarchical features. By using a well-established model, we can more clearly demonstrate the impact of our proposed preprocessing method, isolating its effects from the complexities of newer or less common deep learning architectures.
5. Experimental Results and Discussions
In this section, we evaluate the performance of the proposed method in the respect of IOD accuracy by using various CSI data measured in real-world experimental scenarios. For the experiments, we set , , and .
5.1. Accuracy Comparison
In each scenario, we compare the performance of our method to four alternative methods commonly used for CSI preprocessing techniques. As the first alternative, we select a statistical method that takes a median value for each
s in
to smooth the temporal dynamics inherent in the measured CSI. Using the median of the measured CSIs is a representative method using statistical measures in the time domain, and it has been used in [
15] and the references presented in
Table 1 of [
15]. Henceforth, we will call this method
. The second comparison target is a signal processing method that takes the FFT on each
s in
and filters out high-frequency components of the
s. This approach has been used in [
28]. We will name the method that removes the upper half of the high-frequency components from the CSI frequency components as
. The third alternative method, which we will call
, is a dimension reduction method using PCA [
32]. For each subcarrier
i (
) in
, PCA is applied to
and obtains its
k principal components. Therefore, the
method decreases the dimension of
from
to
(in general,
). In the experiments, we set
k to two values: two and four. CNN is famous for its image classification capability. To exploit the capability of the CNN, a set of measured data is often regarded as a gray image [
36]. We further divide the one-dimensional CSI amplitude value per each subcarrier to three-dimensional color values. Thus, to show the validity of the proposed method, we also present the accuracy of CNN when a set of measured CSI amplitudes is used as a gray image (i.e.,
). Hereafter, we will call this case
.
In
Figure 5, we compare the CSI preprocessing methods in terms of the overall IOD accuracy of the CNN in various measurement scenarios when the input to the CNN is prepared by each method. If we denote the total number of tests for the
i-th experiment as
, and the number of times the Wi-Fi sensing system accurately determines the number of people as
, then the overall IOD accuracy becomes
. We repeat the same experiments with different test datasets for ten iterations and depict box plots for
to compare not only the overall average prediction accuracy but also the variation in the overall prediction accuracy across various test datasets. In this figure, we observe that the relative efficacy of the conventional methods is contingent upon the specific Wi-Fi sensing environments. For example, in the TTW scenario, the median IOD accuracy is the highest with the
method with the two principle components method. However, in the other two scenarios, our method outperforms in terms of the IOD accuracy. We observe that Q1 (i.e., 25 percentile) of the IOD accuracy when our method is applied is higher than the Q3 (i.e., 75 percentile) obtained by other methods.
To analyze the influence of the proposed method on the prediction accuracy for each number of people, we scrutinize the confusion matrices in
Figure 6. A confusion matrix is a table used to evaluate the performance of a classification algorithm. It is particularly useful in supervised learning, where the goal is to predict labels for a set of instances. The confusion matrix allows us to see how well a Wi-Fi sensing system is performing by comparing the predicted labels to the actual labels (i.e., the number of people). Specifically, an element
located in the
i-th row and the
j-th column of a confusion matrix represents the proportion of cases where the actual label is
i but the Wi-Fi sensing system predicts
j during the experiments.
In the confusion matrices in
Figure 6, we observe that while existing schemes result in very low classification accuracy in some environments and for some classes (i.e., the number of people), the proposed method achieves high classification accuracy across all environments and classes, with very small differences in classification performance between classes. For example, in the case of the
method, the probability of correctly identifying two people when there are actually two people in the TTW environment is 0.74, and the probability of correctly identifying zero people when there are actually zero people in the Queuing environment is 0.32. In contrast, when the proposed method is applied, the probability of correctly predicting two people when there are actually two people in the TTW environment is increased to 0.91, and the probability of correctly identifying zero people when there are actually zero people in the Queuing environment is greatly improved to 0.99.
In
Figure 5, we show the box plots for the overall IOD accuracy obtained by each method. We observe that except in the TTW scenario, our method outperforms the
methods in terms of the IOD sensing accuracy. In
Figure 6, we also compare the
method to our method in terms of the confusion matrix. Let us denote
as the true class (i.e., the number of people). When we inspect the TTW scenario, we observe that even though the overall accuracy obtained by
method with two principle components (
) is higher than the accuracy acquired by our method, our method outperforms the
method with
for all
s except
. When compared to the
method with four principle components (
), our method achieves higher accuracy in all the scenarios and all the
s.
We also observe that the smaller the number of principal components used becomes, the more that information that might be useful for classification is removed from the data, which makes it difficult to distinguish between classes. For example, in the Queuing scenario, the accuracy of with is 0.68 when , while the accuracy is 0.90 when with is used. Conversely, as the number of principal components increases, the amount of deleted information decreases, but the possibility of including unnecessary information increases, which also makes it difficult to distinguish between classes. For example, in the Queuing scenario, with produces an accuracy of 0.86 when , while it becomes 0.93 when with is used. Therefore, it is difficult to find the optimal number of principal components for each scenario and for each class within each scenario. However, in both cases, our method achieves the highest accuracy. When in the Queuing scenario, the accuracy obtained by our method is 0.93, and it is 0.95 when .
5.2. Complexity Analysis of CSI Coloring
Each basic input data sample is composed of amplitude values. Given , the method performs a median operation on for each subcarrier . Even though the time complexity of the median operation depends on the sorting algorithm, it is generally supposed to be . Since there are S subcarriers, the time complexity of the method is . In the case of the method, FFT and IFFT operations are performed on s for each subcarrier s. Since the time complexities of both FFT and IFFT are , the time complexity of the method is . The computational complexity of PCA depends on the applied method and the data size. When the PCA of data is computed by eigendecomposition, the time complexity becomes . When PCA is obtained by SVD (singular value decomposition), the computation complexity becomes .
When the CSI amplitude coloring method is applied, each
in
is expanded by
,
, and
. To obtain the output values of these functions, only a fixed number of comparison operations are required. For example, as shown in Equation (
4), to determine
, up to just three comparisons with
are needed (i.e., comparison with 0.35, 0.66, and 0.89). Since the time complexity of the comparison operation is
, the computational complexity of expanding each
into three channels becomes
. Since there are
s in
, the time complexity of our CSI amplitude coloring method becomes
, which is the smallest among the preprocessing methods.
We measure the time it takes for each method to preprocess
. When we measure the real-time performance, we use the same hardware and software to train the CNN. We show the measurement results in
Table 1. In the table, the
column shows the time to make
from the measured CSI dataset. The gray image shows the smallest amount of time because other preprocessing methods are performed after the gray image (i.e.,
) is constructed. In the table, we observe that our method and the
method take a similar amount of time to preprocess
. Our method is
faster than the
method. Compared to the
method, our method reduces the preprocessing time by a factor of
. Therefore, considering the sensing accuracy and preprocessing time, we believe that the proposed method is a better choice for resource-constrained devices than other conventional methods.
5.3. Influence of CSI Amplitude Coloring
To identify the root cause of these performance differences, we examine the CSI data distribution for class 2 (i.e., two people) in the Queuing scenario. In this case, when the
method is used, the rate at which the CNN incorrectly classifies CSI data belonging to class 2 as class 4 is
. However, when our method is used, the misclassfication rate dramatically decreases to
. To understand the factors contributing to the improved results, we conduct an analysis using t-SNE (t-Distributed Stochastic Neighbor Embedding) plots. t-SNE is a machine learning algorithm used for the visualization of high-dimensional data [
37]. t-SNE is a popular tool for visualizing complex datasets in a low-dimensional space (typically 2D or 3D), making it easier to identify patterns, clusters, and relationships that might be hidden in higher dimensions.
Figure 7 shows the t-SNE plots of the CSI data at the input layer of CNN, and
Figure 8 shows the t-SNE plots of the CSI data at the last layer of the CNN, which contains the features that CNN uses for indoor occupancy classification. In these figures, black dots indicate CSI data that both belong to and are correctly classified as class 2 (denoted as C2P2), while blue dots signify the CSI data that are both from class 4 and correctly classified (denoted as C4P4). Red dots represent CSI data belonging to class 2 but mistakenly classified as class 4. They are denoted as C2P4. The spatial distribution of the data points in
Figure 7a reveals that the red dots are in closer proximity to the blue dots compared to the black dots. This arrangement indicates that in the CNN feature space shown in
Figure 8a, the CSI data associated with C2P4 are more closely related to the CSI data from C4P4 than to the CSI data from C2P2. As a result, CNN mistakenly categorizes the CSI data from C2P4 as belonging to class 4.
Figure 7b,
Figure 7c, and
Figure 7d respectively show the t-SNE plots for the red, green, and blue channel data when the measured CSI data belonging to each class are segmented into each channel. In the green channel, we observe that the CSI data for C2P4 remain closer to C4P4 than to C2P2. However, this relationship inverts in the red and blue channels, where C2P4 data align more closely with C2P2 data than with C4P4 data because our CSI data dimension expansion method amplifies the subtle differences between C2P4 and C4P4. In
Figure 8b–d, we also examine the t-SNEs of the final classification features extracted after the data from each channel have been processed through CNN. These figures reveal that when red channel data and blue channel data are fed into CNN, the red dots (C2P4 data) are positioned nearer to the black dots (C2P2 data) than to the blue dots (C4P4 data).
Figure 8e illustrates the t-SNE plot for the data collected at the last layer of CNN after all RGB channel data go through CNN. As evidenced in this figure, by partitioning the measured CSI data into three distinct channels, CNN gains the ability to delineate more distinct boundaries between CSI data from different classes. In other words, the tripartite channel approach that amplifies the subtle differences among the classes in terms of the measured CSI allows CNN to leverage varied data representations, leading to improved discrimination between classes.
5.4. Performance on Other Datasets
To further show the validity of our proposed method, we validate our method by applying it to the publicly available datasets for people counting by Wi-Fi sensing. The first dataset is named the EHUCOUNT dataset described in [
15]. The second dataset is the dataset in [
38]. Henceforth, we call the second dataset the RTV dataset.
The ETHCOUNT dataset is constructed by capturing Wi-Fi signals in six different indoor scenarios over facilities of the faculty of engineering of the university of the Basque Country. The six scenarios are A (Office), B (Lab), C (Corridor), D (Hall + Stairs), E (Corridor), and F (Corridor). Depending on the scenario, the number of people in the scenarios is from 3 to 5, and the number of CSI traces per the number of people and the scenario ranges between 12,000 and 15,000. In the corridor scenario, people maintain a direction for a while before changing it, while people wander in the room scenario (A, B, D). In all scenarios, volunteers are instructed to move slower than 3 km/h. The RTV dataset is constructed by collecting the CSI data in three rooms, which differ in size, the number of furniture pieces, and their locations. Room A is a small-size office room (5 m × 5 m) and room B is a medium-size meeting room (5 m × 9 m) while room C is a large-size meeting room (6 m × 12.5 m). In each scenario, up to 7 people are in the rooms and 5000 CSI samples are collected for each number of people (i.e., 0 to 7). During the CSI collection, the people in the rooms move randomly around or stand still without any guidelines.
Table 2 and
Table 3 show the average sensing accuracy when different CSI preprocessing methods are applied to each dataset. We observe that in most scenarios, the
method shows the best performance. This is attributed to the fact that the CSI measurement environments for EHUCOUNT and RTV datasets are more favorable for the Wi-Fi signal propagation than our datasets. The biggest difference between the experimental environments of these datasets and our experimental environments is that in the experimental settings for EHUCOUNT and RTV datasets, the transmitter and CSI receiver are located in the same space, while in our experimental environments (especially the TTW and Corner scenarios), the transmitter and CSI receiver are located in different spaces. In case of our Queuing scenario, even though the transmitter and CSI receiver are in the same space, there is no direct signal path between them. In other words, while the EHUCOUNT and RTV datasets measure CSI in line-of-sight (LoS) environments between the transmitter and receiver, we measure CSI in non-line-of-sight (NLoS) environments. Thus, more noise is included in our CSI dataset. As a result, the statistical characteristics of CSIs belonging to different classes become less distinct and are more affected by noise when our dataset is used. Because the statistical differences between the CSI data belonging to different classes are large in the EHUCOUNT and RTV datasets, the
method effectively distinguishes between classes. However, we observe in these tables that our method is compatible with other CSI preprocessing methods in terms of the sensing accuracy.
To test the generality and the robustness of our method, we extend the experimental scope by carrying out additional experiments in different indoor environments. The first dataset is collected in a typical laboratory environment, which we will call the dataset. The size of the laboratory is 5.6 m × 3.4 m. We locate a CSI receiver on the table positioned at the center of the laboratory and place a transmitter at the center of the left side of the laboratory. The laboratory is equipped with typical office furniture and supplies, including seven tables and six chairs. Up to six people participate in the experiments in the environment. Each participant sits on the designated chair and works on a computer or reads papers but does not move around during the experiment. The second dataset, which we name 2, is constructed in an environment similar to . However, the measurement location for the 2 dataset is different from that for the dataset. The size of the room, the number and arrangement of furniture, and the material of the walls are also different. In addition, the positions of the transmitter and CSI receiver are reversed. In other words, in the environment, the Wi-Fi transmitter is located in the room where the targets to be detected are present, while the CSI receiver is placed in a different room from the transmitter. On the other hand, in the 2 environment, the CSI receiver is located in the room with the targets while the transmitter is placed in a different room from the CSI receiver. Furthermore, unlike the environment, where someone other than the experiment participants can move freely between the two rooms without any restriction, in the 2 environment, the CSI is measured under controlled conditions, where no one is allowed to pass between the two rooms. In the 2 environment, the number of people in a room ranges from zero to four.
In
Table 4, we show the average sensing accuracy obtained by different CSI preprocessing methods when they are applied to the
dataset and the
2 dataset. In the case of the
2 dataset, the average sensing accuracy is the same as 0.99 regardless of the preprocessing methods used. This is attributed to the fact that compared to other datasets, the differences among the CSI data belonging to each class in the
2 dataset are significant. As a result, regardless of the different CSI features extracted by various preprocessing techniques, the CNN is able to accurately distinguish the CSI data belonging to each class. On the contrary, when we observe the results of the
dataset, our method and the PCA method show the best performance.
Table 5 shows the rankings of each method across various environments in terms of the average IOD accuracy. In the table, the number corresponding to the
i-th row and
j-th column represents the accuracy ranking of method
i in environment
j, with a smaller number indicating higher accuracy. In the table, we observe that there is no preprocessing method that consistently delivers the best performance across all environments. This result agrees with the claims made in [
15]. We also observe in the table that the performance of the proposed method ranks second or higher in 10 out of 14 real-world Wi-Fi sensing environments. The results suggest that compared to conventional representative methods, our CSI amplitude coloring method is a more universal and comprehensive preprocessing method for IOD via Wi-Fi sensing, which aligns with our goal of finding a method that performs well in most environments.