1. Introduction
Today, with the increase in large indoor spaces, it is often necessary to check one’s location in buildings, but the most commonly used location system, GPS (Global Positioning System), cannot provide the required accuracy sufficiently indoors [
1]. Many effective indoor positioning schemes have been proposed, such as infrared and ultrasonic [
2], RF (Radio Frequency) [
3], BLE (Bluetooth Low Energy) [
4], WiFi [
5], and dead reckoning [
6]. Among these, indoor positioning using BLE is characterized by NLOS (Non-Line-of-Sight) transmission, long-term availability, and low cost.
Since it is difficult to estimate the position directly from the RSSI (Received Signal Strength Indicator), position fingerprinting [
7,
8] is often used: a database is constructed by recording the RSSI of each indoor location in advance, and the position is then estimated by comparing the RSSI of the test point with the database. Relatively high accuracy can be achieved with fingerprinting methods by optimizing various machine learning algorithms for different indoor scenarios. For example, related research has focused on achieving high-precision positioning through filters and deep learning models like CNNs [
9], exploring multi-channel signal processing [
10], or enhancing predictive accuracy by optimizing neural networks [
11,
12,
13] and other machine learning methods, such as k-NN [
8], SVR [
14,
15], and ridge regression [
16,
17]. There has also been research on high-precision positioning through filters and CNN (Convolutional Neural Network) deep learning [
9], as well as exploration of effective improvement solutions by comparing the processing of Bluetooth signals from different channels [
10]. Some studies are enhancing the accuracy of predictions by improving and optimizing neural network models like CNN [
12], BPNN (Back-Propagation Neural Network) [
11,
13], or optimizing other machine learning methods, such as k-NN (k-Nearest Neighbor) [
8], SVR (Support Vector Regression) [
14,
15], and ridge regression [
16,
17]. Additionally, some researchers have also been flexibly using WiFi fingerprinting to estimate parking space availability [
18].
In the above methods, the positioning algorithms or filters were optimized to deliver better performance. However, even with the fingerprinting method, the RSSI database often contains errors that lead to inaccurate results due to the irregular indoor environment. It is hypothesized that the causes of environmental errors can be divided into two parts: noise generated by random disturbances, and fixed absorption and reflection due to room layout and furniture. Errors due to random noise can be reduced by applying filters, but fixed RSSI offsets due to environmental factors cannot be corrected using filters. Although the fingerprinting method allows prediction based not on distance but on similarity, the bias due to excessive environmental factors can still cause different fingerprints to become confused with one other.
There is also a study that starts with the data and includes the use of gateways to correct RSSI fluctuations [
19]. A gateway is used to measure signal fluctuations from surrounding beacons and as a reference to correct the user’s RSSI fluctuations in real time. There is no need to prepare large amounts of fingerprint data, but it does need to be collected and processed by a separate server.
In this study, we propose a method to improve indoor positioning accuracy by correcting RSSI offsets in the preprocessing stage. Our main contributions are summarized as follows.
hlTheoretical analysis and motivation. We provide a novel analysis of the RSSI space, visualizing the relationship between signals from multiple beacons as a characteristic curve. Through simulation, we visually demonstrate how beacon placement affects this curve and how environmental factors introduce spatial offsets, thereby establishing the theoretical foundation and necessity of our approach.
A novel data-centric preprocessing workflow. We propose a novel workflow that uses a machine learning model to learn and correct the static RSSI offsets caused by the indoor environment. This approach shifts the focus from optimizing the filter or the positioning algorithm itself to fundamentally improving the quality of the database and the input data, making it a complementary enhancement for any RSSI fingerprinting-based system. While we selected SVR for our implementation due to its robustness on small datasets, the proposed workflow is model-agnostic and has also proved effective when using other models like BPNN.
A systematic and comprehensive evaluation. We conducted an extensive experimental evaluation involving 60 different combinations of processes. This included multiple positioning algorithms, various pre-processing methods, and different post-processing filters, providing a comprehensive analysis of our method’s impact.
Validation on self-made and public datasets for generalizability. To demonstrate the method’s robustness, we validated its effectiveness on two distinct datasets: a self-collected dataset from a dynamic, real-world laboratory environment and a publicly available benchmark dataset. The consistent performance improvements across both datasets confirm the generalizability of our approach.
Table 1 shows a comparison of the characteristics of our method and some related methods.
The remainder of this paper is organized as follows.
Section 2 reviews the related works in indoor positioning and the fingerprinting method.
Section 3 presents our proposed system in detail, including the theoretical analysis of the RSSI space and the principles of the offset correction workflow.
Section 4 and
Section 5 describe the experimental settings and discuss the results, respectively.
Section 6 provides further validation of our method using a public dataset. Finally,
Section 7 concludes this paper and discusses future work.
3. Configuration of the Proposed System
3.1. Analysis and Simulations of RSSI Space
In this simple simulation, detailed RSSI fluctuations and multipath effects are ignored for the moment, with the aim of targeting only the analysis of the possible consequences of RSSI offsets.
For easy visualization and understanding, using the simulation example of the situation with two BLE beacons, the side view of the room is captured, as shown in
Figure 2. The two BLE beacons are in the center of the ceiling, and several fingerprint points are in a straight line at a certain distance near the floor. Then, the distance
d from a fingerprint point to a beacon is as follows:
where
x is the horizontal coordinate of the fingerprint point,
is the horizontal coordinate of the beacon, and
h is the vertical height of the beacon to the fingerprint point.
And, for the distances
and
from the same point to two beacons, the relationship between them can be obtained by substitution:
Further, in Equation (
1), the RSSI is described through the use of
d, which we can translate into using the RSSI to describe
d like Equation (
4):
Also, the relationship between
and
can be obtained by substituting the expressions for
and
into Equation (
3). A relationship curve in RSSI space is then formed as described below.
The two RSSIs from two BLE beacons received by points on the same line always satisfy the relation curve in
Figure 3a, which is shown in RSSI space as non-monotonic U-shaped turns at both ends.
Similarly, the relation for more BLE beacons can be applied to higher-dimensional manifolds. By learning this relation, the principle of positioning is to find a point on the curve that is closest to the test point. However, take
Figure 3b for an example, when simulating the effect of furniture absorption, it is added to near one of the beacons. The ideal curve may be shifted irregularly due to the environment, resulting in large errors in the estimation of some locations. Consequently, for a test point near Locations 9 or 10, the positioning algorithm might incorrectly estimate its position to be near Location 16, which is not in the original neighborhood.
Further, the BLEs are in the median positions of the ceilings, resulting in the same RSSI intensity to the left and right of each BLE. Imagine if BLEs were placed in the corners like
Figure 4, then there would be no question of BLEs having the same RSSIs on the left and right sides. The non-monotonic U-shaped curve in RSSI space would become a monotonic curve, as shown in
Figure 5. This will slightly reduce the level of RSSI confusion at the turns, i.e., near each beacon. This is demonstrated in later experiments.
3.2. Principle of the Proposed Method
The theoretical premise of our proposed method is grounded in manifold learning. We conceptualize the “ideal curve” derived from the LNSM as a representation of a clean manifold in RSSI space. In a real environment, physical factors like absorption and multipath distort this into a complex but still structured measured manifold. Our fundamental hypothesis is that a regular, non-linear mapping function f exists that can transform the distorted measured manifold back toward the ideal one.
Consequently, the problem is framed as a multivariate regression task: to learn the optimal mapping function . The “optimization process” involves training a regression model (e.g., SVR or BPNN) to find the parameters of f that minimize a loss function over the known fingerprint data. The trained model, thus, captures the complex environmental offsets, allowing it to correct the RSSI of new test points in the online phase.
Mainstream research, as reviewed previously, has primarily focused on improving positioning quality by enhancing the traditional fingerprinting method (
Figure 1) with advanced filters, optimized machine learning algorithms, or multi-sensor fusion. However, this study mainly proposes an RSSI offset correction method that differs from modifying the positioning algorithm or simple signal filtering.
As illustrated in
Figure 6, our proposed method enhances the traditional fingerprinting process with several new steps, which are highlighted in gray. By learning the relationship between the offset and the ideal RSSI, the RSSI offset in the online phase is corrected toward the ideal state, and then the ideal fingerprint database is referenced.
3.3. Ideal RSSI Calculation
To calculate the ideal RSSI, an average decay curve for the specific environment must first be established. From the LNSM, the decay of the RSSI does not follow the curve exactly but contains a great deal of noise. This can be seen by analyzing the distance to the BLE beacon and the corresponding RSSI for each point in the fingerprint database (
Figure 7).
To restore the average decay curve and calculate the ideal RSSI, the average value of each fingerprint point is counted, and the decay parameters of the ideal RSSI-distance decay curve are fitted according to LNSM. The ideal RSSI for each fingerprint point is then calculated from this fitted curve, and these values collectively form the ideal fingerprint database.
The logarithmic decay curves were fitted to the RSSI–distance data of each beacon using nonlinear least squares.
Table 2 summarizes the fitted parameters and the fitting accuracy indices.
3.4. Filtering
Since the periodic channel hopping characteristic of Bluetooth broadcasting affects the RSSI strength [
22], the online phase requires real-time filtering of the signal before real-time correction. In this experiment, moving average filtering with less delay and less computation is used here for real-time noise reduction, as shown in
Figure 8.
3.5. Correction Model Training
We found that BPNNs are complicated to debug on small samples due to their training instability. To achieve effective results, the model needs to be trained on the complete RSSI sequences for all points over a certain period. SVR, on the other hand, can achieve effective results by learning from only the average RSSI of each point. As such, after comparative analysis, SVR was used for RSSI correction in this experiment. To prevent confusion, it is important to add that the SVR used for correction is not the SVR used for position estimation; the two processes are independent of each other.
The original RSSI of each fingerprint point is used as a data feature, and the ideal RSSI is trained as a data label. The trained model is saved, and the model is used in the online phase to make real-time corrections to the RSSI of the test points.
5. Results
Due to the large amount of data, we divided the information into three main groups for comparison: the raw data group, the MAF pre-processing group, and the correction pre-processing group. Accuracy was measured by the average error (cm) of all points over 150 s, and stability was measured by the number of curve peaks in total. Considering that the filter produces a delay, we preferred to control the delay within an acceptable range (less than 10 beacon-sending cycles) by modifying the parameters.
5.1. Raw Data
As shown in
Figure 12 and
Table 4, the positioning results for unprocessed data (blue line) exhibited very large errors and fluctuations, and in this environment, different post-processing filters were all able to substantially stabilize the positioning results by filtering out the noise.
5.2. MAF Pre-Processing
As shown in
Figure 13 and
Table 5, the data processed by the moving average filtering pre-processing (blue line) clearly showed a substantial improvement in accuracy. Except for k-NN, which still had some fluctuations due to its discrete output, all the other localization algorithms obtained more stable curves. In this setting, applying different post-processing filters yielded no significant improvement in accuracy but rather served to further smooth the positioning curve.
5.3. Correction Pre-Processing
As shown in
Figure 14 and
Table 6, the correction preprocessed data (blue line) had a much smaller mean error. Post-processing filtering similarly had no significant effect on reducing the mean error but rather smoothed the curve further.
5.4. Analysis and Summary of Results
To illustrate the improvement achieved by the proposed correction method,
Figure 15 and
Table 7 show the average error sizes at various points across all experiments. It can be observed that the maximum, minimum, and average errors were all reduced. Notably, due to the complex indoor environment (including metal lockers and whiteboards), errors at certain points increased slightly. We attributed this to severe, localized interference causing the correction model to generate an inaccurate mapping in these specific areas, a phenomenon that warrants further investigation.
We also examined the corrected RSSI space to verify the effectiveness of the method. However, compared to the previous simulation in
Figure 5, there were not enough points directly above the line connecting the two BLE beacons in this experiment to observe the effect. Therefore, we selected two longer beacon connection lines and test points within 1 m of these lines as references. In
Figure 16, it can be seen that, after correction, the average RSSI of the test points is closer to the ideal curve, indicating that the RSSI space had been partially corrected.
We also performed a series of paired samples t-tests (with two-tailed p-values) to systematically evaluate the significance of our method’s improvement for each of the four positioning algorithms. The results are informative.
For k-NN (p = 0.033) and BPNN (p= 0.037), our MAF + Correction method provided a significant improvement in accuracy compared to the MAF-only method. This demonstrates that our data enhancement is highly effective for these representative algorithms.
For SVR (p = 0.38), the improvement was not significant. This is an insightful finding that we believe is due to the inherent robustness of the SVR algorithm itself. SVR is known for its insensitivity to noise and outliers. We hypothesize that the SVR positioning model is already effective at handling the raw, uncorrected data, thus leaving less room for improvement from our preprocessing step.
For Ridge Regression (p = 0.057), the results showed a strong trend and were borderline significant.
In summary, effective error reduction primarily resides in pre-processing, emphasizing its necessity for handling noisy data, particularly for positioning algorithms that are more sensitive to input data quality. On average, across all algorithms, applying only MAF pre-processing improved accuracy by approximately 21% compared to raw data, while our additional RSSI correction method provided a further 6% improvement.
Post-processing filtering is optional. Concerning positioning methods, k-NN has simpler deployment and a favorable average error. However, due to its discrete output, post-processing provides a more stable effect. In contrast, algorithms like SVR, with continuous outputs, have limited improvement potential.
Users can weigh the need for post-processing based on stability and delay requirements. The effect of various post-processing filters with the same delay is similar. Therefore, opting for filters with lower computational complexity, such as MAF or MMF, could conserve computational resources.
7. Future Works
Due to resource limitations, this experiment was only conducted in two laboratory environments. However, it is apparent that the data volume and environment are limited. To validate the method’s broad applicability, future experiments should be conducted in diverse environments, such as low-interference settings, large-scale spaces, and complex structures.
Furthermore, while the experiments demonstrated the method’s effectiveness, no complex machine learning techniques, such as CNN or LSTM, were introduced for comparison during the correction and positioning stages. Nor were direct comparisons or integrations with other data quality enhancement methods conducted. In the future, supplementary experiments will be performed to further validate or improve the method, ensuring a comprehensive evaluation of its usefulness.