1. Introduction
The demand for accurate indoor positioning services has grown significantly in the last decade, resulting in an increasing interest in localization methods that work in environments where GPS (Global Positioning System) signal is unreliable or unavailable at all.
The application areas that may benefit from reliable personal localization in GPS-denied environments are numerous. However, the largest interest in indoor positioning is generated by applications related to new business and advertising opportunities. Location-aware advertising and the possibility to analyze the traffic patterns of consumers make the indoor localization services attractive to owners or managers of many commercial buildings. This trend is confirmed by a market report [
1] that predicts very fast growth of the indoor positioning systems market.
Among the approaches to indoor positioning, the most interesting are those that are infrastructure-free: they do not need to deploy any dedicated active beacons (e.g., iBeacons [
2]) or artificial landmarks in the environment, which reduces the costs of installation and maintenance of a localization system. On the other hand, to make the localization services available to a large number of potential users, smartphones and other mobile computing devices like tablets are widely used as the sensing component of indoor positioning systems [
3].
Systems utilizing camera images [
4,
5], Bluetooth Low Energy [
6], anomalies of the ambient magnetic field [
7], and even light sources [
8] have been proposed for localization with smartphones. A brief survey of the indoor positioning techniques focusing on the suitability of different sensing modalities is presented in Ref. [
9]. However, the most widely used localization modality for personal indoor positioning is the WiFi signal. Nowadays, WiFi networks are ubiquitous in modern buildings making WiFi-based positioning affordable and relatively easy to deploy on site. In recent years, a number of companies attempted to commercialize their localization solutions. In specific sites commercial mobile location services are provided by Skyhook (Skyhook website:
https://www.skyhook.com/), which offers hybrid positioning technology involving WiFi, GPS, cell towers, and IP addresses. More recently, the Indoo.rs company (Indoo.rs website:
https://indoo.rs/) offered a solution that mixes the iBeacons technology and WiFi-based positioning for indoor localization, deploying their system in several large public sites. However, this system and other similar solutions require dense wireless signal maps [
10], that are laborious to prepare and maintain.
Therefore, we focus on WiFi-based positioning algorithms that do not need to survey the entire environment the localization system is deployed in. The ability to localize with a limited number of known WiFi fingerprints can be achieved by employing graph-based optimization to relax a network of constraints stemming from the matches between the known fingerprints and the consecutive scans perceived along the user’s trajectory. Constraints imposed by the motion of the user, estimated by a Personal Dead Reckoning (PDR) system on the smartphone are used to join the consecutive poses into a trajectory. A single-user indoor positioning system based on this approach has been presented in our paper [
11].
In this journal article we go beyond the previous approach, making a number of original contributions over our earlier research and the state-of-the-art:
We re-formulate the graph-based optimization approach for PDR and WiFi constraints to jointly optimize the trajectories of multiple users in the same building, improving the accuracy of the estimated position for involved users.
We introduce the hyperedges that efficiently represent in the graph multiple constraints imposed on the agent pose by positioning with the Weighted k-Nearest Neighbor technique.
We show how the user’s step length, which is a key parameter in the PDR, can be jointly estimated in the graph-based formulation during the localization.
We demonstrate that with the presented approach it is possible to update the existing WiFi map.
2. Related Work
In this research we are interested in infrastructure-free, smartphone-based indoor positioning exploring the WiFi signals from the access points already available in the environment and the proprioceptive sensors that are present in almost every modern mobile device: accelerometers, gyroscopes, and magnetometers.
Indoor positioning using WiFi signals is investigated for more than a decade. A survey of classic approaches can be found in Ref. [
12]. If positions of the access points with respect to the floor plan are known, then WiFi signal triangulation can be used. Three or more line-of-sight AP positions are necessary to determine the location of the WiFi adapter using the distance to each AP measured from the parameters of the received signal [
13]. Unfortunately, direct use of the received signals is complicated indoors, because walls, furniture, and other objects located in the environment, including the people, attenuate or scatter the WiFi signals in a way that is hard to model and predict [
14]. A solution to estimate the user position with respect to the WiFi APs using a signal propagation model was proposed in Ref. [
15]. This approach using also inertial data and employing Kalman filtering is computationally efficient and runs on a smartphone, but in real indoor environments it yields inaccurate results due to the simplified signal propagation model. Although it is possible to build a map of WiFi signal strength using Gaussian processes [
16], thus avoiding to model explicitly the propagation of signals, this method is computation intensive and requires a laborious survey of the whole area under consideration.
Conversely, approaches employing WiFi fingerprinting do not need to model the signal propagation at all and do not need to determine exact positions of the APs, or even to know the number of the APs deployed in the area. This is of great practical importance, as in many environments not only a location of objects altering the WiFi signals changes frequently, but some wireless networks appear or disappear at random, because routers, printers, and other devices are often moved between locations.
Currently, the most widely applied fingerprinting-based positioning technique is Weighted
k-Nearest Neighbor (WKNN) [
17]. This relatively simple method selects
k most similar fingerprints from the pre-surveyed database describing the environment and computes the user position as a weighted average of the positions of these fingerprints. Weights are related to the similarity between the matched fingerprints. A drawback of this technique is the need to manually tune its parameters for the given environment. Although attempts to develop WiFi-based indoor positioning algorithms that require a minimum effort at the design stage, for example by employing end-to-end machine learning [
18], have been presented in the literature, the achieved localization accuracy is not much improved with respect to WKNN.
In general, the need for a database of densely distributed scans is a common drawback of positioning algorithms using only the WiFi fingerprints. However, in practical applications, we rarely are facing the problem of global localization with WiFi data, without any prior information about the state of the agent. In personal localization with smartphones, a prediction of the user’s current pose giving the previous pose on the path can be achieved by means of PDR, exploiting the inertial sensors of the mobile device. Combining WiFi fingerprinting with PDR pose estimates makes it possible to employ filtering algorithms to track the position of the user. The Extended Kalman Filter (EKF) [
19] allows the positioning system to run on a device with limited computing resources. Unfortunately, the EKF algorithm is prone to linearization errors on the highly uncertain measurements from a smartphone and does not support non-Gaussian user position hypotheses, which can emerge due to the ambiguity in WiFi scan matching. The most popular alternative approach is Particle Filter (PF), which can track multiple hypotheses of the user position [
20], and enables to easily incorporate the user position constraints defined by the known environment map [
21]. Recently, an indoor localization system employing an advanced version of PF was successfully demonstrated in a challenging scenario of a museum site in a large historic building [
22]. On the other hand, PF often requires a large number of particles to converge to the true position and is computation intensive, which makes it less suitable for implementation on a smartphone [
23]. The performance of EKF and PF in the context of indoor positioning is compared in Ref. [
24].
Another prominent approach borrowed from the area of mobile robotics is graph-based localization. Nowadays, solving the Simultaneous Localization and Mapping (SLAM) problem transformed into a graph of constraints by means of computationally efficient non-linear optimization is considered the state-of-the-art in robotics [
25]. Graphs are convenient data structures to encode topological relations, hence a number of indoor positioning systems employ graphs to represent the indoor space or the possible user paths. Hilsenbeck et al. [
26] use a graph structure to achieve an adaptive quantization of the indoor space that in turn reduces the state-space dimensionality in a particle filter-based localization algorithm that combines PDR and WiFi signal strength measurements. This results in a substantially reduced number of particles in the filter and improved performance on a smartphone. Available walking paths abstracted from a floor plan are represented as a graph in Ref. [
27]. This approach assumes that users move only along edges in the graph that form Markov chains. The limited number of possible motions encoded by Markov chains makes it possible to improve the performance of Kalman filter used to fuse WiFi fingerprinting and inertial measurements of the user’s pose. Graphs used just to represent the environment’s topology do not encode the constraints related to measurements, and therefore cannot represent the entire localization problem, like the graphs used in SLAM do. Although graph optimization in SLAM is usually applied in the context of range-based, visual or inertial-visual sensing, it has been demonstrated [
28] that the idea of graph optimization can be used to integrate information obtained with several sensing modalities in a plug-and-play manner. Signal strength from WiFi APs was also used as the sole source of localization information in an off-line SLAM system employing graph optimization [
29]. Whereas none of these publications tackled the problem of modelling WiFi fingerprinting constraints in a graph-optimization framework, our previous research [
11] demonstrated the feasibility of the so-called factor graphs [
30] for modelling and integrating the constraints imposed by WiFi and PDR in the indoor positioning problem. Assuming Gaussian sensor noise, factor graphs provide an efficient way to represent large, non-linear least-squares optimization problems, which are typical in robotic SLAM, but are present also whenever a large number of WiFi fingerprints and user motion estimates have to be fused into a coherent representation. We have demonstrated the real-time performance of this approach on a typical smartphone employing a general-purpose graph-optimization software framework (library) commonly used in robotics. Very recently, this system was extended by adding non-metric constraints [
31], which improved robustness and allowed to handle multi-floor environments. Among the indoor positioning systems described in the literature, the approach most similar to our system is the FastGraph [
32]. Although this work has been published very recently, it does not take advantage of the existing graph-optimization packages, but constructs a graph of constraints that explicitly consider the WiFi signal propagation using the Log-Distance Path Loss Model. This graph is then relaxed to a minimum energy state using a force-directed approach following the spring analogy for edges-constraints. Comparing it to our approach, FastGraph in its default configuration employs dedicated devices to collect WiFi fingerprints, whereas we need only smartphones/tablets held by the users, and requires manually setting a number of values that govern the relaxation process. Our probabilistic solution requires only to define sensor models that determine the behavior of constraints.
Even though the PF and graph-based localization methods offer satisfying accuracy and robustness, major obstacles that hinder the wider acceptance of WiFi-based indoor positioning in commercial applications are the time and effort necessary to build and maintain a database of fingerprints. An example of a large database of WiFi data is given in Ref. [
33]. It was collected by volunteer users, locating the WiFi scans upon a manual input on the known map. Although the volunteers could roughly estimate their position using a variant of the WKNN approach, the process of collecting the data was laborious, and it is one of the few data sets of this kind that are available. Hence, approaches that do not need such a database, or can collect it automatically are of great practical importance [
34]. Automatization of the site survey by using another localization system, e.g., RGB-D SLAM [
35], requires additional equipment and is rather impractical for large environments such as shopping malls. Thus, approaches such as UnLoc [
36] started to emerge. UnLoc is a smartphone-based positioning system that exploits the possibility to track mobile devices of multiple users by means of dead reckoning and landmarks, in order to collect the database of WiFi signals without any explicit involvement of these users. A similar idea was implemented in Ref. [
21] using particle filtering and leveraging the role of a known map of the environment. Laoudias et al. [
20] used the same general idea, termed crowdsourcing, to avoid the need to survey the site, and an explicit participation of the users of diverse mobile devices in collecting the WiFi signals. The WicLoc system [
37] employed the WKNN algorithm to assign weights to APs and to achieve room-level localization from crowdsourced data. Most of the crowdsourcing methods employ point-based maps of the fingerprints, where the WiFi scans are pinpointed to locations distributed in a regular grid or along predefined paths. A different approach is taken in Ref. [
38]. The map of WiFi fingerprints is abstracted by the semantics graph, with possible paths of the users as edges, and specific locations as vertices. Smartphone-based dead reckoning and user activity detection are employed to locate the trajectories and construct a map of fingerprints.
Although crowdsourcing methods become increasingly popular for indoor positioning with smartphones, the approaches proposed in this area do not take advantage of the existing body of research in multi-robot localization and SLAM. A recent survey on multi-robot SLAM [
39] suggests that approaches based on graph optimization (i.e., factor graphs) efficiently handle constraints between the agents when they observe the same environment, which is essential to localize multiple users and build the map simultaneously with a minimal involvement of these users.
3. Graph-Based Optimization
In this research, we adopt an approach to SLAM commonly used in robotics to the problem of indoor positioning with WiFi fingerprints and inertial estimation of the user motion. In the last decade, non-linear optimization algorithms started to be applied to the SLAM problem [
25]. In this formulation of SLAM, the maximum-a-posteriori (MAP) estimate of the unknown variables is computed given a set of measurements, that can be expressed as a function of the agent state. This approach avoids large linearization errors and approximations made by the SLAM algorithms employing filtration, thus yielding accurate estimates of the agent state [
40]. A full probabilistic formulation of SLAM as a Bayesian filtering problem with explanation of the essential assumptions as to the Markovian nature of the observed processes and statistical independence of variables can be found in the vast body of literature on robot localization, e.g., in our survey paper [
41]. However, for the sake of clarity, we provide here a brief introduction to the factor graph formulation of SLAM, focusing on the physical meaning of particular components. Later, we use this formulation to comment the changes that were necessary to adopt this framework for indoor positioning.
The SLAM problem is cast as estimating the probability distribution of the state vector
, given the measurements vector
. If the measurement errors follow a zero mean normal distribution, then the likelihood of the measurements is Gaussian. The distribution over the states given the measurements is proportional to the likelihood of the measurements given the states:
where
is the prediction of the measurements, given the state, and
is the information matrix of the conditional measurements. To solve the problem we need to compute a state
that maximizes (
1):
where assuming the total number of
M measurements,
yields the
m-th predicted measurement/observation applying the sensor model. Defining the difference between the
m-th observation and its prediction as the error function:
we can define the minimized function in Equation (
2) as:
Using the Taylor approximation (the sensor model is a non-linear function of the state), we re-write (
4) applying small increments
and the linearization point
in the state space:
where
with the Jacobians
. Then, it is easy to show that the
minimizing the quadratic form in Equation (
5) can be found solving the equation:
where the Hessian is defined as
, and
. Due to the non-linearity of the measurement model we need to apply a non-linear numerical method to solve (
6), i.e., Gauss-Newton or Levenberg-Marquardt algorithm. In particular, the solution can be computed efficiently modelling the problem as a factor graph. Once proper error functions with sensor models and their Jacobians are defined, the state vector minimizing (
6) can be computed by one of the available generic factor graph-optimization packages applying a very efficient numerical solver [
25].
In the case of indoor positioning, the instantaneous agent state is just its 2-D pose (
x,
y position and heading angle
) in the environment, while the unknown state variables
are representing the set of past poses (i.e., agent trajectory) and the environment map, which is represented as a discrete set of landmarks that can be observed by the agent. The dependencies between the poses of the agent, the detected landmarks, and the measurements can be conveniently represented as a factor graph (
Figure 1), whose vertices are the poses of the agent and the landmarks, and the edges represent constraints imposed by the motion of the agent, and measurements of the landmark positions, The measurements are represented in the graph as factors, shown in
Figure 1 by small black rectangles.
In our system the factor graph has two kinds of vertices: agent poses are represented by
, while
represents landmarks, thus the state is formally defined as
. The edges represent either the constraints
imposed by agent motion between the
-th and
i-th pose, or the constraints
, stemming from the measurements between the
i-th pose of the agent, and the
j-th landmark. Whenever
k landmarks are measured from the same pose of the user at the same time a “hyperedge” can appear in the graph that represents all the
k constraints
for
imposed by the simultaneous measurements. The uncertainty of constraints imposed by landmark observations is represented by their information matrices
computed upon the empirical model of measurements, while the uncertainty of motion constraints
depends on the accuracy of the agent ego-motion estimation. Considering single landmark-to-pose constraints as hyperedges with
k = 1 within the introduced graph structure, the function to minimize in the localization problem is given as:
where
n is the number of recorded agent poses,
and
are error functions for the landmark-to-pose and pose-to-pose constraints, respectively.
Some factors can represent loop closures, which are constraints between poses representing the locations that are re-visited by the agent. These constraints are particularly important, as they inject new information to the graph, thus enabling the optimization algorithm to reduce the agent trajectory drift. Optimization reduces the drift only if at least one vertex in the factor graph has at least two incoming edges. This implies the existence of constraints other than those produced by agent motion along the trajectory. It should be noticed, that constrains from wrong loop closures or spurious landmark measurements have a negative influence on the optimization process because of the least-squares formulation of the estimation problem represented by the factor graph [
42].
This framework is often implemented by a system divided into two parts: the front-end, that abstracts sensory measurements into the graph-based representation that is amenable for optimization, and the back-end, that employs a graph-optimization library to solve (
7). One of the most widely used factor graph-optimization packages is the g
o [
43], employed in a number of robotic SLAM systems. This library is also used in our system and is configured to apply the Preconditioned Conjugate Gradient algorithm (PCG), which is fast but requires a good initial guess of the location of the graph vertices. This choice is a trade-off between robustness and computation speed. When using the PCG, the memory consumption grows linearly in the number of state variables. Although in theory factor graph optimization has cubic complexity in the number of state variables, the actual complexity is much lower due to the sparse nature of the SLAM problem. Following the argumentation given by Strasdat et al. [
40] we consider three operations that cause the computation burden: evaluation of the error functions and Jacobians, building the linear system, and solving the linear system. The first operation is linear in the number of observations, thus the worst-case complexity can be estimated as
for
n agent poses and
m measurements, assuming that all fingerprints are seen from all poses, while in practice only a small fraction of the fingerprints is detectable from each pose. The second operation is quadratic in
n and linear in
m, while the third one is cubic in
n. Because in the indoor positioning task the number of poses is relatively small, while the solvers available in g
o efficiently exploit sparsity of Hessian in the linear system [
43], the cost of computations is dominated by evaluating the Jacobians and building the linear system. Hence, the computational complexity can be estimated rather as
.
Adopting the SLAM framework to indoor positioning with WiFi data requires to re-define the notion of landmarks, as in our formulation the “landmarks” represent the locations of WiFi fingerprints. The pose-to-pose constraints, that in mobile robotics are usually established using odometry (either wheeled or visual), are yielded by PDR in the case of our smartphone-based personal localization system. In turn, the use of both, WiFi scans and PDR, requires proper definitions of the error functions
and
, respectively. These functions should account for the uncertainty characteristics that are specific to the measurements taken with the limited accuracy sensors of a smartphone [
11].
6. Experimental Evaluation
The presented system is focused on utilizing graph-based optimization and multiple user trajectories to increase the accuracy and to make it possible to build a WiFi map with minimal effort. We are not focusing on the performance in the single-user case, as we believe that the proposed approach does not outperform particle filtering with respect to the localization accuracy of an individual user on a dense enough WiFi map. Therefore, the dataset required in our case should contain as many trajectories on a relatively small area as possible. We are unaware of existence of such a dataset, and therefore we have recorded our own sequences, that are made publicly available together with the open-source code of our system.
6.1. Experimental Setup
The experiments were conducted on the third floor of the Centre of Mechatronics, Biomechanics, and Nanoengineering at Poznan University of Technology. The building has multiple corridors that can be used to access laboratories and lecture halls. In our experiments, we recorded 24 sequences that users would usually make to access these rooms that varied when it comes to the length. Two people were involved in recording the sequences: one responsible for recording 19 sequences and another one that recorded 5 sequences. We used a Sony Xperia Z3 smartphone to capture WiFi, accelerometer and orientation data for further processing. The detailed lengths of the sequences are presented in
Table 1. The total distance covered by the sequences according to the ground-truth information is equal to approx. 1565 m. During these sequences, 346 WiFi scans were recorded to support indoor localization. The ground-truth information was obtained from users that annotated the covered paths on the prepared map of the building. The ground-truth routes are presented in
Figure 7.
In this research, we assumed that an
a priori map of WiFi fingerprints is available for the environment. Hence, we recorded WiFi fingerprints along the walls of the corridors. The scans were recorded in-motion along the assumed route with constant walking speed. The dynamic capture procedure was preferred to the static recordings, as it makes possible to quickly capture the required WiFi map, despite being slightly less accurate than the static acquisition approach. The number of APs is based on the number of different measurements in the scans and measurements at 2.4 GHz and 5 GHz from the same APs are treated as independent measurements. The recorded WiFi map consists of 114 scans, and the average number of APs per scan (i.e., user position) is equal to 21.91, but varies per location. The minimum number of APs visible in one location is equal to 10 while the maximum number is equal to 31. The locations with known WiFi scans along with the recorded number of distinct APs in each location are presented in
Figure 8.
6.2. Tuning the WiFi WKNN System
The performance of the WKNN algorithm depends on the choice of a few parameters that have to be tuned:
k - the number of scans from WiFi map used to estimate the user’s location,
- the minimal ratio of networks shared between two WiFi scans required to consider them in the WiFi WKNN approach,
- the minimal signal strength of the measurement that is used in the WiFi WKNN (it is used to remove very weak measurements),
- the type of error measure (distance norm) used to compare measurements from two scans.
Therefore, we performed a simple exhaustive search of all of the configurations to determine the best values of parameters on recorded sequences. We evaluated the following values of parameters:
,
,
,
.
The best results were obtained for
,
,
and
(Euclidean norm), with a mean error being equal to 2.86 m, third quartile of error equal to 3.66 and the maximum error of 22.47 m. The errors of the WKNN approach, in that case, can be observed in
Figure 9. We do not change these parameters in all further evaluations. Probably, more accurate localization could be obtained with a greater
k parameter when the density of WiFi map is increased, but no clear rules are known to the authors how to make these changes during operation. Therefore, these parameters remain fixed making it possible to clearly show the contribution of the proposed approach.
6.3. Localization of a Single User
Firstly, the system is evaluated in single-user scenarios. The proposed graph-based system was evaluated on the 24 recorded sequences. To enable quantitative comparison between the obtained trajectories, a translational trajectory error metric is used, which is similar in concept to the ATE (Absolute Trajectory Error) proposed in Ref. [
48]. The ATE measure is a typical tool used in the evaluation of vision-based SLAM systems in robotics when the ground truth is available. The ATE measuring tools cannot be used directly in the case of our dataset, as the ground-truth path
and the estimated trajectory
are not synchronized. This is caused by the fact that an external motion capture system or GPS cannot be used to obtain the ground-truth trajectory over long paths in natural indoor environments. As the result, the ground-truth path is obtained by the users manually marking their route with respect to the walls and objects of known positions in the floor plan.
In the proposed metric, the estimated trajectory
is firstly mapped onto the subsampled ground-truth route
by computing the rigid-body transform
that is the least- square solution to the alignment problem [
49]. Then, the trajectory error for the
i-th pose in the trajectory is computed as:
Similarly to the ATE, our metric defines the error as the translational component of and computes the Root Mean Square Error (RMSE) or the mean error over the whole sequence. Due to the lack of time synchronization, the final error of is computed between the given node of and the closest (in the Euclidean sense) point on the original route . As a result, the metric error of the trajectory is found while the rotational part is ignored.
The performance of the system on two exemplary sequences is presented in
Figure 10. Each blue circle represents one estimated user location that is connected to the next node using a blue edge that represents the PDR estimate. Each black circle represents the location with WiFi scan stored in the map that was utilized at least one time for the WiFi WKNN estimation. The WiFi WKNN estimation is represented by joining the user location with four locations of the scans stored in the map using red lines. The weight of each connection in the WiFi WKNN is not shown in the figure. The ground-truth path is shown using yellow segments.
The system obtains the RMSE of 2.04 m and an average error of 1.29 m on sequence 1 and the RMSE of 3.23 m and an average error of 2.02 m on sequence 5. For all of 24 sequences, the RMSE was measured to be equal to 5.19 m and the average error is equal to 2.21 m. Detailed results are presented in the following section in
Table 2. This proves that the proposed processing can be used to estimate the trajectory of the user in the environment. It is important to notice, that the proposed error measure underestimates the real error as the time relations between estimated and ground-truth trajectories are not taken into account. As a result, the delay of the estimate of the user location or incorrect localization along the ground-truth route is not punished in any way. Therefore, the measure should not be used to argue about the absolute accuracy of the system but can be used to compare different configurations of the system and to determine which configuration is more accurate.
6.4. Localization of Multiple Users
The experimental sequences were recorded in the building along the routes taken by users to reach different destinations. As a result, the same corridors were explored multiple times by different trajectories and all of the sequences were used in the comparison. In the first approach, the graphs for each sequence were constructed and optimized separately and the obtained trajectory estimates were evaluated against the ground-truth information. In the second approach, the relations between different trajectories were utilized. To ease the computation, it was assumed that the trajectories were recorded sequentially meaning that during the processing of the
i-th sequence it was possible to utilize the information about previous
sequences. In other words, the scan in the currently processed sequence was compared to all scans in the prior WiFi map and all WiFi scans captured by users in the previous sequences. The resulting graph contained edges joining location of users from different trajectories. The whole graph containing information from all sequences is optimized jointly to provide the best possible estimate for all sequences. The results obtained for both configurations of the system are presented in
Figure 11 and in
Table 2.
When each sequence is optimized separately, the overall RMSE for all sequences is equal to 5.19 and the average error is equal to 2.21. When all of the sequences are jointly optimized with inter-user relations, then the overall RMSE for all sequences in equal to 4.65 whereas the average error is equal to 2.11. Therefore, the performed comparison shows that including relations between multiple trajectories in the graph-based optimization results in more accurate localization estimates.
When considering joint optimization of trajectories, the relations between users occur commonly. In
Figure 11, these relations are marked with black lines indicating that the selected WKNN estimation utilized WiFi scan captured on the other trajectory. In the case of the joint optimization, 292 WKNN estimations out of 347 utilize at least one of four selected WiFi scans (
k = 4 in WKNN) captured in another sequence. It means that
of performed WiFi WKNN estimates utilized additional information from other trajectories and is especially high considering that the first trajectory contains 23 WiFi WKNN estimates that can only be based on the original WiFi map. For the 15 out of 24 sequences, all of the performed WKNN estimations used at least one WiFi scan from other sequence meaning that the connections between sequences were critical for the obtained performance of the system.
The positive effect of the introduced relations between trajectories can be observed in
Figure 11 for a sequence marked with a light-blue color in a location pointed by the yellow arrow. Including additional information from other users makes it possible to better estimate the motion and thus follow the corridor providing a more accurate estimate of the motion. However, including these additional relations sometimes has a negative influence on the estimation. This effect can be observed for all of the trajectories that follow a path above the elevators on the top of the building blueprint. When sequences are jointly optimized, the optimization tends to bring these sequences to one, more common route located below the elevators, which is marked by the cyan arrow. Therefore, despite the obvious decrease in the estimation error, joint optimization of sequences shows a preference for more commonly used routes and might decrease the accuracy in the case of routes taken with a lower frequency.
A visual comparison between individual trajectories obtained using separate and joint optimization on sequences 5, 18 and 22 can be observed in
Figure 12. The performance on the sequence 5 is presented in
Figure 12A,D. The lack of proper WiFi measurements and inaccurate orientation changes result in a localization estimate drifting outside of the corridor. This is undesirable as it can be seen clearly that the performance of the system is unsatisfactory for the users. When additional information from other trajectories is used, the WiFi WKNN estimate is more accurate, and the corresponding part of the graph is corrected and the final localization RMSE is reduced from 3.23 m to 1.85 m. A similar situation can be observed for sequence 18 presented in
Figure 12B,E. The additional information from another trajectory results in more accurate WiFi WKNN estimation and a final reduction of localization RMSE from 6.54 m to 3.54 m. As a result, the user is no longer localized outside of the corridor and therefore it is possible to appropriately navigate the user to the destination. The last example of trajectories obtained on sequence 22 is presented in
Figure 12C,F. The addition of information from other users makes it possible to correct the turn of the user trajectory despite inaccurate orientation estimation that suggested further movement in the same direction. In the case of sequence 22, the RMSE was reduced from 3.68 m to 1.05 m when joint optimization was used.
In the multi-user version of the positioning system, its performance may also depend on the communication efficiency between the handheld devices and the stationary computer hosting the back-end. However, because the communication architecture is out of the scope of this article, all the experiments were performed off-line, on previously saved trajectories to enable a fair comparison between the considered variants of the system. Unfortunately, this approach made it impossible to evaluate the processing time in a realistic way. While we already have demonstrated the ability to optimize a factor graph for the indoor positioning problem on an Android-based smartphone [
11], we need to further research this issue for the multi-user case. But it should be noticed that the number of user poses in the considered trajectories varied from 17 to 152, while the number of poses having a WKNN measurement from WiFi scans ranged from 5 to 39—in average only one third of the poses got constraints from the WiFi measurements. Therefore, the number of state variables in the optimized system was considerably smaller than in the mathematically similar visual SLAM problem [
40], making it possible to jointly optimize multiple trajectories within a reasonable time.
6.5. Performance of the System on the Reduced Original Map
Including the additional relations between trajectories in the optimization system makes it possible to obtain more accurate localization for single users when additional measurements are added to the well-prepared WiFi map that is known prior to operation. But the original WiFi map degenerates over time and needs to be updated, i.e., due to the changes in elements placed in the environment or changes in the broadcasted networks. Therefore, the experiments were conducted to verify how the systems with separate and joint optimization would perform in scenarios with reduced availability of the original WiFi map.
To perform these experiments we decided to reduce the size of the original WiFi map to contain from
to
of the originally stored scans. As the placement of these scans has a huge impact on the accuracy of the WiFi localization, we decided to randomly keep or remove the WiFi scans from the original map in order to achieve the wanted density of the WiFi map. An exemplary map containing half of the original WiFi scans is presented in
Figure 13. As this randomness influences the results, we repeated the evaluation procedure 50 times for each configuration. As the optimization converges to a local minimum and is deterministic, the obtained average performance on these 50 trials is a good measure of the performance of the system on a reduced WiFi map. During these experiments, the WKNN parameters were not changed from the originally chosen ones.
The performance of the system in different configurations is presented in
Figure 14. For each configuration, we present the aggregated performance of the system as the average localization RMSE of the system over 24 trajectories and 50 trials. We also consider the mean value of the average localization error over 24 trajectories and 50 trials.
Regardless if we consider sequences separately or jointly in the graph-based optimization, both systems drop in the accuracy when less information from the prior map is provided. This is to be expected as the best performance with the WKNN approach is usually obtained if a dense map of the environment is provided. In the case of the system considering a sequence separately, the drop when of original WiFi scans are available is more significant with the increase of the localization RMSE by 3.34 m when compared to the performance when full WiFi map is available. In the same scenario, joint consideration of trajectories shows a smaller drop in performance with the localization RMSE increased by 2.17 m. The same observation can be made for the average localization error that was larger by 1.83 m for separate trajectories when compared to the increase of 1.09 m for jointly optimized trajectories. When trajectories are jointly optimized only of the original WiFi map is needed to obtain approximately the same accuracy as with of the original WiFi map when sequences are optimized separately. These results suggest that considering relations between multiple trajectories results in a more robust trajectory estimate even in the case when the pre-surveyed WiFi map is degraded significantly.
6.6. Step Length Estimation
Many indoor localization systems assume that the step length is known a priori, and perform no estimation during the localization. To verify the influence of the step estimation on the obtained user pose estimates, we provided four configurations of our system. In two configurations the trajectories for users were optimized separately and the step length estimation was turned off in one case and turned on in the other one. In the remaining two configurations, the graphs for trajectories were optimized jointly and again the step length estimation was turned off and on, depending on the configuration. Without step length estimation, it was assumed that the step length is equal to 0.65 m, which was also a value assumed as prior when the step length estimation was performed. In each configuration, the step length was estimated separately for each user trajectory and was assumed to be constant through the sequence.
The localization RMSE obtained for all of the recorded sequences is presented in
Figure 15A. The worst localization RMSE of 6.24 m was obtained when each trajectory was optimized separately and no step estimation was performed. The same system with step length estimation obtained a localization RMSE of 5.19 m, which is a significant improvement. When the trajectories were optimized jointly, the influence of step length estimation is smaller but visible with a reduction of RMSE from 4.76 m to the RMSE of 4.65 m when the step length estimation is turned on.
The estimated step lengths for different trajectories are presented in
Figure 15B. The step length varies significantly changing from 0.53 m to 0.78 m depending on the sequence. This change is related to the walking speed and the height of a person that took these measurements. Unfortunately, we lack the ground-truth information about this parameter and thus can only confirm that a great inaccuracy should be expected when no proper step length estimation is performed.
The 24 sequences were recorded in our case by two people. One person measuring m recorded 5 sequences while another measuring 1.93 m was responsible for the remaining 19 sequences. For the shorter person, the average estimated step length was equal to 0.625 m while the standard deviation was equal to m depending on the sequence. For the taller person, the average estimated step length was equal to 0.695 m while the standard deviation was equal to 0.038 m. As we lack the proper ground truth, we can only argue that the estimated values are sensible as taller people tend to take larger steps and slightly smaller steps sizes than usually anticipated might be explained by the slower walking speed during the experiment to avoid potential errors.
6.7. Comparison to Other Approaches
Unfortunately, the specific requirements of our approach with respect to the recorded user trajectories rendered it impossible to directly evaluate the presented system on any of the publicly available data sets for indoor positioning. However, we can roughly compare the positioning accuracy of our system to the results achieved by several teams participating in the smartphone-based off-line indoor positioning competition held during the Indoor Positioning and Indoor Navigation (IPIN) 2017 conference. The competition and its results are described in Ref. [
50]. These results cannot be directly compared to our results shown in
Table 2, because of the different environments, a number of available APs, and the methodology of surveying the WiFi map. However, we believe that the advantages of our approach can be assessed qualitatively in comparison to other systems using the data from Ref. [
50], because of the similarity of building structures and the similar scale of experiments with respect to the total length of trajectories.
Table 3 summarizes the positioning results in terms of the mean distance to ground-truth error and the third quartile position error for the four systems competing during IPIN 2017 and the FastGraph system [
32]. The FastGraph follows a graph-based optimization approach similar to ours and compares also to the results published in Ref. [
50], thus the best results of this system (DSI-DEP environment) given in Ref. [
32] have been added to the table for comparison. The results for our system are presented for the multi-user version with step length estimation, averaged for all the 24 trajectories. The positioning accuracy of our system is better than the results known from the literature, although unlike the teams competing during IPIN we used only a sparse and rough pre-surveyed WiFi map and we did not calibrate the algorithm with a ground-truth database, such as the one provided for the competing teams. All our experiments used the same sensor models and other parameters (WKNN, step length prior). We achieved results that are at least comparable to those known from the recent literature with a minimized effort of site surveying and parameter tuning. Therefore, we consider our approach a step towards wider acceptance of WiFi-based indoor positioning in practical applications.