An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning
Abstract
: The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase.1. Introduction
One of the key issues for location based services (LBS) is the positioning technology, and the indoor positioning accuracy requirement is usually higher than that for outdoors [1,2]. For outdoor environments, a Global Navigation Satellite System (GNSS) such as Global Positioning System (GPS) is ideal. However, GPS is not suitable for indoor environments as the satellite signals cannot penetrate walls or roof of buildings [3,4]. Furthermore, when assisted-GPS techniques are used, the position may have errors of tens of meters [5].
Wi-Fi has been widely used for indoor positioning. Wi-Fi provides local wireless access to a fixed network that is low cost, widely deployed and whose indoor coverage is still rapidly increasing. Using existing infrastructure for positioning is a very attractive option. However, only Wi-Fi signal strength (SS) measurements by the user receiver device from the access points (AP) are available for positioning.
One obvious approach is to convert the SSs to distance measurements. If three distances between the user receiver and different APs can be obtained, trilateration can be used to estimate the receiver's position. However creating an accurate model to convert SS to distance is difficult. The propagation of radio signals in indoor environments is very complicated. The SS received from an AP varies significantly (up to 15 dBm) over time at the same location. In addition, indoor environments vary considerably from each other, which means one model may work well for a specific environment, but perform poorly in other situations. Hence it is difficult, if not impossible, to accurately obtain distance measurements from SSs on a consistent basis [6]. On the other hand, the so-called “fingerprinting” approach has demonstrated promising performance of Wi-Fi positioning [7,8]. There are many advantages of the fingerprinting approach [9,10], including the fact that no special hardware is required on the user mobile station (MS) side [11,12].
However, there is a major shortcoming of this technique—initial creation of the fingerprint database and maintaining it are not trivial tasks. An entry in the fingerprint database is represented by a location identifier paired with the Wi-Fi received SS (RSS) at that location. The “fingerprints” may be the average RSSs (deterministic approach) [7], or RSS distributions (probabilistic approach) [8]. The more accurate the fingerprint database (also referred to as the “radio map”) created, the better the positioning accuracy that can be achieved.
The probabilistic approach can provide better accuracy for Wi-Fi positioning. However, understanding the statistical characteristics of the RSSs is the key [12]. In [13] a lognormal distribution was used to model the RSSs. A shape-filtered empirical distribution was utilised to estimate the RSS distribution [14]. Kamol et al. investigated the use of a Gaussian distribution to approximate the real RSS distribution [11]. The Weibull function for approximating the Bluetooth RSS distribution was discussed in [15]. Although the normal distribution is often used for Wi-Fi RSS [16,17], the current study has found that it is not always correct.
This paper proposes a new algorithm based on an improved double-peak Gaussian distribution (IDGD) for Wi-Fi fingerprinting. The rest of the paper is organised as follows: Section 2 gives some details of the probabilistic approach of fingerprinting technology and discusses the characteristics of RSS. Section 3 introduces the IDGD and describes a way to generate the fingerprint database. Before the conclusions are given in Section 5, testing of the new algorithm and the analyses are reported in Section 4. Results show that the proposed algorithm can improve the positioning accuracy by about 20% and has the potential to significantly reduce the labour costs for the training phase.
2. Fingerprinting Technique
The fingerprinting technique is widely used where line-of-sight signal propagation is not typical. The low cost of the user hardware and the promising performance are its main advantages. Wi-Fi location fingerprinting consists of two phases: the off-line data training phase and the on-line positioning phase. The aim of the training phase is to build a fingerprint database. To generate the database in a conventional way, some reference points (RP) in the area of interest are selected. Locating a MS at one RP location, the RSSs of all the APs are measured. From such measurements the characteristic feature of that RP is determined, which is then recorded in the database. This process is repeated at another RP, and so on until all RPs are visited. In the positioning phase, the MS measures the RSS at a place where it requires its position. The measurements (including RSSs and MAC addresses of the APs) are compared with the data in the database using a matching algorithm. Typically, the signal distance is computed. The smallest signal distance indicates the best match and the likeliest location of the MS can be determined [18,19]. Figure 1 illustrates the whole process.
In this study the authors have adopted the probabilistic approach [20,21]. The location fingerprint is a vector P of the probabilistic RSS values from multiple APs at a particular location L. A typical vector P = (p1, p2, …, pN) consists of N RSS values from N APs. The database contains RSS vectors for all RPs in the area of interest. For positioning, a MS obtains a sample of the RSS vector S = (s1, s2, …, sN). The probability between the P and S for each P in the database is computed. The location is then estimated to be that L for which the probability is the highest. Note that the vector S is random. An error is made when the highest probability occurs for a location L that is not the one at which the sample S was collected. Errors occur because the measured RSS vector is a sample of a random vector while only the probabilistic RSS vector is stored in the database.
2.1. Fingerprint Database
The RSS probability distributions of all APs at all RPs need to be stored. The fingerprint of the i-th RPs can be defined as:
To speed up the computations, the signal strength distribution is typically divided into p bins. The fingerprint of the i-th RP also can be expressed as:
Correspondingly the probability of the RSS measurements within the bin Bk for AP An at the i-th RP can be expressed as:
2.2. The Characteristics of RSSs
In order to investigate the characteristics of RSSs, Four tests have been carried out at four different environments: a residential room, an office, a class room and a shopping centre. More than 10,000 RSS samples have been collected for each test. In total 424 APs have been detected during the tests. All data have been analysed and some characteristics of the Wi-Fi signals were determined (see Figure 3):
- (1)
Distribution of more than 30% of RSSs from APs consists of two peaks and a long tail as shown in Figure 2. The two peaks are quite obvious, however this characteristic has not been mentioned in past studies.
- (2)
The Gaussian function does not approximate the distribution of RSSs very well. The Gaussian distribution in Figure 2 was based on the data used to generate the occurrence plot. Obviously these two distributions are significantly different.
The two peaks distribution of RSS is not accidental. The test results are listed in Table 1. The probability distribution of 134 APs (out of a total of 424) indicate double peaks, which is about 32%. Further investigation has found that the percentage of double-peak distribution of RSS at different environments is not significantly different (being from 26% to 38%), which suggests the double-peak behaviour may not be so unusual for indoor environments.
Generating the database is a prerequisite for location fingerprinting. Generally speaking, the more measurements obtained at each RP, the better the positioning performance. However, more measurements means more effort is required for the RSS survey/training phase. In reality only a few samples of RSS are collected at each RP, and hence the limited samples cannot be used to generate an accurate empirical RSS distribution.
3. Improved Double-Peak Gaussian Distribution (IDGD)
3.1. Fingerprint Database Based on Gaussian and Double-Peak Gaussian Distribution
The Gauss function is a traditional method for fingerprint database generation [22]; its probability density function can be expressed as:
The double-peak Gaussian distribution (DGD) is proposed as a candidate to replace the Gaussian distribution when it is not suitable. The RSS of each AP is divided into two parts, according to the minimum value between the two peaks, and each part is treated as an independent Gaussian function. The weight of each function was assumed to be 1/2. Its probability density is expressed as:
3.2. IDGD Fingerprint Database Model
In order to solve the problem mentioned above, an improved DGD was developed. The u1 and u2 are changed from the mean values to the values of peak 1 and peak 2, and the standard deviations are the same as σ1, σ2 used in DGD. Figure 3 shows the empirical distribution, DGD and IDGD.
It can be seen that the probability distribution based on with the IDGD is better than that obtained using the DGD. As already mentioned, the distribution of RSSs is not always Gaussian, but it is also not always double-peak Gaussian. Hence, a new model (IDGD), comprising a joint model of Gaussian distribution and DGD, is proposed. The function is defined as:
In Equation (8), the Gaussian model is adopted when the distribution of RSSs has one peak, and the DGD is utilised when the RSS distribution has two peaks. The values at peak 1 and peak 2 are denoted as Max 1 and Max 2, respectively. The values between Max 1 and Max 2 are searched (see Figure 2) and the minimum value is found and denoted as Min. We tested all collected data and thus find that if 2 MIN< Max1 + Max2, the RSS distribution appears one peak or two unobvious peaks. Otherwise, the distribution appears two peaks. Investigation of this question will be carried out in future research. Thus the decision rule is expressed as:
3.3. The Positioning Procedure
The procedure for fingerprint-based positioning using the proposed joint model is as follows, where steps 1 to 4 are the off-line data training phase and steps 5 to 7 are the on-line positioning phase:
- Step1:
Choose the RPs, and then collect the RSSs from all APs at each RP.
- Step2:
Detect the gross errors and filter them out.
- Step3:
Use a global search procedure to find the two peaks and the minimum value between the two peaks. The two times minimum is compared with the sum of the two peaks to decide between using the Gaussian model or the alternative model. This decision rule was created based on all the data collected.
- Step4:
Create the fingerprint database.
- Step5:
RSSs are collected by the user, outliers are removed. Calculate the probability distribution of received RSSs.
- Step6:
Use the fingerprint database to calculate the joint probability density for the RSSs collected in the step 5.
- Step7:
Estimate the user's location using the K weighted nearest neighbour (KWNN) algorithm. KWNN is a conventional algorithm used for fingerprint-based Wi-Fi positioning. Using this algorithm, K (K ≥ 2) nearest neighbours (those with the shortest signal distance) of a test vector are chosen. The weighted average of the co-ordinates of K points can be used as the estimate of the user's location. The inverse of the signal distance defines the weight [23].
Figure 4 illustrates the details of the procedure using the proposed joint Gaussian and IDGD model for positioning.
4. Test and Analysis
To verify the proposed approach, a study was carried out in a small test area. The test area was a typical office room of forty five square metres in size. Nine RPs and five test points (TPs) were selected. A LENOVE X220 Tablet equipped with an Intel Centrino Advanced-N 6250 wireless network card was used to make RSS measurements. A software called inSSIDer was used to collect Wi-Fi signal strengths.
Data were collected during a working day from 8–9 a.m. Up to 100 RSS samples (for about 2 min) were collected at each RP. Then different models—Gaussian, histogram, DGD and IDGD—were used to generate the fingerprint database. About 10–20 RSS samples were collected at each TP soon after the training data were collected. The conventional KWNN (K = 3) was used to estimate the position of the TPs.
In this paper, the weight was calculated as the inverse of the signal distance—the Euclidean distance was adopted. Table 2 lists the positioning error for each TP using different models, and shows that the performance of the Gaussian model is the worst (with an average error of 2.23 m), while using the DGD and histogram generates similar results (1.69 m and 1.73 m, respectively), and the IDGD gives the lowest positioning error. For all individual TPs the performance of the IDGD is almost always the best, and overall performance is improved by about 40%, 20% and 21% compared with those based on the Gaussian, histogram and DGD, respectively. We also try a deterministic approach using the average of RSS for the same experiment, the results are no better (see the first line in the Table 2).
This first test indicated that the proposed model works well. Figure 5 shows the test bed (with an area of approximately 400 square metres) of the second test, consisting of a computer lab, corridors, a foyer, a kitchen and a toilet. In total, there were 68 RPs (red crosses) and 35 TPs, the latter being chosen at random. A similar procedure to the first test was used; there were about 40 RSS samples collected at each RP and 5–20 RSS measurements at each TP. All the data were collected at one working day. Figure 6 shows the results of the test—the horizontal axis is the number of the TP and the vertical axis is the positioning error. In generally the IDGD gave the most accurate positioning results.
Figure 7 shows the average positioning errors using the four models. The positioning accuracy using the IDGD is improved by about 42%, 33% and 24% compared to that of the histogram, Gaussian and DGD distributions, respectively. The small number of samples collected at each RP is the main reason that the histogram model performed the worst.
5. Concluding Remarks
The observation of double peaks of Wi-Fi signal strength has suggested the investigation of a new model known as the Double-peak Gaussian Distribution (DGD) to approximate the signal strength's distribution. Further investigation indicated that an improvement of the DGD was needed, and the Improved DGD (IDGD) was proposed.
The IDGD takes into account the different types of distributions of the RSS samples (sometimes one peak, in other circumstances two peaks). When one peak is detected a standard Gaussian distribution is used to create the fingerprint, whereas when two peaks are detected the DGD is used instead. Tests show that applying the new model for fingerprint-based positioning can significantly improve the positioning accuracy (by up to 40%). Furthermore, this model has the potential to reduce labour costs for the data training phase, i.e., to achieve the same level of positioning accuracy less RSS samples need to be collected during the training phase.
Acknowledgments
This work was supported by the following projects: The Pre-Research project of the key technology research of “Container Intelligent Logistics Based on BeiDou Satellite” which was funded by the Science and Technology Commission of the Shanghai Municipality (12511501102); the project of “Research on Authentication Platform of Cloud Computing based on the Internet of Things” which was funded by National Natural Science Foundation of China (61272468); and the project of “High Gain Low Cost Miniaturization Multimode Substrate Integrated Satellite Navigation Antenna” which was funded by the Shanghai Municipal Commission of Economy and Information.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Ferraro, R.; Aktihanoglu, M.; Li, L., Translators; Location-Aware Applications; Posts & Telecom Press: Beijing, China, 2012.
- Market Survey Report: Location Based Services-Market and Technology Outlook 2013–2020; Market Info Group LLC (MIG), Inc.: Kawasaki, Japan, 2013.
- Li, B.; Zhang, J.; Dempster, A.G.; Rizos, C. Open Source GNSS Reference Server for Assisted-Global Navigation Satellite Systems. J. Navig. 2011, 64, 127–139. [Google Scholar]
- Li, B.; Salter, J.; Dempster, A.G.; Rizos, C. Indoor Positioning Techniques Based on Wireless LAN. Proceedings of First IEEE International Conference on Wireless Broadband and Ultra Wideband Communications, Sydney, Australia, 13–16 March 2006; pp. 1–6.
- Bing, B. Wireless Local Area Networks: The New Wireless Revolution; Wiley-Interscience: New York, NY, USA, 2002. [Google Scholar]
- Mok, E.; Retscher, G. Location Determination Using WiFi Fingerprinting Versus WiFi Trilateration. J. Locat. Based Serv. 2007, 1, 145–159. [Google Scholar]
- Bahl, P.; Padmanabhan, V.N. RADAR: An In-building RF-Based User Location and Tracking System. Proceedings of IEEE 9th Annual Joint Conference of the IEEE Computer and Communications Societies, Tel Aviv, Israel, 26–30 March 2000; pp. 775–784.
- Youssef, M.A.; Agrawala, A.; Shankar, A.U. WLAN Location Determination via Clustering and Probability Distributions. Proceedings of First IEEE International Conference on Pervasive Computing and Communications, Fort Worth, TX, USA, 23–26 March 2003; pp. 143–150.
- Kohoutek, T.K.; Mautz, R.; Wegner, J.D. Fusion of Building Information and Range Imaging for Autonomous Location Estimation in Indoor Environments. Sensors 2013, 13, 2430–2446. [Google Scholar]
- Guerrero, L.A.; Vasquez, F.; Ochoa, S.F. An Indoor Navigation System for the Visually Impaired. Sensors 2012, 12, 8236–8258. [Google Scholar]
- Kaemarungsi, K.; Krishnamurthy, P. Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting. Proceedings of IEEE First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, Boston, MA, USA, 22–26 August 2004; pp. 14–23.
- Kaemarungsi, K. Distribution of WLAN Received Signal Strength Indication for Indoor Location Determination. Proceedings of 2006 1st International Symposium on Wireless Pervasive Computing, Phuket, Thailand, 16–18 January 2006.
- Youssef, M.A. HORUS: A WLAN-Based Indoor Location Determination System. Ph.D. Thesis, Department of Computer Science, University of Maryland, College Park, Maryland, MD, USA, 2004. [Google Scholar]
- Xiang, Z.; Song, S.; Chen, J.; Wang, H.; Huang, J.; Gao, X. A Wireless LAN-based Indoor Positioning Technology. IBM J. Res. Dev. 2004, 48, 617–626. [Google Scholar]
- Pei, L.; Chen, R.; Liu, J.; Kuusniemi, H.; Tenhunen, T.; Chen, Y. Using Inquiry-Based Bluetooth RSSI Probability Distributions for Indoor Positioning. J. Glob. Position. Syst. 2010, 9, 122–130. [Google Scholar]
- Small, J.; Smailagic, A.; Siewiorek, D.P. Determining User Location for Context Aware Computing Through The Use of A Wireless LAN Infrastructure, December 2000. Available online: http://www-2.cs.cmu.edu/aura/docdir/small00.pdf (on accessed 20 September 2011).
- Ladd, A.M.; Bekris, K.E.; Rudys, A.; Marceau, G.; Kavraki, L.E.; Dan, S. Robotics-based Location Sensing Using Wireless Ethernet. Proceedings of the 8th Annual International Conference on Mobile Computing and Networking, New York, NY, USA, 23–28 September 2002; pp. 227–238.
- Wang, Y.; Jia, X.; Lee, H.K.; Li, G.Y. An Indoor Wireless Positioning System Based on WLAN Infrastructure. Proceedings of the 6th International Symposium on Satellite Navigation Technology Including Mobile Positioning & Location Services, Melbourne, Australia, 22–25 July 2003; pp. 1–13.
- Roos, T.; Myllymaki, P.; Tirri, H.; Misikangas, P.; Sievanen, J. A Probabilistic Approach to WLAN User Location Estimation. Int. J. Wirel. Inf. Netw. 2002, 9, 155–164. [Google Scholar]
- Li, B.; Dempster, A.G.; Barnes, J.; Rizos, C.; Li, D. Probabilistic Algorithm to Support The Fingerprinting Method for CDMA Location. Proceedings of International Symposium on GPS/GNSS, Hong Kong, 8–10 December 2005.
- Li, B.; Wang, Y.; Lee, H.K.; Dempster, A.G.; Rizos, C. Method for Yielding a Database of Location Fingerprints in WLAN. IEE Proc. Commun. 2005, 152, 580–586. [Google Scholar]
- Hashemi, H. The Indoor Radio Propagation Channel. Proc. IEEE 1993, 81, 943–968. [Google Scholar]
- Li, B. Terrestrial Mobile User Positioning Using TDOA and Fingerprinting Techniques. Ph.D. Thesis, School of Surveying & Spatial Information Systems, University of New South Wales, Sydney, Australia, 2006. [Google Scholar]
The Area of the Test Site | The Number of APs Detected | The Number of APs Show Double-Peak Distribution | Percentage | |
---|---|---|---|---|
Residential room | 10 m2 | 28 | 9 | 32% |
Office | 45 m2 | 134 | 35 | 26% |
Class room | 200 m2 | 124 | 38 | 31% |
Shopping centre | 1,000 m2 | 138 | 52 | 38% |
Total | 424 | 134 | 32% |
Model | T1 | T2 | T3 | T4 | T5 | Average |
---|---|---|---|---|---|---|
Deterministic | 2.28 | 2.10 | 0.62 | 1.27 | 2.49 | 1.75 |
Gaussian | 2.36 | 2.01 | 1.36 | 2.57 | 2.83 | 2.23 |
Histogram | 2.51 | 1.27 | 1.33 | 1.04 | 2.28 | 1.69 |
DGD | 1.26 | 1.20 | 1.32 | 2.58 | 2.29 | 1.73 |
IDGD | 1.15 | 1.23 | 1.09 | 1.14 | 2.18 | 1.36 |
© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Chen, L.; Li, B.; Zhao, K.; Rizos, C.; Zheng, Z. An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning. Sensors 2013, 13, 11085-11096. https://doi.org/10.3390/s130811085
Chen L, Li B, Zhao K, Rizos C, Zheng Z. An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning. Sensors. 2013; 13(8):11085-11096. https://doi.org/10.3390/s130811085
Chicago/Turabian StyleChen, Lina, Binghao Li, Kai Zhao, Chris Rizos, and Zhengqi Zheng. 2013. "An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning" Sensors 13, no. 8: 11085-11096. https://doi.org/10.3390/s130811085
APA StyleChen, L., Li, B., Zhao, K., Rizos, C., & Zheng, Z. (2013). An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning. Sensors, 13(8), 11085-11096. https://doi.org/10.3390/s130811085