**1. Introduction**

With the proliferation of wireless communication and Internet of Things (IoT), location-based services have shaped and enabled a wide range of applications for our safety and convenience. While GPS dominates outdoor localization and becomes an essential element of the modern transportation, the prospect of indoor navigation has also gained attention in the last few decades as it holds important value for ubiquitous applications in location-based service (LBS) such as inventory management, logistical and supply chain management, smart home and smart building monitoring, retails and sport analytics, mall navigation, virtual reality, etc.

However, indoor location is inaccessible for GPS signals due to structural blockages and severe multiple propagation effects. Currently, there is no general solution for indoor localization, but there are several potential technologies available to provide indoor positioning and map construction solutions such as radio frequency identification (RFID), ultra wide band (UWB), micro-electromechanical systems (MEMS) multi-sensors, and wireless local area networks (WLAN). To fully leverage the data collected from multiple dimensions that can potentially lead to a better indoor positioning result, crowdsourcing has drawn significant attention in recent years. More and more crowdsourcing approaches have been presented, and the popularity of smart devices boosts their applications. The term crowdsourcing describes a new computing paradigm that distributes the work previously handled by employees to a large undefined network of individuals. Obviously, a crowd of people can coordinate and solve problems faster than a single person, and the crowd can generate data rapidly about the particular location.

Such model is an efficient way for indoor location-aware applications. The crowdsourced data, however, can hardly be fused easily to enable usable applications for the reason that the data are collected by different users, in different locations, at different times, with different noises and distortions. At first, the sensory readings are inaccurate due to the deviation of low-cost sensing devices, and different individuals can report slightly inconsistent readings on the same parameter. Since the crowdsourced data possess the advantage of their large quantity, many researchers propose to improve the accuracy through data fusion. These methods, however, may suffer from the fact that crowdsourced data traces are distorted both temporally and spatially. As a result, in this case, the fusion of multiple data sources can hardly be proceeded. The lack of precise time and location information largely limits the usability of the crowdsourcing data, and thus there is an urgent need for efficient crowdsourced data alignment to support varying applications.

As the time calibration can be achieved through time synchronization, we focus on the geographical trace alignment in this paper. Theoretically, collecting data from different locations should be based on a localization service. However, in the indoor scenarios, the indoor positioning systems are not widely deployed and used in practice. There are many existing indoor localization approaches such as the fingerprint-based methods [1–5], the ranging based methods [6], and the like. They, however, may suffer from the high dynamics and complexities of the indoor environment and thus output erroneous results. The authors in [7,8] propose the methods to combine the sensor information with the constraints imposed by the map, thereby filtering out infeasible locations and converging on the true location. However, they need special constraints of the floor plan, which may not always be available, as the floor plan cannot be acquired in some circumstances. A recent state-of-the-art approach Walkie-Markie [9] presents an indoor pathway mapping scheme. It automatically reconstructs the internal pathway maps of buildings, using the trend of WiFi signals as landmarks to calibrate the location of pathways. This method, however, we find still has room to be further improved due to several limitations. Firstly, as this system works based on WiFi-Marks, it is less capable of processing short data traces with hardly any WiFi-Marks. Moreover, it assumes that the distortions of traces are simple because they only come from incorrect stride lengths and headings. As a matter of fact, due to a variety of localization methods, the distortions could be more complicated in practice. Some other works use geomagnetic information [10,11] or WiFi information [12,13] for indoor localization. Although a high localization performance can be achieved, they focus on the feature of a single data dimension (such as WiFi), i.e., they cannot be applied to fuse other sensory data with different features from WiFi signals. Moreover, in the indoor environment, there are various signals and noises which will interfere with the feature of the single data dimension, and thus may affect their positioning accuracy.

In order to address this problem, we propose a novel solution that aligns the multi-dimensional crowdsourced data with the right time and location stamps, and thus builds high quality traces from raw data samplings. Generally, crowdsourced traces consist of multiple data dimensions such as the WiFi signal strength, the ambient temperature/humidity, the magnetic field information, and the like. Each dimension corresponds to one underlying parameter of the physical world and exhibits its unique feature on the data distribution and variation. For example, the wireless signal follows the strength loss model and the magnetic follows the magnetic field model. This field feature seams consistent as the fluctuations of gradient along a specific trace, even if the deviations varies for different devices. Figure 1 shows the gradient of access point (AP) and magnetic field in ideal models. More importantly, with only one dimension information, the trace may not be unique, e.g., the trace departing from the same AP to different directions can have a similar decrease gradient feature. However, the multi-dimensional information, e.g., signals from more than one AP and base stations, three directional components of

magnetic field, etc, could give us the chance to conduct integrated fusion and align traces accurately without a priori knowledge of the environment.

**Figure 1.** Gradient of AP signal and magnetic field in ideal models.

Based on these observations, we form the crowdsourced trace alignment as a multi-dimensional data consistency optimization problem and search for the more accurate solution based on the mutual correlation among the multi-dimensional data. In summary, the contributions of this work are as follows:


The rest of this paper is organized as follows. In Section 2, we discuss existing efforts related to this work. Section 3 provides a detailed statement of the crowdsourcing data alignment problem. Section 4 describes our main idea and the framework of the proposed approach. In Section 5, we discuss the 3D indoor trace collection and illustrate the evaluation results. We reach a conclusion in Section 6.
