Next Article in Journal
UGV Coverage Path Planning: An Energy-Efficient Approach through Turn Reduction
Next Article in Special Issue
A New Smartphone-Based Method for Remote Health Monitoring: Assessment of Respiratory Kinematics
Previous Article in Journal
Emerging Scientific and Technical Challenges and Developments in Key Power Electronics and Mechanical Engineering
Previous Article in Special Issue
From Bioimpedance to Volume Estimation: A Model for Edema Calculus in Human Legs
 
 
Article
Peer-Review Record

A Hybrid CNN-LSTM-Based Approach for Pedestrian Dead Reckoning Using Multi-Sensor-Equipped Backpack

Electronics 2023, 12(13), 2957; https://doi.org/10.3390/electronics12132957
by Feyissa Woyano 1,2, Sangjoon Park 1,2,*, Vladimirov Blagovest Iordanov 2 and Soyeon Lee 2
Reviewer 1:
Reviewer 3:
Electronics 2023, 12(13), 2957; https://doi.org/10.3390/electronics12132957
Submission received: 26 May 2023 / Revised: 16 June 2023 / Accepted: 21 June 2023 / Published: 5 July 2023
(This article belongs to the Special Issue Wearable and Implantable Sensors in Healthcare)

Round 1

Reviewer 1 Report

Dear Authors

The paper titled “A Hybrid CNN-LSTM-Based Approach for Pedestrian Dead Reckoning Using Multi-Sensor-Equipped Backpack”  proposed a hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM)-based inertial PDR system that extracts inertial measurement units (IMU) sequence features. The paper is addressing important issues however it needs more improvements.

 1.      The abstract jumps directly into discussing the development of Micromechanical systems (MEMS) and pedestrian dead reckoning (PDR) without providing sufficient background information or context. It would be beneficial to briefly explain the significance of MEMS and PDR in the field of localization and positioning.

2.      An abstract needs some numbers, i.e. add RMSE and R2 values to make it more attractive.

3.      Tables 1 and 2 are not discussed or cited in the text.

4.      My major concerns are in Table 2.

4.1  Lack of Justification: The author provides specific values for hyperparameters such as the number of filters, filter size, optimizer, epochs, learning rate, number of blocks, and dropout factor. However, there is no clear explanation or justification provided for these choices. It is important to discuss the rationale behind selecting these values and how they relate to the problem at hand. Without proper justification, it is unclear whether these values are optimal or suitable for the given task.

 

4.2  Absence of Tuning: The author states that the hyperparameters are provided without tuning. Hyperparameter tuning is a crucial step in optimizing the performance of a model. By not tuning the hyperparameters, there is a possibility that the chosen values may not be the most effective for the specific dataset or problem. It is important to explore different values and techniques for tuning hyperparameters to ensure the model's optimal performance.

4.3  Lack of Sensitivity Analysis: Hyperparameters can significantly impact the model's performance and generalization ability. Without conducting a sensitivity analysis or studying the effects of varying hyperparameter values, it is difficult to assess the robustness and stability of the chosen hyperparameters. It is important to investigate how the model's performance varies with different values of hyperparameters to ensure the reliability of the results.

5.      Authors used  RMSE as an evaluation metric, a more important metric can be added which is known as NSE.

6.      Long Short-Term Memory (LSTM): is a significant aspect of this work however more support should be given based on recent work, please add - deep learning based modeling of groundwater storage change; -cdlstm: a novel model for climate change forecasting.

7.      Overfitting and model tuning is required, see and add pccnn; dnnbot1;  dnnbot2.

8.      # of parameters are required for the models with FLOPS and the computational complexity.

9.      Limitations and the future scope should be added with more clarity.

10.   Authors need to provide the merits of this study vs. other review studies.

 

11.   The inter-comparison or comparison with other studies is missing, please add them.

 

Author Response

 

 

Response to Reviewer 1 Comments

Point 1: The abstract jumps directly into discussing the development of Micromechanical systems (MEMS) and pedestrian dead reckoning (PDR) without providing sufficient background information or context. It would be beneficial to briefly explain the significance of MEMS and PDR in the field of localization and positioning.

Response 1: We all authors appreciate reviewer’s critical point of view on this manuscript. A revision was made to the abstract in order to provide sufficient background information as per the suggestions and comments by the reviewers.

Researchers in academics and companies working on location-based services (LBS) are paying close attention to indoor localization based on pedestrian dead reckoning (PDR) because of its infrastructure-free localization method. PDR is the fundamental localization techniques that utilize human motion to perform localization in a relative sense with respect to the initial position. The size, weight, and power consumption of micromechanical systems (MEMS) embedded into smartphones are remarkably low, making them appropriate for localization and positioning.

 

Point 2:   An abstract needs some numbers, i.e. add RMSE and R2 values to make it more attractive

 

Response 2: We all authors appreciate the reviewer’s efforts to carefully read the manuscript. We have reviewed and added the RMSE values. But instead of using R2 we have added MAE as evaluation metrics. Experiments conducted on odometry datasets collected from Multi-Sensor backpack devices demonstrated that the proposed architecture outperformed previous traditional PDR methods, demonstrating that root mean square error (RMSE) for the best user is 0.52m. On the handheld smartphone only dataset the best achieved R2 metric is 0.49.

 

Point  3: *Tables 1 and 2 are not discussed or cited in the text.

 

Response 3: We appreciate your valuable comments. We have discussed table 1 and 2 and cited important articles. The added Table 1 comments from paper are highlighted below.  Table 1 shows the description of inertial odometry datasets collected with four different subjects – 1 female and 3 males of age between 30 and 49. Each trajectory is about 2 minutes of normal walking along a corridor in a building as illustrated in Figures 2(c) ,2(d).  IMU data from the smartphone was used as input data to the model that tries to estimate the positions recorded by backpack positioning system as shown in Figures 2(a),2(b).

Part of the added table 2 comments is highlighted below.

These results motivated us to use Adam optimizer with learning rate 1e-4 in the later experiments as indicated in Table 2. Since we use lower learning rate, we increased the number of training epochs to 500 to ensure sufficient time for the loss to decay.

The rest of the related comments are interspersed in hyperparameter tuning section, between line 343 and line 376

Point  4: * My major concerns are in Table 2.

 Lack of Justification: The author provides specific values for hyper parameters such as the number of filters, filter size, optimizer, epochs, learning rate, number of blocks, and dropout factor. However, there is no clear explanation or justification provided for these choices. It is important to discuss the rationale behind selecting these values and how they relate to the problem at hand. Without proper justification, it is unclear whether these values are optimal or suitable for the given task.

Response 4-1: We authors greatly appreciate the reviewer’s valuable comments offered. We tried to provide justification for some of the parameters in the hyperparameter tuning section. 

Tuning the hyperparameters is very important step in training neural network models. During the tests, the batch size was determined by the limits of the GPU memory. The window size and stride depend on the sampling rate of the sensor data and ground truth positions, and were selected after preliminary search within range similar to other studies. At this stage, we were concerned with selecting a model size with sufficient capacity to learn the task on this dataset. Reducing the model size to be able run well on mobile devices will be considered at a later stage. We tested two configurations with number of filters 128 and 192, 1159689 and 2599689 number of parameters correspondingly. The larger model achieved slightly better minimum validation loss of -12.65 but the testing errors slightly worse than the smaller model. Above certain model size, it seems that performance is not very sensitive to additional increase of the number of parameters. Therefore, used the smaller model size in the rest of the experiments.

 

 

Point  4-2:  Absence of Tuning: The author states that the hyper parameters are provided without tuning. Hyper parameter tuning is a crucial step in optimizing the performance of a model. By not tuning the hyper parameters, there is a possibility that the chosen values may not be the most effective for the specific dataset or problem. It is important to explore different values and techniques for tuning hyper parameters to ensure the model's optimal performance.

 

Response 4-2: We authors greatly appreciate the reviewer’s valuable comments offered. We performed hyperparameter tuning of model size, optimizer and learning rate. We added Table3, Table 4 and Figure 4 with result of the parameter tuning. The added relevant comments are highlighted below

Tuning the hyperparameters is very important step in training neural network models. During the tests, the batch size was determined by the limits of the GPU memory. The window size and stride depend on the sampling rate of the sensor data and ground truth positions, and were selected after preliminary search within range similar to other studies. At this stage, we were concerned with selecting a model size with sufficient capacity to learn the task on this dataset. Reducing the model size to be able run well on mobile devices will be considered at a later stage. We tested two configurations with number of filters 128 and 192, 1159689 and 2599689 number of parameters correspondingly. The larger model achieved slightly better minimum validation loss of -12.65 but the testing errors slightly worse than the smaller model. Above certain model size, it seems that performance is not very sensitive to additional increase of the number of parameters. Therefore, used the smaller model size in the rest of the experiments.

Point  4-3: Lack of Sensitivity Analysis: Hyper parameters can significantly impact the model's performance and generalization ability. Without conducting a sensitivity analysis or studying the effects of varying hyper parameter values, it is difficult to assess the robustness and stability of the chosen hyper parameters. It is important to investigate how the model's performance varies with different values of hyper parameters to ensure the reliability of the results.

Response 4-3: We all authors appreciate reviewer’s critical point of view on this manuscript. We have revised this section to clarify the observations as below and the new added evaluation metrics are clearly highlighted in the main body for the reviewer’s convenience. We added comments on some observed hyper parameter sensitivity in the hyperparameter tuning section. For convienance, we highlight some of the comments below

We tested two configurations with number of filters 128 and 192, 1159689 and 2599689 number of parameters correspondingly. The larger model achieved slightly better minimum validation loss of -12.65 but the testing errors slightly worse than the smaller model. Above certain model size, it seems that performance is not very sensitive to additional increase of the number of parameters.

We selected 200 epochs for training confirming that it is sufficient to observe flattening of the loss curves. We can see that each optimizer approaches a specific loss value that is not sensitive to the learning rate.

          

 

Point 5 * Authors used RMSE as an evaluation metric, a more important metric can be added which is known as NSE.

 

Response 5:: We all authors appreciate reviewer’s critical point of view on this manuscript. We have revised this section to clarify the observations as below and the new added evaluation metrics are clearly highlighted in the main body for the reviewer’s convenience.

 

Mean Absolute Error:  The mean absolute error (MAE) is another useful measure widely used in model evaluations. The MAE is widely used to evaluate the performance of a guidance or navigation system and represents the estimated position's global accuracy. The mean absolute error (MAE) for a regression problem is the average of the absolute difference between the predicted and actual values.

 

(5)

We added R2 as recommended but we had difficulties with incorporating NSE metric. We could not determine an adequate n value for the sums of the residuals and the variance.

 

Point 6: *  Long Short-Term Memory (LSTM): is a significant aspect of this work however more support should be given based on recent work, please add - deep learning-based modeling of groundwater storage change; -cdlstm: a novel model for climate change forecasting.

 Response 6: We authors greatly appreciate the reviewer’s efforts to carefully review the paper and the valuable comments offered. We added to the introduction part as deep learning particularly CDLSTM is used for time series climate change forecasting similar to time series motion modeling in our case. The added discussion is highlighted below

Recently, the demand of deep learning (DL) approaches has a significantly increasing in almost every domain. In climate analysis and weather forecasting, DL techniques have been used and applied with techniques such as DNN and RNN [24]. They model future climate status focusing on the use of limited scope and data to investigate their models, and then use parameter tuning and cross validation on different data. They used a novel CDLSTM model to investigate three important aspects, namely detecting rainfall and temperature trends, analyzing the correlation between temperature and rainfall, and forecasting the temperature and rainfall. LSTM is an improvement over RNN designed to capture long range dependencies in time series data. This network is immensely beneficial for a broad range of circumstance, and it is now broadly used in various applications. In the climate change forecasting and groundwater storage field, LSTM is being used broadly. LSTM is also used for the botnet detection and classification in developing fast and efficient network. The authors in [25] proposed deep neural network for intrusion detection. They implemented Principal Component-based Convolution Neural Network (PCCNN) approach to improve precision. The PCA used to reduce dimension of feature vector. More recent work also in the area Botnet Detection and Classification using Deep Neural Network (DNN)models, DNNBoT1and DNNBoT2 for the detection and classification of Internet of Things (IoT) botnet attacks [26].

 

Point 7: * Overfitting and model tuning are required, see and add pccnn; dnnbot1;  dnnbot2

 

Response 7: We authors greatly appreciate the reviewer’s efforts to carefully review the paper and the valuable comments offered.

 Similar to the approaches to handle overfitting in the above mentioned papers, we used dropout albeit without additional tuning of its value. In addition we used validation loss monitoring to save the model with minum validation error as another measure against over fitting.

 

Point 8: # of parameters are required for the models with FLOPS and the computational complexity.

Response 8: We authors greatly appreciate the reviewer’s valuable comments offered. As recommended we added the number of parameters of the models considered in hyperparameter tuning stage. Unfortunately, we could not provide correct FLOPS due to the limitations of our toolkit (we had difficulties to obtain FLOPS for LSTM layers in our version of TensorFlow).

Point 9: # Limitations and the future scope should be added with more clarity.

 

Response 9:  We author greatly appreciate the reviewer’s efforts to carefully review the paper and the valuable suggestions offered. Limitations and future scope are clearly stated and discussed in the conclusions part. The edited parts are clearly highlighted in the main body of the manuscript.

The limitation of this study is that only normal walking is considered. The other mode of motion such as side stepping, running, climbing and descending stairs are to be considered at a later stage. Beside unconstrained smartphone-based PDR, in our future work we will consider addition of self-attention mechanism to increase robustness by mitigating noise spikes and missing measurements, and by improving generalization over a variety of smartphone models. Another promising approach to consider is structured state space sequence modeling that targets solving the problem of long-range dependencies. It will help in capturing rich building interior context and improve performance on trajectories specific to the given building.

 

Point 10  Authors need to provide the merits of this study vs. other review studies.

 

Response 10: Authors gratefully acknowledge the reviewer's efforts in carefully reviewing the paper and for the valuable comments.

 

The merits of using a deep learning (DL)-based PDR over traditional PDR has been due to its robustness and accuracy. Given their ability for generalization without specifying explicit analytic model, DL approaches are increasingly being used to learn motion from timeseries data. The other advantage of DL based PDR is that it decreases inertial sensor errors in multi-mode system and also capable of estimating motion and generating trajectories directly from raw inertial data without any handcrafted engineering. The conventional PDR approaches have difficulties in providing accurate state estimation for long-range distance and are inadequate to sufficiently control error explosion from primitive double integration. Additional limitations are due to complex users’ body constraints and motion dynamics. In a departure from other studies which target estimation of the pose of the smartphone collecting the IMU data, here we focus on the scenario where input IMU data is collected from smartphone while the target estimation pose is that of the user holding the phone.

 

Point 11.  The inter-comparison or comparison with other studies is missing, please add them.

 

Response 11:: Authors gratefully acknowledge the reviewer's efforts in carefully reviewing the paper and for the valuable comments.  We added quantitative comparison based on RMSE in the context of trajectory length (Table 8) and qualitative comparison using the same dataset in other studies (Figure 14).

Table 8 show that methods based on hybrid CNN-SVM showed significantly better performance than using CNN method alone, based on the accuracy values. In our case, a hybrid CNN-LSTM network trained on different user show that in most of the cases, MAE and RMSE are less than traditional PDR methods as shown in Table 7. It is important to notice that, a small error here, demonstrates a significant improvement of PDR result. Looking at the average MAE and RMSE on all user shows that there is an improvement in the value of MAE. Regarding RMSE, the effect of considerable noise is not too much, and still, the RMSE of each user is less than existing methods.

 As an additional qualitative comparison, in Figure 13 we show the ground truth and estimated trajectories on a dataset from the public repository provided by [17]. Our results are shown in the right column and are obtained from a model trained with Adam optimizer with learning rate 3e-4 (see Table 4). 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Modern deep learning techniques that have been employed to enhance pedestrian dead reckoning (PDR) were used in this paper. Convolutional neural networks and long short-term memories (CNN-LSTM) were used to extract features from a multi-channel, low-cost IMU. In order to extract useful features for motion estimation, CNN and LSTM were combined. Additionally, the performance of the conventional step and heading system (SHS) based PDR was accessed, which uses a three heading estimate method. The PDR accuracy was improved while using CNN-LSTM rather than step and heading system (SHS)-based PDR. The study's final finding is that the deep learning-based PDR performs better than the conventional PDR. Experiments conducted on odometry datasets collected from pack back demonstrated that the proposed architecture outperformed previous traditional methods.

The paper is well written and deserves publication. There are some minor issues that have to be solved. First is the misuse of style for muti-author reference within text. Second is the repetition of abbreviations. Third is the τhe unequal distribution of general and specific information. The experiments should receive the maximum part, so the general information should be reduced. With so many general information, the paper gives sometimes the impression of a review. The problems are easily addressed for this reason a minor review is suggested.

 

 

Author Response

Response to Reviewer 2 Comments

 

 

 

Point 1 : Misuse of style for muti-author reference within text.

Response 1: We all authors appreciate reviewer’s critical point of view on this paper. We corrected misuse of the style for multi-author reference as highlighted below.

  • The extraordinary development of the state of art indoor location-based services (LBS) is accelerating the expansion of indoor positioning techniques [1-2].  On page 2
  • The wireless fidelity (Wi-Fi) [3]–[5], radio frequency identification (RFID) [6]–[9], ultrawide-band (UWB) [10, 11], and Bluetooth (BLE) [12] are among those techniques that required tailored infrastructure. On page 2

 

Point 2 : The repetition of abbreviations

 

Response 1: We all authors appreciate reviewer’s critical point of view on this paper. We corrected repetition of abbreviations and mentioned all abbreviations lists of this manuscript above the reference.

 

Micro electro mechanical system (MEMS)- omitted repetitions in abstract part

pedestrian dead reckoning (PDR) -   omitted repetitions in several page and abstract part

Convolutional Neural Network (CNN) -   omitted repetitions in several page

Short-Term Memory Network (LSTM)-   omitted repetitions in several page

step and heading system (SHS)-based - omitted repetitions in several page

 inertial navigation system (INS)-PDR   omitted repetitions in several page

List of abbreviations

IMU: Inertial Measurement Unit

PDR: Pedestrian Dead Reckoning

MEMS: micro electro mechanical system

LIDAR: Light Detection and Ranging

INS: Inertial navigation system

SHS: Step-and -Heading system

GPS: Global Positioning System

RMSE: Root Mean Squared Error

MAE: mean Absolute Error

CNN: Convolutional Neural Network

LSTM: Long-short Term Memory

RNN: Recurrent neural network

Point 3: The unequal distribution of general and specific information.

Response 3: We added a section for parameter tuning and increased the experiment discussion part.

  • We added comparison with other studies as shown in Table 8
  • We added qualitative comparison with another study on the same dataset, visualized in Figure 13
  • We also plot the result of the parameter tuning in Figure 4 and summarize Result in Table 3 and Table 4

 

 

 

 

Response to Reviewer 2 Comments

 

 

 

Point 1 : Misuse of style for muti-author reference within text.

Response 1: We all authors appreciate reviewer’s critical point of view on this paper. We corrected misuse of the style for multi-author reference as highlighted below.

  • The extraordinary development of the state of art indoor location-based services (LBS) is accelerating the expansion of indoor positioning techniques [1-2].  On page 2
  • The wireless fidelity (Wi-Fi) [3]–[5], radio frequency identification (RFID) [6]–[9], ultrawide-band (UWB) [10, 11], and Bluetooth (BLE) [12] are among those techniques that required tailored infrastructure. On page 2

 

Point 2 : The repetition of abbreviations

 

Response 2: We all authors appreciate reviewer’s critical point of view on this paper. We corrected repetition of abbreviations and mentioned all abbreviations lists of this manuscript above the reference.

 

Micro electro mechanical system (MEMS)- omitted repetitions in abstract part

pedestrian dead reckoning (PDR) -   omitted repetitions in several page and abstract part

Convolutional Neural Network (CNN) -   omitted repetitions in several page

Short-Term Memory Network (LSTM)-   omitted repetitions in several page

step and heading system (SHS)-based - omitted repetitions in several page

 inertial navigation system (INS)-PDR   omitted repetitions in several page

List of abbreviations

IMU: Inertial Measurement Unit

PDR: Pedestrian Dead Reckoning

MEMS: micro electro mechanical system

LIDAR: Light Detection and Ranging

INS: Inertial navigation system

SHS: Step-and -Heading system

GPS: Global Positioning System

RMSE: Root Mean Squared Error

MAE: mean Absolute Error

CNN: Convolutional Neural Network

LSTM: Long-short Term Memory

RNN: Recurrent neural network

Point 3: The unequal distribution of general and specific information.

Response 3: We added a section for parameter tuning and increased the experiment discussion part.

  • We added a comparison with other studies as shown in Table 8
  • We added a qualitative comparison with another study on the same dataset, visualized in Figure 13
  • We also plot the result of the parameter tuning in Figure 4 and summarize Results in Table 3 and Table 4

 

 

 

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper evaluates the deep learning-based pedestrian dead reckoning. The IMU-based PDR algorithm for smartphones with LIDAR uses the estimation power to solve the dead reckoning problem. The advantage of the proposed architecture is demonstrated by experiments. Several concerns are provided below.

(1) Why are there two Figs. 5 on Page 9 and 10? Please explain why the RMSE of user 1 is the largest in the first Fig. 5.

(2) Please explain the reason that the discrepancy between the predicted ones and the ground truth is the largest in the x-axis.

(3) In Fig. 10, why the curves are not smooth, especially for Figs 10(c) and (d)? Please explain.

(4) Some figures are not clear. For example, Figs. 2, 6 and 10. Please improve.

Author Response



Response to Reviewer 3 Comments

 

Point 1: Why are there two Figs. 5 on Page 9 and 10? Please explain why the RMSE of user 1 is the largest in the first Fig. 5.

Response 1: We all authors appreciate reviewer’s critical point of view on this paper. We corrected figure numbering after revision in response to the reviewer comments.

 

To the best of our understanding the largest RMSE is for user 1 because of his height and related stride length which stands out in comparison to the other three users.

 

 

Figure 4. RMSE vs MAE different user for scenario 1.

 

Point 2:  Please explain the reason that the discrepancy between the predicted ones and the ground truth is the largest in the x-axis.

Response 2: We all authors appreciate reviewer’s critical point of view on this paper. We explained the discrepancy between the predicted and the ground truth along the x-axis. 

 

The coordinate for x, y and z show increased errors because the user direction change. The test trajectories length along x-axis is twice the length along the y-axis which might be related to observed large RMSE.

Point 3:  In Fig. 10, why the curves are not smooth, especially for Figs 10(c) and (d)? Please explain

 

Response 3: We all authors appreciate reviewer’s critical point of view on this paper. We explained why the curves are not smooth in Fig 10(c) and 10(d). 

Although we are not sure for the exact causes of the observed spikes, we suspect that since the used batch size is relatively small, when the loss value gets lower a batch size containing difficult samples would results in such spikes. Also, since the spikes are more prominent in the same two users’ data, they may be related to user specific patterns in the data.  

Point 4:  Some figures are not clear. For example, Figs. 2, 6 and 10. Please improve.

Response 1: We all authors appreciate reviewer’s critical point of view on this paper. To improve quality of figures as suggested by the reviewers, we added labels on the figures and description in the text. Figure 6 and Figure 10 changed after Revision

 

  

                                                   (a)

     

                             (b)

 

                                (c)                                      

 

 

                               (d)

Figure 2. The device used for the indoor experiment. It contains multisensorial backpack device used in the indoor environment to collect dataset. (a) User carrying backpack device (b) Monitor for real time display. (c) Scenario 1 ground truth plotted (d) Scenari 2 ground truth plotted.

 

 

(a)

 

(b)

 

(c)

 

 

(d)

Figure 7. Boxplot of user trajectory RMSEs with CNN-LSTM model where (a) is user 1 (b) is user 2 (c) is user 3 (d) is user 4.

 

 

                                                          (a)

 

 

 

                                                        (b)

 

 

 

 

                                                       (c)

 

(d)

Figure 11. Each user’s training and validation loss on selected epochs for the model. (a) Training loss for 100 epochs ;(b) validation loss for 100 epochs; (c) Training loss for 500 epochs ;(d) Validation loss for 500 epochs.

                                  

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

All comments have been addressed.

Reviewer 3 Report

Thanks to the authors for the revision. My concerns have been addressed.

Back to TopTop