Next Article in Journal
Improving Human Activity Monitoring by Imputation of Missing Sensory Data: Experimental Study
Next Article in Special Issue
Vulnerability Assessment of Ubiquitous Cities Using the Analytic Hierarchy Process
Previous Article in Journal
From Symptom Tracking to Contact Tracing: A Framework to Explore and Assess COVID-19 Apps
 
 
Article
Peer-Review Record

An Empirical Recommendation Framework to Support Location-Based Services†

Future Internet 2020, 12(9), 154; https://doi.org/10.3390/fi12090154
by Animesh Chandra Roy 1, Mohammad Shamsul Arefin 1,*, A. S. M. Kayes 2,*, Mohammad Hammoudeh 3 and Khandakar Ahmed 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Future Internet 2020, 12(9), 154; https://doi.org/10.3390/fi12090154
Submission received: 25 August 2020 / Revised: 15 September 2020 / Accepted: 15 September 2020 / Published: 17 September 2020
(This article belongs to the Special Issue Sustainable Smart City)

Round 1

Reviewer 1 Report

Below are my comments on the manuscript:

  1. A lot of technical details are missing in the manuscript
  2. Implementation of DBSCAN clustering is not clearing stated. What were the features/dimensions the clustering was done on?
  3. Are there only five Parent categories for building user profile (Recreation, Shopping,. Education., Food, Travelling)?
  4. It is not clear how the User profile is created (Table 4). Is it just a list or the ordering of priority has some scores? If it is the later, how is the score calculated? For example in case of UserA in table 4 – do Food and Recreation have a score for being the first & second on the list? If two priority has same score what is the logic of giving one more preference?
  5. It is not clear how the similarity of the users are calculated(Table 6). What are the vectors A,B for measuring cosine similarity? If these are from user profile, is it a vector (with implicit feedback) or are there any scores assigned?
  6. It is not clear how the User profile is updated (Table 7). Is it the union of the priority list of two similar users? If there is score, how that score is updated?
  7. If the updated list is just based on union of two similar users, and there are only 5 category, in the end most users will have most of the categories in their list. In that case the result generated would be misleading. For examples, from Table6 , u2 is similar to u15 and u1 is similar to u15. According to the method described the final updated list would be union of all the three users u1,u2,u15.
  8. How is ‘N’ decided for constructing the grid?
  9. It is not clear how similarity score is computed in Algo3. What are the vectors for POIs and user profile used to calculate this metrics.
  10. How is the synthetic data generated? Which tool the authors are using to generate this data?
  11. What is the rationale behind using only 200 users to calculate similarity
  12. What was the test sample used to report the model performance?
  13. Given, there can be max five categories in the user profile, and the given update rule might eventually have most of the categories in all the users, the very high scores are misleading. Also, if there is any bias in the way synthetic data is generated, the findings are misleading.
  14. Given the above, it is hard to evaluate the generalizability of the findings.

 

Author Response

Response to Reviewer-1

  1. English language and style are fine/minor spell check required

 

  • We have revised the manuscript to make necessary update to remove typos and grammatical errors.

 

  1. Implementation of DBSCAN clustering is not clearing stated. What are the features/dimensions the clustering was done.
  • At first, we eliminate the null values, all punctuations, and symbols if there are any. Then encode the parent category with the Label Encoder so that: ‘Recreation’=0, ‘Food’ =1, ‘Health’ =2, and so on. We tokenize the words and then with the Porter Stemmer of NLTK try to find the root of a word. The TF-IDF vectorizer vectorizes the features to reflect how important a word is to a document in a collection. Finally, we perform DBSCAN to cluster all the place types into 11 categories and measure the Completeness and Homogeneity Score. [Added in line 256-261]
  1. Are there only five parent categories for building user profile (Recreation, Shopping, Education, Food, and Traveling)?

 

  • We consider eleven parent categories based on the statistical analysis of the extracted POIs using the google place API. To demonstrate the figure we only show five categories as an example. [Added in line 350-352]

 

  1. It is not clear how the User profile is created (Table 4). Is it just a list or the ordering of priority has some scores? If it is the later, how is the score calculated? For example in case of UserA in table 4- do Food and Recreation have a score for being the first & second on the list? If two priority has same score what is the logic of giving one more preferences?

[Explained in lines 264-267 that is given below]

  • As we have the parent category of each visiting place for the individual user from Table 3, now we calculate the percentage of each visiting parent category. Then we make the order of category from high to low which defines the priority of interest for the user. In case of the same priority score, we make the list arbitrary.

 

  1. It is not clear how the similarity of the user are calculated (Table 6). What are the vectors A, B for measuring cosine similarity? If these are from user profile, is it a vector (with implicit feedback) or are there any scores assigned?

 

  • To calculate the similarity we try to find the frequency of words for each user profile using the TfidfVectorizer which generates a sparse matrix. Then we calculate the dot product of two users using their vectors values A and B and divide it by the product of the magnitude of both vectors. The value of the cosine angle between the two vectors defines the similarity score. [Added in line 270-274]

 

  1. It is not clear how the User profile is updated (Table 7). Is it union of the priority list of two similar users? If there is score, how that score is updated?

If the updated list is just based on union of two similar users, and there are only 5 category, in the end most users will have most of the categories in their list. In that case the result generated would be misleading. For examples, from Table6, u2 is similar to u15 and u1 is similar to u15. According to the method described the final updated list would be union of all the three users’ u1, u2, u15.

 

  • We can see that there is a lot of similarity between some users. For example, user u2 matches the most with the user u15 and u1. That’s why we calculate the final list using the union operation of users’ u1, u2, u15. To identify similar users’ we consider those scores greater than a threshold value 0.5. Thus we have to update the profile of the user u2 with u15 and u1 so that u2 can have more POI options. [Added in line 279-283]

 

  1. How is ‘N’ decided for constructing the grid?

 

  • As we extracts the POIs using the Google place API, based on the distance covered by the location of POIs, the value of ‘N’ is selected for constructing the grids.

 

  1. It is not clear how similarity score is computed in Algo3. What are the vectors for POIs and user profile used to calculate this metrics?

 

  • Here, vector A and B defines the frequency values of user profile and the extracted POIs respectively. [Added in line 309-309]

 

  1. How is the synthetic data generated? Which tool the authors are using to generate this data?

 

  • We have used ‘Faker’ package of python to generate the values of user name, check-in palces. latitude, longitude, and place type.
  1. What is the rationale behind using only 200 users to calculate similarity?

 

  • We have considered 200 user profile in collaborative filtering method to calculate the similarity. The number of user can be increased to get better output.

 

  1. What was the test sample used to report the model performance?

Given, there can be max five categories in the user profile, and the given update rule might eventually have most of the categories in all the users, the very high scores misleading. Also, if there is any bias in the way synthetic data is generated, the findings are misleading.

 

  • To evaluate the system we have considered the current location of the user and the physical distance between the suggested POI location and the user location for the recommendation.

 

 

 

Reviewer 2 Report

I liked the article. The theoretical part seems solid as well as the methodological foundations. I would suggest that the authors emphasize their novelty/contribution more explicitly. My other comments relate to the presentation of images and minor grammatical errors.
Figure 1: I ask the authors to describe Figure 1, the meaning of two icons in the same object; e.g. 2 user icons A in Movie Theater - does this mean 2 check-ins over time or in two objects of the same type Movie Theaters? This is somewhat evident from a later example, but is unclear here.
Chapter 1.1. contains an example and a description of the structure of the article, I suggest that it then be 1.2. section.
In referencing the source, the authors used the author's name et al, instead of the surname et al; this is true for all references [not Jonathan et al. [5] but Raper et al. [5].
Figure 2 also requires some clarification; what is the meaning of different shapes (rectangle, roller, arrow) and their use (do the arrows show the sequence of the procedure)?
The conclusion states traveling options (text line 383) and the recommendations apply to all types of user meeting priorities?

Grammatical errors:
Textline - TL 71-72: To overcome ... a interest or an interest
TL 162 They also two modules
TL 231 it define as
TL 233 To categories ...
TL 263. similarly ...
TL 258 Without Where
TL 285 From 12 girds or grids?
TL 294 is it guess or would hit be a better choice?
TL 345 Then We
TL 372 between between

Author Response

Response to Reviewer-2

  1. I liked the article. The theoretical part seems solid as well as the methodological foundations. I would suggest that the authors emphasize their novelty/contribution more explicitly. My other comments relate to the presentation of images and minor grammatical errors.

 

  • We would like to thank for your comments. The contribution of this work is stated in line[186-188].

 

  1. Figure 1: I ask the authors to describe Figure 1, the meaning of two icons in the same object; e.g. 2 user icon A in Movie Theatre – does this mean 2 check-ins over time or in two objects of the same type Movie Theatres? This is somewhat evident from a later example, but is unclear here.

 

  • Two ‘user a’ icon in Movie Theatre means 2 check-ins over different time.
  1. Chapter 1.1. contains an example and a description of the structure of the article, I suggest that it then be 1.2. section, in referencing the source, the authors name et al, instead of the surname et al; this is true for all references [not Jonathan et al.[5] but Raper et al. [5]]
  • Corrected in section 2.

 

  1. Figure 2 also requires some clarification; what is the meaning of different shapes (rectangle, roller, arrow) and their use (do the arrows show the sequences of the procedure)?

 

  • In figure 2, rectangle shape defines the process, 3 roller shapes defines 3 modules of our system procedure and the arrow shapes defines the sequences of the procedure.

 

  1. The conclusion states traveling options (text line 383) and the recommendations apply to all types of user meeting priorities?

 

  • Yes, the recommendations are applicable for all types of user meeting priorities.

 

Back to TopTop