Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Object-Tracking Algorithm Combining Motion Direction and Time Series

Appl. Sci. 2023, 13(8), 4835; https://doi.org/10.3390/app13084835

by Jianjun Su¹, Chenmou Wu² and Shuqun Yang^1,*

Reviewer 1:

Gia Khanh Tran

Reviewer 2: Anonymous

Reviewer 3:

Suan Lee

Reviewer 4:

Ignacio Enrique Zaldívar-Huerta

Appl. Sci. 2023, 13(8), 4835; https://doi.org/10.3390/app13084835

Submission received: 1 March 2023 / Revised: 7 April 2023 / Accepted: 10 April 2023 / Published: 12 April 2023

Round 1

Reviewer 1 Report

This paper proposed an object tracking algorithm that leverages motion direction and time series information. To account for the lack of consideration of target motion direction, the authors introduced a loss function that incorporates directional guidance and designed a tracking result scoring module based on the attention mechanism by integrating tracking result time series information. The proposed algorithm's performance has been evaluated through various experiments, that showed improvements in tracking accuracy. The paper is recommended for publication if the following minor points are resolved.

1) The paper's presentation was poor and included many typos. Please carefully recheck thoroughly.

2) In Figure 4, the performances of the proposed method are degraded compared to some other conventional ones at specific regions of the graph. Please discuss in details this phenomenon in the revised manuscript.

Author Response

Dear Reviewer,

Thank you for taking the time to review our manuscript entitled “Object tracking algorithm combining motion direction and time series”. We appreciate your thoughtful comments and suggestions, which have helped us to improve the quality of our work.

We have carefully considered your comments and have made the following revisions to address your concerns:

Point 1: The paper's presentation was poor and included many typos. Please carefully recheck thoroughly.

Response 1: we would like to express our apologies for the poor language expression and typos in our manuscript. We have carefully checked and revised the entire paper to ensure the accuracy and readability of our manuscript.

Point 2: In Figure 4, the performances of the proposed method are degraded compared to some other conventional ones at specific regions of the graph. Please discuss in details this phenomenon in the revised manuscript.

Response 2: Thank you for bringing up the issue regarding the performance of our algorithm in Figure 4. We also observed the phenomenon you mentioned when the abscissa in the Success plots ranges from 0.8 to 1.0.

Regarding the evaluation method used in our paper, we followed the approach commonly used in the literature of this field to measure the algorithm performance. In particular, for the Precision plots, we used the curve value at the position of 20 pixels (abscissa) to rank the algorithm. For the Success plots, we used the Area Under the Curve (AUC) to measure the overall performance of the algorithm, rather than focusing on a specific range. We have cited the relevant literature in this regard [1-3].

While we agree that the issue you raised is not the main focus of our paper, we appreciate your valuable feedback. We will carefully consider this situation in future research and work towards improving the performance of our algorithm.

[1] Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. Lasot: A high-quality benchmark for large-scale single object tracking. IEEE/CVF conference on computer vision and pattern recognition 2019, 5374-5383.

[2] Yan, B.; Peng, H.; Fu, J.; Wang, D.; Lu, H. Learning spatio-temporal transformer for visual tracking. IEEE/CVF International Conference on Computer Vision 2021, 10448-10457.

[3] Voigtlaender, P.; Luiten, J.; Torr, P.H.; Leibe, B. Siam r-cnn: Visual tracking by re-detection. IEEE/CVF conference on computer vision and pattern recognition 2020, 6578-6588.

We hope that our revisions satisfactorily address your concerns.

Sincerely,

Jianjun SU

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript presents a tracking algorithm. The authors propose a loss function that considers the direction of the object motion between consecutive frames. Also, a scoring module that takes into account the time series of tracking results is included.

The manuscript is clear and the methods are adequately described. However, as I see it, the experimental section must be improved to clearly show the performance of the proposal.

First, table 1 is especially important. In this table, there are two main issues:

- It would be useful is the authors could study the same evaluation metrics for all the datasets, so that the reader can have complete information (now, almost every dataset has its own evaluation metrics).

- Also, in all cases, the authors compare the performance of their proposal with a number of previous works. This is nice, but, in all cases, the most recent benchmarking works are from the year 2021. Since this is a research area that advances very quickly, it would be useful if the authors could include, additionally, some works from the year 2022. This way, the reader could have a clearer idea of the relative performance of the proposal with respect to the current state of the art.

I have missed some further information about the training process with each dataset.

There are some parameters that are expected to have an important impact upon the performance of the algorithm, such as epsilon, tau and lambdas. The authors just indicate the value given to these parameters. Must they be tuned for every dataset? Which is their impact upon the results? A sensitivity analysis could be useful to readers.

The ablation study is specially important to prove the validity of the specific contributions. However, this study is only done with the TrackingNet test set. It would be useful to have these results with the other datasets too.

No information is given in the paper about the necessary time to run the algorithms.

Finally, the authors should clearly describe which are the limitations of their proposal.

There are also some minor issues:

- The authors use 't' to represent two different variables (eq. 1 and lines from 2015). Please avoid using the same symbol for different concepts.

- The manuscript must be carefully proofread. There are several issues mainly with the concordance between subjects and verbs.

Author Response

Dear Reviewer,

Thank you for your constructive comments on our manuscript. We appreciate your effort in evaluating our work and providing valuable feedback to help improve our research.

We are glad to hear that you found our paper to be clear and the methods adequately described. We have taken note of your comments regarding the experimental section, and we agree that there is room for improvement in demonstrating the performance of our proposal.

Point 1: It would be useful is the authors could study the same evaluation metrics for all the datasets, so that the reader can have complete information (now, almost every dataset has its own evaluation metrics).

Response 1: We agree that using the same evaluation metric for all datasets would provide comprehensive information. However, as the you noted, each dataset often has its own evaluation metric, and the standard practice in our field is to evaluate performance independently on each dataset, as demonstrated in prior works [1-7]. To ensure comparability with other algorithms in the field, we have followed this evaluation approach. Nonetheless, we acknowledge that this may limit the ability to directly compare algorithm performance across datasets. In future work, we will try to evaluate algorithms using the same evaluation metric for all datasets.

Point 2: Also, in all cases, the authors compare the performance of their proposal with a number of previous works. This is nice, but, in all cases, the most recent benchmarking works are from the year 2021. Since this is a research area that advances very quickly, it would be useful if the authors could include, additionally, some works from the year 2022. This way, the reader could have a clearer idea of the relative performance of the proposal with respect to the current state of the art.

Response 2: We agree that including works from 2022 would provide a clearer idea of the relative performance of our proposal. However, we would like to clarify that in our work, we mainly focus on comparing our algorithm with the STARK algorithm, which is the baseline of our research. Our algorithm improves Precision and Success metrics over this baseline, which is a significant contribution in the field.

Nevertheless, we have noted the recent excellent object-tracking algorithms such as SwinTrack, MixFormer, TATrack, etc., which have achieved good performance. However, we have also observed that the parameters of these algorithms are much larger than our algorithm, as indicated in references [8-10]. Moreover, these algorithms are designed using a transformer for vision, which makes them more efficient in learning global information and achieving better performance.

In future research, we plan to apply our proposed method to state-of-the-art object-tracking algorithms and compare it with these algorithms.

Point 3: There are some parameters that are expected to have an important impact upon the performance of the algorithm, such as epsilon, tau and lambdas. The authors just indicate the value given to these parameters. Must they be tuned for every dataset? Which is their impact upon the results? A sensitivity analysis could be useful to readers.

Response 3: We recognize that the parameters you mentioned, such as epsilon, tau, and lambdas, can significantly impact our algorithm's performance. In our study, we used values from related research to set these parameters. For instance, we adopted the epsilon value from SIoU [11], and lambdas values were set based on the weighting of L1 loss and GIoU loss in DETR [12]. Tau value was determined based on our research experience in object-tracking. These parameter values above are remained constant across all datasets. We have clarified this further in the revised manuscript.

We agree that a sensitivity analysis would be helpful to better understand the impact of these parameters on our algorithm's performance. We plan to conduct a sensitivity analysis in future research.

Point 4: The ablation study is specially important to prove the validity of the specific contributions. However, this study is only done with the TrackingNet test set. It would be useful to have these results with the other datasets too.

Response 4: We have made some additions to our ablation study based on your suggestion. Specifically, we have increased the AUC evaluation metric for the TrackingNet dataset, and we have included evaluation data for the LaSOT dataset, including both AUC and Precision metrics. The details of these additions have been further clarified in the revised manuscript.

Point 5: No information is given in the paper about the necessary time to run the algorithms.

Response 5: Based on your suggestion, we have added a description of the running speed and the amount of parameters and calculations of our algorithm in the experimental section. Among them, the parameter amount (Params) of our algorithm is 29.8M, the calculation amount (FLOPs) is 12.9G, and the real-time running speed is maintained at 25fps. Details have been added to the revised manuscript.

Point 6: the authors should clearly describe which are the limitations of their proposal.

Response 6: We have added a discussion of our method, which includes limitations of our method. Specifically, the TRSM module must rely on the object-tracking algorithm for training, which may prove inconvenient. Therefore, our future research work will focus on optimizing the TRSM module and conducting independent training to make it a completely independent module for easier application to various object-tracking algorithms. Details have been added to the revised manuscript.

Point 7: The authors use 't' to represent two different variables (eq. 1 and lines from 2015). Please avoid using the same symbol for different concepts.

Response 7: Based on your feedback, we have substituted for to prevent confusion caused by using the same symbol for different concepts.

Point 8: The manuscript must be carefully proofread. There are several issues mainly with the concordance between subjects and verbs.

Response 8: We have carefully checked and revised the entire paper to ensure the accuracy and readability of our manuscript.

Thank you again for your time and effort in reviewing our manuscript.

[1] Yan, B.; Peng, H.; Fu, J.; Wang, D.; Lu, H. Learning spatio-temporal transformer for visual tracking. IEEE/CVF International Conference on Computer Vision 2021, 10448-10457.

[2] Guo, D.; Wang, J.; Cui, Y.; Wang, Z.; Chen, S. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020; pp. 6269-6277.

[3] Dai, K.; Zhang, Y.; Wang, D.; Li, J.; Lu, H.; Yang, X. High-performance long-term tracking with meta-updater. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, 6298-6307.

[4] Voigtlaender, P.; Luiten, J.; Torr, P.H.; Leibe, B. Siam r-cnn: Visual tracking by re-detection. IEEE/CVF conference on computer vision and pattern recognition 2020, 6578-6588.

[5] Bian, T.; Hua, Y.; Song, T.; Xue, Z.; Ma, R.; Robertson, N.; Guan, H. Vtt: long-term visual tracking with transformers. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), 2021; pp. 9585-9592.

[6] Wang, N.; Zhou, W.; Wang, J.; Li, H. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021; pp. 1571-1580.

[7] Guo, D.; Shao, Y.; Cui, Y.; Wang, Z.; Zhang, L.; Shen, C. Graph attention tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021; pp. 9543-9552.

[8] He, K.; Zhang, C.; Xie, S.; Li, Z.; Wang, Z. Target-Aware Tracking with Long-term Context Attention. arXiv preprint arXiv:2302.13840 2023.

[9] Cui, Y.; Jiang, C.; Wang, L.; Wu, G. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022; pp. 13608-13618.

[10] Lin, L.; Fan, H.; Zhang, Z.; Xu, Y.; Ling, H. Swintrack: A simple and strong baseline for transformer tracking. Advances in Neural Information Processing Systems 2022, 35, 16743-16754.

[11] Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740 2022.

[12] Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, 2020; pp. 213-229.

Sincerely,

Jianjun SU

Author Response File: Author Response.docx

Reviewer 3 Report

This paper proposes an innovative object-tracking algorithm that combines motion direction and time series information to improve object localization accuracy. The algorithm addresses the lack of consideration of target motion in previous algorithms by adding direction information constraint. During the training process, the algorithm learns general patterns of target motion, which enable it to obtain the motion information of a specific target during tracking, leading to improved tracking accuracy.

This paper is limited to using the SIoU method and proposing a TRSM module rather than a new model. It is necessary to further compare or analyze or explain the advantages of the proposed method and the TRSM module.

It would be nice to add a comparison with the latest models such as SwinTrack, MixFormer, TATrack, MixViT, etc., which have been revealed to have better performance. If it is difficult to compare, explain why in detail.

While existing models are designed and improved for short-term tracking, it would be nice to have an additional explanation for the part that the model proposed in the paper can be extended to long-term object tracking closer to real-world scenarios.

6. Patents section has no content. It needs to be removed.

Author Response

Dear Reviewer,

We have carefully considered your comments and have made the following revisions to address your concerns:

Point 1: It is necessary to further compare or analyze or explain the advantages of the proposed method and the TRSM module.

Response 1: We have added a discussion of our method, which includes the advantages of the TRSM module. Specifically, our method can significantly improve performance with only a small amount of additional parameter. Additionally, our method optimizes the object-tracking algorithm separately in the training and tracking phases and is not dependent on any specific algorithm, making it applicable to various deep learning-based object-tracking algorithms to improve their tracking accuracy. Details have been added to the revised manuscript.

Point 2: It would be nice to add a comparison with the latest models such as SwinTrack, MixFormer, TATrack, MixViT, etc., which have been revealed to have better performance. If it is difficult to compare, explain why in detail.

Nevertheless, we have noted the recent excellent object-tracking algorithms such as SwinTrack, MixFormer, TATrack, etc., which have achieved good performance. However, we have also observed that the parameters of these algorithms are much larger than our algorithm, as indicated in references [1-3]. Moreover, these algorithms are designed using a transformer for vision, which makes them more efficient in learning global information and achieving better performance.

In future research, we plan to apply our proposed method to state-of-the-art object-tracking algorithms and compare it with these algorithms.

Point 3: While existing models are designed and improved for short-term tracking, it would be nice to have an additional explanation for the part that the model proposed in the paper can be extended to long-term object tracking closer to real-world scenarios.

Response 3: We have added a discussion of our proposed method, including applications to long-term object tracking. Specifically, the TRSM module is applicable to long-term object-tracking algorithms, which determine target loss based on the reliability of tracking results before conducting target re-detection. We have included relevant content in the revised manuscript.

Point 4: 6. Patents section has no content. It needs to be removed.

Response 4: The revised manuscript has this section removed.

Thank you once again for your valuable feedback.

[1] He, K.; Zhang, C.; Xie, S.; Li, Z.; Wang, Z. Target-Aware Tracking with Long-term Context Attention. arXiv preprint arXiv:2302.13840 2023.

[2] Cui, Y.; Jiang, C.; Wang, L.; Wu, G. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022; pp. 13608-13618.

[3] Lin, L.; Fan, H.; Zhang, Z.; Xu, Y.; Ling, H. Swintrack: A simple and strong baseline for transformer tracking. Advances in Neural Information Processing Systems 2022, 35, 16743-16754.

Sincerely,

Jianjun SU

Author Response File: Author Response.docx

Reviewer 4 Report

Dear authors, I have read your paper entitled "Object tracking algorithm combining motion direction and time series", and I have found a well-written work. I have detected some minus details in the English written, in the attached document I have signalized some. In the technical aspect, I consider that this paper touches on a hot topic of interest for researchers. The comparison with the state of the art permits us to underline your research. However, in your paper, you did not indicate the references for the use of equations (1) to (6), so, these equations must be referenced. Even more, in the conclusions section, you must indicate the limitations of the algorithm used.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

We have carefully considered your comments and have made the following revisions to address your concerns:

Point 1: I have detected some minus details in the English written, in the attached document I have signalized some.

Response 1: Thank you for your detailed feedback on my manuscript. Your opinion is invaluable in improving the quality of my paper. We have made the modifications mentioned where according to your suggestion. In addition, we conducted a thorough review of the entire manuscript to ensure its accuracy and readability.

Point 2: In your paper, you did not indicate the references for the use of equations (1) to (6), so, these equations must be referenced.

Response 2: The equations (1) to (5) were sourced from the SIoU paper, and we have included the appropriate citation in the revised manuscript. As for equation (6), it is original to our paper and therefore, no additional reference has been added. Details have been added to the revised manuscript.

Point 3: Even more, in the conclusions section, you must indicate the limitations of the algorithm used.

Response 3: We have added a discussion of our method, which includes limitations of our method. Specifically, the TRSM module must rely on the object-tracking algorithm for training, which may prove inconvenient. Therefore, our future research work will focus on optimizing the TRSM module and conducting independent training to make it a completely independent module for easier application to various object-tracking algorithms. Details have been added to the revised manuscript.

Thank you again for your valuable feedback.

Sincerely,

Jianjun SU

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The paper has somewhat improved with the revision. However, the authors have not addressed some of the concerns that I raised in my previous report, and I consider that they can be of interest to readers:

- In all cases, the authors compare the performance of their proposal with a number of previous works. This is nice, but, in all cases, the most recent benchmarking works are from the year 2021. Since this is a research area that advances very quickly, it would be useful if the authors could include, additionally, some works from the year 2022. This way, the reader could have a clearer idea of the relative performance of the proposal with respect to the current state of the art. I understand that many recent algorithms may follow a different philosophy, but it is important that the reader can have a clear idea on how the present work builds upon the current state of the art.

- Including a sensitivity analysis with respect to the most relevant parameters can also be useful to readers.

Author Response

Point 1: In all cases, the authors compare the performance of their proposal with a number of previous works. This is nice, but, in all cases, the most recent benchmarking works are from the year 2021. Since this is a research area that advances very quickly, it would be useful if the authors could include, additionally, some works from the year 2022. This way, the reader could have a clearer idea of the relative performance of the proposal with respect to the current state of the art. I understand that many recent algorithms may follow a different philosophy, but it is important that the reader can have a clear idea on how the present work builds upon the current state of the art.

Response 1: Thanks for the reviewer’s insightful suggestion. We totally agree add a comparison with the latest methods, which the manuscript have added two works published in 2022 in our comparison.

“Visual Tracking with FPN Based on Transformer and Response Map Enhancement”，published in applied sciences.

“SiamRDT: An Object Tracking Algorithm Based on a Reliable Dynamic Template”，published in symmetry.

Point 2: Including a sensitivity analysis with respect to the most relevant parameters can also be useful to readers.

Response 2: We appreciate the reviewer's valuable feedback on improving this paper. We acknowledge the significance of studying the sensitivity of relevant parameters, which can provide valuable information to the readers. However, as this requires a more thorough investigation, we plan to pursue this as a separate research objective in the future. Therefore, it was not included in this current work. We will conduct a comprehensive study on the sensitivity of relevant parameters in our future research. We appreciate the helpful feedback from the reviewer.

Author Response File: Author Response.docx

Reviewer 3 Report

Although the authors did not include a comparison experiment with the state-of-the-art (SOTA) model, it would still be meaningful to compare their proposed method with the STARK algorithm in order to identify areas where the object tracking algorithm can be improved.

If possible, the authors should explain and emphasize in the paper that their proposed method and results are meaningful enough without comparing SOTA models.

Author Response

Point 1: Although the authors did not include a comparison experiment with the state-of-the-art (SOTA) model, it would still be meaningful to compare their proposed method with the STARK algorithm in order to identify areas where the object tracking algorithm can be improved.

If possible, the authors should explain and emphasize in the paper that their proposed method and results are meaningful enough without comparing SOTA models.

Response 1: Thank you for the reviewer's valuable suggestion. We have compared our algorithm with the STARK algorithm (STARK-ST50). Regarding the explanation for not comparing with state-of-the-art algorithms, we have provided further clarification in the 'Discussion' section.

We appreciate the helpful feedback from the reviewer.

Author Response File: Author Response.docx

Article Menu

Object-Tracking Algorithm Combining Motion Direction and Time Series

Further Information

Guidelines

MDPI Initiatives

Follow MDPI