Next Article in Journal
Estimating Parameters in Mathematical Model for Societal Booms through Bayesian Inference Approach
Previous Article in Journal
Parallel Matrix-Free Higher-Order Finite Element Solvers for Phase-Field Fracture Problems
 
 
Article
Peer-Review Record

ssMousetrack—Analysing Computerized Tracking Data via Bayesian State-Space Models in R

Math. Comput. Appl. 2020, 25(3), 41; https://doi.org/10.3390/mca25030041
by Antonio Calcagnì *, Massimiliano Pastore and Gianmarco Altoé
Reviewer 1:
Reviewer 2: Anonymous
Math. Comput. Appl. 2020, 25(3), 41; https://doi.org/10.3390/mca25030041
Submission received: 23 May 2020 / Revised: 8 July 2020 / Accepted: 8 July 2020 / Published: 9 July 2020

Round 1

Reviewer 1 Report

Overall, the paper is well written and introduces interesting software for analysing mouse-tracking data, although it lacks some theoretical detail. I am not familiar with mouse tracking studies, so I cannot comment on the importance of the topic, partially because it is a bit unclear to me is whether the model defined in Section 2 is novel or is it commonly used in mouse tracking applications, ie. is the package just an implementation on already published theory or is the whole modelling framework new? This should be made more clear, and if the package relies on a theory derived somewhere else, more references to background literature should be added.

The actual software seems to be well written and documented, and the package looks to be relatively easy to use. So my main concern is that more detail on the theoretical aspects of the package should be added, and perhaps some reflection on alternative methods.

Some more specific comments:

In the introduction, the benefits of ssMousetrack over general state space modelling packages such as KFAS, bssm, LibBi, pomp etc could be made clearer, i.e. by focusing on a specific type of models the usage is easier than defining general models. On the other hand it is not clear why using rstan for MCMC is an advantage here (over other MCMC methods), as in general Stan can struggle with state space models unless carefully tuned (see for example https://cran.r-project.org/web/packages/walker/vignettes/walker.html). Of course, by using rstan as a backend allows one to easily extend the post-processing to shinystan etc, which is related to point (i). On the other hand, by noting early that the model used by the package assumes von Mises distribution, authors can make a distinction over several other packages such as bssm which does not support von Mises.

Start of section 2.1 is unclear. What is meant by marginal MCMC? Marginal of what? Reason (i) is unclear, authors state "MCMC algorithms,..., provide a more efficient and complete solution...". More efficient than what? Maximum likelihood?

Overall Section 2.1 needs a more detailed description of the estimation procedure. The model is far from linear-Gaussian, which is the main assumption behind the Kalman filter, so it is very unclear how well do the f(X|Y) and f(gamma|Y) actually match with the true densities (given known hyperparameters)? How good are these approximations, and are approximate results adequate for the models and problems considered by the package users?

Related to the above, this means the results produced by ssMousetrack are only approximate and potentially biased, whereas particle filter -based pseudo-marginal MCMC methods could produce asymptotically exact inference (for example by using pomp or libBi packages), as would HMC-based inference with rstan if one would actually write the model in Stan using the equation 1-5 and sampling the full posterior instead of relying on the approximate marginal likelihood. Some reasoning should be given why the package relies on approximate filtering instead of these exact methods. Would the sampling be too slow or hard to tune properly etc? Have the authors tried this?

While Appendix A shows the formulas used in the approximation of f(X|Y), no references are given. How are these derived, how accurate are the approximations, when do they work or when do they fail?

I do not fully understand how model assessment criteria of Section 2.2 are supposed to work. The idea of simulating new data according to the posterior predictive distribution is common in Bayesian inference, and various (often graphical) ways to check discrepancy of these predictive data and observed data exists (see, e.g. https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssa.12378). This does seem similar to Bayesian R2 of Gelman et al. but I can't quite grasp the reasoning behind these formulas.

Author Response

See the attached rev1.pdf file

Author Response File: Author Response.pdf

Reviewer 2 Report

The current manuscript presents a new modeling framework and associated software package for the analysis of movement tracking data as employed routinely in cognitive science. The approach is interesting and could make a valuable contribution to a quickly growing area of research. Before I can recommend publication, however, several remarks outlined below would need to be addressed in a potential revision.

My main criticism of the current manuscript is that it does a poor job explaining the added benefits and limitations associated with using the new framework. The authors should clearly outline which kind of conclusions can be derived from using such a framework that currently are not possible. Crucially, such a discussion also needs to reference potential drawbacks. It is important to realize that more elaborate formal models in service of summarizing the data, such as the current one relative to what is referred to as descriptive analyses (which in a sense are models, too), do not necessarily lead to more or better insights. In fact, using more elaborate models can even be harmful. Everything hinges on whether the model “true” or in other words whether the assumptions underlying the model hold true. And if they don’t hold how severe are the biases resulting from misspecification. This needs to be discussed. Do the statistical distributions used match what is known about the empirical distributions of noise in the data? Are its simplifications warranted? Does it make sense to model the trajectories of individual experimental cells in a unimodal fashion? Can it explain the discrete revisions in the trajectory? These questions are particularly relevant given the descriptive observations that we have made in recent years, namely that depending on the experimental setup trajectories can occur in widely different form and that their distribution across the x,y,t space is not unimodal. For instance, our clustering approach referenced in the manuscript has uncovered very different kinds of trajectories (please reference this method, as per package instruction, using Wulff, Haslbeck, Kieslich, Henninger, Schulte-Mecklenbeck, 2019). But these trajectories don’t perfectly covary with experimental condition implying that every condition is composed of multiple kinds of trajectories.

 

Minor points

The preprocess function of the package does not seem to handle cases where the design variables have larger column indices than the x, y data.

Author Response

See the attached rev2.pdf file

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Authors have improved the manuscript according to my comments well, so I only have a few minor comments:

I would mention also libBi and pomp already at the intro where bssm and KFAS are mentioned as examples of state-space modelling packages in R (actually the R package which provides an interface to more general program libBi is called rbi, as in Stan vs rstan).

on p. 2 "which is instead missed by bssm" sounds slightly peculiar to my non-native English-speaking ear as missed sounds that bssm should include it, perhaps switch to "not supported by" or "not available in" bssm (and KFAS as well? libBi and pomp might support Von-Mises though).

Reference to libBi has author called others: "Murray, L.M.; others.

In conclusions, the sentence "Additionally, the current version of the library can be extended to include other interfaces for MCMC available on R, such as pomp [38] or libBi [39]" is bit unclear as pomp and libBi are modelling packages like KFAS and bssm instead of "interfaces to MCMC" such as rstan. I guess the point here is that ssMousetrack could be extended so that the model defined by the ssMousetrack is somehow fed to to pomp or libBi for the parameter estimation? 

Author Response

See the attached file.

Author Response File: Author Response.pdf

Back to TopTop