Next Article in Journal
Quantum Inspired Task Optimization for IoT Edge Fog Computing Environment
Next Article in Special Issue
Percolation Problems on N-Ary Trees
Previous Article in Journal
Analysis of the Dynamic Response as a Basis for the Efficient Protection of Large Structure Health Using Controllable Frequency-Controlled Drives
Previous Article in Special Issue
Bootstrapping Not Independent and Not Identically Distributed Data
 
 
Article
Peer-Review Record

Quasar Identification Using Multivariate Probability Density Estimated from Nonparametric Conditional Probabilities

Mathematics 2023, 11(1), 155; https://doi.org/10.3390/math11010155
by Jenny Farmer 1,†, Eve Allen 2 and Donald J. Jacobs 2,*,†
Reviewer 1:
Reviewer 2: Anonymous
Mathematics 2023, 11(1), 155; https://doi.org/10.3390/math11010155
Submission received: 30 October 2022 / Revised: 12 December 2022 / Accepted: 26 December 2022 / Published: 28 December 2022
(This article belongs to the Special Issue Probability Distributions and Their Applications)

Round 1

Reviewer 1 Report

I suggest the authors to consider more examples of copulas rather than just Gaussian one for equation (11), please consider Frank, t-, Gumbel, and many other Archimedean copulas. The Gaussian one is the worst one in practice, and cannot cater the heavy tail dependence. 

Author Response

We thank the reviewer for suggesting additional ways to extend our benchmarks. Beyond a general editing of the manuscript, we have paralleled all of our original Gaussian copulas with the t-copula (using nu=1) for modeling linear correlations using a Cauchy distribution as the marginal. These new results are reported independently in Table 2, and are averaged together for the figures showing trends in compute time and MSE.  The t-copula produces slightly different results for MSE, but general trends by sample size, number of variables, and correlation remain the same.  The t-copula was also used for the new comparative section, including KDE suggested by reviewer 2.

For bivariate distributions, additional Archimedean copulas including Clayton, Frank, and Gumbel were also included and reported separately.  We restrict to two variables, because these additional copulas are not as readily extendable to multivariate distributions for more than 2 variables.  We have chosen MATLAB for creating copulas and generating samples for correlated random variables because of the robust features in modeling multivariate distributions available with this software.  Although there are methods for extending the Archimedean copulas for higher dimensions outside of MATLAB that could be employed in the future, we prefer to stay within the limitations of MATLAB built-in functions. As such, we broke up our analysis to include more copulas in 2D and limited ourselves to Gaussian and t-distribution copulas for D=2 through 6. Note that for D=7 we had problems with MATLAB using the t-distribution copula. So, we decided to stop at D=6.

Reviewer 2 Report


Comments for author File: Comments.pdf

Author Response

We thank the reviewer for suggesting comparing with KDE. Beyond general overall editing of the manuscript, we incorporated some of your suggestions. We originally planned to compare to KDE, but then decided that comparing our results to exact synthetic data is sufficient. Nevertheless, it is convenient to have a direct comparison presented for those familiar with KDE. Therefore, following your suggestion, we have included an additional section in the results section devoted to a comparison between our method and KDE for each of the tests in our original data set, as well as for the expanded distributions based on the suggestion of reviewer 1 adding more copula cases. Trends for MSE and compute time demonstrate that our method outperforms KDE for any given number of variables as the sample size gets sufficiently large.  For two variables, our method outperforms KDE even for small sample sizes.

We do not attempt to go higher than D=6, for two reasons. First, it makes no sense to use 100 samples for 6 or more variables. Although we do under-sampling to establish uniformity across the comparison, we mentioned in the manuscript what a reasonable minimum sample number should be for any dimension D. Second, the main purpose of our work is to show that our method works well for several variables. Now we added to our conclusions some comments on the comparisons to KDE that we present in the revision. 

Tracking the interesting crossover for KDE is outside the scope of our work. Basically, the two reasons that prevented us from going to D=7 (or higher) on this revision were time (we were rushed to submit our updated revision) and the fact that MATLAB gave us problems generating t-distribution copulas with D=7. It should be noted that we verified that our method nominally handles at least D=10 before we run into memory problems depending on hardware. However, because most points we store are essentially zero, before comparisons are made for D>6, appropriate data structures must be implemented. We added in our conclusions that our method, as currently written, can operate up to D=10.

It should be noted that interpolation methods in MATLAB will not keep up. Again, we must implement sparse methods for interpolation. All this is possible, and we plan to extend the method in future works. Optimizing this code as a software product of our research efforts is not part of the scope of this work. There are many applications that can benefit up to D=6, such as the one we demonstrate related to quasar detection.  More algorithm development is required, which we will defer to a follow up paper to give us the necessary time to extend the method properly. This is why we originally said several variables. Now we say multiple variables, which is also true. Several was too specific of a word to use.   

Round 2

Reviewer 1 Report

Some more examples with mixtures of Gaussian distributions and a product of Cauchy distributions using a variety of copulas are now included. The request in the last report has now been addressed. 

Reviewer 2 Report

I have no further comments. The authors answered all the questions I have.

Back to TopTop