Next Article in Journal
Large-Scale Dataset of Local Java Software Build Results
Previous Article in Journal
Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions
 
 
Article
Peer-Review Record

Bryan’s Maximum Entropy Method—Diagnosis of a Flawed Argument and Its Remedy

by Alexander Rothkopf
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 25 July 2020 / Revised: 14 September 2020 / Accepted: 15 September 2020 / Published: 17 September 2020

Round 1

Reviewer 1 Report

I found the article to be well-written and interesting.  I will confess that I am a Bayesian statistician, but I do not work with inverse problems.  So the paper exceeds my competency.  I will attempt to paraphrase the paper.    

Inverse inference attempts to reconstruct an image that has passed through a filter plus the addition of measurement error, to cite perhaps the most simple and important example.  Fortunately, the filter is known to the researcher, who observes the distorted image at discrete points.  The true image is described by a set of unknown parameters, rho.  The goal of inverse inference is to estimate rho based on the filtered, noisy image. 

Unfortunately, the number of unknown parameters, rho, is often very large, and simple estimation algorithms, such as ordinary least squares or cross-entropy fail or lead to high-variance reconstructions.  Bayesian inference, or so called “regularization” methods, provide a method to reduce the variation in the estimated reconstruction by smoothing across the image.  For instance, a well-constructed Bayesian prior on rho shrinks spurious, high-noise estimates towards the prior mean m0 by pooling data from neighboring cells.  These methods add some bias to the estimation to improve overall mean square error.

The paper considers three prior distributions, which also correspond to entropy measures:  TK, MEM, and BR.  When the error terms are Gaussian, the likelihood function is quadratic in the observed image and the reconstructed image based on the parameters rho_guess.  The TK prior is also quadratic in rho_guess and m0.  In this happy situation, it is not hard to obtain rho_TK that maximizes the posterior distribution by using Taylor series approximations or Newton-Raphson methods. 

So far so good. A fly in the ointment is that rho may be restricted to be positive, while the TK prior allows for real numbers.   The MEM and BR priors are designed for positive parameters.  Apparently, Bryan (1990) recommended an approximation algorithm that is similar to the one used to find rho_TK for MEM priors.  Alexander argues that is such an approach is unable to find rho_MEM because of the mismatch between the parameter space for rho and the estimation method.  (I hope I paraphrased it correctly.) It is something along the lines of linear approximation of exponential functions can be different from the exponential of the linear approximations at the extrema. 

The paper then illustrates the defect with a simple example where the functions used in Bryan’s expansion does not span the true value of rho.  The paper goes on to discuss the implication of the finding and recommends modern optimization methods and not Bryan’s method.

The paper is a useful contribution to the literature because it provides a cautionary tale about Bryan’s method.  Unfortunately, I do not have much “wisdom” to “share” with the Alexander to make the paper better.  I hope the editors and Alexander do not fault me for this. 

I would very much appreciate including the following reference:  Whittaker (1923) “On a new method of graduation,” Proceedings of the Edinburgh Mathematical Society,” 41, 63-75.  As far as I know, he is the first to propose regularization through a prior distribution.  The paper is well known in actuarial science but has largely been ignored in statistics and data science.  The paper does not address the inverse problem, but its presentation and discussion of “regularization” is brilliant.

Author Response

I would like to thank the referee for carefully reading the manuscript and offering her/his perspective. I am delighted to see that the short summary of the study content provided in the report agrees with what the author intended to convey and that an overall positive evaluation has been made.

The suggested reference has been added to the manuscript and I must say that it was a very interesting read, introducing in an intuitive way many of the relevant concepts that are also central to the inverse problem. The paper was unknown to me (and sadly not widely cited in the physics community) but I will include it in future works as historic reference.

An updated version of the manuscript including the recommendations and requests by the other referees is attached.

Reviewer 2 Report

This paper studied on diagnosis of a flawed argument and its remedy in the maximum entropy method (MEM). The author should solve the following problems for Journal publication of the submitted paper. 1. The author's new proposed and the existing methods should be explained separately. 2. The author must add conclusions section and describe the research results, contribution and future works to this section.

Author Response

I would like to thank the referee for preparing a concise report.

If I understand correctly the first item raised in the report, it asks to more clearly distinguish between novel methods developed by the research community and those that were developed by the author. In order to more clearly distinguish the two, I have added a starting paragraph to section 3.

The summary section 4 has been amended into a "summary and conclusion" section, which besides referring to the outcome of the present paper also includes a paragraph on future directions towards development of improved Bayesian methods for inverse problems.

An updated version of the manuscript including the recommendations and requests by the other referees is attached.

Reviewer 3 Report

This article contain an interesting discussion of the flaw of the Maximum Entropy Method (MEM) and suggestion how to overcome the problem. The author has given an example showing that MEM together with the SVD singular subspace assumption is an ad hoc method. Some remedy has been suggested at the end of the article. Although the article lacks a more detailed analysis, it can be suspected that it makes some contribution to the regularization theory and Bayesian reconstruction methods and I suggest that the article should be published.

  1. m and m_l in (7-11) should introduced earlier
  2. The positivity of the function \rho is an assumption that need not automatically hold in (8) and (9). It should clarified not only on the basis of specific examples.   

Author Response

I would like to thank the referee for preparing a concise report and for the two relevant suggestions.

To address the first item raised in the report, the introduction of the hyperparameters m and \alpha has been rewritten and moved below (5).

The second item is addressed by modifying the text below (9), stating specifically that S_MEM and S_BR are axiomatically constructed assuming \rho being positive.

An updated version of the manuscript including the recommendations and requests by the other referees is attached.

Back to TopTop