This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Zhang in 2012 introduced a nonparametric estimator of Shannon’s entropy, whose bias decays exponentially fast when the alphabet is finite. We propose a methodology to estimate the bias of this estimator. We then use it to construct a new estimator of entropy. Simulation results suggest that this bias adjusted estimator has a significantly lower bias than many other commonly used estimators. We consider both the case when the alphabet is finite and when it is countably infinite.

Let

Assume that

Zhang [

For a positive integer

The methodology proposed in this paper is as follows: For all

It remains to choose a reasonable parametric form for

(

Assume that

Here

Two finer modifications are made to

If the least squares fit based on Equation (

When a sample has no letters with frequency 1, the model in Equation (

To show how well Equation (

(Triangular)

(Zipf)

For each distribution and each estimator, the bias was approximated as follows. We simulate

We compare the absolute value of the bias of our estimator (New Sharp) with that of the plug-in (MLE), the Miller–Madow (MM), and the one given in Equation (

Another estimator of entropy is the NSB estimator of Nemenman, Shafee, and Bialek [

We compare the absolute value of the bias of our estimator with that of the NSB estimator. The

We now turn to the case when

For any distribution on a countably infinite alphabet

If

These facts tell us that

As in the case when

If

When a sample has no letters with frequency 1, we run into trouble as we did in the case when

To show how well Equation (

(Power)

(Geometric)

(Poisson)

We compare the absolute value of the bias of our estimator (New Sharp) with that of the plug-in (MLE), the Miller–Madow (MM), and the one given in Equation (

We compare the absolute value of the bias of our estimator with that of the NSB estimator. The

In Zhang [

One situation where estimators of entropy run into difficulty is in the case where all

This suggests a way to think about small sample sizes. Before discussing this, we describe a common approach to defining what a small sample size is. When

What matters is not how big the sample is relative to

One way to quantify how much information a sample has is the sample’s coverage of the population, which is given by

Note that, for the situation described above, where each letter is a singleton, we have

We end this paper by discussing some future work. While our simulations suggest that the estimator introduced in this paper is quite useful, it is important to derive its theoretical properties. In a different direction, we note that, in practice, one often needs to compare one estimated entropy to another. An approach to doing this is to use the asymptotic normality of

The difference between biases due to different sample sizes causes a huge inflation of Type II error rate, even with reasonably large samples.

The bias in estimating the variance of an entropy estimator is also sizable and persistent.

The research of the first author is partially supported by NSF Grants DMS 1004769.

The authors declare no conflict of interest.