Next Article in Journal
The Adaptive Optimal Output Feedback Tracking Control of Unknown Discrete-Time Linear Systems Using a Multistep Q-Learning Approach
Previous Article in Journal
Advancing Spectral Clustering for Categorical and Mixed-Type Data: Insights and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

On the Optimal Point of the Weighted Simpson Index

by
José Pinto Casquilho
1,* and
Helena Mena-Matos
2,*
1
Postgraduate and Research Program, Universidade Nacional Timor Lorosa’e (UNTL), Díli, Timor-Leste
2
Centro de Matemática da Universidade do Porto (CMUP), Faculdade de Ciências da Universidade do Porto (FCUP), 4169-007 Porto, Portugal
*
Authors to whom correspondence should be addressed.
Mathematics 2024, 12(4), 507; https://doi.org/10.3390/math12040507
Submission received: 27 December 2023 / Revised: 26 January 2024 / Accepted: 3 February 2024 / Published: 6 February 2024

Abstract

:
In this short communication, following a brief introduction, we undertake a comprehensive analytical study of the weighted Simpson index. Our primary emphasis concerns the precise determination of the optimal point (minimizer) coordinates and of the minimum value of the index, a differentiable convex function, which is related to the harmonic mean concept. Furthermore, we address and solve the inversion problem and show the tight connection between both approaches. Last, we give some insights and final remarks on this subject.

1. Introduction

The Simpson index concerning a population distributed among k categories or classes is defined as:
S = i = 1 k p i 2 ,
where p i denotes the probability (or proportion of occurrences) of class i. So, one has p i 0 and i = 1 k p i = 1 , and therefore S is defined on a k 1 simplex. This index equals the probability that two elements taken at random from the population of interest belong to the same category or class. The value of Simpson’s index ranges from 1 / k to 1, with 1 representing no diversity; so, the larger the value of S , the lower the diversity. The name “Simpson index” roots from the influential 1949 paper by Edward Hugh Simpson entitled “Measurement of Diversity” [1], wherein he introduced what he called a measure of concentration defined in terms of population constants, with the minimum concentration equaling the maximum diversity. The Simpson index became a widely used quantitative metric in ecological and biodiversity studies as a tool for assessing and quantifying the diversity and evenness of species within ecological communities. It also applies to other biological problems, including biomedical sciences, such as measuring diversity concerning immunity in response to viral infections (e.g., [2]).
However, it is also acknowledged that the original mathematical concept formulation was used in cryptanalysis as far back as the 1920s and 30s—therein named probability of monographic coincidence—by the American cryptanalysts William Friedman and Solomon Kullback (e.g., [3]). It is relevant to note that the Italian statistician Corrado Gini had already applied the quantity 1 i = 1 k f i 2 as early as 1912. He defined the index with relative frequencies f i computed from large samples, referring to it as an index of mutability for disconnected (qualitative) variables [4]. This quantity became later known as the “Gini-Simpson index”, a name adopted in the 1980s by the eminent statistician C. R. Rao (e.g., [5,6]), who restated it with probabilities as G = i = 1 k p i ( 1 p i ) = 1 i = 1 k p i 2 . For instance, Jian et al. [7] consider a “Livelihood Simpson Index”, which in fact is a Gini–Simpson index. Obviously, given a probability distribution p 1 ,   , p k , the Simpson and Gini–Simpson indices correspond to complementary events, verifying S + G = 1 .
The use of a weighted version of the Simpson index appears to have been first reported in 1992, when Nowak and May [8] conceived the effective immune response against the virus population composed of different strains in the context of HIV infections, which was then revisited two years later [9].
Some refer to the weighted Simpson index when they are actually dealing with the weighted Gini–Simpson index (e.g., [10,11]). The weighted Gini–Simpson index is defined as G S w = i = 1 k w i p i ( 1 p i ) , a concave function, differentiable in the interior of a simplex, with an identifiable maximum value for which a method to determine the optimal point (maximizer) was framed based on the fact that one is dealing with feasibility values associated with the constraints of the simplex [12]—namely, that the optimal coordinates must verify p i * 0 —which were not taken into account in [11] and may lead to miscalculations.
Yet, Kasulo and Perrings used a price-weighted Simpson index to assess scenarios relative to the connection between the diversity of catch in a multi-species fishery and profit-maximizing regimes [13], and Ma used a symmetric form of the weighted Simpson index for ranking alternatives under a decision-making procedure involving mixed attribute values [14].
Despite the many recent publications addressing diversity, including phylogenetic diversity, and dissimilarity indices—either in biology (e.g., [15,16,17]) or, following the pioneer approach of Patil and Taillie [18], in the social sciences (e.g., [19,20]), mostly associated with developments relative to Hill’s numbers [21] and Rényi entropies [22]—it seems that an analytical study concerning the optimal point (minimizer) and the optimal value (minimum) of the weighted Simpson index has not yet been published.
In this short communication, we undertake a comprehensive analytical study of the weighted Simpson index focusing on the optimization problem. We do not deal with statistical developments, which could be addressed in the future, presumably within the scope of information measures (e.g., [23]).
Concerning the structure of this paper, in Section 2, we solve both the optimization problem and the inverse problem and discuss different normalization procedures of the index in relation to the results obtained and, in Section 3, we present some final remarks.

2. The Weighted Simpson Index

Herein, we define the weighted Simpson index, concerning a population distributed among k categories or classes, as:
S w = i = 1 k w i p i 2
where p i denotes the probability (or proportion of occurrences) of class i and w i > 0 is a weight assigned to that class, altogether defining a vector of positive real values w = w 1 , , w k t . For now, we have decided not to impose any extra conditions on the weights, leaving this matter to be discussed later. Our current focus is on understanding the broader context.
Weights allow one to consider various features for the classes. In the context of biodiversity, these features may be related, for example, to environmental benefits, conservation importance, and the vulnerability or economic value of species, a subject that was already emphasized at least since the beginning of the 1980s, and exemplified with biomass and other importance values [24].

2.1. Optimizing the Weighted Simpson Index

Now, we will address the optimization problem concerning the weighted Simpson index S w for fixed weights w 1 , , w k . We recall that the lower the value of S w , the greater the diversity of a composition associated with a positive valuation of the different classes or corresponding events. In general, for a weighted index, diversity is not maximized by the uniform distribution of classes. So, it is natural to ask the following: which distribution of the classes minimizes S w and what is its minimum value? In the following proposition, we derive this optimal distribution and the minimum value of S w and determine the maximum value of S w (this one indicating a minimum diversity’s value, what occurs in a vertex of the simplex) as well.
Proposition 1.
Given  w = w 1 , , w k t   a vector of positive real numbers and the weighted Simpson index  S w = i = 1 k w i p i 2    defined on the simplex   Δ k 1 = p 1 , , p k R k :   p i 0 ,   i = 1 k p i = 1 ,
(a) 
the maximum value of  S w  is given by max i w i ;
(b) 
the minimum value of  S w is given by
    S w * = 1 i = 1 k 1 w i
and corresponds to the distribution
p j * = 1 w j i = 1 k   1 w i   for   j = 1 , , k .
Proof. 
The weighted Simpson index S w is a (continuous) convex function defined in the compact domain Δ k 1   and thus the extreme value theorem ensures the existence of the (global) maximum and minimum values of the index. So, S w attains its absolute maximum at the boundary of Δ k 1 and its absolute minimum at the interior of Δ k 1 . Clearly, the maximum is attained when all except one of the p i are zero, and therefore one has max S w = max i w i .
The minimum can be assessed with the method of Lagrange multipliers for finding the critical points of S w subject to the equality constraint i = 1 k p i = 1 . As S w is differentiable in the interior of Δ k 1 , one can build the Lagrangian function:
L w p 1 , , p k ; λ = S w p 1 , , p k + λ i = 1 k p i 1 .
Equating partial derivatives to zero, we get:
L w p 1 = 0 L w p k = 0 L w λ = 0 2 w 1 p 1 + λ = 0 2 w k p k + λ = 0   i = 1 k p i 1 = 0  
From the first k equations in (4), we conclude that for any specific j 1 ,   2 , , k , the following equivalence holds:
w j p j = λ / 2     p j = λ 2 w j   .
Then, using the constraint j = 1 k p j = 1 one has λ 2   i = 1 k 1 w i = 1     w j p j   i = 1 k 1 w i = 1 and, recalling the argument of a convex function, we obtain the optimal coordinates of the minimum point given by:
p j * = 1 w j i = 1 k   1 w i   for   j = 1 , k .
Now one can evaluate the minimum value of the weighted Simpson index (1) as follows:
S w * = S w p 1 * ,   ,   p k * = j = 1 k w j 1 w j i = 1 k 1 w i 2 = 1 i = 1 k 1 w i 2 j = 1 k 1 w j = 1 i = 1 k 1 w i .

2.2. Some Further Comments on the Minimizer

Note that the minimum value of the weighted Simpson index (2) is related to the harmonic mean of the weights H ( w ) by S w * = H ( w ) / k . The harmonic mean H of k positive real numbers x 1 , , x k is the reciprocal of the arithmetic mean of the reciprocals of those numbers:
H x 1 , , x k = k 1 x 1 + + 1 x k ,
and is appropriate for averaging rates over constant numerator units [25]. As a typical example, if a set of investments are invested at different interest rates, and they all give the same income, the unique rate at which all of the capital tied up in those investments must be invested to produce the same revenue as given by the set of investments, is equal to the harmonic mean of the individual rates ([26], p. 240).
The special case of all weights being equal to 1 leads to the uniform distribution: p i * = 1 / k ,   i and to the minimum value S w * = 1 / k , as can be seen from expressions (3) and (2), and also because in this case, S w = S with S being the (unweighted) Simpson index whose minimum is 1 / k .
Rewriting the optimal distribution in (3) as:
p j * = 1 1 + w j i j 1 w i             for   j = 1 , , k
it is straightforward to conclude that the weights are driving forces of the optimal probabilities, operating reciprocally. When the weight attached to a specific class increases and all the others keep invariant, the corresponding optimal probability decreases, and when a weight attached to another class increases, all the others being invariant, the original class increases its optimal probability.
If one considers the valuation of the classes of a distribution in the usual sense of importance assessment with an ordering of positive real numbers, expecting that v m > v n would promote the result p m * > p n * , then one should be aware that the weights associated with the optimal point (3) would not be the values v m , v n , but could possibly be conceived like w m = 1 / v m and w n = 1 / v n instead.

2.3. Normalization

In the realm of indices, normalization comprises the adjustment of the index values to conform to a predetermined range or interval. The use of normalized indices in applications can be important for several reasons. Normalized indices provide a standardized scale, usually ranging from 0 to 1, or 0% to 100%, irrespective of the specific scale or magnitude of the data. This standardization allows for direct comparisons between different datasets or populations and remains consistent across different contexts and scales.
In the case of the weighted Simpson index, this can be done in a classic way defining the normalized weighted Simpson index as:
s w = S w m i n S w m a x S w m i n S w = S w i = 1 k 1 w i 1 m a x w i i = 1 k 1 w i 1 .
However, this normalization eliminates the effect of the number of classes in the distribution (e.g., species in a community). For example, in the case of all weights equal to 1, the normalized weighted Simpson index of a population with k species uniformly distributed is always 0, and thus independent of the number of species. The fact that normalized indices of diversity can be misleading has already been mentioned by several authors (e.g., [11]).
In the case of a weighted index, it may be relevant to normalize the weights so that the index becomes dimensionless and independent of the order of magnitude of the weights. For the weighted Simpson index, the normalization can be performed, for example, by imposing the condition i = 1 k w i = 1 . This procedure can be accomplished by dividing each non-normalized weight w i by the sum of all the weights. So, the weighted Simpson index with normalized weights corresponding to the weighted Simpson index S w   is denoted by S u = S β w with β = i = 1 k w i 1 and u = u 1 , , u k t with u i = β w i .
Proposition 2.
Let  S u  be the weighted Simpson index computed with normalized weights, corresponding to the weighted Simpson index  S w . Then:
(a) 
The maximum value of  S u  is given by max i u i = i = 1 k w i 1 max i w i ;
(b) 
The minimum value of  S u  is given by  S u * = 1 i = 1 k 1 u i = 1 i = 1 k w i i = 1 k 1 w i  and corresponds to the distribution  p j * = 1 w j i = 1 k   1 w i  for  j = 1 , k .
Proof. 
Note that S β w = β S w for any real number β . In fact, S β w = i = 1 k β w i p i 2 = β i = 1 k w i p i 2 = β S w . As S u = S β w = β S w with β = i = 1 k w i 1 > 0 , this normalization procedure does not affect the maximum and minimum points of the index, and the maximum and minimum values of S u are obtained by multiplying by β , respectively, the maximum and minimum values of S w . □

2.4. The Inverse Problem

The inverse procedure relative to the weighted Gini–Simpson index was formulated recently [27]. Now, we consider the analogous problem concerning the weighted Simpson index, stated as the following: given a minimum point of S w denoted by p 1 * , ,   p k * , verifying both 0 < p j * < 1 and j = 1 k p j * = 1 , what would be a set of weights able to generate that solution? The answer to this question is straightforward, as follows:
Proposition 3.
Suppose that  p * = p 1 * , ,   p k * , verifying both  0 < p j * < 1  and  j = 1 k p j * = 1 , is a minimum point of the weighted Simpson index   S u  with normalized weights. Then
u i = 1 p i * j = 1 k 1 p j *   for   i = 1 , , k ,
and the harmonic mean of the weights equals the harmonic mean of the coordinates of the minimum point,  H u = H p * .
Proof. 
Recalling (3), the weights must be chosen to be inversely proportional to the optimal coordinates p j * with the proportionality constant equal to S u * (2), as we can see rewriting:
p j * = 1 u j i = 1 k 1 u i 1 p j * = u j i = 1 k 1 u i 1 p j * = u j 1 S u * u j = S u * p j * .
For normalized weights, using the condition j = 1 k u j = 1 , one gets j = 1 k S u * p j * = 1     S u * = 1 / j = 1 k 1 p j * , and so the weights must be chosen as:
u i = 1 p i * j = 1 k 1 p j *   for   i = 1 , , k .
Note that from the condition that the sum of the weights equals 1, it follows that i = 1 k 1 u i = i = 1 k 1 p i * and thus, we can proceed with the equality   k i = 1 k 1 u i = k i = 1 k 1 p i * obtaining the result concerning the corresponding harmonic means H u = H p * . □
For non-normalized weights, there are infinitely many solutions to the inverse problem, parameterized by the minimum S w * . In other words, in that case, for the inverse problem to have a unique solution, it is not sufficient to know the minimum point, and the minimum value S w * must also be given.

3. Final Remarks

We have presented a detailed analytical study of the optimization problem associated with the weighted Simpson index. The core result is that at the optimal point one has w i p i * = w j p j * for i , j = 1 , , k , and also, that for all i, one gets w i p i * = S w * , the minimum value of the index. So, there is a trade-off between the weights and the optimal probabilities (or proportions of occurrences) in what could be seen as an equilibrium condition. The fact that Nowak [8,9] has used the weighted Simpson index as a Lyapunov function to assess an antigenic diversity threshold seems compatible with an equilibrium point perspective, which could also be used in a broad sense concerning different problems within several scientific fields. The mathematical results are obtained in simple closed-form solutions and so do not seem prone to unexpected computational difficulties.
Furthermore, for a random variable W with values corresponding to the previous weights w i i = 1 , , k , and the probability function given by Pr W = w i = p i * , with p i * defined as in (3), computing the mean value of W entails E W = i = 1 k w i p i * = k S w * , which equals the harmonic mean of the weights, meaning E W = H ( w ) .

Author Contributions

J.P.C. conceived a draft solving the direct optimization problem and H.M.-M. revised and built the solution for the inverse problem. Both authors wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no specific funding. The second author was partially supported by CMUP, a member of LASI, which is financed by national funds through FCT—Fundação para a Ciência e a Tecnologia, I.P., under the projects with reference UIDB/00144/2020 and UIDP/00144/2020.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the anonymous reviewers for their insightful comments and suggestions during the peer review process.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Simpson, E. Measurement of diversity. Nature 1949, 163, 688. [Google Scholar] [CrossRef]
  2. Davis, C.L.; Adler, F.R. Mathematical models of memory CD8+ T-cell repertoire dynamics in response to viral infections. Bull. Math. Biol. 2013, 75, 491–522. [Google Scholar] [CrossRef] [PubMed]
  3. Österreicher, F.; Casquilho, J.A.P. On the Gini-Simpson index and its generalization—A historic note. S. Afr. Stat. J. 2018, 52, 129–137. [Google Scholar] [CrossRef]
  4. Gini, C. Variabilità e Mutabilità: Contributo Allo Studio Delle Distribuzioni e Delle Relazioni Statistiche; C. Cuppini: Bologna, Italy, 1912; Available online: https://www.byterfly.eu/islandora/object/librib:680892#mode/2up (accessed on 8 December 2023).
  5. Rao, C.R. Diversity and dissimilarity coefficients: A unified approach. Theor. Popul. Biol. 1982, 21, 24–43. [Google Scholar] [CrossRef]
  6. Rao, C.R. Diversity: Its measurement, decomposition, apportionment and analysis. Sankhya Indian. J. Stat. 1982, 44, 1–22. Available online: http://library.isical.ac.in:8080/xmlui/bitstream/handle/10263/504/82.E01.pdf?sequence=1&isAllowed=y (accessed on 8 December 2023).
  7. Jiang, X.; Yin, G.; Lou, Y.; Xie, S.; Wei, W. The impact of transformation of farmer’s livelihood on the increasing labor costs of grain plantation in China. Sustainability 2021, 13, 11637. [Google Scholar] [CrossRef]
  8. Nowak, M.A.; May, R.M. Coexistence and competition in HIV infections. J. Theor. Biol. 1992, 159, 329–342. [Google Scholar] [CrossRef]
  9. Nowak, M.A. The evolutionary dynamics of HIV infections. In First European Congress of Mathematics, Paris, 6–10 July 1992. Progress in Mathematics; Joseph, A., Mignot, F., Murat, F., Prum, B., Rentschler, R., Eds.; Birkhäuser Verlag: Basel, Switzerland, 1994; Volume 120, pp. 311–326. [Google Scholar] [CrossRef]
  10. Subburayalu, S.; Sydnor, T.D. Assessing street tree diversity in four Ohio communities using the weightd Simpson index. Landsc. Urban. Plan. 2012, 106, 44–50. [Google Scholar] [CrossRef]
  11. Guiaşu, R.C.; Guiaşu, S. Conditional and weighted measures of ecological diversity. Int. J. Uncertain Fuzziness Knowl. Based Syst. 2003, 11, 283–300. [Google Scholar] [CrossRef]
  12. Casquilho, J.P. A methodology to determine the maximum value of weighted Gini-Simpson index. SpringerPlus 2016, 5, 1143. [Google Scholar] [CrossRef]
  13. Kasulo, V.; Perrings, C. Fishing down the value chain: Biodiversity and access regimes in freshwater fisheries—The case of Malawi. Ecol. Econ. 2006, 59, 106–114. [Google Scholar] [CrossRef]
  14. Ma, J. Generalized grey target decision method based on the Gini-Simpson index involving mixed attributes and uncertain numbers. Data Tech. Appl. 2019, 53, 484–500. [Google Scholar] [CrossRef]
  15. Grabchak, M.; Marcon, E.; Lang, G.; Zhang, Z. The generalized Simpson’s entropy is a measure of biodiversity. PLoS ONE 2017, 12, e0173305. [Google Scholar] [CrossRef] [PubMed]
  16. Ricotta, C.; Szeidl, L.; Pavoine, S. Towards a unifying framework for diversity and dissimilarity coefficients. Ecol. Indic. 2021, 129, 107971. [Google Scholar] [CrossRef]
  17. Xu, S.; Böttcher, L.; Chou, T. Diversity in biology: Definitions, quantification and models. Phys. Biol. 2020, 17, 031001. [Google Scholar] [CrossRef] [PubMed]
  18. Patil, G.P.; Taillie, C. Diversity as a concept and its measurement. J. Am. Stat. Assoc. 1982, 77, 548–567. [Google Scholar] [CrossRef]
  19. Xu, S.; Peskin, C.S. The impact of universal recycling on the evolution of economic diversity. PLoS ONE 2022, 17, e0262184. [Google Scholar] [CrossRef] [PubMed]
  20. Nourbakhsh, M.; Yari, G. Weighted Rényi’s entropy for lifetime distributions. Commun. Stat.-Theory Methods 2017, 46, 7085–7098. [Google Scholar] [CrossRef]
  21. Hill, M.O. Diversity and evenness: A unifying notation and its consequences. Ecology 1973, 54, 427–432. [Google Scholar] [CrossRef]
  22. Rényi, A. On the dimension and entropy of probability distributions. Acta Math. Acad. Scient Hung. 1959, 10, 193–215. [Google Scholar] [CrossRef]
  23. Pardo, L. New developments in statistical information theory based on entropy and divergence measures. Entropy 2019, 21, 391. [Google Scholar] [CrossRef] [PubMed]
  24. Lyons, N.I. Comparing diversity indices based on counts weighted by biomass or other importance values. Am. Nat. 1981, 118, 438–442. [Google Scholar] [CrossRef]
  25. Komić, J. Harmonic Mean. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 622–624. [Google Scholar] [CrossRef]
  26. Dodge, Y. The Concise Encyclopedia of Statistics; Springer: New York, NY, USA, 2008; pp. 239–241. [Google Scholar] [CrossRef]
  27. Casquilho, J.P. On the weighted Gini-Simpson index: Estimating feasible weights using the optimal point and discussing a link with possibility theory. Soft Comput. 2020, 24, 17187–17194. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Casquilho, J.P.; Mena-Matos, H. On the Optimal Point of the Weighted Simpson Index. Mathematics 2024, 12, 507. https://doi.org/10.3390/math12040507

AMA Style

Casquilho JP, Mena-Matos H. On the Optimal Point of the Weighted Simpson Index. Mathematics. 2024; 12(4):507. https://doi.org/10.3390/math12040507

Chicago/Turabian Style

Casquilho, José Pinto, and Helena Mena-Matos. 2024. "On the Optimal Point of the Weighted Simpson Index" Mathematics 12, no. 4: 507. https://doi.org/10.3390/math12040507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop