**1. Introduction**

In its simplest sense, the environment means the surrounding external conditions influencing the growth of people, animals or plants, living or working conditions, etc. Environmental Sciences (EVS) is an integrated multidisciplinary approach that studies the environment and solutions of environmental problems. In the present scenario, the environment has become a global agenda item, which has increased the scope and importance of EVS. In the development of different stages of civilization, humans were accompanied both by the environment and statistics. Since the early days, they were found to be knowingly accustomed to the environment and unknowingly played with statistics. Thus, both statistics and the environment have shared a long history of mutual reciprocation. In modern times, these two subjects are independently able to attract the academic attention of scholars throughout the world (see [1]).

The United Nations Statistics Division (UNSD) has an exclusive branch for environmental statistics, established in 1995. Its major area of work is data collection, methodology, capacity development, and coordination of environmental statistics and indicators. They have a dedicated newsletter called "ENVSTATS", which publishes the activities of UNSD in the area of environmental statistics. The Framework for the Development of Environmental Statistics (FDES 2013) is an updated version of the original FDES, which was published by UNSD in 1984. In India, the Ministry of Statistics and Programme Implementation has a specific publication report in the branch of environmental statistics called "EnviStats" which updates recent developments in the field of environmental statistics.

The extensive use of statistics in EVS led to the development of a new branch called Environmental Statistics. We all know that statistics are an inevitable context in any scientific arena. Even so, the motivation for conducting this specific review is that environmental statistics have an integrated multidisciplinary face, which will shed light on the pure biological field of modern science with its analytical nature. That undiscovered interconnection with statistics and environmental science will be revealed through this review, which will be an easy access point for future investigators. This review has been conducted in two

**Citation:** Tomy, L.; Chesneau, C.; Madhav, A.K. Statistical Techniques for Environmental Sciences: A Review. *Math. Comput. Appl.* **2021**, *26*, 74. https://doi.org/10.3390/ mca26040074

Received: 8 October 2021 Accepted: 1 November 2021 Published: 4 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

5

parts, i.e., the pure statistical techniques and those specific techniques that have been exclusively invented for environmental science. A brief state of the art is presented below. The authors of [2] discussed different statistical techniques which are helpful to environmental engineers. It addresses different environmental problems with a solution-oriented approach that encourages students to view statistics as a problem-solving tool.

The use of statistical techniques to understand various environmental phenomena was explained in [3]. He examined different statistical tools, such as probabilistic and stochastic models, data collection, data analysis, inferential statistics, etc. In addition, he discussed principles and methods applicable to a wide range of environmental issues (including pollution, conservation, management, control, standards, sampling, monitoring, etc.) across all fields of interest and concern (including air and water quality, forestry, radiation, climate, food, noise, soil condition, fisheries and environmental standards). Accordingly, he considered sophisticated statistical techniques, such as extreme processes, stimulus response methodology, linear and generalized linear models, sampling principles and methods, time series, spatial models, multivariate techniques, design of experiments, etc.

This article is an attempt to describe some basic statistical concepts used in EVS, thereby establishing a link between the two subjects. It studies some basic statistical concepts relevant to environmental study. Illustrations are discussed on the basis of [4,5].

In this article, Section 2 presents the basic concepts in statistics, Section 3 describes the application of statistical tools in EVS, Section 4 is about the various illustrations regarding the topic, and Section 5 is the conclusion.

## **2. Basic Concepts**

With the advent of the theory of probability and games of chance in the mid-seventeenth century, the concept of modern statistics was born. The name "statistics" appears to have come from the German word "Statistik," the Italian word "statista," or the Latin word "status," all of which mean "political state" or "state craft", respectively. The term statistics can be used in two different senses.

In the plural sense, it means "a collection of numerical facts". According to Horace Secrist, "Statistics may be defined as the aggregate of facts, affected to a marked extent by a multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in a systematic manner, for a predetermined purpose and placed in relation to each other". This definition explains the characteristics of statistical data.

In its singular sense, it means "statistical methods for dealing with numerical data". According to Croxton and Cowden, "Statistics is the science of collection, presentation, analysis, and interpretation of numerical data". This definition points out different stages of statistical investigation. Hence, statistics is concerned with exploring, summarizing, and making inferences about the state of complex systems, for example, the state of a nation (social statistics), the state of people's health (medical and health statistics), the state of the environment (environmental statistics), as extensively described in [6].

In the midst of its wide range of applications and advantages, one important allegation about statistics is that the concerned parties may make misleading statements in their favor. However, the fact is that, as in the case of any science, only an expert can make use of statistical tools effectively. One should make sure that the statistical study is conducted by the right person. There are lots of good ways, many more bad and wrong ways too. So, be sure about the correctness of the tool used. The notorious allegation by Mark Twain citing the British Prime Minister Benjamin Disraeli that "there are three types of lies: lies, damned lies, and statistics" (but the phrase is nowhere in Disraeli's works, and the earliest known appearances were years after his death, so it is assumed to be by some anonymous writer in mid-1891) is just a lie, provided the precaution is served. In such a context, it is interesting that the author of [7] beautifully coined the title "Truth, Damn Truth and Statistics" for his article.

## **3. Application of Statistical Tools in EVS**

In statistics, data analysis is divided into two sections: descriptive statistics and inferential statistics. The authors of [5] discussed the two in depth, and Sections 3.1 and 3.2 below present them in summary form.

#### *3.1. Descriptive Statistics*

Descriptive statistics are the initial stage of data analysis where exploration, visualization and summarization of data are done. We will look at the definitions of population and random sample in this section. Different types of data, viz. quantitative or qualitative, discrete or continuous, are helpful for studying the features of the data distribution, patterns, and associations. The frequency tables, bar charts, pie diagrams, histograms, etc., represent the data distribution, position, spread and shape efficiently. This descriptive statistical approach is useful for interpreting the information contained in the data and, hence, for drawing conclusions.

Further, different measures of central tendency viz. mean, median, etc., were calculated for analyzing environmental data. It is also useful to study dispersion measures, such as range, standard deviation, etc., to measure variability in small samples. One of the important measures of relative dispersion is the coefficient of variation, and it is useful for comparing the variability of data with different units. Skewness and kurtosis characterize the shape of the sample distribution. The concepts of association and correlation demonstrate the relationships between variables and are useful tools for a clear understanding of linear and non-linear relationships. Important measures of these fundamental characteristics are briefly discussed here in the following.

#### 3.1.1. Central Tendency

The tendency of the observations to cluster around some central value is called central tendency. Any measure of central tendency is termed "average". The most commonly used averages are the following:

$$\text{Mean } \mathfrak{x} = \sum\_{i=1}^{n} \frac{x\_i}{n}.$$

where *xi* denotes the *i*th observation and *n* is the number of observations.

Median is the middle-most observation when observations are arranged in ascending or descending order

and

Mode is the most frequently occurring observation.

#### 3.1.2. Dispersion

The scattering of observations about the central value is called dispersion. Important measures of dispersion are range, quartile deviation, mean deviation, standard deviation and coefficient of variation. These four measures depend on the unit of measurement of the observations, hence, they are absolute measures. They can be defined as:

Range is the difference between largest and smallest observations.

Measure based on quartiles:

• Quartile deviation

$$QD = \frac{Q\_3 - Q\_2}{2}$$

where *Q*<sup>3</sup> and *Q*<sup>2</sup> are the third and first quartile in the frequency distribution, respectively;

• Mean deviation

$$MD = \sum\_{i=1}^{n} \frac{|\mathbf{x}\_i - \vec{x}|}{n}$$

where *x*¯ is mean of *xi* (observed values);

• Standard deviation

$$SD = \sqrt{\sum\_{i=1}^{n} \frac{(\mathbf{x}\_i - \overline{\mathfrak{x}})^2}{n}};$$

• Coefficient of variation

$$CV = \frac{SD}{\text{Mean}} \times 100.$$

Thus, CV is the relative measure (measure independent of unit) corresponding to SD.

3.1.3. Skewness

The lack of symmetry is termed as skewness or asymmetry. In a frequency curve, if both the sides of the mode are distributed in the same manner, the distribution is symmetric, otherwise it is skewed. When more area is on the right side of the mode, the distribution is positively skewed. If more area is on the left side of the mode, the distribution is negatively skewed. Figure 1 depicts the three situations. There are mainly two measures:

1. Pearson's measure

$$\mathcal{S} = \frac{\text{Mean} - \text{Mode}}{SD}$$

If *S* = 0, the distribution is symmetric, if *S* > 0, positively skewed and if *S* < 0, negatively skewed.

2. Moment measure

$$\beta\_1 = \frac{\mu\_3}{\mu\_2^{3/2}}$$

*μ*<sup>2</sup> = *SD*<sup>2</sup>

where

and

$$\mu\_3 = \sum\_{i=1}^n \frac{(x\_i - \bar{x})^3}{n}.$$

If *β*<sup>1</sup> = 0, it is symmetric, if *β*<sup>1</sup> > 0, positively skewed and if *β*<sup>1</sup> < 0, negatively skewed.

**Figure 1.** (**a**) Negative skewness, (**b**) symmetric, and (**c**) positive skewness.
