2.5.6. Analysis of Convergence and Divergence Using Polling

The values of the climate variables were computed for each household. This was performed using GAMS software in three stages. First, using the latitudinal and longitudinal locations of three meteorological stations and the households, the Manhattan distance between each household and the three stations was calculated. Second, based on these metrics, the nearest station, the second nearest station, and the third nearest station were identified. Following the Inverse Distance Weighted interpolation process, we took the inverse of the distance values and normalized these values to sum to 1 to calculate the weighting factors corresponding to the three stations. Lastly, the values for each household were calculated by multiplying the observed meteorological data of the three stations by the respective weighting scores of the households.

Then, the polling method was applied to discern the profiles of households whose perception converges with or diverges from the meteorological data. Polling is a multivariate analysis technique involving a joint analysis of a large number of integer-valued explanatory variables using the maximum likelihood prediction method [41]. It is used to jointly evaluate the roles of different variables in predicting the likelihood of convergence or divergence between meteorological data and perceptions. The joint empirical frequency distribution is defined from observed values of the explanatory variables. Then, conditional frequency distributions are derived from this joint distribution by partitioning the answers by, e.g., S respondents indexed s into a vector *y* of a dependent variable and a vector *x* of explanatory variables, taking the frequencies of *y* conditional on *x* [41–43].

$$\text{Conditional frequency} = \frac{m\_{yx}}{\sum\_{y \in Gx} m\_{yx}} \qquad \text{Covrange} = \frac{m\_{yx}}{\sum\_{x} m\_{yx}}$$

where *m* is the mass of the observations, *Y*s and *X*s show integer coded values of the dependent and explanatory variables, respectively. The conditional frequencies show probability estimates of *y* given profile *x*. Hence, the set of most probable characteristics associated with each *x* value (the "winner") has the highest probability of having the desired y outcomes (convergence or divergence). The coverage of a profile *x* is the mass of a class within profile *x* divided by the total mass of the relevant group. The edge of the winning profile over the runner up (i.e., the second best guess) is the ratio of their maximum likelihood probabilities (i.e., the share of the population covered by the most likely profile relative to the share covered by the runner-up). Selection of the best profile from the set of explanatory variables was based on the coverage and edge of each combination. In addition to the observed and perceived climate variables, all possible combinations of four explanatory variables were used to identify the profiles of households whose perceptions converge with or diverge from the meteorological results.
