*1.1. Motivations*

Among the many facets of omni-channel retailing, this paper refers to a set of analytics and decision processes that support the seamless focus of a brand across many channels (instore, online, mobile, call center or social). Retailers have come to recognize the importance of integrating information and services from multiple available channels to reduce data mismatch in order to create a seamless Customer eXperience (CX) and to obtain datasupported insight into the managemen<sup>t</sup> of a network of stores. However, it is important to identify, promote and provide customers with various experiential benefits to enhance both shopping intentions and satisfaction. Although price and convenience are still primary considerations, customers are putting more emphasis on competence in specific categories and the overall customer experience. This aspect is particularly strong for categories that are highly fragmented or in which advice to customer plays a large role in sales, such as furniture, do-it-yourself products, apparel and consumer electronics. Personalization, meaning the quality of individual attention and tailored service, is largely regarded as the top criterion in evaluating CX. The analysis of customer data, from questionnaires and the analyses of online behavior, is instrumental in providing personalized services such as customized purchase recommendations, sending promotion information based on individual preferences and providing location-based services. The focus of this paper is on

**Citation:** Ponti, A.; Giordani, I.; Mistri, M.; Candelieri, A.; Archetti, F. The "Unreasonable" Effectiveness of the Wasserstein Distance in Analyzing Key Performance Indicators of a Network of Stores. *Big Data Cogn. Comput.* **2022**, *6*, 138. https://doi.org/10.3390/ bdcc6040138

Academic Editors: Domenico Talia, Fabrizio Marozzo and Min Chen

Received: 7 October 2022 Accepted: 11 November 2022 Published: 15 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the analysis of CX while considering a multinational retail company operating through a network of stores. To enable this analysis, a number of key performance indicators (KPI), acquired for each customer through different channels, are associated to the main drivers of the customer experience. It is important to remark that this analysis must be performed from a granular perspective on what a consumer really wants, today and in the future, in order to understand which services/products to offer on which channel. Developing this detailed understanding of consumers requires harnessing consumer data, which should be combined with consumer behavior insight from interviews and observations. It also requires analytics, which can work at the required granular level, gain a clear understanding of consumer expectations and derive a global picture of the strengths and weaknesses of each store. Capturing the full potential of omni-channel retailing requires a cross-channel perspective and transparency to measure and manage channel interplay, obtaining at the same time measures for the entire network of stores and improvement actions. More recently, the use of machine learning methods has been gaining more importance to leverage the wealth of customer data into a richer representation of the CX. It is the opinion of the authors of this paper that, given the growing number of channels and heterogeneity of customers, the standard statistical approach, which analyzes samples of the customer behavior only on parameters such as average and variance, might capture only a part of the hidden value of the data.

This paper proposes a different approach in which the samples from customer surveys are represented as discrete probability distributions, in particular as histograms or cloud points. In this distributional context, the variation in performance between two stores, considering one KPI, is the distance between two univariate histograms. The method can be naturally extended to jointly consider several KPIs, leading, for each store, to a multivariate histogram. The statistical and, more recently, the machine learning communities have developed many alternative models to measure the distance between distributions. A general class of distances, known as *f*-divergences, is based on the expected value of a convex function of the ratio of two distributions. Some examples are Kullback–Leibler (and its symmetrized version Jensen–Shannon), Hellinger, Total Variation and *χ*-square divergence. In this paper, the focus is on the Wasserstein (WST) distance. Although other distances measure pointwise differences in densities (or weights), the WST distance (also known as the optimal transport distance) is a cross-binning distance; this distinction can be summed up by saying that the optimal transport distance is horizontal, whereas other distances are based on vertical displacement. Two important elements of the WST theory are the barycenter and WST clustering. The WST barycenter offers a useful synthesis of a set of distributions. A standard clustering method such as k-means can be generalized to WST spaces, enabling the WST barycenters and *k*-mean WST clustering, which is used to characterize and classify behavioral patterns. In general, WST enables the synthetization of a comparison between two multi-dimensional distributions through a single metric by using all information in the distributions. Moreover, the WST distance is generally well defined and provides an interpretable distance metric between distributions.

This study was motivated by the emerging need for a multination retailer to revise the performance measurement system—currently based on NPS—which has been adopted to rank the 50 stores of its commercial network. The limitations of NPS and the desire to design a new performance measurement system able to deal with multiple KPIs coming from omni-channel customer surveys lead us to propose a completely new analytical framework based on multi-variate discrete distributions and the Wasserstein distance. Indeed, using a more comprehensive system to evaluate the relative performance of each store with respect to the others is a critical decision for the company as a basis for the distribution of a performance-related bonus (on a quarterly basis), which is subject to negotiation with trade unions. Although multi-channel surveys are available, this study focuses on only one specific channel to better evaluate the benefits and limitations of the new framework.

## *1.2. Related Works*

The cornerstone of the implementation of a CX strategy is the metric used to measure the performance of a company. A widely used such metric is the Net Promoter Score [1], which is associated with customer loyalty and is considered a reliable indicator of the future of a company's performance.

The author of [2] offered a view about a complete system of performance measurements for an enterprise based on over twenty years of research and development activities. The system was designed to provide key persons at different units/levels with useful quantitative information, such as board members to exercise due diligence, leaders to decide where to focus attention next and people to carry out their work well. Later, the author of [3] provided a review of various methods for tackling performance measurement problems. Although technical statistical issues are buried somewhat below the surface, statistical thinking is very much part of the main line of the argument, meaning that performance measurements should be an area attracting serious attention from statisticians. More recently, the authors of [4] re-visited the use of NPS (Net Promoter Score) as a predictor of sales growth by analyzing data from seven brands operating in the U.S. sportswear industry measured over five years. Interestingly, the results confirmed that, although the original premises are reasonable, methodological concerns arise when NPS is used as a metric for tracking overall brand health. Only the more recently developed brand health measure of NPS (using an all-potential customer samples) is effective at predicting future sales growth.

An interesting approach leveraging machine learning to analyze Customer Experience (CX) was proposed in [5,6]. The authors of these works considered beyond the NPS and the Customer SATisfaction score (CSAT) to measure the CX, and they performed a wide comparative evaluation of several machine learning approaches, analyzing the specific case of a telecommunication company and applying a wide set of classification methods to categorize the survey results.

In this paper we propose a distributional approach to performance evaluation; the performance is measured through KPIs represented as discrete probability distributions whose similarities are computed through the Wasserstein distance. The Wasserstein distance can be traced back to the works of Gaspard Monge [7] and Lev Kantorovich [8]. Recently, also under the name of the Earth Mover Distance (EMD), it has been gaining increasing importance in several fields, such as Imaging [9], Natural Language Processing [10] and a generation of adversarial networks [11]. Important references include [12], which gave a complete mathematical characterization, and [13], which also gave an up-to-date survey of numerical methods. The authors of [14] provided an overview of the Wasserstein space. A specific analysis of its geometry and geodesic Principal Components Analysis was given in [15]. Specific computational results related to barycenters and clustering were given in [16]. A novel Wasserstein distance and fast clustering method were proposed in [17]. One should note that the computational cost of the WST distance is amplified in computations of the barycenters of multi-variate distributions for computational as well as theoretical reasons [13].

The Wasserstein distance has also been receiving attention in economic theory, where the key reference is [18], in which it was shown that a number of seemingly unrelated problems can be modelled and solved as optimal transport problems. For the term "unreasonable effectiveness" in the title of this paper, we are indebted to [19]. Some key problems in finance have been also dealt with using optimal transport as the pricing of financial derivatives [18] and the analysis of robustness in risk managemen<sup>t</sup> [20]. Other contributions to finance are [21], which provided a Wasserstein-based analysis of stability in finance, and [22], which proposed Wasserstein *k*-means clustering to classify market regimes. An important application domain of the Wasserstein distance is the analysis of distributional robustness. In [23], the authors analyzed Wasserstein-based distributionally robust optimization and its application in machine learning using the Wasserstein metric [24,25]. Two contributions, along the line of stochastic programming, were given

in [26], which proposed an approximation of data-driven chance-constrained programs over Wasserstein balls, and in [27], which proposed a distributionally robust two-stage Wasserstein model with recourse. We are not aware of significant applications of the Wasserstein distance in managemen<sup>t</sup> science. A managemen<sup>t</sup> topic where the Wasserstein distance enables significant contributions is the design of recommender systems using metric learning [28,29], which has shown to enable the measurement of uncertainty and the embedding of user/item representations in a low-dimensional space.
