The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing
Abstract
:1. Introduction
2. Some History
3. Multiple Regression
3.1. Motivation
3.2. The Data
- y (response) Loyalty
- Price
- Quality
- Community Outreach
- Trust
- Customer Satisfaction
- Negative Publicity.
3.3. The Model: Parametric Regression
3.4. The Fitted Model and Its Residuals
4. The Added Variable Plot and Extensions
4.1. The Added Variable Plot
4.2. Monitoring Tests for Regression Coefficients: Extended Added Variable Plots
4.3. The Monitoring Plot of Added Variable t-Statistics for the Loyalty Data
5. Response Transformation
5.1. Introduction: The Box–Cox Transformation
5.2. Maximum Likelihood Estimation for the Box–Cox Transformation
5.3. An Approximate Score Test
5.4. The Fan Plot
5.5. Initial Robust Transformation of the Loyalty Data
5.6. An Automatic Procedure for the Box–Cox Transformation
5.7. Automatic Robust Transformation of the Loyalty Data
6. Robust Non-Parametric Regression with Transformation of the Response and Explanatory Variables
6.1. AVAS: Additivity and Variance Stabilization
6.2. Generalized Additive Models and the Structure of AVAS
6.3. The Numerical Variance Stabilizing Transformation and the AVAS Algorithm
6.4. Non-Robust Analysis of the Loyalty Data with AVAS
7. Robust Non-Parametric Regression with Response and Explanatory Variable Transformations: RAVAS
7.1. Improvements and Options
7.2. Robust Analysis with RAVAS
8. Interpretation
- —Price. Our RAVAS analysis leads to a t value for this variable that is almost twice the value found from regression on the original data given in Table 2. The primary importance of price is indicated. The first part of the curve shows low loyalty for cheap items. Price is important, but the brand is not. These products are highly fungible without any particular characteristics. Choice between them is often strongly conditioned by promotions and rewards programs [45]. But, increasingly, for more expensive items, loyalty becomes higher as specific characteristics are felt to be important. The purchase of products with higher price positioning and with particular characteristics is the result of a careful selection process by the consumer; an extreme example is represented by the luxury market [46]. When consumers find the product that meets their needs, they tend to build loyalty and are often unwilling to change the product. Furthermore, the increase in loyalty for high-priced goods is strongly conditioned by the need to show off. Loyalty to a certain brand is hugely influenced by emotions given by the achievement of a social value which, according to the consumer, derives from the product or service.
- —Quality. These are data on what people believe they are doing. In fact, quality is not directly measurable by the consumer but is the result of perception. What emerges from the results is that loyalty increases with perceived quality, in a surprisingly linear way, although with proportionally increased loyalty for products of high perceived quality. As is the case for price, a higher level of quality also includes a psychological component which increases loyalty so that higher prices are assumed to be a guarantee of higher quality. This is supported by the cross-correlation indices between the price, quality and loyalty variables in Table 1. The result agrees and confirms studies in which it was found that price promotions on brands with the lowest loyalty rate must be more aggressive to steal loyal consumers through quality. Furthermore, the possible switch is more probable between fungible assets [47].
- —Community Outreach. This strategy usually improves corporate image and brand reputation and can work very well in customer relationships. Several studies recorded an increase in consumer loyalty towards companies sensitive to ethical sponsorships, environmental protection, transparency and social responsibility [48]. These strategies present a roughly quadratic relationship, although the decreasing upper part is relatively sparse. This means that community outreach promotes loyalty up to a certain level, after which such strategies may make customers feel that the firm is more interested in its image than in serving them. This seems to mirror the findings that are emerging, for example, from some ongoing studies of green companies, where consumers are starting to suspect that most awareness initiatives are a mere facade strategy [49].
- —Trust. Trust is unimportant for loyalty at low levels; then, loyalty increases with trust, up to a point, and finally decreases. To explain this trend in the relationship, we have to recall once again the concept of “mass” and “targeted” products mentioned above. Since trust, price and loyalty are strongly correlated, we can say that, for low-priced products, the choice is based on “If I see a bargain I go for it”, pointing out the unimportance of trust perception. For medium-priced products, on the other hand, there is an increase in loyalty as a function of trust and value [50]. When the value of the goods increases, they fall into the category of “targeted” goods, those for which a selection process based on emotions prevails and this leads to neglecting loyalty despite great trust. Another interpretation for the final decrease in loyalty is suggested by the result of a study that, in summary, found that customers with high levels of satisfaction may not especially trust another company that tries to raise their already high level of satisfaction [51].
- —Customer Satisfaction. The evidence for this relationship appears obvious. Customer satisfaction is a feeling that derives from a mix of different factors such as the quality of service, the quality of the products and the price level [52]. It is therefore natural to expect a result such as the one obtained, in which low levels of customer satisfaction are accompanied by low levels of loyalty which then increase steadily with increases in customer satisfaction.
- —Negative Publicity. Negative publicity has a strong impact on reputation, which is a loyalty driver. Indeed, when the quality of the product or service is not easy to measure, loyalty is supported more by reputation than by customer satisfaction [53]. Given this, it is quite natural to expect, as our analysis has shown, that reputational collapse will manifest itself as a decrease in loyalty in a surprisingly linear way.
9. A Procedure for a Structured Approach to Modern Robust Regression Analysis
9.1. Step 1: Variable Transformation
9.2. Step 2: Robust Variable Selection
9.3. Step 3: Monitoring of Scaled Residuals
9.4. Step 4: Outlier Detection and Removal of Outlying Observations
9.5. Step 5: Check Residuals on the Subset of Clean Units: Heteroskedasticity, Normality and Serial Correlation
10. Other Problems
11. Acknowledgements and Software
- The European Union NextGenerationEU/NRRP, Mission 4 Component 2 Investment 1.5, Call 3277 (30 December 2021), Award 0001052 (23 June 2022), under the project ECS00000033 “Ecosystem for Sustainable Transition in Emilia-Romagna”, Spoke 6 “Ecological Transition Based on HPC and Data Technology”.
- The University of Parma project “Robust statistical methods for the detection of frauds and anomalies in complex and heterogeneous data”.
- The Ministry of Education, University and Research project “Innovative statistical tools for the analysis of large and heterogeneous customs data” (2022LANNKC).
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tibshirani, R. Estimating transformations for regression via additivity and variance stabilization. J. Am. Stat. Assoc. 1988, 83, 394–405. [Google Scholar] [CrossRef]
- Riani, M.; Atkinson, A.C.; Corbellini, A. Robust transformations for multiple regression via additivity and variance stabilization. J. Comput. Graph. Stat. 2023, 33, 85–100. [Google Scholar] [CrossRef]
- Atkinson, A.C.; Riani, M.; Corbellini, A.; Perrotta, D.; Todorov, V. Robust Statistics Through the Monitoring Approach: Applications in Regression; Springer: Berlin/Heidelberg, Germany, 2025; In press. [Google Scholar]
- Student. The probable error of a mean. Biometrika 1908, 6, 1–25. [Google Scholar] [CrossRef]
- Fisher, R.A. Statistical Methods for Research Workers; Oliver and Boyd: Edinburgh, UK, 1925. [Google Scholar]
- Lehmann, E.L. Fisher, Neyman, and the Creation of Classical Statistics; Springer: New York, NY, USA, 2011. [Google Scholar]
- Draper, N.R.; Smith, H. Applied Regression Analysis; Wiley: New York, NY, USA, 1966. [Google Scholar]
- Anscombe, F.J. Examination of Residuals. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 1–36. [Google Scholar]
- Belsley, D.A.; Kuh, E.; Welsch, R.E. Regression Diagnostics; Wiley: New York, NY, USA, 1980. [Google Scholar]
- Cook, R.D.; Weisberg, S. Residuals and Influence in Regression; Chapman and Hall: London, UK, 1982. [Google Scholar]
- Atkinson, A.C. Plots, Transformations, and Regression; Oxford University Press: Oxford, UK, 1985. [Google Scholar]
- Cook, R.D.; Weisberg, S. Applied Regression Including Computing and Graphics; Wiley: New York, NY, USA, 1999. [Google Scholar]
- Andrews, D.F.; Bickel, P.J.; Hampel, F.R.; Tukey, W.J.; Huber, P.J. Robust Estimates of Location: Survey and Advances; Princeton University Press: Princeton, NJ, USA, 1972. [Google Scholar]
- Stigler, S.M. The Changing History of Robustness. Am. Stat. 2010, 64, 277–281. [Google Scholar] [CrossRef]
- Cerioli, A.; Farcomeni, A.; Riani, M. Wild adaptive trimming for robust estimation and cluster analysis. Scand. J. Stat. 2019, 46, 235–256. [Google Scholar] [CrossRef]
- Huber, P.J. Robust Statistics; Wiley: New York, NY, USA, 1981. [Google Scholar]
- Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics: Theory and Methods (with R), 2nd ed.; Wiley: Chichester, UK, 2019. [Google Scholar]
- Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; Wiley: New York, NY, USA, 1987. [Google Scholar]
- Tallis, G.M. Elliptical and Radial Truncation in Normal Samples. Ann. Math. Stat. 1963, 34, 940–944. [Google Scholar] [CrossRef]
- Atkinson, A.C.; Riani, M.; Cerioli, A. The Forward Search: Theory and data analysis (with discussion). J. Korean Stat. Soc. 2010, 39, 117–134. [Google Scholar] [CrossRef]
- Riani, M.; Cerioli, A.; Atkinson, A.C.; Perrotta, D. Monitoring Robust Regression. Electron. J. Stat. 2014, 8, 642–673. [Google Scholar] [CrossRef]
- Cerioli, A.; Riani, M.; Atkinson, A.C.; Corbellini, A. The power of monitoring: How to make the most of a contaminated multivariate sample (with discussion). Stat. Methods Appl. 2018, 27, 559–666. [Google Scholar] [CrossRef]
- Berman, B. Developing an effective customer loyalty program. Calif. Manag. Rev. 2006, 49, 123–148. [Google Scholar] [CrossRef]
- Mascarenhas, O.A.; Kesavan, R.; Bernacchi, M. Lasting customer loyalty: A total customer experience approach. J. Consum. Mark. 2006, 23, 397–405. [Google Scholar] [CrossRef]
- Atkinson, A.C.; Riani, M. Forward search added-variable t tests and the effect of masked outliers on model selection. Biometrika 2002, 89, 939–946. [Google Scholar] [CrossRef]
- Cox, D. Nonlinear models, residuals and transformations. Math. Operationsforsch. U Statist. 1977, 8, 3–22. [Google Scholar]
- Box, G.E.P.; Cox, D.R. An analysis of transformations (with discussion). J. R. Stat. Soc. Ser. B 1964, 26, 211–252. [Google Scholar] [CrossRef]
- Atkinson, A.C.; Riani, M.; Corbellini, A. The Box-Cox transformation: Review and extensions. Stat. Sci. 2021, 36, 239–255. [Google Scholar] [CrossRef]
- Carroll, R.J. Prediction and Power Transformations when the Choice of Power is Restricted to a Finite Set. J. Am. Stat. Assoc. 1982, 77, 908–915. [Google Scholar] [CrossRef]
- Atkinson, A.C. Testing transformations to normality. J. R. Stat. Soc. Ser. B 1973, 35, 473–479. [Google Scholar] [CrossRef]
- Atkinson, A.C.; Riani, M. Robust Diagnostic Regression Analysis; Springer: New York, NY, USA, 2000. [Google Scholar]
- Atkinson, A.C.; Riani, M. Tests in the fan plot for robust, diagnostic transformations in regression. Chemom. Intell. Lab. Syst. 2002, 60, 87–100. [Google Scholar] [CrossRef]
- Yeo, I.K.; Johnson, R.A. A new family of power transformations to improve normality or symmetry. Biometrika 2000, 87, 954–959. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Riani, M.; Atkinson, A.C.; Corbellini, A.; Farcomeni, A.; Laurini, F. Information Criteria for Outlier Detection Avoiding Arbitrary Significance Levels. Econom. Stat. 2022. [Google Scholar] [CrossRef]
- Riani, M.; Atkinson, A.C.; Corbellini, A. Automatic robust Box-Cox and extended Yeo-Johnson transformations in regression. Stat. Methods Appl. 2022, 32, 75–102. [Google Scholar] [CrossRef]
- Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models; Chapman and Hall: London, UK, 1990. [Google Scholar]
- Riani, M.; Atkinson, A.C.; Corbellini, A. Robust response transformations for generalized additive models via additivity and variance stabilisation. In Selected Papers of 13th Scientific Meeting of Classification and Data Analysis Group—CLADAG 2021; Grilli, L., Lupparelli, M., RampichinI, C., Rocco, E., Vichi, M., Eds.; Springer: Cham, Switzerland, 2023. [Google Scholar]
- Buja, A.; Hastie, T.; Tibshirani, R. Linear Smoothers and Additive Models. Ann. Stat. 1989, 17, 453–510. [Google Scholar] [CrossRef]
- Friedman, J.; Stuetzle, W. Smoothing of scatterplots. In Technical Report ORION 003; Technical Report; Department of Statistics, Stanford University: Stanford, CA, USA, 1982. [Google Scholar]
- Barlow, R.E.; Bartholomew, D.J.; Bremner, J.M.; Brunk, H.D. Statistical Inference under Order Restrictions; Wiley: Chichester, UK, 1972. [Google Scholar]
- Hastie, T.; Tibshirani, R. Generalized Additive Models. Stat. Sci. 1986, 1, 297–318. [Google Scholar] [CrossRef]
- Breiman, L. Comment on “Monotone regression splines in action” (Ramsey, 1988). Stat. Sci. 1988, 3, 442–445. [Google Scholar] [CrossRef]
- Bellini, S.; Cardinali, M.G.; Ziliani, C. Building customer loyalty in retailing: Not all levers are created equal. Int. Rev. Retail. Distrib. Consum. Res. 2011, 21, 461–481. [Google Scholar] [CrossRef]
- Lal, R.; Bell, D.E. The impact of frequent shopper programs in grocery retailing. Quant. Mark. Econ. 2003, 1, 179–202. [Google Scholar] [CrossRef]
- Yoo, J.; Park, M. The effects of e-mass customization on consumer perceived value, satisfaction, and loyalty toward luxury brands. J. Bus. Res. 2016, 69, 5775–5784. [Google Scholar] [CrossRef]
- Allender, W.J.; Richards, T.J. Brand loyalty and price promotion strategies: An empirical analysis. J. Retail. 2012, 88, 323–342. [Google Scholar] [CrossRef]
- Singh, J.J.; Iglesias, O.; Batista-Foguet, J.M. Does having an ethical brand matter? The influence of consumer perceived ethicality on trust, affect and loyalty. J. Bus. Ethics 2012, 111, 541–549. [Google Scholar] [CrossRef]
- Hameed, I.; Hyder, Z.; Imran, M.; Shafiq, K. Greenwash and green purchase behavior: An environmentally sustainable perspective. Environ. Dev. Sustain. 2021, 23, 13113–13134. [Google Scholar] [CrossRef]
- Agustin, C.; Singh, J. Curvilinear effects of consumer loyalty determinants in relational exchanges. J. Mark. Res. 2005, 42, 96–108. [Google Scholar] [CrossRef]
- Vlachos, P.A.; Vrechopoulos, A.P.; Pramatari, K. Too much of a good thing: Curvilinear effects in the evaluation of services and the mediating role of trust. J. Serv. Mark. 2011, 25, 440–450. [Google Scholar] [CrossRef]
- Sivadas, E.; Baker-Prewitt, J.L. An examination of the relationship between service quality, customer satisfaction, and store loyalty. Int. J. Retail. Distrib. Manag. 2000, 28, 73–82. [Google Scholar] [CrossRef]
- Selnes, F. An examination of the effect of product performance on brand reputation, satisfaction and loyalty. Eur. J. Mark. 1993, 27, 19–35. [Google Scholar] [CrossRef]
- Cox, D.R.; Donnelly, C.A. Principles of Applied Statistics; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Wolstenholme, D.E.; O’Brien, C.M.; Nelder, J.A. GLIMPSE: A knowledge-based front end for statistical analysis. Knowl.-Based Syst. 1988, 1, 173–178. [Google Scholar] [CrossRef]
- Riani, M.; Atkinson, A.C. Robust model selection with flexible trimming. Comput. Stat. Data Anal. 2010, 54, 3300–3312. [Google Scholar] [CrossRef]
- Mallows, C.L. Some comments on Cp. Technometrics 1973, 15, 661–675. [Google Scholar]
- Freue, G.V.C.; Kepplinger, D.; Salibian-Barrera, M.; Smucler, E. Robust elastic net estimators for variable selection and identification of proteomic biomarkers. Ann. Appl. Stat. 2019, 13, 2065–2090. [Google Scholar]
- Kepplinger, D. Robust variable selection and estimation via adaptive elastic net S-estimators for linear regression. Comput. Stat. Data Anal. 2023, 183, 107730. [Google Scholar] [CrossRef]
- Durbin, J.; Watson, G.S. Testing for Serial Correlation in Least Squares Regression: I. Biometrika 1950, 37, 409–428. [Google Scholar] [PubMed]
- Rubin, D.B. Neyman (1923) and Causal Inference in Experiments and Observational Studies. Stat. Sci. 1990, 5, 472–480. [Google Scholar] [CrossRef]
- Cox, D.R. Causality: Some Statistical Aspects. J. R. Stat. Soc. Ser. A 1992, 155, 291–301. [Google Scholar] [CrossRef]
- Bühlmann, P. Invariance, Causality and Robustness. Stat. Sci. 2020, 35, 404–426. [Google Scholar] [CrossRef]
- Torti, F.; Riani, M.; Morelli, G. Semiautomatic robust regression clustering of international trade data. Stat. Methods Appl. 2021, 30, 863–894. [Google Scholar] [CrossRef] [PubMed]
- Kukush, A.; Mandel, I. A validity test for a multivariate linear measurement error model. Model Assist. Stat. Appl. 2024, 19, 97–115. [Google Scholar] [CrossRef]
- Tukey, J. Causation, regression, and path analysis. In Statistics and Mathematics in Biology; Kempthorne, O., Ed.; Iowa State College Press: Ames, IA, USA, 1954; pp. 35–66. [Google Scholar]
Loyalty | Price | Quality | Community Outreach | Trust | Customer Satisfaction | Negative Publicity | |
---|---|---|---|---|---|---|---|
Loyalty | 1 | 0.727 | 0.713 | 0.182 | 0.755 | 0.524 | −0.45 |
Price | 0.727 | 1 | 0.686 | −0.117 | 0.838 | 0.28 | −0.193 |
Quality | 0.713 | 0.686 | 1 | 0.055 | 0.617 | 0.41 | −0.229 |
Community Outreach | 0.182 | −0.117 | 0.055 | 1 | 0.018 | 0.327 | −0.288 |
Trust | 0.755 | 0.838 | 0.617 | 0.018 | 1 | 0.384 | −0.337 |
Customer Satisfaction | 0.524 | 0.28 | 0.41 | 0.327 | 0.384 | 1 | −0.488 |
Negative Publicity | −0.45 | −0.193 | −0.229 | −0.288 | −0.337 | −0.488 | 1 |
Estimate | SE | tStat | p Value | |
---|---|---|---|---|
(Intercept) | −2.0312 | 0.18383 | −11.05 | 1.8399 × 10−27 |
Price | 0.32801 | 0.031277 | 10.487 | 5.5983 × 10−25 |
Quality | 2.5894 | 0.16721 | 15.486 | 1.043 × 10−50 |
Community Outreach | 0.75773 | 0.095933 | 7.8986 | 5.016 × 10−15 |
Customer Satisfaction | 0.99442 | 0.12458 | 7.9822 | 2.6177 × 10−15 |
Negative Publicity | −0.95476 | 0.089834 | −10.628 | 1.3692 × 10−25 |
Number of observations: 1711, error degrees of freedom: 1704 | ||||
Root Mean Squared Error: 0.578 | ||||
R-squared: 0.742, Adjusted R-Squared: 0.741 | ||||
F-statistic vs. constant model: 815, p-value = 0 |
Estimate | SE | tStat | p Value | |
---|---|---|---|---|
(Intercept) | −6.5067 × 10−16 | 0.010712 | −6.0742 × 10−14 | 1 |
Price | 0.79922 | 0.043762 | 18.263 | 3.8539 × 10−68 |
Quality | 1.0932 | 0.064689 | 16.9 | 2.5635 × 10−59 |
Community Outreach | 0.94503 | 0.086075 | 10.979 | 3.8876 × 10−27 |
Trust | 1.8194 | 0.12527 | 14.523 | 4.0187 × 10−45 |
Customer Satisfaction | 0.88925 | 0.11701 | 7.5998 | 4.8883 × 10−14 |
Negative Publicity | 0.9537 | 0.071283 | 13.379 | 7.1151 × 10−39 |
Number of observations: 1695, error degrees of freedom: 1688 | ||||
Root Mean Squared Error: 0.441 | ||||
R-squared: 0.806, Adjusted R-Squared: 0.806 | ||||
F-statistic vs. constant model: 1.17 × 103, p-value = 0 |
Description | Tools to Use | |
---|---|---|
STEP 1 | Variable transformation. | Parametric (fan plot + automatic procedure for finding best value of ). |
Non-parametric (RAVAS + automatic option selection). | ||
Output: Best value of transformation parameter for the response. Find observations influential for transformation. Find best transformation to work with. Compare parametric and non-parametric approach. | ||
STEP 2 | Robust variable selection. | Monitoring of added value t-statistics, candlestick plot or, if the number of variables is very large, robust LASSO. |
Output: Find the effect of influential subsets of units on t-statistics. Find a set of relevant explanatory variables. | ||
STEP 3 | Monitoring of scaled residuals. Analysis of FS, S and MM residuals. | Brushing in the monitoring plots to understand the position of outlying residuals and the eventual presence of subgroups of units. |
Output: Analysis of correctness of the model and detection of the optimal value of bdp or efficiency to use. | ||
STEP 4 | Outlier detection and removal of outlying observations. | Routines for automatic outlier detection. Analysis of the position of the outlying units in the yX plot. |
Output: Find a subset of clean units. | ||
STEP 5 | Check residuals on the subset of clean units: heteroskedasticity, normality and serial correlation. Comparison of parametric and non-parametric approach. Find whether the linear approach is reasonable. | QQ plots with envelopes, normality plots, autocorrelation tests. |
Output: If some test fails, go to step 2 and restart with another model (e.g., heteroskedastic approach). |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Riani, M.; Atkinson, A.C.; Morelli, G.; Corbellini, A. The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing. Stats 2025, 8, 6. https://doi.org/10.3390/stats8010006
Riani M, Atkinson AC, Morelli G, Corbellini A. The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing. Stats. 2025; 8(1):6. https://doi.org/10.3390/stats8010006
Chicago/Turabian StyleRiani, Marco, Anthony C. Atkinson, Gianluca Morelli, and Aldo Corbellini. 2025. "The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing" Stats 8, no. 1: 6. https://doi.org/10.3390/stats8010006
APA StyleRiani, M., Atkinson, A. C., Morelli, G., & Corbellini, A. (2025). The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing. Stats, 8(1), 6. https://doi.org/10.3390/stats8010006