Marketing Mix Modeling Using PLS-SEM, Bootstrapping the Model Coefficients

Méndez-Suárez, Mariano

doi:10.3390/math9151832

Open AccessArticle

Marketing Mix Modeling Using PLS-SEM, Bootstrapping the Model Coefficients

by

Mariano Méndez-Suárez

Department of Market Research and Quantitative Methods, ESIC Business & Marketing School, Pozuelo de Alarcón, 28223 Madrid, Spain

Mathematics 2021, 9(15), 1832; https://doi.org/10.3390/math9151832

Submission received: 8 July 2021 / Revised: 30 July 2021 / Accepted: 2 August 2021 / Published: 3 August 2021

(This article belongs to the Special Issue Partial Least Squares Structural Equation Modeling (PLS-SEM) Applications in Economics and Finance)

Download

Browse Figures

Versions Notes

Abstract

:

Partial least squares structural equations modeling (PLS-SEM) uses sampling bootstrapping to calculate the significance of the model parameter estimates (e.g., path coefficients and outer loadings). However, when data are time series, as in marketing mix modeling, sampling bootstrapping shows inconsistencies that arise because the series has an autocorrelation structure and contains seasonal events, such as Christmas or Black Friday, especially in multichannel retailing, making the significance analysis of the PLS-SEM model unreliable. The alternative proposed in this research uses maximum entropy bootstrapping (meboot), a technique specifically designed for time series, which maintains the autocorrelation structure and preserves the occurrence over time of seasonal events or structural changes that occurred in the original series in the bootstrapped series. The results showed that meboot had superior performance than sampling bootstrapping in terms of the coherence of the bootstrapped data and the quality of the significance analysis.

Keywords:

partial least squares structural equation modeling (PLS-SEM); PLS-SEM bootstrapping; PLS-SEM with time series; marketing mix modeling; maximum entropy bootstrapping

1. Introduction

Marketing mix models use multiple regression to measure marketing effectiveness and efficiency [1]. In the case of multichannel retailers that sell online and offline and advertise on both offline and Internet media, a common solution to the model marketing mix is chaining multiple regression models (based on conversations with consulting experts), i.e., modeling first the impact of advertising on online sales and then using this information to model offline sales. Recent research [2] proposed using partial least squares structural equation models (PLS-SEM) to measure the simultaneous impact of advertising in multichannel retailer contexts and to measure the effectiveness of the different advertising campaigns on web and store sales [3].

PLS-SEM has some desirable properties for marketing mix modeling because it is a causal modeling approach aimed at maximizing the explained variance of the dependent constructs, and because it is similar to multiple regression analysis, it is appropriate for prediction [4]. Moreover, and very relevant, PLS-SEM avoids the problem of indeterminacy and displays the factor scores [5], allowing the use of latent variable scores measured by one or several indicators in subsequent analyses [6]. Consequently, PLS-SEM is particularly useful for measuring the efficiency of marketing campaigns by attributing sales to each of the advertising channels and calculating marketing ROI [3].

However, because PLS-SEM does not assume normality, lack of extreme values, or symmetry in sample data [7], the parametric significance tests usually employed in linear models cannot be applied to test whether outer loadings and path coefficients are significant. Instead, PLS-SEM relies on a nonparametric sampling bootstrapping procedure [8] to test the significance of estimated coefficients. This bootstrapping methodology involves repeated random sampling with replacement from the original sample to create bootstrap samples. It is a good procedure for estimating sampling distributions under independent and identically distributed (i.i.d.) random variables [9], even in situations in which the i.i.d. setup is slightly violated [10], as with cases in which there might be changes in the mean or variance (i.e., the survey is conducted in different countries or with heterogenous respondents) [11,12].

Although sampling bootstrapping is a proper method to measure the significance of the coefficients in most PLS-SEM applications, it is not recommended for marketing mix time series because the data has internal structure and the sampling bootstrapping method can change the dates of events, such as Black Friday or Christmas, or introduce several additional events or none at all in a given year. It also does not respect the time intervals of the structural changes that the series may have.

As an alternative to sampling bootstrapping, we propose maximum entropy (meboot) bootstrapping [13], which maintains the individual basic shapes of time series and their time dependence structures as the autocorrelation function (ACF) and the partial autocorrelation function (PACF). Additionally, when applying meboot bootstrapping, the results inherit the structure while respecting the dates of special events such as Black Friday as well as the possible structural changes.

Despite its importance, little research has been done in the area of time series significance analysis using PLS-SEM models, especially with regard to marketing mix analysis. Furthermore, current research does not highlight the relevance and importance of the application of consistent bootstrap methodologies for solving these types of problems; this research makes important contributions by filling this void. For these reasons, the overall aim of this paper is to provide a detailed empirical demonstration of the advantages of the suggested meboot bootstrapping procedure in comparison with sampling bootstrapping to calculate the significance of PLS-SEM model parameter estimates in a time series or marketing mix modeling context. To this end, we based our analysis on standardized data from a European consumer electronics multichannel company [2] containing web and store sales and online and offline advertising activities.

Given this aim, the remainder of this paper is structured as follows. First, the theoretical foundations are explained. Then, the data used in this research is analyzed, and next, both bootstrapping methods are applied to finally discuss the results.

2. Theoretical Foundation

2.1. PLS-SEM

PLS-SEM is a technique appropriate for solving marketing mix problems even when very complex relationships exist [14] because the optimization algorithm maximizes the variance explained of the model’s endogenous constructs, making it especially appropriate to identify key variables in situations of weak theory [15] or verify whether the hypothesized relationships are empirically acceptable [16], for example, those involving marketing mix model variables. Regarding its statistical properties, PLS-SEM admits single item constructs without identification or convergence problems [17]; moreover, PLS-SEM models can handle extremely non-normal data with asymmetries and very high levels of skewness, for example, those corresponding to marketing events such as Black Friday. PLS-SEM is also appropriate for the typical small sample sizes of marketing mix models, such as in our case of 120 weekly observations corresponding to approximately 2.5 years of weekly data.

Earlier applications of PLS-SEM to solve marketing mix problems focused on better understanding the direct and cross effects of advertising on sales. Early research [18] studied the impact of the interaction of radio and print advertising in the opening of checking and savings accounts at a commercial bank, finding evidence of direct and cross effects between both media. More recent research [19] added Internet advertising variables to measure the impact of print advertising and paid search on a service company, finding a crossover effect on online conversions.

Recently, [2] PLS-SEM applied to marketing mix showed evidence of the amplifying effect of organic search queries on the advertising and, consequently, the sales of a multichannel retailer. Additionally, the PLS-SEM [3] model was used to calculate the ROI of offline and Internet advertising campaigns.

To verify the statistical significance of the PLS-SEM model parameters, the literature proposes using sampling bootstrapping; the next section discusses the reasons.

2.2. Sampling Bootstrapping

The term bootstrapping is inspired by the story of the Baron of Munchausen [20], who explained how he pulled himself and his horse out of a swamp by his own hair, meaning that the Baron saved himself by his own means. In this sense, the homonymous statistical technique developed by Efron [9] is similar because bootstrapping draws conclusions about the characteristics of a population using the sample itself; in other words, given the absence of information about the population, the sample is assumed to be the best estimate of the population [21], making this method very appropriate when, as is the case with PLS-SEM, there is no knowledge about the distribution of the parameters.

To find the empirical sampling distribution of a parameter, bootstrapping generates a number of samples with repetition (recommended: 5000) [4], containing the same amount of data as the original series to be sure that the samples obtained have the same statistical properties as the original sample, i.e., if the data contains 120 observations, as in the present research, 5000 samples with 120 observations are generated; in this way, each resample has the same number of elements as the original sample, and the replacement method transforms the finite sample into an infinite population. For each sample, a PLS-SEM model is calculated, and the data on the coefficients of interest are stored, creating a distribution of 5000 distinct coefficients, one for each of the path coefficients or outer loading models of interest. For example, when analyzing the loadings of the indicator λ, we will obtain 5000 values of the estimate λ*, these values are then ordered from smallest to largest:

λ_{(1)}^{*}, λ_{(2)}^{*}, \dots, λ_{(5000)}^{*}

(1)

Then, the lower and upper bounds of the confidence intervals are identified, i.e., if the desired confidence interval is 95%, the interval goes from the lower bound observation, 5000 × 0.025, to the upper 5000 × 0.975 observation, that is, from 125 observations to 4875. The resulting confidence interval (CI) suggests that the population value of λ

C I = [λ_{(125)}^{*}, λ_{(4875)}^{*}]

(2)

will be somewhere in between

λ_{(125)}^{*}

and

λ_{(4875)}^{*}

with a 95% probability. Once the confidence interval is calculated, if it does not include 0, we may consider that the coefficient is significant at 95%.

However, as stated previously, in many cases, because of the nature of the data, the distribution of the parameters is asymmetric and the percentile method is subject to coverage error as stated by [7], meaning that, for example, a 95% confidence interval may actually be a 90% confidence interval. Hence, it is recommended to construct bias-corrected percentile confidence intervals to make statistical inferences when using PLS-SEM. Using bias-corrected and accelerated (BCa) bootstrap confidence intervals solves this problem by adjusting for biases and skewness in the bootstrap distribution [22]; for a detailed step-by-step explanation of the methodology in a PLS-SEM context, see [23].

In the case of time series data as marketing mix model variables, this methodology has a major drawback because, by definition, resampling does not preserve the order of the data, the autocorrelation structure, or the exact time of marketing-associated events such as Black Friday. To solve these problems, the present research proposes the maximum entropy bootstrapping methodology for analyzing the significance of time series coefficients, which will be explained next.

2.3. Maximum Entropy Bootstrapping

Carlstein [24], aware that time series do not satisfy the i.i.d. hypothesis required by bootstrapping and the problems generated by breaking the internal structure of time series by shuffling the data, proposed a solution convenient for stationary time series consisting of bootstrapping nonoverlapping blocks of observations instead of case-by-case observations; on the basis of this idea, the methodology was improved with the proposal of nonoverlapping moving blocks [25,26]. However, even after these improvements, the methods faced the same problems with respect to violations of the required stationarity property and therefore did not provide any remedy.

As a solution to time series bootstrapping, Vinod and López-de-Lacalle [13] proposed the application of the principle of maximum entropy (ME), explained in depth by [27]. According to Vinod [28], ME is a powerful tool to avoid unnecessary distributional assumptions, such as i.i.d. or stationarity assumptions. ME constructs a population of time series, called ensemble Ω, which can include regime switches, gaps, or jump discontinuities. With f(x) being the density function of x_t, the entropy H (Equation (3)) is defined as:

H = E (- l o g f (x)),

(3)

Maximizing the entropy H in a density f(x) function, defined in terms of Shannon information [29], means that we are finding the smoothest possible probability distribution that meets the constraints derived from prior knowledge about the mean and variance of the original series. The meboot algorithm constructs segments of ME density f(x) subject to certain mass- and mean-preserving constraints.

The meboot algorithm [13] is a procedure that generates a large number of replicates, e.g., 5000, of the original series, which can be used for statistical inference; it then applies the “blocking” technique to break the time series into nonoverlapping blocks such that the grand mean of all the simulated samples equals the time average of the original, constructing bootstrap samples, or ensembles, that retain the basic shape and dependence structure of the original data. Figure 1 shows the actual series of web sales used in this research, explained in the next section, as well as two random ensembles generated with the meboot algorithm.

Moreover, the approach can be applied in the presence of structural breaks, such as economic crises or recoveries, as well as jumps due to Black Friday sales in which both offline and online sales may “jump” sharply above the mean. For more information on meboot, Vinod [30] provides extensive Monte Carlo evidence that supports the use of the meboot in empirical work and suggests that the meboot confidence intervals are reliable.

3. Materials and Methods

3.1. Data

To conduct the present research, we used data from Méndez-Suárez and Monfort [2], which contains a time series over 120 weeks from a European consumer electronics multichannel retailer, including information on investment in offline, Internet, and paid search advertising, as well as Google queries containing the name of the retailer and the online and offline sales. Table 1 depicts the descriptive statistics of the standardized values of the original data; some variables, such as online Sales, queries, and retargeting, show high levels of skewness and excess kurtosis.

3.2. Methods

To compare the results of sampling versus meboot bootstrapping, we used the PLS-SEM model from [2], depicted in Figure 2. The online and offline media in which the multichannel retailer advertised during the period are represented as two reflective latent constructs; the rest of the exogenous variables included in the structural model are single item constructs.

The latent variable online advertising included display, Facebook, Retargeting, Twitter, and YouTube, and the latent variable offline advertising contained store flyers and TV advertising (Equation (4)).

\begin{matrix} O n l i n e_{t} = D i s p l a y_{t} λ_{1} + F a c e b o o k_{t} λ_{2} + R e t a r g e t i n g_{t} λ_{3} + T w i t t e r_{t} λ_{4} + Y o u t u b e_{t} λ_{5} \\ O f f l i n e_{t} = S t o r e f l y e r_{t} λ_{6} + T V A d v e r t i s i n g_{t} λ_{7} \end{matrix},

(4)

The structural model contained four endogenous variables (Equation (5)), including queries, explained by online and offline web and store sales, both explained by on and offline advertising, paid search, and Christmas. Paid search was explained by queries.

\begin{matrix} Q u e r i e s_{t} = O n l i n e_{t} β_{1} + O f f l i n e_{t} β_{2} \\ W e b S a l e s_{t} = Q u e r i e s_{t} β_{3} + O n l i n e_{t} β_{4} + O f f l i n e_{t} β_{5} + P a i d S e a r c h_{t} β_{6} + C h r i s t m a s_{t} β_{7} \\ S t o r e S a l e s_{t} = Q u e r i e s_{t} β_{8} + O n l i n e_{t} β_{9} + O f f l i n e_{t} β_{10} + P a i d S e a r c h_{t} β_{11} + C h r i s t m a s_{t} β_{12} \\ P a i d S e a r c h_{t} = Q u e r i e s_{t} β_{13} \end{matrix},

(5)

The PLS-SEM model from Figure 2 was used to bootstrap the latent variable outer loadings and the path coefficients using sampling and meboot; the results are presented in the following section.

4. Empirical Results

To compare the results of sampling and meboot, we bootstrapped 5000 subsamples of the PLS-SEM model and calculated the bias-corrected and accelerated (BCa) confidence intervals [7]. Bootstrapping of the structural model employed the R [31] packages, plspm [32], and meboot [13]. The BCa confidence interval calculation in R followed that of Streukens and Leroi-Werelds [23]. The discriminant validity of the model, heterotrait–monotrait (HTMT) ratio of correlations, employed the R semTools package [33].

4.1. Correlations

The correlation of the original series and two random draws of the meboot and sample bootstrap are shown in Table 2a–c. The results showed similar correlations between the original and the bootstrapped variables; there were no significant differences to suggest that one method is better than the other or that one of the methods has major flaws and cannot be used to assess the significance of the results. Next, we analyze the results of the bootstrapped confidence intervals.

4.2. Reliability, Validity, Structural Model, and Fit Assessment

Following [7], to assess the reflective measurement model, we evaluated the composite convergent validity using the average variance explained (AVE), the internal consistency reliability with Cronbach’s α, and the discriminant validity using HTMT. The mathematical formulations are represented in Equation (6) (a–d), respectively.

\begin{matrix} (a) A V E ξ_{j} = \frac{1}{K_{j}} \sum_{k = 1}^{K_{j}} λ_{jk}^{2}; (b) C r o n b a c h ’ s α = \frac{N \cdot \bar{c}}{1 + (N - 1) \cdot \bar{c}}; (c) J ö r e s k o g ’ s ρ = \frac{{(\sum_{i = 1}^{N} l_{1})}^{2}}{{(\sum_{i = 1}^{N} l_{1})}^{2} + \sum_{i = 1}^{N} var (e_{i})} \\ (d) H T M T_{ij} = \frac{1}{K_{i} K_{j}} \sum_{g = 1}^{K_{i}} \sum_{k = 1}^{K_{j}} r_{ig, jh} \div {(\frac{2}{K_{i} (K_{i} - 1)} \cdot \sum_{g = 1}^{K_{i - 1}} \sum_{k = g + 1}^{K_{i}} r_{ig, ih} \cdot \frac{2}{K_{j} (K_{j} - 1)} \cdot \sum_{g = 1}^{K_{j - 1}} \sum_{k = g + 1}^{K_{j}} r_{jg, jh})}^{\frac{1}{2}} \end{matrix}

(6)

The AVE for construct ξ_j is defined as the average of the explained variances λ² of each reflective construct. In Cronbach’s α, N is the number of low-order components (i = 1, …, N), and

\bar{c}

is the average correlation between the lower-order components. In Jöreskog’s ρ, l_i is the loading of the lower-order component i on a particular higher-order construct, and var(e_i) is the variance of the measurement error of the lower-order component i. As explained by [34], the HTMT of constructs ξ_i and ξ_j with K_i and K_j indicators, respectively, are the averages of the correlations of indicators across constructs measuring different phenomena relative to the average of the correlations of indicators within the same construct.

Table 3 shows the BCa confidence intervals of the reflective measuring model assessment using both bootstrapping methodologies. For the external loadings of the latent variables (Table 3a), there was agreement between the two methods in terms of the significance of the loadings, but in this case, the width of the intervals is consistently larger when using sampling bootstrapping, which means that there is a much larger level of dispersion of the results when this methodology is used.

However, the problems become especially severe when assessing the reflective constructs (Table 3b) because of the width of the sampling bootstrap intervals, which in all cases is three times wider or more compared with the meboot intervals; consequently, the latent variables are not validated in terms of AVE, Cronbach’s Alpha, and Jöreskog’s ρ; the HTMT is validated but by hundredths of a percent.

The confidence intervals from the regression coefficients (Table 4a) had similar amplitudes and showed similar results with respect to significance in all the paths, except for the offline advertising path to web sales, for which the sampling bootstrap method indicated that offline advertising had a non-significant coefficient on web sales; in other words, offline advertising does not impact the sales of the web store.

As [35] stated, the different meaning of the term fit does not depend on whether covariance-based SEM or variance-based SEM is used but on whether confirmatory or explanatory research is performed (see [36]). Since in explanatory research, as in this case, we would like to explain as much variation as possible in a dependent variable, the R² is the natural measure of fit; however, as occurred in the assessment of the reflective construct outer loadings, the confidence intervals of the R² (Table 4b) of the sampling bootstrapped values were widespread and invalidated the model, contrary to the meboot values, which showed high levels of fit in line with the results of the model application shown in Figure 3.

To understand what really explains the differences between the bootstrapping methodologies, we need to visually inspect the entire time series. Figure 3 shows the original series, and two random paths of the sampling and meboot series both for online and offline sales. The sampling bootstrapped series added jumps to sales corresponding to events such as Christmas and Black Friday but at very different times from those occurring in the original series, and, for example, in the case of offline sales (Figure 3a), it included up to 10 jumps, only one of which corresponded to the date on which it occurred; however, at the times these events occur, the sampling bootstrapped series did not reflect them. On the other hand, in the meboot series, the jumps occurred at the same times as in the original series; however, as expected for the maximum entropy modeling, some replicas of the original series were more pronounced than others.

5. Discussion

PLS-SEM methodology using i.i.d. data has been very successful in areas such as marketing, strategic management, management information systems, production, and operations or accounting [37], and it is a promising methodology for time series, especially marketing mix modeling [2,3]. However, to succeed in these areas, the traditional method to measure the significance of the structural model and the outer loadings using sampling bootstrapping should be reconsidered because this method shuffles the data without considering their internal structure or respecting the order of the sequence, the autocorrelation structure, and the moments of occurrence of special events.

The present research presents a detailed analysis of the consequences of using sampling bootstrapping for time series, especially marketing mix series, showing the risks of the decision to trust in sampling bootstrap because the method destroys the internal structure of the series and shows wider confidence intervals for the outer loadings of the models. As a solution for these types of time-series analyses in PLS-SEM contexts, when the exact colocation of the bootstrapped data is essential, as in marketing mix analyses, this study recommends using meboot bootstrapping as an alternative and proves its suitability for time series or marketing mix modeling with PLS-SEM.

Additionally, this research contributes to the development of PLS-SEM methodology by providing a technique free of the risks associated with sampling bootstrapping in time series analysis, broadening the scope and accuracy of the methodology in other areas of research. Taken as a whole, the contributions of the present research provide valuable insights into how the evaluation of time series dependencies can be effectively performed using PLS-SEM analysis and why it is so relevant to apply a bootstrapping technique specifically adapted to time series and a technique that is compatible with their time structure to measure the significance of external loadings and path coefficients.

The managerial implications of this work are twofold: (1) practitioners have to be very careful when analyzing time series using PLS-SEM if the data is not i.i.d. and smooth in terms of shape because sampling bootstrapping shuffles the time series and destroys its integrity. In this respect, the present research shows that the use of sampling bootstrapping for time series involves very high risks, especially those associated with the assessment of the reflective and the path coefficient significance; this finding constitutes one of the main contributions of this article. The meboot bootstrapping procedure respects the internal structure of the data and maintains the colocation of special marketing events, making it a trustable technique for time series analysis.

The methodology proposed in the present research can be an excellent source of innovation for PLS-SEM methodology, extending its possible application to all areas that use time series analysis for explanatory purposes and are susceptible to the potential application of PLS-SEM predictive analysis, such as those related, for example, to quality control in industrial processes or the evolution of natural ecosystems.

Three limitations of the present research may become avenues for future research. The model in this study is limited to marketing mix series, and the proposed methodology has not been tested in other time series contexts in which PLS-SEM models may be used. Additionally, the proposed model and meboot bootstrap methodology were tested on a time series of only 120 observations and not on a series with a larger number of observations. In addition, since the model only uses reflective constructs, evaluation of time series models with formative constructs would complement the results of this research.

Funding

This research was funded by ESIC Business and Marketing School, grant number 1-M-2019.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The author declares no conflict of interest.

References

Méndez-Suárez, M.; Estevez, M. Calculation of marketing ROI in marketing mix models, from ROMI, to marketing-created value for shareholders, EVAM. Universia Bus. Rev. 2016, 52, 18–75. [Google Scholar]
Méndez-Suárez, M.; Monfort, A. The amplifying effect of branded queries on advertising in multi-channel retailing. J. Bus. Res. 2020, 112, 254–260. [Google Scholar] [CrossRef]
Méndez-Suárez, M.; Monfort, A. Marketing Attribution in Omnichannel Retailing in Springer Proceedings in Business and Economics; Springer: Cham, Switzerland, 2021; pp. 114–120. ISBN 9783030189105. [Google Scholar]
Hair, J.F.; Ringle, C.M.; Sarstedt, M. PLS-SEM: Indeed a Silver Bullet. J. Mark. Theory Pract. 2011, 19, 139–152. [Google Scholar] [CrossRef]
Fornell, C. A Second Generation of Multivariate Analysis: An Overview; Fornell, C., Ed.; Praeger: New York, NY, USA, 1982. [Google Scholar]
Henseler, J.; Ringle, C.M.; Sinkovics, R.R. The use of partial least squares path modeling in international marketing. Adv. Int. Mark. 2009, 20, 277–319. [Google Scholar]
Hair, J.F.J.; Hult, G.T.; Ringle, C.; Sarstedt, M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM), 2nd ed.; SAGE Publications: Thousand Oaks, CA, USA, 2017; ISBN 9781483377445. [Google Scholar]
Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Monographs on Statistics and Applied Probability; Chapman & Hall/CRC: New York, NY, USA, 1993; ISBN 978-0-412-04231-7. [Google Scholar]
Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat. 1979, 7, 1403–1433. [Google Scholar] [CrossRef]
Liu, R.Y. Bootstrap Procedures under some Non-I.I.D. Models. Ann. Stat. 1988, 16, 1696–1708. [Google Scholar] [CrossRef]
Richter, N.F.; Hauff, S.; Schlaegel, C.; Gudergan, S.; Ringle, C.M.; Gunkel, M. Using Cultural Archetypes in Cross-cultural Management Studies. J. Int. Manag. 2016, 22, 63–83. [Google Scholar] [CrossRef]
Benítez-Márquez, M.D.; Bermúdez-González, G.; Sánchez-Teba, E.M.; Cruz-Ruiz, E. Exploring the antecedents of cruisers’ destination loyalty: Cognitive destination image and cruisers’ satisfaction. Mathematics 2021, 9, 1218. [Google Scholar] [CrossRef]
Vinod, H.D.; López-de-Lacalle, J. Maximum Entropy Bootstrap for Time Series: The meboot R Package. J. Stat. Softw. 2009, 29, 1–29. [Google Scholar] [CrossRef] [Green Version]
Ramírez-Orellana, A.; Martínez, M.D.C.V.; Grasso, M. Using Higher-Order Constructs to Estimate Health-Disease Status: The Effect of Health System Performance and Sustainability. Mathematics 2021, 9, 1228. [Google Scholar] [CrossRef]
Wold, H. Partial Least Squares; Wiley: New York, NY, USA, 1985; ISBN 0471667196. [Google Scholar]
Chin, W.W. Issues and Opinion on Structural Equation Modeling. MIS Q. 1998, 22, 7–16. [Google Scholar]
Usakli, A.; Kucukergin, K.G. Using partial least squares structural equation modeling in hospitality and tourism. Int. J. Contemp. Hosp. Manag. 2018, 30, 3462–3512. [Google Scholar] [CrossRef]
Jagpal, H.S. Measuring joint advertising effects in multiproduct firms. J. Advert. Res. 1981, 21, 65–69. [Google Scholar]
Olbrich, R.; Schultz, C.D. Multichannel advertising: Does print advertising affect search engine advertising? Eur. J. Mark. 2014, 48, 1731–1756. [Google Scholar] [CrossRef]
Raspe, R.E. The Surprising Adventures of Baron Munchausen; Standard Ebooks: Nevada County, CA, USA, 1781. [Google Scholar]
Abdi, H.; Chin, W.W.; Vinzi, V.E.; Russolillo, G.; Trinchera, L. New Perspectives in Partial Least Squares and Related Methods. In Springer Proceedings in Mathematics and Statistics; Abdi, H., Chin, W.W., Esposito Vinzi, V., Russolillo, G., Trinchera, L., Eds.; Springer: New York, NY, USA, 2013; Volume 56, pp. 201–208. ISBN 978-1-4614-8282-6. [Google Scholar]
Efron, B. Better bootstrap confidence intervals. J. Am. Stat. Assoc. 1987, 82, 171–185. [Google Scholar] [CrossRef]
Streukens, S.; Leroi-Werelds, S. Bootstrapping and PLS-SEM: A step-by-step guide to get more out of your bootstrap results. Eur. Manag. J. 2016, 34, 618–632. [Google Scholar] [CrossRef]
Carlstein, E. The Use of Subseries Values for Estimating the Variance of a General Statistic from a Stationary Sequence. Ann. Stat. 1986, 14, 1171–1179. [Google Scholar] [CrossRef]
Kunsch, H.R. The Jackknife and the Bootstrap for General Stationary Observations. Ann. Stat. 1989, 17, 1217–1241. [Google Scholar] [CrossRef]
Liu, R.Y.; Singh, K. Moving blocks jackknife and bootstrap capture weak dependence. Explor. Limits Bootstrap 1992, 225, 248. [Google Scholar]
Baldwin, R.A. Use of Maximum Entropy Modeling in Wildlife Research. Entropy 2009, 11, 854–866. [Google Scholar] [CrossRef]
Vinod, H.D. Maximum Entropy Bootstrap Algorithm Enhancements; Discussion Paper Series; Fordham University: Bronx, NY, USA, 2013; Volume 2013-04. [Google Scholar]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Vinod, H. New bootstrap inference for spurious regression problems. J. Appl. Stat. 2016, 43, 317–335. [Google Scholar] [CrossRef]
R Core Team R. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
Sanchez, G. PLS Path Modeling with R; Trowchez Editions: Berkeley, CA, USA, 2013. [Google Scholar]
Jorgensen, T.D.; Pornprasertmanit, S.; Schoemann, A.M.; Rosseel, Y. semTools: Useful Tools for Structural Equation Modeling, Version 0.5-5; R Packag. 2021. Available online: https://cran.r-project.org/web/packages/semTools/semTools.pdf (accessed on 1 August 2021).
Henseler, J.; Ringle, C.M.; Sarstedt, M. A new criterion for assessing discriminant validity in variance-based structural equation modeling. J. Acad. Mark. Sci. 2015, 43, 115–135. [Google Scholar] [CrossRef] [Green Version]
Henseler, J.; Hubona, G.; Ray, P.A. Using PLS path modeling in new technology research: Updated guidelines. Ind. Manag. Data Syst. 2016, 116, 2–20. [Google Scholar] [CrossRef]
Henseler, J. Partial least squares path modeling: Quo vadis? Qual. Quant. 2018, 52, 1–8. [Google Scholar] [CrossRef] [Green Version]
Hair, J.F.; Sarstedt, M.; Hopkins, L.; Kuppelwieser, V.G. Partial least squares structural equation modeling (PLS-SEM): An emerging tool in business research. Eur. Bus. Rev. 2014, 26, 106–121. [Google Scholar] [CrossRef]

Figure 1. Plot of the standardized EUR series of web sales data used in this research, explained in the next section, and two random ensembles.

Figure 2. The PLS-SEM model used to illustrate the sample and meboot bootstrapping results comparison. Figure adapted with permission; the article was published in Journal of Business Research, 112, Méndez-Suárez, M.; Monfort, A. The amplifying effect of branded queries on advertising in multichannel retailing, 254–260, Copyright Elsevier (2020).

Figure 3. (a,b) plot the original weekly Sales. Offline sales (a) and online sales (b) series and their respective sampling and meboot counterparts. The horizontal axis represents time in weeks and the vertical axis represents the standard deviation of the standardized sales series.

Table 1. Descriptive statistics of the data.

Variables	Median	Min	Max	Skewness	Kurtosis
Online Sales	−0.2	−0.6	9.0	6.8	54.5
Offline Sales	−0.3	−0.7	5.5	3.4	12.4
Queries	−0.3	−0.8	6.7	4.7	26.7
Paid Search	−0.1	−1.8	5.3	1.5	6.3
Store flyer	0.2	−1.1	2.4	0.2	−1.3
TV advertising	0.1	−1.4	4.6	1.1	3.7
Display	0.0	−1.3	3.5	1.3	2.4
Facebook	−0.2	−1.5	3.7	1.0	1.2
Retargeting	0.0	−1.1	7.6	3.7	25.0
Twitter	0.0	−1.2	5.0	1.8	5.8
YouTube	−0.2	−0.9	3.7	1.3	2.0
Christmas	−0.2	−0.2	5.4	5.1	24.6

Note: Data represent standardized EUR with a mean of 0 and standard deviation of 1. Christmas is a dummy binary variable representing Christmas Eve and Epiphany.

Table 2. (a) Correlation coefficients of the time series, (b) correlation coefficients of one randomly selected series from meboot, and (c) correlation coefficients of one randomly selected series from sampling bootstrap.

(a) Correlations of Original Series
	1	2	3	4	5	6	7	8	9	10	11	12
1 Online Sales	100
2 Offline Sales	76	100
3 Queries	92	75	100
4 Paid Search	69	68	53	100
5 Store Flyers	33	30	38	19	100
6 TV Advertising	23	6	24	6	46	100
7 Display	32	7	38	12	42	66	100
8 Facebook	34	20	27	28	28	42	48	100
9 Retargeting	57	52	44	64	11	8	9	28	100
10 Twitter	32	3	23	19	17	58	64	51	12	100
11 YouTube	36	15	26	26	29	47	46	71	34	52	100
12 Christmas	32	57	29	34	6	2	−1	12	39	−4	12	100
(b) Correlation of one random series, meboot bootstrapping
	1	2	3	4	5	6	7	8	9	10	11	12
1 Online Sales	100
2 Offline Sales	81	100
3 Queries	80	86	100
4 Paid Search	72	74	60	100
5 Store Flyers	35	37	45	26	100
6 TV Advertising	10	11	22	8	41	100
7 Display	14	28	40	14	38	67	100
8 Facebook	29	32	28	29	29	42	41	100
9 Retargeting	46	40	37	67	13	12	12	30	100
10 Twitter	8	28	20	20	17	55	62	53	12	100
11 YouTube	25	27	24	26	30	43	38	70	35	53	100
12 Christmas	56	18	22	31	4	1	−2	13	32	−4	14	100
(c) Correlation of one random series, sampling bootstrapping
	1	2	3	4	5	6	7	8	9	10	11	12
1 Online Sales	100
2 Offline Sales	90	100
3 Queries	93	91	100
4 Paid Search	59	65	54	100
5 Store Flyers	39	33	40	18	100
6 TV Advertising	12	4	20	2	42	100
7 Display	41	33	46	23	47	63	100
8 Facebook	31	24	26	31	32	43	53	100
9 Retargeting	50	55	54	74	22	11	33	39	100
10 Twitter	44	23	30	33	28	54	61	59	30	100
11 YouTube	23	11	20	20	32	56	55	71	32	69	100
12 Christmas	5	36	7	34	−15	−19	−7	−4	24	−15	−11	100

Note: Data values are percentages. Bootstrapped.

Table 3. Assessment of the reflective measurement model latent variables by meboot and sampling bootstrapping. (a) Convergent validity of the outer model. (b) Reliability of the outer model. (c) Discriminant validity.

(a) Outer Loading Convergent Validity Bootstrap Results
Indicators	Loadings	95% BCa CI Meboot	CI Amplitude	>0.5?	95% BCa CI Sampling	CI Amplitude	>0.5?
Store flyer	0.93	(0.87, 0.93)	0.10	Yes	(0.75, 0.97)	0.22	Yes
TV advertising	0.75	(0.64, 0.83)	0.14	Yes	(0.58, 0.87)	0.30	Yes
Display	0.65	(0.65, 0.75)	0.09	Yes	(0.24, 0.81)	0.57	No
Facebook	0.78	(0.71, 0.80)	0.11	Yes	(0.53, 0.87)	0.34	Yes
Retargeting	0.66	(0.64, 0.79)	0.15	Yes	(0.50, 0.88)	0.38	Yes
Twitter	0.67	(0.65, 0.76)	0.05	Yes	(0.24, 0.86)	0.62	No
YouTube	0.80	(0.67, 0.82)	0.19	Yes	(0.63, 0.88)	0.25	Yes
Latent Variables	AVE	95% BCa CI Meboot	CI Amplitude	>0.5?	95% BCa CI Sampling	CI Amplitude	>0.5?
Online ad	0.51	(0.51, 0.55)	0.04	Yes	(0.35, 0.62)	0.27	No
Offline ad	0.72	(0.66, 0.76)	0.11	Yes	(0.63, 0.8)	0.17	Yes
(b) Latent Variables Internal Consistency Reliability Bootstrap Results
Latent Variables	Cronbach’s Alpha	95% BCa CI Meboot	CI Amplitude	0.60–0.90?	95% BCa CI Sampling	CI Amplitude	0.60–0.90?
Online ad	0.78	(0.77, 0.8)	0.03	Yes	(0.70, 0.84)	0.14	Yes
Offline ad	0.63	(0.6, 0.71)	0.11	Yes	(0.44, 0.77)	0.34	No
Latent Variables	Jöreskog’s ρ	95% BCa CI Meboot	CI Amplitude	>0.7?	95% BCa CI Sampling	CI Amplitude	>0.7?
Online ad	0.85	(0.85, 0.87)	0.02	Yes	(0.45, 1)	0.55	No
Offline ad	0.84	(0.82, 0.87)	0.05	Yes	(0.50, 1.39)	0.89	No
(c) Latent Variables Discriminant Validity Bootstrap Results
Latent Variables	HTMT	95% BCa CI Meboot	CI Amplitude	CI < 1?	95% BCa CI Sampling	CI Amplitude	CI < 1?
Online ad & Offline ad	0.80	(0.73, 0.89)	0.16	Yes	(0.61, 0.99)	0.37	Yes

Note: As per Hair et al. [7], bootstrapped coefficients are corrected and accelerated (BCa).

Table 4. Evaluation of the structural model. (a) The model’s regression coefficients and their significance based on meboot and sampling bootstrapping. (b) The model’s predictive accuracy based on meboot and sampling bootstrapping.

(a) Regression Coefficients Bootstrap Results
Endogenous Variables	Exogenous Variables	Path Coefficient	95% BCa CI Meboot	CI Amplitude	Significance (p < 0.05)?	95% BCa CI Sampling	CI Amplitude	Significance (p < 0.05)?
Web sales	Online ad	0.14	(0.07, 0.32)	0.25	Yes	(0.05, 0.24)	0.19	Yes
Web sales	Offline ad	−0.05	(−0.14, −0.002)	0.14	Yes	(−0.12, 0.02)	0.14	No
Web sales	Queries	0.75	(0.58, 0.89)	0.31	Yes	(0.62, 0.85)	0.23	Yes
Web sales	Paid Search	0.24	(0.06, 0.32)	0.26	Yes	(0.15, 0.39)	0.23	Yes
Web sales	Christmas	−0.15	(−0.15, 0.13)	0.29	No	(−0.14, 0.12)	0.26	No
Store sales	Online ad	−0.18	(−0.24, −0.04)	0.19	Yes	(−0.36, −0.01)	0.35	Yes
Store sales	Offline ad	0.05	(−0.02, 0.12)	0.14	No	(−0.14, 0.16)	0.30	No
Store sales	Queries	0.52	(0.27, 0.65)	0.38	Yes	(0.42, 0.75)	0.34	Yes
Store sales	Paid Search	0.39	(0.31, 0.52)	0.21	Yes	(0.12, 0.59)	0.46	Yes
Store sales	Christmas	0.32	(0.23, 0.44)	0.21	Yes	(0.08, 0.53)	0.45	Yes
Queries	Online ad	0.38	(0.27, 0.47)	0.20	Yes	(0.12, 0.59)	0.47	Yes
Queries	Offline ad	0.20	(0.13, 0.3)	0.17	Yes	(0.08, 0.35)	0.27	Yes
Paid Search	Queries	0.54	(0.36, 0.62)	0.26	Yes	(0.36, 0.66)	0.30	Yes
(b) Predictive accuracy of the structural model evaluated with the magnitude of the explained variance, R²
Endogenous Variables	R²	95% BCa CI Meboot		CI Amplitude		95% BCa CI Sampling	CI Amplitude
Queries	0.26	(0.17, 0.3)		0.13		(0.06, 0.36)	0.30
Paid Search	0.29	(0.13, 0.39)		0.25		(0.11, 0.42)	0.31
Store sales	0.79	(0.77, 0.94)		0.17		(0.68, 0.93)	0.25
Web sales	0.92	(0.91, 0.95)		0.04		(0.92, 0.97)	0.04

Note: As per Hair et al. [7], bootstrapped coefficients are corrected and accelerated (BCa).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Méndez-Suárez, M. Marketing Mix Modeling Using PLS-SEM, Bootstrapping the Model Coefficients. Mathematics 2021, 9, 1832. https://doi.org/10.3390/math9151832

AMA Style

Méndez-Suárez M. Marketing Mix Modeling Using PLS-SEM, Bootstrapping the Model Coefficients. Mathematics. 2021; 9(15):1832. https://doi.org/10.3390/math9151832

Chicago/Turabian Style

Méndez-Suárez, Mariano. 2021. "Marketing Mix Modeling Using PLS-SEM, Bootstrapping the Model Coefficients" Mathematics 9, no. 15: 1832. https://doi.org/10.3390/math9151832

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Marketing Mix Modeling Using PLS-SEM, Bootstrapping the Model Coefficients

Abstract

1. Introduction

2. Theoretical Foundation

2.1. PLS-SEM

2.2. Sampling Bootstrapping

2.3. Maximum Entropy Bootstrapping

3. Materials and Methods

3.1. Data

3.2. Methods

4. Empirical Results

4.1. Correlations

4.2. Reliability, Validity, Structural Model, and Fit Assessment

5. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI