**4. Results**

As can be seen in Table 2, most of the variables in every factor resulted in sufficient loadings. This also indicates a sufficient extent of relevance of variables in explaining the construct of each of the proposed factors. However, there were six variables that resulted in factor loadings of less than 0.500. They are depicted in the table with a strikethrough line. We decided to exclude them from the regression analysis and their involvement as independent variables in the impact that they have on the dependent variable, namely, the organic search engine visit percent. Regarding the percent of variance explained in each factor, most of them expressed considerable variability, reaching up to 55%.


**Table 2.** Descriptives and internal validity and consistency of the examined factors and their variables.



**Table 2.** *Cont*.

The underlines depict the factor and its items, the italics depict the sub-factors, and the strikethrough lines depict each dropped variable.

Linear regression analysis returned significant indications (Table 3). Prior preprocessing and analysis was conducted in order to exclude outliers that would possibly influence the outcomes of the prediction. We also note that no changes were observed when we used a hierarchical technique. From the results of the test, all the produced models have clear statistical significance with *p* values less than 0.05. Nevertheless, there is a marginal exception in the factor of Website Loading Speed with a *p* value of 0.061, greater than 0.05. In terms of R<sup>2</sup> values, the results depict alignment with prior studies stating that ranking algorithms involve a massive number of secret variables in the SEO context [20,48]. In fact, Size of Websites explained up to 30.06% of the variability of the response data around its mean; SEO Crawling, up to 17.7%; Website Loading Speed, up to 10.01%; Website Security Condition, up to 17.9%; and User Behavior, up to 29.5%.

**Table 3.** Regression analysis output.


*N* = 171. \* and \*\* indicate statistical significance at the 95% and 99% levels, respectively.

Indeed, the more numerous the variables, the higher the model fit and, consequently, the R2 values [59]. In other words, we defined some variables in each factor; however, all of them require further research to investigate even more variables that play a crucial role in the increase of the organic search engine traffic percent.

In terms of the first hypothesis, a significant regression equation was found with *p* = 0.012 and R2 of 0.306. This means that the mean value of the organic search traffic percent is increased by 1.06% for every percentage point increase in each variable of the Size of Websites factor. For the second hypothesis, a significant regression equation was observed with *p* = 0.021 and R2 of 0.177. This means that the mean value of the organic search traffic percent is increased by 2.14% for every percentage point increase in the percent of the SEO Crawling factor.

For the third hypothesis, a slightly nonsignificant regression equation was found with a marginal value of *p* = 0.061 and R<sup>2</sup> of 0.101. Even by bootstrapping of 1000 additional re-samples, there was no significant change in the *p* value, R2, or coefficients. Therefore, the mean value of the organic search traffic percent is increased by 1.11% for every percentage point increase in the Website Loading Speed factor. For the fourth hypothesis, Website Security Condition provided a significant regression equation with *p* = 0.019 and R<sup>2</sup> of 0.189. As a result, the mean value of the organic search traffic percentage is increased by 1.36% for every unit increase in the Website Security Condition factor.

For the last hypothesis, User Behavior in the examined cultural websites seems to have the highest impact on the increase in organic search engine traffic percent. More specifically, a significant regression equation was observed with *p* = 0.000 and R<sup>2</sup> of 0.295. Thus, the mean value of the organic search traffic percent is increased by 3.14% for every percentage point increase in each variable of the User Behavior factor. This constitutes an important research result. Prior research implied that search engines encapsulate user behavior in order to rank websites in their search engine result pages [20,48]. And indeed, the higher the ranking position of a website, the higher the organic search engine traffic percentage that they receive [18,19,60].

#### *Agent-Based Model Development*

The extracted outcomes of the regression statistics resulted in significant implications that could be incorporated into a predictive data-driven agent-based model. The purpose of ABM is to compute and represent each case individually at a micro-level view while taking into consideration temporal changes [53–55]. This is not possible through the aggregated macro-level approach of Fuzzy Cognitive Mapping.

For instance, in ABM, decision-makers are able to estimate the impact of each change in SEO performance and user behavior individually for each website. This advantage provides precise results regarding the impact that each SEO factor has for every website and also about the percentage variance of organic search traffic that the website receives. As all cultural websites differ in their content and how compatible they are with the SEO factors, it is possible to generate different data analytics, both technical and behavioral. This means that managerial staff need more or less time to rectify SEO issues and, therefore, to improve user behavior and enhance organic search traffic. In this case, ABM as a predictive modeling and simulation method gives risk-free flexibility to decision-makers. They are able to themselves define the time needed to keep up with the SEO compatibility framework and, thereafter, to improve their organic search traffic. This approach combines both managers' domain knowledge, as each cultural organization differs in its operations, and the practical insights of the data analytics results.

The first main goal of the proposed computational model is verification that the proposed methods and results can be used to optimize the organic search engine traffic percentage. The second goal is decrease of the bounce rate level as a negative indicator of the overall user behavior in the examined cultural heritage websites. In the next figure (Figure 8), we present the developed ABM, its entities, and its conditions. For model development, AnyLogic ver. 8.5.2 software was used in JAVA source code in order to compute agent behavior. We defined a specific time range of 90 days in order to predict and simulate the percentage increase of organic search engine traffic. No additional days were included as there was no other crucial percentage variance in the organic search engine and bounce rates.

The model starts in its initial stage with the possibility of entrance and visit inside the cultural heritage institution websites, an initial point that depicts fundamental aspects of the agent-based development process [54–56]. This is indicated in the first statechart, entitled "Potential Search Engine Users". The transition of users (as agents) among statecharts is computed based on the prior descriptive statistics of the study, such as min, max, mean, and mode, and the outcomes of the regression analysis. The impact level that users receive from the Size of Websites, the Website Loading Speed, and the Website Security is defined by the conditions of the Size of Websites Impact, Website Loading Speed Impact, and Website Security Impact. These three major factors and their defined conditions impact drastically on user behavior inside cultural heritage websites. However, as the regression results indicated, the Website Loading Speed factor does not impact the organic traffic percentage, so there is no kind of transition between the two statecharts. At the same time, the Size of Websites, Website

Security, and SEO Crawling factors are depicted as statecharts that impact both User Behavior and the final goal, which is positive influence of the Organic Search Engine Traffic Percentage.

**Figure 8.** A predictive agent-based model for optimization of the organic search engine visit percentage and decrease of the bounce rate.

This kind of impact is computed through Poisson distributions, while setting up the results of the regression as lambda values (λ). The Poisson distribution was selected because our sample


Moreover, we define the consequence of low interaction rate and dissatisfied user behavior resulting in immediate abandonments via the bounce rate metric, which is illustrated through the Bounce Rate statechart and computed through the Percent of Bounce Rate condition. That is, users enter the websites, but they find insufficient content (Size of Websites), low speed of navigation (Website Loading Speed), and insufficient security in their exploration, so they leave the websites almost immediately upon their visit. In Figure 9 we present the outcomes of the predictive agent-based model.

The graph in Figure 9 represents the potential scenario of improving each of the examined factors based on the outcomes of the regression and their impact on the organic search engine traffic percentage and bounce rate level. Indeed, the model, after the initial days of the run, shows an improvement in organic search engine traffic. At the same time, a decrease and steadiness is observed in the bounce rate level without any significant sign of increase. Furthermore, it is noted that the Organic Search Engine Visit Percent does not show any further optimization after Day 50, following a straight line without any kind of fluctuation or change. This happens for two reasons. First the examined cultural websites are able to receive the rest of their visitors from paid advertising, social networks, direct traffic, email marketing campaigns, and other external websites. Secondly, it is possible for the managerial staff to cover and rectify all the SEO technical compatibility factors that are able to optimize the organic search traffic up to ~75%, as depicted in Figure 9. Therefore, new data analytics and regression results are needed in order to provide feedback, update the predictive model, and determine the potential extent of the percentage increase in organic search engine traffic.

**Figure 9.** Optimization of the organic search engine traffic percentage in a time range of 90 sequential days. The horizontal axis demonstrates the specific time range from 0 up to 90 days of the simulation run. The vertical axis depicts the percentage of organic search engine traffic.

#### **5. Discussion and Future Implications**

The optimization of visibility in cultural heritage websites improves the knowledge that stakeholders receive from them. More specifically, in an open and democratized way, CHI websites increase people's interest in the past and allow them to recognize how their surrounding societal context has changed over time. In this respect, the SEO strategy must be set within the prism of the overall mission of cultural heritage institutions, rather than assigning these functionalities unilaterally to technical staff. This will enhance the overall importance of SEO strategies from the support personnel to the upper management levels.

In this paper, we proposed a novel methodology that quantifies in a manageable way the impact of several factors on the organic search engine traffic percentage in CHI websites with the purpose of increasing visibility and findability. One step further, this methodology offers to the administrators of CHI websites an opportunity to convert data analytics about SEO performance into useful insights and actions for potential optimization of their visibility on the Web. Otherwise, a big data analytics framework without evaluation, analysis, interpretation, and suggestion for further improvement is completely useless [61]. Based on that, we believe that this research establishes a new SEO context of communication, as more and more big data analytics can be retrieved and interpreted, while focusing on critical factors and omitting less relevant ones in organic traffic optimization. In this respect, this methodology provides new opportunities both for managers in cultural institutions and for research on this topic.

#### *5.1. Managerial Implications for Cultural Heritage Institutions*

#### 5.1.1. Implications for Optimized Website Performance and Visibility

The proposed model shows validity, reliability, and cohesion as regards the variables and factors that it contains for evaluation and, hence, optimization of organic search engine traffic. This works as a solid stepping stone for managers to adopt this methodology, evaluating the importance of each factor and focusing precisely on each one for further improvement. However, although the results of the obtained behavioral data analytics demonstrated that CHI websites receive on average up to 62.76% of their total traffic solely from search engines (Table 2), the bounce rate level was observed to reach up to 55%. This means that more than one in two visitors immediately abandon the websites after their visit.

Based on that, we suggest to marketing managers and administrators of cultural websites to focus first on usability improvement with the purpose to improve user interaction and behavior. The factor of SEO Crawling includes variables that might have crucial impacts for enhanced usability and user experience in CHI websites. The avoidance of thin or duplicated content, removal of broken links that confuse users, and proper mobile friendliness are some of these user-centered variables. Moreover, this factor includes variables that have a systemic impact on increasing search engine friendliness to crawlers. The appropriate curation of headings, titles, meta-descriptions, robots.txt, and sitemap files are some of the points that managers should focus on with the purpose to develop favorable conditions for the indexing process by search engines.

In this research, regression results show that user behavior has the highest impact among the factors that affect the percentage increase of organic search engine traffic (Table 3). Indeed, User Behavior can increase by up to 3.14% the total percentage of organic search traffic in the examined websites. Nevertheless, if the administrators do not pay greater attention to aligning their efforts firstly with the optimization of website usability, then the behavior of users and their experience will negatively affect the percentage of search engine traffic. Therefore, the managerial staff of cultural institutions must not focus only on SEO strategies that aim for ranking optimization; it is more important to improve usability for better engagement between users and content [60,62]. This will positively increase behavior and, thereafter, provide higher organic traffic percentages.

#### 5.1.2. Utility of the Methodology

The proposed methodology not only supports managerial staff to seek an aggregated evaluation of a CHI website, it is also a flexible approach to focus on the individual performance of specific collections contained in unique webpages which unfortunately suffer from low visibility and findability on the Web. For instance, administrators could evaluate the SEO performance of specific webpages of cultural content while manipulating in a more efficient way the process of optimization in specific *parts* rather than the *whole*. This approach covers a rigorous and challenging task for cultural heritage institutions, as they have to deal with the large size of their content. Indeed, the larger the size of a web-based system, the more complex its manipulations and rectifications [12,19,45,54].

In addition, the outcomes of this methodology provide practical and educational implications for cultural institutions to avoid big data frameworks that rely more on data storage and not enough on their analysis. As the reliability of the gathered data constitutes a core value for the quality of a well-informed decision-making process [61], website analysts of cultural heritage websites should focus more on big data metrics systems that fit the following:


Therefore, the proposed methodology offers the flexibility to tackle other problematic issues in the online presence of cultural heritage institutions, such as the proper utilization of analytics for social media optimization or cost-effective online paid advertising campaigns. That is, we proceed to the careful delimitation of KPIs; gather, validate, and examine the data analytics that align with the KPIs; and then develop data-driven predictive models for optimization.

#### 5.1.3. Optimized Financing Resource Management

In the EU context, the Eurostat report of 2018 [64] depicted a low percentage of expenditures in cultural services ranging from 0.4% up to 0.8% of the GDP. Bearing in mind the reduced financial flexibility of cultural institutions and their limited available resources for the management of the cultural material that they contain, the process of search engine optimization could be a cost-effective marketing strategy.

In contrast with other digital marketing strategies that increase website visitor numbers but provide poor content curation and usability, SEO constitutes a sustainable digital marketing strategy that focuses on one of the most fundamental aspects of digital marketing: the effectiveness of the landing page. If users *land* in webpages that express usability and proper curation of content, then their experience will be better, making any kind of marketing communication strategy more effective. This constitutes a promising approach to reduce paid advertising strategies that do not eventually return the investment, due to the minimal interaction of users after visiting websites.

#### *5.2. Research Implications*

The dimension reduction results through principal component analysis indicated that most of these variables are suitable for providing reliable evaluations of website performance and the impact that they have on user behavior and organic search engine traffic. Notwithstanding, even if we include more than 50 variables that impact the organic search engine traffic optimization, the regression R-square values indicate the depth of search engine ranking algorithms and the multidimensionality of the variables they involve. Following the above findings, we have started further research efforts in order to explore and include more variables or factors that probably influence rankings and, hence, percentage variance in organic search engine visits. Based on this assumption, as big data mining and analytics techniques are getting more and more sophisticated and impact organizations' decision-making process in terms of marketing and promotion strategies [62], the research topic of SEO reliance in big data analytics will be discussed in a detailed manner in the future.

Regarding the predictive agent-based model development, from the initial research approaches as a computational method to describe the complexity of a system and its entities, to recent characterizations, it is referred to more as an art than as a science [56,58,65]. However, as big data analytics expand the opportunities for integrating more and more data into simulation models, the *art* is sidelined. Therefore, new research approaches are developed to overcome a lack of data and combine prior domain knowledge and analytics for logical well-informed and data-driven predictive models.

In this paper, we developed the ABM as a supportive tool that provides feedback to the managers of cultural heritage institutions regarding the impact of several factors on user behavior and organic search engine traffic percent. An abstraction level that describes the impact of each factor was developed. Nevertheless, predictive models are mostly stable in abstraction levels but unstable with larger perturbations when more conditions and entities are included. Therefore, further research is needed to evaluate predictive model efficiency through ABM when expanding the level of abstraction or integrating system dynamics approaches [66].

Furthermore, the results of the study emphasize the necessity to redefine the SEO topic. Apparently, the higher the compatibility of the SEO factors, the higher the rankings and search engine visit percentage. However, the main aim of search engines is confirmation that they provide the most qualitative content, in the highest volume, in the fastest time to their users, according to their search terms [67]. In this respect, Web developers and content creators should have practical quantified indicators in order to evaluate and optimize their website performance and content. Thus, we redefine SEO as it is not solely the process of rectification for higher search rankings. It is rather much more a *user-centric* strategy that improves the findability and visibility of information in search results and aims for integrated user experience inside websites.

**Author Contributions:** All the authors have contributed equally in this research effort in terms of the conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing the original draft and writing review-editing. D.K.-M., G.A.G. and D.P.S. contributed to the supervision and project administration of this study. All authors have read and agreed to the published version of the manuscript

**Funding:** This research received no external funding.

**Acknowledgments:** Dedicated to those who fought against the covid-19, but did not make it. We heartily thank the guest editors of *Big Data Analytics in Cultural Heritage* for giving us this opportunity. We are also grateful to the Reviewers for their vital remarks. To conclude, we acknowledge the valuable technical contribution of Nikolaos Lazaridis for the provision of cutting-edge hardware helping us to work faster for this project.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


*Big Data Cogn. Comput.* **2020**, *4*, 5


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
