A High-Level Representation of the Navigation Behavior of Website Visitors
Abstract
:1. Introduction
1.1. Related Work
1.1.1. Commercial Software
1.1.2. Other Non-Commercial Approaches
1.2. Methodology
2. Identification of Rules in Each Class of Visitors
2.1. Representation of Each Class of Visitors as a Sequence of Symbols
2.1.1. Identification of Relevant Pages
- Filtering of pages of interest: Our marketing partner prepared a list of 298 web pages that they were interested in analyzing. We removed from sessions all web pages that were not pages of interest. We eliminated the sessions that did not visit any page of interest (26%). The dataset was reduced to 37,400 sessions (74%).
- Removing of pages that are automatically loaded: Those pages are not meaningful for marketing experts because they do not represent the intentional navigation behavior of visitors (for example, Java resources necessary for the proper functioning of the site).
- Removing the subsequent repetition of the same page: Most sessions had pages subsequently repeated n times. For example, Home → Home → Home → Home → Login → Control panel → Control panel → Logout. Our information technology partner indicated to us that, in most cases, it is due to the functionality of the website, and not related to the navigation behavior of visitors. For example, if the visitor fulfils a form, the same page could be automatically reloaded whenever the visitor clicks on a different field. Therefore, we reduced the subsequent occurrences of the same page to one occurrence. The previous example would be reduced to Home → Login→ Control panel → Logout.
2.1.2. Representation of Sessions as a Sequence of Symbols
2.1.3. Segmentation of Data in Different Classes of Visitors
- Visitors who made a payment: 7% of the sessions.
- Visitors who started the payment process but did not conclude it: 10% of the sessions.
- Visitors who made a conversion different to the made payment or started payment: 32% of the sessions.
- Visitors who did not perform any conversion: 52% of the sessions.
2.2. Selection and Implementation of the Compression Algorithm
2.2.1. Selection of the Sequence Mining Algorithm
2.2.2. Sequitur Algorithm
- (digram uniqueness): there is no pair of adjacent symbols repeated in the grammar.
- (rule utility): every rule appears more than once.
- Sequence: a string of symbols, e.g., “aghhhhbfababdchdttttyhhs”.
- Rule: a sub-sequence that appears twice or more in a sequence and its minimum length is 2. The rules obtained with the Sequitur algorithm may be defined in terms of other rules.
- Base rule: a rule that does not contain other rules, e.g., rule 1 = “a b”, rule 2 = “d c”.
- Nested rule: a rule composed of base rule(s), e.g., if rule 3 = “f 1 1 2 h”, it is a nested rule defined in terms of the base rules 1 and 2.
- Expanded rule: the result of recursively unfolding all the rules that are contained in a nested rule, e.g., the nested rule “f 1 1 2 h” is expanded as “f a b a b d c h”.
2.2.3. Implementation of the Sequitur Algorithm
2.3. Rule Extraction
2.3.1. Rule Finding
- Concatenate all sessions of a given class of visitors, adding a distinguishing pair of symbols between each session.
- Apply the Sequitur algorithm.
- Expand rules.
- Exclude rules that include the pair of symbols mentioned in the first step.
- Compute the frequency of each rule in sessions of the same class of visitors.
2.3.2. Inter-Class Analysis
- Percentage of rules found in sessions: given a set of rules, it measures the percentage of those rules that are found in a group of sessions. It allows us to find out if a set of rules describes a specific class of visitors or not. A result of 100% in all classes of visitors for a given set of rules would mean that all those rules were found in the four classes of visitors. Thus, that set of rules would not describe a specific class of visitors.
- Inverse frequency of a rule: it measures the percentage of sessions in which a rule is found at least once. A high percentage indicates that the rule is relevant for describing the navigation behavior of visitors.
- A total of 100% of the rules were found in “Made payment” sessions because rules were extracted from those sessions. We can see that this percentage decreases to approximately 50% for the classes of visitors “Started payment” and “Other conversions”. For the visitors from the class “No conversion”, it reduces to 6%. These results indicate that approximately 50% of the rules specifically describe the navigation behavior of the visitors that belong to the class “Made payment”.
- The highest inverse frequency indicates that one rule was found in up to 91% of sessions that belong to the class of visitors “Made payment”. This metric is lower for the other three classes of visitors. Since we use this metric to measure the rule relevance, we can say that this set of rules is more relevant for the visitors that belong to the class “Made payment”.
2.3.3. Rule Selection
- Select only nested rules. This eliminates redundancy.
- Select rules with inverse frequency ≥5%. A rule that describes less than 5% of the sessions does not generalize the navigation behavior of visitors; thus, it is not useful for the objectives of our research.
3. Representation of Sessions with Rules
3.1. Selection of Rules to Visualize
- The reduction rate of the shrinked session is equal to 1 − (length of the shrinked session/length of the original session); that is, . The length of the session is reduced by 0.58 (58%) when it is expressed with rules and pages that do not form a rule.
- The reduction rate of the stripped session is equal to 1 − (length of the stripped session/length of the original session); that is, . The length of the session is reduced by 0.75 (75%) when it is expressed only with rules.
3.2. Selection of the Session Representation to Visualize
4. Results
4.1. Graph Creation
4.1.1. Definitions
- Entry rate of the rule i: this is the number of sessions that start in the rule i divided by the total number of sessions in the class A. It is denoted by .
- Exit rate of the rule i: this is the number of sessions that end in the rule i divided by the total number of sessions in the class A. It is denoted by .
- Frequency of the edge : this is the flow of visits from the rule i to the rule . It is equal to the number of occurrences of the edge . It is denoted by .
- Out-degree frequency of the rule i: this is the flow of visits that goes out from the rule i. It is the sum of edge frequencies in which the source rule is i plus . It is denoted by .
- Weight of the edge : this is the frequency of the edge divided by the out-degree frequency of the source node ; that is, . It is denoted by .
4.1.2. Calculation Example
- Entry rule: seven sessions started in rule a and three sessions started in rule b.
- Exit rule: two sessions ended in rule a and eight sessions ended in rule b.
- Edge frequency: , , and .
- Entry rate: and .
- Exit rate: and .
- The calculation of weights is shown in Table 7.
4.1.3. Graph Example
4.2. Visualization of a Whole Class of Visitors
4.2.1. Relevant Entry and Exit Rules
4.2.2. Most Frequent Path
4.2.3. Rules in Which Conversion Occurs
4.3. Analysis of Specific Rules
4.4. Contrasting Different Classes of Visitors
4.4.1. Contrasting the Exit Rule
4.4.2. Contrasting a Common Rule
4.4.3. Contrasting the Most Frequent Path
5. Discussion
- Both CSW and NCA provide page-by-page detail. Considering that most commercial websites have hundreds of pages, a graph that shows all website sessions in a given period would be (1) extremely long in CSW and (2) uninterpretable in NCA.
- Both CSW and NCA allow us to filter segments of visitors, but this is not enough to have an easy-to-interpret graph. CSW allows us to select the starting webpage in the graph. However, that leads to an incomplete graph. It allows for an analysis of engagement in the selected web pages but does not allow for a visualization of the complete path followed by visitors.
- CSW allows tracking pre-configured events instead of web pages but without the context of all of the web pages visited by the related visitors. Conversely, our approach makes no assumptions about the visitor behavior and provides the rules (sequences of visited pages) in the context of the complete navigation paths.
- Neither CSW nor NCA eases the identification of loops. Our proposed method clearly shows loops, in a single conversion and between different conversions.
- In CSW and NCA, it is hard to visualize the navigation behavior of a whole class of visitors. Therefore, it would be more difficult to compare different classes of visitors. Conversely, our approach would provide a moderately small graph for each class of visitors, and these graphs are easier to compare. This comparison of classes allows us to answer specific business questions. For example, what is the most common sequence of visited pages on which visitors of two different classes leave the website?
- Generally, achieving business objectives (conversions) involves the visiting of several pages. Some business goals may be partially identified as events in CSW, e.g., the sequence of web pages that a visitor must follow to make a payment. However, visitors may follow longer common sequences that help to reduce the number of nodes. In addition, some common navigation behavior is not predictable. For example, what do visitors who did not finish the payment process have in common?
- The identification of unexpected loops or repetitions that could be avoided with website enhancements. For example, in Figure 7, we can see that 5% of the traffic in the node “Modify product and start payment” loops in this rule. The business expert could investigate the cause, e.g., a technical error, a non-intuitive site design, or a lack of clarity in the information shown.
- Identification of processes in which web traffic is lost. For example, in Figure 7, we can see that there are three rules in which the payment process can start: (1) “Modify product and start payment”, (2) “Consult payment details”, and (3) “Make payments query”. However, the dropout rate is four times higher in the first two and they have a loop. A call to action in the web pages of rules 1 and 2 could decrease their dropout rate.
- Finding the relationship between two conversions, regardless of whether they are subsequent or not. For example, we found that a high percentage of visitors who dropped out before confirming the payment did not request assistance at any point in their session. Thus, the online helpdesk is underutilized, as it could help to retain customers that leave before payment is finalized. A call to action on the pages prior to payment confirmation could increase the conversion rate.
- Comparing different website versions or different periods at a high level. If a website changes drastically, it can be difficult or even impossible to compare the impact of each page on the user experience. With our method, we can compare the versions at a high level. We could find, for example, how many steps the most common path has and if the new version reduced or increased the loops in navigation, and could compare the shortest path to make a payment versus the most common path to pay.
- Contrasting the navigation behavior of different classes of visitors. For example, new vs. recurrent; male vs. female; or visitors from different countries. All of them are in the context of the whole navigation behavior of the classes of interest. For example, the graphs in Figure 7 and Figure 8 can be compared as discussed in Section 4.4. This comparison would not be easy in two graphs with hundreds of nodes and edges.
6. Conclusions
6.1. Contributions
6.2. Limitations
6.3. Future Work
- The effectiveness of our approach could be tested for improving metrics measurement, website design, and paid marketing effectiveness.
- Software aimed at marketing experts could be useful for autonomously replicating and personalizing the process that we followed. For example, the company could identify the five most relevant conversions and use them for describing the navigation behavior of visitors. On the contrary, the company may find it useful to associate each page of interest with a conversion.
- It would be valuable to extract rules from the high-level visualizations that we obtained. For example, Acosta-Mendoza et al. propose a frequent approximate subgraph mining approach [51], which we could incorporate as the last step of our methodology.
- The use of rules that we propose can be applied after classifying visitors with different methods. We used conversions to classify visitors because we focused on a marketing audience. However, visitors could be classified with other techniques and purposes.
- After identifying rules of interest with our method, some of them could be configured in web analytics software for monitoring (e.g., such as events in Google Analytics). Although visitor behavior is dynamic, it could be useful for the marketing expert to monitor some rules autonomously.
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
WAS | Web analytics solutions |
CSW | Commercial software |
NCA | Non-commercial approach |
References
- Bondarenko, S.; Laburtseva, O.; Sadchenko, O.; Lebedieva, V.; Haidukova, O.; Kharchenko, T. Modern Lead Generation in Internet Marketing for the Development of Enterprise Potential. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 2019, 8, 3066–3071. [Google Scholar]
- Berman, R.; Israeli, A. The Value of Descriptive Analytics: Evidence from Online Retailers. Harvard Business School Working Paper, No. 21-067. 2020, pp. 1–56. Available online: https://www.hbs.edu/faculty/Pages/item.aspx?num=59259 (accessed on 20 January 2021).
- Kotler, P.; Gary, A. Principles of Marketing, 12th ed.; Pearson Education: London, UK, 2007. [Google Scholar]
- Hun, T.K.; Yazdanifard, R. The Impact of Proper Marketing Communication Channels on Consumer’s Behavior and Segmentation Consumers. Asian J. Bus. Manag. 2014, 2, 155–159. [Google Scholar]
- Kotler, P.; Kartajaya, H.; Setiawan, I. Marketing 4.0. Moving from Traditional to Digital, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2017. [Google Scholar]
- Rahman, A.; Dash, S.; Luhach, A.K.; Chilamkurti, N.; Baek, S.; Nam, Y. A Neuro-fuzzy approach for user behaviour classification and prediction. J. Cloud Comput. Adv. Syst. Appl. 2019, 8, 1–15. [Google Scholar]
- Kandpal, N.; Singh, H.P.; Shekhawat, M.S. Application of Web Usage Mining for Administration and Improvement of Online Counseling Website. Int. J. Appl. Eng. Res. 2019, 14, 1431–1437. [Google Scholar]
- Bertero, C.; Roy, M.; Sauvanaud, C.; Tredan, G. Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection. In Proceedings of the IEEE 28th International Symposium on Software Reliability Engineering (ISSRE), Toulouse, France, 23–26 October 2017; pp. 351–360. [Google Scholar]
- Velkumar, K.; Thendral, P. A survey on web mining techniques. In Proceedings of the 2nd International Conference on New Scientific Creations, Osaka, Japan, 7–9 April 2020; pp. 167–173. [Google Scholar]
- Wang, Y.; Liu, H.; Liu, Q. Application Research of Web Log Mining in the E-commerce. In Proceedings of the Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 349–352. [Google Scholar]
- Google. Google Analytics-Knowledgebase. 2021. Available online: https://developers.google.com/analytics (accessed on 29 January 2021).
- Matomo. Matomo-Open Analytics Platform. 2021. Available online: https://developer.matomo.org (accessed on 20 January 2021).
- Omniture. Omniture Website. 2021. Available online: https://marketing.adobe.com/resources/help (accessed on 15 January 2021).
- Leadfeeder. Leadfeeder Website. 2021. Available online: https://www.leadfeeder.com (accessed on 12 January 2021).
- VMO. VMO Website. 2021. Available online: https://vwo.com (accessed on 18 January 2021).
- Paveai. Paveai Website. 2021. Available online: https://www.paveai.com/referrer-spam-remover/ (accessed on 21 January 2021).
- Woopra. Woopra Website. 2021. Available online: https://www.woopra.com (accessed on 25 January 2021).
- Venugopal, K.R.; Nimbhorkar, S.S. Web Page Recommendations Based Web Navigation Prediction. In Web Recommendations Systems; Springer: Singapore, 2020; pp. 109–130. [Google Scholar]
- El Aissaoui, O.; El Madani El Alami, Y.; Oughdir, L.; El Allioui, Y. Integrating web usage mining for an automatic learner profile detection: A learning styles-based approach. In Proceedings of the 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morroco, 2–4 April 2018; pp. 1–6. [Google Scholar]
- Tiwari, S.; Gupta, R.K.; Kashyap, R. To Enhance Web Response Time Using Agglomerative Clustering Technique for Web Navigation Recommendation. In Proceedings of the Computational Intelligence in Data Mining, Honolulu, HI, USA, 27 November–1 December 2017; Advances in Intelligent Systems and Computing. Springer: Singapore, 2019; Volume 711, pp. 659–672. [Google Scholar]
- Huynh, H.M.; Nguyen, L.T.T.; Vo, B.; Oplatkova, Z.K.; Hong, T.P. Mining Clickstream Patterns Using IDLists. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019; pp. 2007–2012. [Google Scholar]
- Huynh, H.M.; Nguyen, L.T.T.; Vo, B.; Nguyen, A.; Tseng, V.S. Efficient methods for mining weighted clickstream patterns. Exp. Syst. Appl. 2020, 142, 112993. [Google Scholar] [CrossRef]
- Prakash, P.G.O.; Jaya, A. Analyzing and Predicting User Navigation Pattern from Weblogs using Modified Classification Algorithm. Indones. J. Electr. Eng. Comput. Sci. 2018, 11, 333–340. [Google Scholar] [CrossRef]
- Abirami, K.; Mayilvaganan, P. Fuzzy Clustering with Artificial Bee Colony Algorithm using Web Usage Mining. Int. J. Pure Appl. Math. 2018, 118, 3619–3626. [Google Scholar]
- Abirami, K.; Mayilvaganan, P. Similarity Measurement Of Web Navigation Pattern Using K-Harmonic Mean Algorithm. Elysium J. Eng. Res. Manag. 2017, 4, 1–6. [Google Scholar]
- Aravindan, J.S.; Vivekanandan, K. An Overview of Pre-processing Techniques in Web usage Mining. Int. J. Comput. Trends Technol. (IJCTT) 2017, 48, 41–44. [Google Scholar] [CrossRef]
- Banerjee, A.; Ghosh, J. Clickstream Clustering using Weighted Longest Common Subsequences. In Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining, Chicago, IL, USA, 5–7 April 2001; Volume 143, p. 144. [Google Scholar]
- Huidobro, A.; Monroy, R.; Cervantes, B. A Contrast-Pattern Characterization of Website Visitors in Terms of Conversions. In Technology-Enabled Innovations in Education (CIIE) 2020; Part of the Book Series: Transactions on Computer Systems and Networks (TCSN); Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Armstrong, G.; Kotler, P.T.; Trifts, V.; Buchwitz, L.A. Marketing: An Introduction, 6th ed.; Pearson: London, UK, 2017. [Google Scholar]
- Kumar, V.; Ogunmola, G.A. Web Analytics for Knowledge Creation: A Systematic Review of Tools, Techniques, and Practices. Int. J. Cyber Behav. Psychol. Learn. (IJCBPL) 2020, 10, 1–14. [Google Scholar] [CrossRef]
- WTS. Web Technology Surveys (WTS) Website. 2021. Available online: https://w3techs.com (accessed on 13 January 2021).
- G2. G2 Website. 2021. Available online: https://www.g2.com (accessed on 18 January 2021).
- Gita, S.; Christopher, G.; Bui, H.H.; Pynadath, D.; Goldman, R.P. Plan, Activity, and Intent Recognition. Theory and Practice. Chapter 5: Stream Sequence Mining for Human Activity Discovery; Kauffmann Publishers: Waltham, MA, USA, 2014; pp. 123–148. [Google Scholar]
- Gómez, F. Visualization and Machine Learning Techniques to Support Web Traffic Analysis. Master’s Thesis, Tecnológico de Monterrey, Monterrey, Mexico, 2018. [Google Scholar]
- Cervantes, B.; Gómez, F.; Loyola-González, O.; Medina-Pérez, M.A.; Monroy, R.; Ramírez, J. Pattern-Based and Visual Analytics for Visitor Analysis on Websites. Appl. Sci. 2019, 9, 3840. [Google Scholar] [CrossRef] [Green Version]
- Igor, C.; David, H.; Christopher, M.; Padhraic, S.; Steven, W. Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 6–9 August 2000; pp. 280–284. [Google Scholar]
- Dubois, P.M.J.; Han, Z.; Jiang, F.; Leung, C.K. An Interactive Circular Visual Analytic Tool for Visualization of Web Data. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, USA, 13–16 October 2016; pp. 709–712. [Google Scholar]
- Ahmed, N.K.; Rossi, R.A. Interactive Visual Graph Analytics on the Web. In Proceedings of the 9th International AAAI Conference on Web and Social Media, Oxford, UK, 26–29 May 2015; pp. 566–569. [Google Scholar]
- Bourobou, S.T.M.; Yoo, Y. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm. Dep. Electr. Comput. Eng. Pusan Natl. Univ. 2015, 15, 11953–11971. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Srivastava, J.; Cooley, R.; Deshpande, M.; Tan, P.N. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explor. 2000, 1, 12–23. [Google Scholar] [CrossRef]
- Malviya, B.K.; Agrawal, J. A Study on Web Usage Mining Theory and Applications. In Proceedings of the Fifth International Conference on Communication Systems and Network Technologies, Gwalior, India, 4–6 April 2015; pp. 935–939. [Google Scholar]
- Nakamura, R.; Inenaga, S.; Bannai, H.; Funamoto, T.; Takeda, M.; Shinohara, A. Linear-Time Text Compression by Longest-First Substitution. Algorithms 2009, 2, 1429–1448. [Google Scholar] [CrossRef]
- Charikar, M.; Lehman, E.; Ding Liu, R.P.; Prabhakaran, M.; Sahai, A.; Shelat, A. The Smallest Grammar Problem. IEEE Trans. Inf. Theory 2005, 51, 1–23. [Google Scholar] [CrossRef]
- Galle, M. Investigating the Effectiveness of BPE: The Power of Shorter Sequences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 1375–1381. [Google Scholar]
- Larsson, N.J.; Moffat, A. Offline Dictionary-Based Compression. In Proceedings of the Data Compression Conference, Snowbird, UT, USA, 29–31 March 1999; pp. 296–305. [Google Scholar]
- Bille, P.; Gørtz, I.L.; Prezza, N. Space-Efficient Re-Pair Compression. In Proceedings of the Data Compression Conference, Snowbird, UT, USA, 4–7 April 2017; pp. 171–180. [Google Scholar]
- Yang, E.H.; Kieffer, J.C. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models. IEEE Trans. Inf. Theory 2000, 46, 755–777. [Google Scholar] [CrossRef]
- Nevill-Manning, C.G.; Witten, I.H. Compression and Explanation using Hierarchical Grammars. Comput. J. 1997, 40, 3–116. [Google Scholar] [CrossRef] [Green Version]
- Latendresse, M. Masquerade Detection via Customized Grammars. In Lecture Notes in Computer Science, Proceedings of the Second International Conference (DIMVA), Vienna, Austria, 7–8 July 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 1–12. [Google Scholar]
- Manninen, M. Public Implementation of Sequitur in Python. 2021. Available online: https://github.com/markomanninen/pysequitur (accessed on 30 January 2021).
- Acosta-Mendoza, N.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F.; Gago-Alonso, A.; Medina-Pagola, J.E. Mining clique frequent approximate subgraphs from multi-graph collections. Appl. Intell. 2020, 40, 878–892. [Google Scholar] [CrossRef]
No. | New Symbol | The String so Far | Resulting Grammar | Expanded Rules |
---|---|---|---|---|
1 | a | a | 0 → a | NRF |
2 | g | ag | 0 → a g | NRF |
3 | h | agh | 0 → a g h | NRF |
4 | d | aghd | 0 → a g h d | NRF |
5 | f | aghdf | 0 → a g h d f | NRF |
6 | g | aghdfg | 0 → a g h d f g | NRF |
7 | h | aghdfgh | 0 → a 1 d f 11 → g h | gh |
8 | m | aghdfghm | 0 → a 1 d f 1 m 1 → g h | gh |
9 | a | aghdfghma | 0 → a 1 d f 1 m a 1 → g h | gh |
10 | d | aghdfghmad | 0 → a 1 d f 1 m a d 1 → g h | gh |
11 | f | aghdfghmadf | 0 → a 1 2 1 m a 2 1 → g h | gh |
2 → d f | df | |||
12 | g | aghdfghmadfg | 0 → a 1 2 1 m a 2 g 1 → g h | gh |
2 → d f | df | |||
13 | h | aghdfghmadfgh | 0 → a 1 2 m a 2 1 → g h gh | |
2 → d f 1 | dfgh |
Metric | Made Payment | Started Payment | Other Conversions | No Conversion |
---|---|---|---|---|
Percentage of visitors | 7% | 10% | 32% | 52% |
Number of rules | 764 | 704 | 997 | 92 |
Metric | Made Payment | Started Payment | Other Conversions | No Conversion |
---|---|---|---|---|
Percentage of “made payment” rules found in sessions | 100% | 57% | 46% | 6% |
Highest inverse frequency of a “made payment” rule in a session | 91% | 68% | 33% | 2% |
Metric | Made Payment | Started Payment | Other Conversions | No Conversion |
---|---|---|---|---|
Number of rules | 9 | 5 | 4 | 0 |
Rule Name | Rule Length | Made Payment | Started Payment | Other Conversion |
---|---|---|---|---|
Go to control panel | 4 | X | X | X |
Pay via control panel | 7 | X | ||
Pay and modify product | 5 | X | ||
Consult availability and pay | 8 | X | ||
Modify product and pay | 8 | X | ||
Start payment | 4 | X | ||
Pay for a service | 8 | X | ||
Make invoice | 4 | X | ||
Login and modify product information | 4 | X | X | |
Modify product information | 3 | X | X | |
Consult payment details | 3 | X | ||
Make payments query | 3 | X | ||
Modify product information and start payment | 3 | X | ||
Consult availability | 3 | X |
Metrics for Visitors from the Class “Made Payment” | Original Session | Shrinked Session | Stripped Session |
---|---|---|---|
Average session length | 39.23 | 18.26 | 1.67 |
Maximum session length | 572 | 181 | 24 |
Standard deviation of session length | 30.81 | 16.46 | 1 |
Average reduction rate from original session | NA | 0.54 | 0.95 |
Standard deviation of reduction rate from original session | NA | 0.17 | 0.04 |
Percentage of sessions considered | 100% | 100% | 70% |
Source Rule i | Target Rule j | |||
---|---|---|---|---|
a | b | 5 | 10 | 5/10 = 0.5 |
a | a | 3 | 10 | 3/10 = 0.3 |
a | Exit | 2 | 10 | 2/10 = 0.2 |
b | a | 6 | 14 | 6/14 = 0.43 |
b | Exit | 8 | 14 | 8/14 = 0.57 |
Type | Author | Differences with Our Approach |
---|---|---|
Commercial software | Google Analytics (GA) [11] and Matomo [12] | They provide graphs with page-by-page detail. This helps to measure web page engagement; for example, to find pages where traffic is lost. However, this detail makes it difficult to visualize a trajectory with numerous pages or to visualize 100 percent of visitors. It is possible to track events instead of web pages, but since events are pre-configured, they do not necessarily represent the natural navigation behavior of visitors. In addition, GA does not allow you to analyze data generated prior to its use. |
Non Commercial software | S. Tiwari et al. [20] | Its goal is to find the expectation of the next page using agglomerative clustering. Visitors are classified based on previous web pages they accessed, but web pages are not clustered or associated with conversions. Therefore, classes are not necessarily meaningful to a business expert and do not provide a high-level representation of the navigation behavior. It is useful in an online implementation to improve web response time, but is not intended to facilitate analysis for MKT experts. |
A. Banerjee et al. [27] | Its purpose is to find the longest common sub-sequence of clickstreams using a dynamic programming algorithm. They use a similarity graph to find clusters of visitors based on the time spent on each page. In our approach, the longest path is not necessarily the most relevant in terms of business goals. In fact, we found that, in general, visitors who make a payment visit fewer pages than those who do not. In our approach, the time spent on each webpage is not relevant. Finally, the similarity graph that they build is aimed to cluster visitors but is not to be used by marketing experts for further analysis. | |
Huy M. et al. [21,22] | They present a novel data structure (pseudo-IDList) suitable for clickstream pattern mining. They also propose using the average weight measure for clickstream pattern mining and present an improved method named Compact-SPADE. Both approaches focus on improving runtime or memory consumption for clustering visitors, but no business knowledge is incorporated to create those clusters. In addition, they do not create a high-level graph of the clusters for further analysis by business experts. | |
F. Gómez [34] | He presents a visualization tool for analyzing website traffic. It aims to distinguish bots from human visitors based on their navigation path. Like commercial software, it provides a page-by-page detail, which makes it difficult to analyze numerous pages. | |
B. Cervantes [35] | They combine visualization and machine learning techniques for analyzing web log data. The visualization can be used by business experts to obtain insights by looking at key elements of the graph. However, visitors are not classified in terms of business goals. Furthermore, the visualization provides page-by-page detail, which makes it difficult to analyze visits with numerous nodes. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huidobro, A.; Monroy, R.; Cervantes, B. A High-Level Representation of the Navigation Behavior of Website Visitors. Appl. Sci. 2022, 12, 6711. https://doi.org/10.3390/app12136711
Huidobro A, Monroy R, Cervantes B. A High-Level Representation of the Navigation Behavior of Website Visitors. Applied Sciences. 2022; 12(13):6711. https://doi.org/10.3390/app12136711
Chicago/Turabian StyleHuidobro, Alicia, Raúl Monroy, and Bárbara Cervantes. 2022. "A High-Level Representation of the Navigation Behavior of Website Visitors" Applied Sciences 12, no. 13: 6711. https://doi.org/10.3390/app12136711
APA StyleHuidobro, A., Monroy, R., & Cervantes, B. (2022). A High-Level Representation of the Navigation Behavior of Website Visitors. Applied Sciences, 12(13), 6711. https://doi.org/10.3390/app12136711