Combating Web Tracking: Analyzing Web Tracking Technologies for User Privacy

Sim, Kyungmin; Heo, Honyeong; Cho, Haehyun

doi:10.3390/fi16100363

Open AccessArticle

Combating Web Tracking: Analyzing Web Tracking Technologies for User Privacy

by

Kyungmin Sim

,

Honyeong Heo

and

Haehyun Cho

^*

Graduate School of Software, Soongsil University, Seoul 06978, Republic of Korea

^*

Author to whom correspondence should be addressed.

Future Internet 2024, 16(10), 363; https://doi.org/10.3390/fi16100363

Submission received: 31 August 2024 / Revised: 28 September 2024 / Accepted: 3 October 2024 / Published: 5 October 2024

(This article belongs to the Special Issue Security and Privacy Issues in the Internet of Cloud)

Download Versions Notes

Abstract

:

Behind everyday websites, a hidden shadow world tracks the behavior of Internet users. Web tracking analyzes online activity based on collected data and delivers content tailored to users’ interests. It gathers vast amounts of information for various purposes, ranging from sensitive personal data to seemingly minor details such as IP addresses, devices, browsing histories, settings, and preferences. While Web tracking is largely a legitimate technology, the increase in illegal user tracking, data breaches, and the unlawful sale of data has become a growing concern. As a result, the demand for technologies that can detect and prevent Web trackers is more important than ever. This paper provides an overview of Web tracking technologies, relevant research, and website measurement tools designed to identify web-based tracking. It also explores technologies for preventing Web tracking and discusses potential directions for future research.

Keywords:

web privacy; web tracking; web crawler; web browser; privacy leakage

1. Introduction

While browsing the web, users often encounter content tailored to their interests. For instance, after searching for a laptop to purchase, advertisements or related content may appear in their Instagram or Facebook feeds. This personalized experience is the result of web tracking, which systematically collects data and information about users. Web tracking techniques monitor various aspects of user behavior online, including the websites visited, the duration of time spent on each page, and the interactions made, such as clicks [1]. Companies aggregate data on users’ online activities—such as search history, browsing behavior, and social media interactions—to construct detailed user profiles. These profiles are then utilized to deliver personalized content and targeted advertising.

Research by Castell-Uroz et al. revealed that 79% of websites globally employed Web tracking, with Google maintaining tracking presence on 80.3% of all websites [2]. While Web tracking is generally perceived as a benign technology that provides user-tailored information, it also has the potential to be exploited, raising significant privacy concerns, as evidenced by previous studies [3,4,5]. Users may voluntarily provide personal information on the web, such as through Web forms, or this information may be indirectly collected without their knowledge through methods such as IP header analysis, HTTP requests, search engine query analysis, and JavaScript [6].

Recent studies have highlighted the misuse of Web tracking for malicious purposes, including price discrimination, government surveillance, background scanning, and identity theft [7,8,9]. The growing prevalence of Internet usage and digital technology has amplified the risks associated with cyberattacks and surveillance, posing significant threats to online privacy and security. Malicious websites often collect and sell user data to third parties without the user’s knowledge or consent, leading to identity theft, financial fraud, and other detrimental consequences. Furthermore, cybercriminals may exploit stolen data to launch cyberattacks or sell them on the dark Web for profit [10].

Numerous studies and techniques have been developed to safeguard Internet users from malicious trackers, including browser extensions, tracking blocking technologies, domain-level blocking, and fingerprinting defenses. However, despite advancements in Web tracking detection technologies, it remains challenging to identify and prevent non-consensual Web tracking in real-world scenarios due to the inherent limitations of existing research.

Despite the widespread implementation of Web tracking detection and prevention technologies, the rapidly evolving nature of tracking methods continues to present significant challenges. Tracking mechanisms, such as cookies, JavaScript, Web beacons, and fingerprinting, are continually updated and refined, making detection and response increasingly difficult. The diversity of devices, browsers, and platforms further complicates the detection process, as tracking behaviors often differ across these environments. While existing tracking detection tools are somewhat effective, they struggle to comprehensively identify non-consensual tracking activities in real-world scenarios.

This paper investigates various Web tracking techniques and previously proposed approaches to mitigating unconsented tracking of users’ Web activities. By analyzing the limitations of current tracking detection technologies and exploring innovative solutions, this study aims to better equip users and organizations to defend against the growing threats posed by web tracking.

The core issue addressed in this paper is the inadequacy of current Web tracking detection and prevention technologies in effectively identifying and mitigating the rapidly evolving landscape of tracking techniques. Although tools like FourthParty [5], FPDetective [11], OpenWPM [12], FP-Crawler [13], FP-Radar [14], and OmniCrawl [15], as well as prevention solutions such as FP-Guard, UniGL [16], AdGraph [17], WebGraph [18], and FPFlow [19], have made significant progress in monitoring tracking activities, they often fall short of fully capturing the dynamic and complex nature of modern tracking mechanisms. For example, as websites increasingly adopt fingerprinting and cross-site tracking technologies, traditional methods struggle to detect these covert techniques, which do not rely on easily identifiable elements like cookies. Furthermore, the variability in tracking methods across different browsers, devices, and platforms complicates the development of a one-size-fits-all solution. The lack of comprehensive detection frameworks that can adapt to new tracking techniques, combined with the difficulty of monitoring tracking activities that span multiple websites, presents a critical challenge for users and organizations seeking to protect their privacy.

This paper provides an overview of the current state of tracking prevention and systematically analyzes and compares defense approaches to offer insights into the existing tracking prevention ecosystem.

This paper aims to elucidate and contribute to the following key areas:

•: An in-depth exploration of various tracking technologies, including the long-established use of cookies by trackers, as well as more recent Web tracking techniques such as Web beacons and browser fingerprinting. The paper examines how these technologies are employed for tracking purposes.
•: A comprehensive discussion of the research methodologies and tools used to detect prevalent tracking activities on the web, alongside the technologies designed to prevent such tracking. The paper categorizes approaches for measuring Web APIs and outlines strategies for preventing tracking, including the randomization of API values, patching of non-unique fingerprints, employing machine learning techniques for tracking detection, and implementing dynamic taint analysis.
•: Finally, the paper presents the challenges and potential research directions that may help address the ongoing and future arms race between tracking technologies and privacy-preserving methods.

2. Tracking on the Web

Each time we browse the Internet, most Web browsers generate logs of the websites visited and the items clicked. To track this information, many websites utilize cookies, which are small data files stored in our Web browsers. Additionally, some websites may leverage account information to monitor user activity. Through Web tracking, online services gather extensive personal information about users, which is then employed for behavioral analysis and to deliver content that may be relevant to users.

Media platforms such as YouTube and Netflix collect data on the videos users watch to recommend personalized content. Similarly, online retailers like Amazon and eBay track the items viewed and purchased by users, enabling them to suggest other products of potential interest. Furthermore, search engines like Google maintain records of search queries to provide relevant results, though these data are also often used for advertising purposes. For instance, after searching for a camera on Google, users may encounter camera advertisements on other websites. Thus, website tracking technology facilitates the provision of personalized content, which can enhance the user experience.

As discussed, websites collect various types of data to offer customized information to users. This may include IP addresses to determine users’ locations, details on user interactions with websites (e.g., time spent on a page or mouse cursor position), information about the browsers and devices used to access the site, and insights into users’ interests, shopping habits, and encountered issues. While most websites engage in some form of Web tracking, the extent of data collection varies—some sites may collect all these types of data, while others may gather none. Ultimately, the scope of data collection depends on the services provided by the website and its revenue generation model.

2.1. Web Tracking Methods

In today’s digital landscape, the widespread adoption of Web tracking technologies has dramatically transformed how user data are collected, monitored, and utilized across online platforms. As websites have become more sophisticated, so too have the methods used to track user behavior, often without explicit consent. These practices raise significant concerns regarding privacy, security, and data protection, particularly as tracking technologies become more advanced and pervasive. The evolution of tracking techniques, from cookies to fingerprinting and beyond, has driven the development of both regulatory frameworks and technical countermeasures to safeguard users’ personal information. This section provides a detailed analysis of the various Web tracking mechanisms currently in use, the associated security risks, and the growing importance of regulatory compliance in protecting data privacy. By reviewing key tracking methods, such as cookies, Web beacons, and fingerprinting, this section examines how these tools operate, their potential for privacy infringement, and the security measures and legal frameworks designed to mitigate these risks.

2.1.1. Cookie

Cookies are small information files created by Web servers and sent to Web browsers. The browser stores these cookies for a predetermined period or for the duration of the user’s session on the website. When the user makes subsequent requests to the Web server, the browser automatically attaches the relevant cookies.

Types of Cookies

There are various types of cookies. Session Cookies are temporary and are deleted once the browser is closed. They are typically used to manage user sessions, such as keeping the user logged in while navigating between pages. Persistent Cookies remain on the user’s device even after the browser is closed, allowing the site to remember user preferences or login details between sessions. Third-Party Cookies are set by domains other than the one the user is visiting. They are generally used for advertising or tracking purposes across multiple websites and are the most controversial from a privacy and security standpoint.

Security Risks of Cookies

Cookies can pose several security risks. Cookie theft (session hijacking) occurs when session cookies, which manage session states, are intercepted using methods such as man-in-the-middle (MITM) attacks or cross-site scripting (XSS). If an attacker gains access to a user’s session cookie, they can impersonate the user and control the session. Cross-site scripting (XSS) allows attackers to inject malicious scripts into a vulnerable website, which can then access cookies containing sensitive information, such as session IDs [20]. These can be exploited for session hijacking attacks. Cross-Site Request Forgery (CSRF) exploits cookies by tricking users into submitting unintended requests. Since browsers automatically send cookies with each request, attackers can manipulate a logged-in user into performing actions without their consent. Persistent data collection is possible even after cookies are deleted from the device. Data collected while the cookie was active may have already been transmitted to the server. This allows third parties, such as ad networks, to retain and manage the data, even after the cookie is no longer on the user’s device. This presents a major privacy concern, as data can be used for tracking without the user’s direct control [21]. Tracking and privacy concerns arise from third-party cookies, which track users across multiple websites [22]. Advertisers and data brokers can compile detailed user profiles based on browsing habits. Due to this pervasive tracking, regulations like the EU’s General Data Protection Regulation (GDPR) require user consent for cookie tracking [22].

Cookie Security Measures

To mitigate these risks, several security measures can be implemented: A secure flag ensures cookies are transmitted only over encrypted HTTPS connections, preventing them from being sent in plain text over insecure HTTP connections. An HttpOnly flag prevents client-side scripts, such as those injected via XSS attacks, from accessing cookies, which is essential for protecting session cookies from theft. A SameSite attribute restricts cookies from being sent with cross-site requests, mitigating CSRF attacks. When enabled, cookies are only sent if the request originates from the same site, making cross-site attacks more difficult. Cookie expiration, particularly for session cookies, should be set to a short duration to minimize the window of opportunity for attackers to exploit them.

2.1.2. Web Beacon

Web beacons are small tags placed on websites or emails to track how users interact with content. A beacon is a one-pixel-by-one-pixel transparent image, usually within the code of a Web page. When a browser accesses a site with a Web beacon, it sends a request to download the image. The request includes information such as the user’s IP address, time of day, or browser, which can be used to track their activity. This allows webmasters to monitor users as they browse their sites or interact with email content. If web beacons are used in email marketing, they can provide the company with information about whether the email was opened, when it was opened, and how many times it was opened [23].

2.1.3. Referrer Header

The referrer header is an HTTP request header that transmits the URL of the previously visited Web page to the server of the current page being accessed by the Web browser. This header enables website operators to determine the referral path through which users arrived at their site, aiding in the analysis of traffic sources and user flow. It is commonly utilized for purposes such as marketing, analyzing advertising effectiveness, and delivering customized content to users. However, due to the potential for leaking personal information, many modern browsers have introduced features to limit or obfuscate referrer information, enhancing user privacy.

2.1.4. IP Address-Based Tracking

IP address-based tracking is a method of monitoring online activity through the use of a user’s Internet Protocol (IP) address. An IP address is a unique numerical identifier assigned to a user each time they access the Internet, and it can be used to approximate the user’s location, determine their Internet Service Provider (ISP), and gather network information. Websites and service providers utilize this information to analyze user behavior, track access patterns, and, in some instances, identify specific users or devices. However, since IP addresses are often dynamic and can be shared among multiple users or devices, their effectiveness in accurately identifying individuals is limited.

2.1.5. Fingerprinting

In the digital environment, browser configurations can effectively serve as a means of user identification, much like fingerprints are used for authentication in the physical world. Browser fingerprinting is a sophisticated tracking technique that constructs detailed user profiles by analyzing various device and browser attributes, including device type, operating system, screen resolution, browser version, language settings, and time zone. The aggregation of these attributes provides a highly accurate method of identifying users, though it may not always yield a uniquely identifiable profile. Moreover, browser fingerprinting can detect the use of emulators or spoofing tools, which may indicate suspicious activity on a website. Thus, browser fingerprints can act as reliable user identifiers, as they are typically distinct enough to differentiate between individual users [24,25,26].

2.1.6. Canvas Fingerprinting

Canvas fingerprinting is a sophisticated tracking technique that utilizes the HTML5 canvas element to generate a unique identifier for a user’s device and browser. This method involves rendering specific graphics within the browser and analyzing the resulting image pixel by pixel. The image is then converted into a hash value, which serves as a unique identifier for the user [27]. Unlike cookies, canvas fingerprinting does not rely on storing data on the user’s device. Instead, it leverages the browser’s rendering capabilities to create an identifier based on the device’s unique characteristics. Additionally, unlike cookie-based tracking, canvas fingerprinting does not require any user action, such as accepting cookies. It operates discreetly in the background when a webpage loads, enabling a more passive and subtle form of tracking. As a result, canvas fingerprinting raises significant ethical concerns, as it functions without explicit user consent, unlike cookies. Despite the privacy issues associated with its persistent nature, certain browsers and tools—such as specialized browsers and extensions—are increasingly being employed as countermeasures to mitigate this form of tracking.

2.1.7. Third-Party Tracking Scripts

Third-party tracking scripts are external scripts embedded within websites by third-party providers to monitor user activity and collect data. These scripts capture various forms of user interaction data, such as the duration of page visits and click behavior, and transmit this information to a third-party server. The collected data are often used for targeted advertising and user behavior analysis. These scripts have the capability to track users across multiple websites, which raises significant privacy concerns. Consequently, certain browsers and extensions offer features to block or manage these scripts. Additionally, the presence of third-party tracking scripts can impact website performance, while advertisers leverage them to deliver personalized advertisements.

2.2. How Web Tracking Works

This section outlines techniques used to enhance Web functionality and collect user data. Scripts embedded in websites enable automatic tracking and analysis of user behavior through tools like Google Analytics and Hotjar. Browser extensions add features and may access browsing data with user consent, while not all engage in tracking. Browser plugins, once used for multimedia processing, have largely been replaced by modern technologies like HTML5, with their functions now managed by extensions and native web tools.

2.2.1. Script

A script is a piece of code that runs within a webpage, typically written in JavaScript, used to dynamically change the page’s behavior or respond to user input [28]. Scripts operate on the client side and can enhance the speed and responsiveness of Web applications through asynchronous communication with the server (e.g., AJAX). They can be embedded directly within an HTML document or loaded from external files. This allows developers to install scripts on websites to automatically collect user data. However, caution is required from a security standpoint. If a malicious script is inserted into a webpage using cross-site scripting (XSS) and executed in another user’s browser, it can hijack sessions or steal sensitive information. Websites can defend against such attacks by using Content Security Policy (CSP) headers, which limit the execution of unauthorized scripts. Additionally, scripts can be unintentionally executed if user input is not properly validated, potentially exposing vulnerabilities in dynamic content updates. Scripts are also commonly used for data analysis and visitor flow monitoring. They track website traffic and visit counts, breaking them down into more granular metrics like unique visitors, average time spent on the page, and exit rates. Popular tracking tools that rely on scripts include Google Analytics, Matomo, Tiny Analytics, Mixpanel, SimilarWeb, and Hotjar.

2.2.2. Browser Extensions

A browser extension is a software module designed to enhance the functionality of a Web browser. Using Web technologies such as HTML, CSS, and JavaScript, extensions can modify the browser’s user interface or manipulate data on webpages. This capability allows extensions to introduce new features and improve the browser’s overall performance. Browser extensions can also access browser APIs to extend core functionalities. For instance, they can add features like intercepting HTTP requests or managing tabs, cookies, and cache. Common examples of browser extensions include ad blockers, password managers, social media toolkits, and productivity tools. Extensions can personalize the browser’s user interface or modify the appearance and behavior of specific websites [29]. However, there are important security considerations. Extensions may request access to sensitive browser data, such as browsing history, input data, and clicked links. While this access is typically managed through user consent, malicious extensions may exploit these permissions to leak personal information. Even trusted extensions can be compromised during updates, potentially introducing malicious code. Users may find it challenging to detect such changes. Additionally, extensions requesting excessive or unnecessary permissions can present a security risk. Although browser extensions typically run in a sandboxed environment, certain APIs may bypass this isolation and access the operating system’s resources, which can lead to malicious activities. It is important to note, however, that not all extensions track user behavior or monitor Internet usage [30,31].

2.2.3. Browser Plugins

A plugin is software that operates outside the browser, primarily used to process file formats or media content that the browser cannot natively handle. Historically, common examples included Flash, Silverlight, and Java Applets. Plugins do not function entirely within the browser but interact with it by leveraging the operating system’s resources. They enable the browser to support various formats (e.g., PDFs, media files) through built-in functionality, extending the capabilities of Web applications and allowing for more complex tasks [32]. However, there are significant security concerns. Because plugins are deeply integrated with the operating system, insecure plugins can become a gateway for malware. Additionally, plugins have direct access to the user’s device and file system, which, if exploited, could allow an attacker to gain full control of the system. While Flash and Java were widely used in the past, modern browsers now favor standardized technologies such as HTML5. Due to the numerous security vulnerabilities associated with plugins, which can lead to malicious code execution or system infections, most browsers have phased out support for them. Although plugins once had the ability to access webpage content and track user activities, these functions are increasingly managed by browser extensions and native Web technologies [33].

2.3. Data Privacy Regulation on Web Tracking

Web tracking is mostly legal, but there are increasing instances of illegal user tracking, data leakage, and illegal data sales [28,34]. To safeguard users’ data privacy, the European Union (EU) implemented the General Data Protection Regulation (GDPR) in May 2018 [35]. This stringent law protects the data privacy of people in the EU, Switzerland, the Islands, Liechtenstein, and Norway. Also, the California Consumer Privacy Act (CCPA), which became effective in 2020 after the GDPR, is a robust privacy law. It applies to any business that collects or sells the personal information of Californians, irrespective of their location, though it does not apply to Californians living outside of California. Consent to personal information processing differs between CCPA and GDPR. The CCPA uses an opt-out system that enables the data subject to express their intention to refuse the sale of personal information. Conversely, the GDPR uses an opt-in system that requires prior consent from data subjects. According to the GDPR and CCPA, a website can only use a visitor’s personal data if the visitor has given their consent, and personal data cannot be processed without a legitimate basis, such as user consent. Providing users with enough information and transparency about Web tracking to fulfill consent requirements is a key issue. Consent must be freely given, specific, informed, unambiguous, and easy to understand. However, some websites fail to obtain proper consent or do not specify their tracking practices clearly [10].

3. Countermeasures against Web Tracking

The functionality of Web browsers has expanded significantly with advancements in Web technology. However, the evolution of tracking technologies has led to an ongoing arms race between tracking methods and anti-tracking solutions. In this section, various studies and solutions for tracking detection and prevention on the Web are examined, along with research that highlights the limitations of current tracking detection techniques and proposes potential improvements. Section 3.1 reviews various Web measurement tools and related research, highlighting key requirements for these tools and analyzing the design of two prominent frameworks, OpenWPM and Tracker Radar Collector (TRC). Section 3.2 covers techniques for preventing Web tracking and compares the tracking prevention technologies used by commercial browsers and extensions. Section 3.3 addresses the challenges faced in tracking detection and prevention.

3.1. Tracking Measurement

Web tracking research typically follows two methodologies: either by crawling multiple websites using specialized measurement tools or by studying the browsing behaviors of a large user sample. These studies play a crucial role in identifying Web tracking practices and facilitate the collection and analysis of tracking-related data.

Table 1 provides an overview of various tracking measurement tools, detailing their capability to track specific properties. The first column enumerates the different properties, such as cookies, navigator, canvas, WebRTC, geolocation, and others. The subsequent columns list the individual tracking measurement tools, including FourthParty, FPDetective, OpenWPM, FP-Crawler, FP-Radar, OpenWPM-Mobile, and OmniCrawl. A “✓” in the table indicates that the tool is capable of tracking a particular property, while a “-” signifies that the tool does not support that feature.

In 2011, Mayer et al. [5] developed FourthParty as a Firefox extension that detects calls to window, navigator, and screen objects among browser APIs and JavaScript APIs related to HTTP traffic, DOM windows, and cookies when a user browses a website. FourthParty stores all records in a standardized log format in an SQLite database. In 2013, Nikiforakis et al. [11] developed FPDetective, a framework for detecting web-based device fingerprinting. It consists of a crawler built with PhantomJS and Chromium, and a parser that analyzes the data collected by the crawler. FPDetective detects the use of the navigator and window of the JavaScript API. The navigator contains information about browser vendors and specific browser versions, MIME type, supported plugins, operating system, etc. Using this tool, researchers detected cases of flash and font detection among tracking scripts on 10,000 websites. As a result, 97 websites used flash fingerprinting, and 51 websites used font fingerprinting.

In 2015, Englehardt et al. [12] developed OpenWPM. It uses Firefox-based Selenium and collects information about disk access, JavaScript access, and HTTP data to analyze online tracking features between first and third parties. In particular, to analyze JavaScript access, researchers redefined the getter and setter of the Web API to determine which API was used on a webpage. In 2016, a study using OpenWPM to analyze canvas, AudioContext, WebRTC fingerprinting, etc., identified 14,371 canvas fingerprinting, 3250 canvas-based font guesses, and a small number of other WebRTC and AudioContext-based fingerprinting webpages.

In 2020, Vastel et al. [13], through the FP-Crawler study, investigated the number of websites that used fingerprinting to detect crawlers. Researchers analyzed how effective fingerprinting was in detecting Web crawlers, and according to the research, the more the crawlers have modified fingerprint elements, the more likely the site tends to fail to detect crawlers. In addition, they found 291 crawler-blocking websites on 10,000 websites and confirmed that 93 of them used fingerprinting.

In 2021, Iqbal et al. [14], after developing FP-Radar, analyzed Alexa’s top 100k websites from 2010 to 2019 using the Wayback Machine. FP-Radar draws API correlation graphs through a graph-based supervised machine learning approach to predict future API usage. Researchers confirmed that 63% of fingerprinting scripts used battery, navigator, and network APIs using API clustering through the unsupervised machine learning approach.

The above studies affirm that measurement technology advances alongside the development of Web technology. In 2011, Flash, Java, and Silverlight were present in close to 30% of the Web, but they were rarely used as of 2022 [36]. During this period, Flash was no longer supported as of 2020 [37], and HTML5 replaced the features that Flash was used for, and many other plugins also had security vulnerabilities and reliability issues, and their features were replaced by extensions. In other words, the Web API provides the functions that existing plugins have added, and the user experience has become convenient, but the problem is that stateless tracking has been used more and more by exploiting these Web APIs. As Web technology develops in the future, its abuse will increase, so it is necessary to continuously develop Web measurement tools.

The two tables below offer a comprehensive comparison of the security and privacy features available in both PC and mobile browsers. Table 2 compares three critical security features in PC browsers: tracker and ad blocking, fingerprinting prevention, and cross-site cookie protection. The table specifies whether each browser supports these features by default ( Futureinternet 16 00363 i001

), through the installation of an extension ( Futureinternet 16 00363 i002

), or not at all ( Futureinternet 16 00363 i003

). For example, Brave Browser provides all features by default, while Chrome and Edge only support certain features through extensions. Similarly, Table 3 compares mobile browsers based on the same security features, indicating whether they are supported by default, via an extension, or not supported. Based on the evaluation of privacy features across PC and mobile browsers in Table 2 and Table 3, it is clear that Brave Browser and Tor Browser offer the most comprehensive protection against Web tracking and fingerprinting. Both browsers provide in-browser support for essential privacy features such as tracker and ad blocking, fingerprinting prevention, and cross-site cookie protection, making them ideal for users seeking robust, built-in privacy without relying on third-party extensions. Brave strikes an excellent balance between privacy and usability, offering strong native protections while maintaining compatibility with a wide range of websites. Tor Browser, on the other hand, provides the highest level of privacy, particularly through its advanced fingerprinting prevention mechanisms. However, this heightened privacy can sometimes result in usability challenges, such as page rendering issues. Firefox also performs well, offering customizable privacy options and the flexibility to enhance privacy through extensions. However, browsers like Chrome and Edge depend heavily on external extensions to achieve privacy capabilities comparable to Brave or Tor, which introduces potential security risks and increases complexity. Overall, for users prioritizing privacy, Brave Browser stands out as the most balanced option for both PC and mobile platforms, while Tor Browser remains the top choice for maximum privacy protection, albeit with some trade-offs in user experience.

Table 4 below offers a concise overview of various anti-tracking tools, detailing their operational methods and the specific techniques they employ. This information is intended to assist researchers and technical experts in selecting the most appropriate tool for their particular environment or requirements. The tools listed in the table contribute to the prevention of user tracking on the Web through diverse approaches, including randomization, non-unique fingerprinting, and machine learning-based detection.

3.1.1. Tracking Measurements in Mobile Devices

The user environment for the Web is rapidly shifting from desktop to mobile devices. For example, in 2011, mobile traffic accounted for only 6%, but in 2021, 56% of traffic was being transmitted from mobile devices [46]. Unlike desktops, mobile devices have various sensors such as motion and ambient sensors, and webpages use Web API to control the sensors to create the same effect as using an application. Therefore, we present a study on the use of fingerprinting techniques that utilize Web API to access sensors, and a large-scale study of Web measurement in the mobile environment.

Das et al. [47] suggested that trackers could access a sensor when a user entered a key or PIN and could perform stateless tracking based on the sensor. Accordingly, researchers discovered that using OpenWPM-Mobile, 3695 out of 100,000 websites accessed one or more sensor APIs. Furthermore, 63% of scripts accessing motion sensors used canvas fingerprinting.

Cassel et al. [15] developed OmniCrawl to propose a method to perform a comprehensive analysis of Web tracking on mobile, desktop, and privacy-focused browsers. Previous Web measurement tools developed in prior studies did not account for the mobile environment, instead relying on a PC browser emulated to appear as a mobile version (through changes in user-agent and screen resolution elements in desktop Firefox). This study utilized actual devices to gather scripts and compared the API access and features of commercially available browsers.

Zhang et al. [48] proposed sensor calibration fingerprinting. This method infers the calibration value that the smartphone manufacturer takes for the correct value of the sensor. This is a highly efficient method because it can be used without the user’s permission on a website or in an app. Researchers explained that the problem could be solved by rounding or adding subtle noise to the sensor value that did not affect the quality.

3.1.2. Measurement of User Browsers on a Large Scale

To determine how unique each browser is in tracking, researchers use webpages with tracking and fingerprinting scripts to assess the degree of uniqueness of each browser in tracking by collecting changes in the entropy of browser environments and fingerprint elements. It can help identify the tracker code based on the collected data. We introduce Panopticlick, AmiUnique, and Hiding in the Crowd, which are representative studies.

Panopticlick is a project launched by the Electronic Frontier Foundation (EFF) that provides information about the user’s browser and basic knowledge of tracking and fingerprinting [49]. Eckersley et al. [50], based on that site, collected fingerprints from 470,161 browser samples and found that 1 out of 286,777 browsers had the same fingerprints as others.

AmIUnique can measure the AdBlock filter-list in the user’s browser, including HTTP Header, WebGL, and canvas elements, and provides fingerprinting statistics for users and visualizes how unique the user is for each element as a percentage [51]. Laperdrix et al. [52], based on that site, analyzed 118,934 fingerprints and confirmed that 89.4% of fingerprints were unique.

Gómez-Boix et al. [53] conducted a study where they inserted tracking scripts into a popular website, collected 2,067,942 browser fingerprints over six months, and compared the results with previous studies using Panopticlick and AmIUnique. In analysis, 62% of mobile device fingerprints had unique canvas values in the analysis, while 30% of desktop fingerprints had unique plugin combinations. The results of the study indicated that the mobile device’s own features significantly impacted the fingerprint, while user customization on desktops had a major influence on the fingerprint. Studies utilizing Panopticlick and AmIUnique may gather user fingerprints based on their website access, which could bias their results. That study identified 33.6% of unique fingerprints, while previous studies showed that collecting large-scale fingerprints, excluding user intent, could identify more than 80% of unique fingerprints. The findings show that browser fingerprinting may not be effective on a very large scale, but it is an important factor in identifying users on a small scale. In future research, we need to prepare for new fingerprinting techniques using extension fingerprinting and newly introduced APIs, and develop methods to minimize user bias.

3.2. Tracking Prevention

Many tracking prevention tools use a method of blocking requests from the network based on filter lists, which block network requests matching the filter list from webpages based on URL-based lists such as EasyList and EasyPrivacy. This method produces these lists based on passive reporting, which has the advantage of quick detection with minimal resources. Additionally, there are filter lists such as EasyList, optimized for each country’s Internet environment, and users can even create their own filters. However, since there is a high possibility of a lack of detection due to the limited detection range, we introduce a study to train machine learning identifiers to increase detection rates and automation of filter list creation to make fingerprinting meaningless by randomizing the values of fingerprinting elements or identifying different browser environments.

3.2.1. Mainly Used Tracking Prevention Solution

To prevent tracking, many browsers and extensions block network requests directed to tracker URLs using filter lists. Brave enhances privacy by randomly generating return values for APIs commonly used for fingerprinting. Using the FPRandom method, trackers receive consistent fingerprinting results within the same session but receive varying values when attempting to fingerprint across different websites [54].

Chrome offers cross-site cookie blocking, but users need to install extensions like AdBlock, uBlock, or CanvasBlocker to block tracking, advertising, and fingerprinting. The mobile version of chrome is not as feature-rich as the desktop version. Specifically, advertising and fingerprinting blocking are not available on mobile versions, as well as the ability to install extensions.

Edge uses Disconnect’s blocking list to block trackers and advertisements. It blocks sites that perform known trackers, advertisements, and fingerprinting through the list. At this time, the user must use an extension program to block the fingerprinting API in addition to the known list. Users can build AdBlock Plus to block advertising on the mobile version [55].

Firefox also uses Disconnect’s blocking list, similar to Edge, to block tracker, advertising, and fingerprinting. Mobile versions allow installation of additional advertising and tracking blockers, such as uBlock Origin, Ghostery, and Privacy Badger. Edge and Firefox block fingerprinting sites using Disconnect’s block list, unlike Brave and Tor, which directly modify the browser’s JavaScript API. Disconnect distinguishes fingerprinting from non-listed sites by the domain name.

Safari uses Intelligent Tracking Prevention (ITP), which leverages machine learning to detect and prevent cross-site tracking by analyzing user interactions on the Internet. Safari also reveals domains that utilize DuckDuckGo’s tracker radar list for tracking purposes. Furthermore, Safari not only blocks cookie-based tracking but also provides solutions to stop fingerprinting and simplifies system configuration, making it harder for trackers to identify user devices. Safari achieves this by avoiding the use of custom tracking headers and unique identifiers in Web requests [56].

Samsung Internet offers Smart Anti-Tracking, similar to Safari’s ITP, using machine learning to identify and remove tracking cookies. Additionally, users can add ad-blocking lists such as AdBlock or Disconnect [57].

The Tor browser uses the Tor network to protect personal information and anonymity, which can randomly pass nodes running Tor routers around the world and arrive at their destination, preventing third parties from tracking Internet activity [58]. Brave also supports Tor network access, and although accessing the Web in this way enhances security, the Internet speed is very slow because it connects via multiple nodes, so there is a limit to using the Tor network for everyday purposes. In addition, as a way to prevent tracking, the Tor browser incorporates a NoScript extension to block unnecessary scripts [59]. However, developers do not recommend installing extensions, including advertising blockers (e.g., AdBlock), because installing extensions can make the browser’s fingerprints unique. The developers have also designed it to give all users the same fingerprint in order to prevent fingerprinting. As a result, it will be difficult for trackers to extract a unique fingerprint for each user [60].

Finally, most advertising and tracking blockers compare users to filter lists for requested and responded resources and block advertising tracking content, allowing the browser to render them. To block content, extensions block requests to URLs written in filter lists or use CSS rules that hide elements to make ads invisible to users [61].

Meanwhile, Privacy Badger uses heuristic methods to block domains that collect unique identifiers after sending Do Not Track and Global Privacy Control signals to block third-party tracker, cookie, and canvas fingerprinting [62]. Merzdovnik et al. [63] showed that filter list-based tools (e.g., AdBlock Plus, Disconnect, Ghostery, uBlock Origin) had a timeout rate of less than 10% when loading a website, while Privacy Badger had a better blocking rate than some filter list-based extensions. In other words, it is not possible to block all traces using online tools, and there is a trade-off that increases the inconvenience felt by users as the blocking rate increases.

3.2.2. Adding a Noise or Randomization

JavaScript APIs that output multimedia elements can vary based on the user’s hardware and environment settings, making it possible for trackers to extract a unique fingerprint through these APIs. Therefore, to thwart fingerprinting, one can incorporate a random value into the output or employ a technique that shows a non-specific value. This makes it impossible to accurately depict the real user’s environment through the direct or indirect output of the API.

Laperdrix et al. [39,40] developed Blink and FPRandom. Blink automatically reconfigures user platforms consisting of diversified platform components (DPCs) such as font, plugins, browsers, operating system, and CPU architecture, and uses virtual machines for this purpose. The disadvantage of Blink is that it relies on components that occupy a large amount of disk space, and it can be difficult to use virtual machines on low-performance computers. FPRandom modifies Firefox’s code and adds a random value to the value derived by the browser function to deliver a different return value between each browsing session. Plugins and font lists were two main attributes of identifying devices, but researchers considered canvas, AudioContext, and JavaScript property orders due to changes in fingerprinting techniques at the time.

FaizKhademi et al. [38] developed FPGuard by modifying the Chromium browser to randomize elements like canvas and font, which can be used for fingerprinting. In addition, since each API has its own characteristics, researchers proposed a randomization method that fitted the characteristics of APIs.

Nikiforakis et al. [42] developed PriVaricator. In order to randomize to a realistic value, PriVaricator takes a method such as selecting an arbitrary number within an appropriate numerical range or adding an in-range noise to the original value. Through this, researchers proposed a method for minimizing the phenomenon of site breaking and returning a more realistic value in the randomization process.

Baumann et al. [41] developed Disguised Chromium Browser. It complements the methods used in traditional FPRandom and PriVaricator (which continuously disable or randomize system and browser parameters) and is a solution developed in such a way that the values of elements change from session to session, randomizing features to more realistic values and preventing continuous changes to system parameters. It considers elements such as canvas, user-agent, screen size, etc., among browser history, navigator, and JavaScript APIs.

3.2.3. Making Non-Unique Fingerprints

This approach aims to standardize user fingerprints by making direct modifications to outputs or providing APIs for identifying users uniquely, thus reducing subtle differences. This aims to make identification difficult because trackers have many user-equivalent fingerprints. Wu et al. [16] formalized the rendering process by rewriting the OpenGL Shading Language (GLSL) program and conducting a UniGL study with the support of the WebGL function. WebGL fingerprinting is difficult to defend against as each browser may have different rendering results for programs written in GLSL. The researchers identified that floating-point operations in WebGL caused these inconsistencies. To address this issue, they redefined all floating-point operations by implementing the vertex shader in JavaScript, which allowed all floating-point operations to be performed by the JavaScript interpreter and the CPU without inconsistency. However, this method has a limitation, as timing side-channel attack, a fingerprinting technique that measures WebGL execution time, is vulnerable to WebGL itself. Additionally, trackers can perform fingerprinting through WebGL meta-information.

3.2.4. Tracking Detection Using Machine Learning

Advertising and tracking services hinder domain detection by using sophisticated tracking methods, such as distributing JavaScript code across multiple files, making it difficult for EasyList-based blocking tools to detect them. This has led to the need for machine learning techniques to be utilized in tracking detection, as suggested by recent studies [18,43,64].

Iqbal et al. [17,45] developed AdGraph and FP-Inspector to identify and block tracking traces using machine learning. AdGraph uses a Chromium-based Web browser to extract HTML, network, and JavaScript data, generates graphs, and performs tracking detection and blocking with a random forest-trained identifier.

However, Siby et al. [18] developed WebGraph as an improvement of AdGraph. AdGraph is vulnerable to current hidden techniques, so the researchers proposed ML-based methods for advertising and tracking prevention. Unlike AdGraph’s content detection, WebGraph uses action-based detection, which is more difficult to obfuscate. WebGraph builds graphs for actions and extracts flow features. The limitation of that work is that it only focused on a limited subset of browser APIs for advertising and tracking such as HTTP cookie headers, document.cookie, and window.localStorage, not considering other potential techniques.

FP-Inspector uses OpenWPM to extract information through static and dynamic analysis and learn user identifiers based on their fingerprinting behaviors. The static analysis approach involves transforming the script into an Abstract Syntax Tree (AST), extracting features, and applying machine learning. Researchers also proposed a dynamic analysis method that captured and analyzed the code after it had been executed in an instrumented browser. This enabled the unpacking of a script that used techniques such as eval or function. Finally, FP-Inspector extracts JavaScript API features from the code, represented by the AST, which trackers can use for fingerprinting. A dynamic analysis helps to analyze elements that a static analysis cannot analyze such as obfuscated code. Machine learning models using decision trees can use extracted features for training the classifier.

Yang et al. [43] proposed a graph neural network (GNN)-based detection method. The movement of resources between the browser and the server can be expressed as a graph, and the domain name and URL become properties of nodes and edges. This allows them to graph multiple HTTP requests, not just individual webpages. The GNN can then learn both explicit and implicit features of Web tracking and advertising requests. Unlike existing AdGraph and WebGraph studies, which only use explicit features and train a random forest model for single HTTP requests, the GNN-based method has higher detection rates.

3.2.5. JavaScript Dynamic Taint Analysis

Dynamic taint analysis (DTA) is a technology that tracks when a program designates a particular input as taint and then transfers the taint to another value [65]. If the script initiates a Web request that delivers contamination (fingerprint), DTA can block this request. Li et al. developed FPFlow [19], which monitors taint propagation in JavaScript objects during webpage visits and intercepts related requests. Research using FPFlow confirmed that 66.6% of the top 10,000 websites on Tranco used fingerprinting. In the future, taint analysis should expand its capabilities to also track implicit data flows and detect fingerprinting that does not rely on API returns, such as from WebRTC or JavaScript font probing.

3.3. Challenges

This section outlines the difficulties encountered by detection and prevention methods. Trackers employ page cloaking or bot detection to impede researchers from gathering webpages, leading to an ongoing battle to circumvent these techniques. Furthermore, we examine the limited browsing experience due to tracking prevention measures, evaluate the efficacy of existing studies, and propose a solution to each challenge.

3.3.1. Page Cloaking

Several studies, including Krumnow and Gossen et al. [66,67] have improved OpenWPM’s capabilities for crawling sites. However, if a site has a tracking script, it may detect the crawler and hinder its normal detection abilities, particularly for phishing sites. Zhang et al. [68] revealed that page cloaking used a combination of user interaction, bot behavior, and fingerprinting technology. Fingerprinting involves checking cookies, referrer, and user-agent to determine if the visitor is a person or an anti-phishing bot. To overcome this challenge, the study proposes a method to detect cloaking by comparing website visual similarities and analyzing code structure characteristics based on API calls and abstract syntax trees. To address the limitations of the previously discussed page cloaking, several improvements can be proposed. First, selective page cloaking can be implemented instead of full-scale cloaking. This approach applies context-specific cloaking that is triggered only for known bots or trackers. For instance, when a specific fingerprint matching common tracker characteristics is detected, a decoy page can be displayed. Second, dynamic page generation can be used to modify page elements each time the page is loaded, making it difficult for trackers to gather consistent data over multiple visits. This might involve subtle changes to the HTML structure, CSS, or JavaScript code to confuse scrapers and trackers. Finally, content decoy mechanisms can introduce a tracking decoy system that simulates fake user activities (e.g., clicks, scrolls) alongside cloaking. By introducing false behavioral data, this approach overwhelms trackers with misleading information, rendering their tracking efforts inefficient.

3.3.2. Bot Detection

Krummov et al. [66] analyzed the reliability of OpenWPM and found that scripts detecting the presence of display and wrapper functions could detect bots on OpenWPM, and at least 16.7% of sites on Tranco Top 100k recognized Selenium and OpenWPM. Vastel et al. [13] suggested that crawler detection could use fingerprinting. Accordingly, researchers conducted a study to analyze the script of the Alexa 10K site, modified the properties of the Web bot step by step, and checked the bot detection rate of the website. As a result, researchers confirmed that changing the attribute to match an actual browser resulted in a decrease in the detection rate. Daniel Goßen et al. [67] introduced the concept of JavaScript template attack, which allowed trackers to detect bots and create a healthy crawler. JavaScript accesses all child and parent objects through a prototype chain. In the DOM structure, window objects exist at the top, and the concept of JavaScript template attack is to explore all the properties that exist by traversing the window. By comparing the differences between the general browser environment and the Web bot environment, the attack can identify potential bots. However, the challenge of increased execution time between crawling and detecting sophisticated bot detection codes remains, despite solutions proposed by researchers. To address the limitations of the previously mentioned bot detection methods, several improvements can be proposed. First, AI-driven behavioral analytics: Advanced machine learning models that analyze subtle user behaviors (e.g., mouse movements, interaction timing) can be used to detect bots. AI can identify patterns that differentiate legitimate users from bots, while minimizing false positives. Second, botnet fingerprinting: A botnet detection system can be employed to identify common infrastructures or behaviors shared by known tracking bots or Web crawlers. This would enhance the detection of coordinated tracking attempts across multiple sites. Lastly, CAPTCHA alternatives: less intrusive and passive bot detection methods, such as behavioral biometrics that measure unique human behaviors (e.g., typing rhythm), can be implemented without requiring explicit user interaction.

3.3.3. Effectiveness (Blocking and Preventing Tracking)

Eckersley et al. [50] argued that advancements in technology made personal information more susceptible to fingerprinting. They provided an example of how trackers could easily track browsers with installed plugins, even if a user changed the value of fingerprintable elements like the user-agent or canvas or emptied the list of plugins. Most browsers have one or more plugins, making them easy targets for fingerprinting. Vastel et al. [69] developed FP-Scanner to detect the spoofing elements used in existing countermeasures against fingerprinting. Through the tool, they confirmed the mismatch of fingerprinting attributes, which showed that trackers could recover key attribute values. For instance, when a Firefox user on Windows uses Random Agent Spoofer (RAS) to spoof as Safari, trackers can easily recognize it as the agent information changes but elements like the OS or font remain the same. Therefore, more comprehensive countermeasures, such as randomizing browser elements to realistic values or randomized values for each session, will be necessary in addition to further research. To address the limitations in the effectiveness previously mentioned, several recommendations can be made. First, holistic tracker detection should be implemented by combining multiple detection layers, including DNS-based filtering, behavior blocking, and fingerprint resistance. A detection system can be deployed to flag suspicious third-party requests or tracking pixels when they are dynamically loaded. Second, cross-layer protection: Protection should be integrated across the browser, network, and device levels to prevent reliance on a single layer (e.g., cookies) for tracking. For instance, encrypted DNS (DNS-over-HTTPS) combined with network-level traffic analysis can stop DNS-based trackers from bypassing other blockers. Lastly, enhanced user education: Tools should be made more intuitive while offering better insights into what is being blocked. For example, integrating a “privacy score” can help users understand their overall risk level, enabling them to adjust their behavior or tools accordingly.

3.3.4. Arms Race to Produce Filter Lists against Trackers

Filter list-based detection and prevention methods have limited range, and their effectiveness varies based on the expertise of the person writing the rules, making them prone to false negatives. The process is often passive and relies on experts or crowds to generate rules. However, as there are many simple ways to bypass these measures, the competition between detection and circumvention remains ongoing, with each side constantly updating their methods [70]. Advertising-based businesses struggle with declining profits due to widespread use of ad blockers. To combat this, advertising publishers have created Anti-Ad-Block, which detects and bypasses ad blockers by obscuring site content or making it indistinguishable from normal content.

To solve this problem, Shafiq et al. [44] developed a CV-Inspector, a machine learning tool for detecting circumvention of ad-blocking. It analyzes the Anti-Circumvention Filter List (ACVL), a list designed to counter circumvention techniques and complement EasyList with additional filter rules and grammar. CV-Inspector identifies websites that bypass ad-blocking using the ACVL. However, its limitation is that it only detects automatic circumvention and cannot identify circumvention that requires user interaction, such as redirecting to an advertisement before displaying actual content.

On the other hand, AdGraph provides a graph-based detection method that uses machine learning to assist in creating filter lists. It can automate the creation of lists for websites with low traffic or in specific languages and regularly audit the lists to update obsolete rules. This is necessary because websites frequently update, and new ones emerge.

3.3.5. Limiting Browsing Experience

The Tor browser’s tracking blocking methods effectively block tracking, but they slow down performance compared to other browsers and limit the browser elements. According to official documents [71], Tor browser regulates various properties, such as plugins, canvas, WebGL, font, and screen resolution. It can negatively impact the browsing experience and result in users only using the Tor browser as a secondary option, even if it is less convenient in certain situations. In addition, blocking fingerprinting often leads to pages not working or appearing properly, as essential code and the tracking code combined on webpages can cause the function of fingerprintable APIs to be blocked or altered, leading to an inability to display normal content. To mitigate this issue, Smith et al. [72] proposed a solution using dynamic analysis to detect areas where APIs can reveal sensitive information in JavaScript code and suggested reducing page breakage by patching privacy-threatening areas through static code analysis.

4. Future Research Directions

This section outlines future research prospects. As tracking entities develop new methods to adapt to the evolving Web environment, it is essential to continuously explore strategies to counter these advancements. A comprehensive approach to preventing Web tracking should incorporate multiple techniques, including blocking tracking cookies and scripts, utilizing AI-based detection systems to identify and respond to sophisticated tracking behaviors in real time, and implementing multi-layer defenses at the browser, network, and device levels. This layered approach significantly enhances the system’s ability to block or mitigate tracking attempts, providing users with robust privacy protection while ensuring adaptability to evolving tracking technologies.

4.1. Evolution of Web Tracking Technology

Karami et al. [73] presented a method that could automatically identify the extensions installed by users. The method not only detected the behavior of modifying web-accessible resources (WAR) or the DOM of a webpage but also proposed a way to detect the behavior of requesting additional resources and the communication between the extension’s content scripts and webpages. Senol et al. [74] investigated cases in which the top 100,000 websites collected email addresses through third-party scripts even if users left the site without submitting their email and password at login. Laor et al. [75] introduced GPU fingerprinting technology which utilized JavaScript to generate unique device signatures based on the variations in speed among the GPU’s execution units. As tracking methods that use flash and plugins become obsolete and are replaced by JavaScript API-based fingerprinting, the competition to overcome the challenges mentioned earlier will continue, and new tracking techniques will emerge.

4.2. Combine Strategies for Layered Defense

The effectiveness of Web tracking prevention can be greatly enhanced by adopting a multi-layered defense strategy. This approach acknowledges that no single method can block all forms of tracking. Trackers employ a variety of techniques, including cookies, fingerprinting, and behavioral analysis, each targeting different aspects of user interaction. By combining multiple defense mechanisms such as DNS-based blocking, JavaScript blocking, and fingerprinting prevention measures, a robust and comprehensive protective layer is established. Each component addresses specific vulnerabilities, ensuring that if one defense is bypassed, others remain intact. For example, DNS-based filtering blocks connections to known tracking domains, while JavaScript blocking disables scripts that could initiate tracking. Implementing a layered system across the browser, network, and device levels minimizes potential vulnerabilities and creates an effective barrier against tracking attempts.

4.3. Leverage AI and Machine Learning for Detection

In the constantly evolving Web tracking landscape, trackers frequently adopt new and sophisticated evasion techniques that diminish the effectiveness of traditional blocklists. A promising solution is to integrate AI and machine learning (ML) models, rather than relying solely on domain-specific blocklists, to detect tracking behaviors. By analyzing patterns in network traffic, script execution, and browser interactions, ML models can identify abnormal behaviors indicative of tracking, even when trackers employ new or dynamically generated techniques. Additionally, ML models can continuously learn from real-time data, enhancing their ability to detect and block emerging tracking strategies. Adversarial machine learning techniques can also be utilized to predict future evasion tactics, enabling the system to more effectively counter upcoming threats. These AI-driven systems provide adaptive and dynamic protection, ensuring users are safeguarded from previously unknown tracking methods.

4.4. Holistic Tracker Detection Across Browser, Network, and Device Levels

Web tracking often functions across multiple layers, exploiting browser-level vulnerabilities, network-based identifiers (e.g., IP addresses), and device-specific characteristics like screen resolution or installed fonts. To ensure comprehensive protection against tracking, it is essential to adopt a holistic approach that safeguards users at all of these levels. At the browser level, advanced techniques such as fingerprint obfuscation and cookie isolation can prevent trackers from collecting identifying information. At the network level, encrypted DNS and VPNs can anonymize the user’s location and network activity, further concealing their identity. Additionally, at the device level, periodically randomizing or obfuscating parameters like screen resolution or hardware information can disrupt device fingerprinting. By integrating protection across all these levels, this approach creates synergy between them, making it significantly more difficult for trackers to gather meaningful data from users.

4.5. Automated Privacy Audits and Monitoring

Given the complexity of modern Web tracking technologies, it is crucial to continuously monitor and audit browser activities to detect hidden or emerging tracking methods. Automated privacy audits can provide real-time analysis of user interactions and network traffic, identifying potentially malicious or suspicious tracking behaviors. Privacy audit tools can flag abnormal activities by analyzing fingerprinting API executions, the use of tracking pixels, and the presence of cross-site requests. Additionally, anomaly detection algorithms can alert users to unexpected or privacy-threatening behaviors, allowing them to take action before their data are compromised. Automated penetration testing of the browser environment can also proactively identify vulnerabilities that trackers might exploit. This continuous monitoring system enhances privacy by providing dynamic, real-time insights into user security and enabling immediate threat mitigation.

4.6. User Awareness and Control with Detailed Feedback

Dambra et al. [76] reported that 42.7% of Internet users worldwide have installed browser extensions to block advertisements. General users tend to have lower awareness of trackers compared to advertisements, underscoring the need for methods to raise awareness [77]. A key aspect of effective tracking prevention is providing users with clear and actionable feedback on their privacy settings and potential tracking attempts. By incorporating a user-friendly privacy dashboard, browsers or privacy tools can offer detailed insights into blocked trackers, cookies, and scripts, helping users understand which data are being collected and by whom. These dashboards can present metrics such as the number of blocked tracking attempts, cross-site data flows, and detected third-party scripts, allowing users to make informed decisions about their privacy. In addition to visual feedback, users should be given access to granular privacy controls, enabling them to tailor their level of protection. Offering various privacy presets (e.g., standard, strict, or custom) helps strike a balance between usability and security. With these controls, users can adjust their privacy settings to suit their browsing needs, ensuring a smooth user experience without compromising security.

5. Conclusions

Advancements in Web technology have led to increasingly diverse browser functions and a more sophisticated Web experience for users. As websites meticulously record user interactions, including the pages visited and the items clicked, the technologies employed to collect and track personal information have also evolved, intensifying the competition between tracking technologies and anti-tracking solutions. Given the difficulty of entirely avoiding Web tracking, researchers are actively developing methods to help users mitigate the security threats they may encounter. This paper introduces various Web tracking technologies that collect user information and provides an overview of the research and solutions proposed for the detection and prevention of tracking on the web. The paper reviews Web measurement tools and research conducted on both desktop and mobile platforms, as well as large-scale studies that measure user browser activities. It introduces techniques employed to prevent tracking, which primarily involve blocking network requests, adding random values to API outputs, or modifying APIs to standardize user fingerprints. Countermeasures against Web tracking focus on addressing the ongoing arms race between tracking techniques and anti-tracking tools. Web measurement frameworks like OpenWPM and FP-Radar help uncover tracking practices, while browsers and extensions use methods such as tracker/ad blocking, fingerprinting prevention, and cookie isolation. Machine learning plays an increasing role in detecting sophisticated tracking strategies, though challenges like page cloaking, bot detection, and limitations of current tools persist. Proposed research directions include AI-driven analytics, dynamic taint analysis, noise/randomization techniques, and cross-layer protection to enhance tracking prevention and safeguard user privacy. Effective Web tracking prevention requires a multi-layered approach that combines DNS and JavaScript blocking, fingerprinting prevention, and AI-driven detection to counter evolving tracking techniques. A holistic defense across browser, network, and device levels ensures comprehensive protection, while automated audits and user-friendly dashboards provide real-time feedback and control. Enhancing user awareness and offering customizable privacy settings empower users to better manage their privacy and safeguard against tracking attempts.

Author Contributions

Conceptualization, H.C. and K.S.; methodology, K.S.; software, H.C.; validation, H.C., K.S. and H.H.; formal analysis, K.S.; investigation, H.H.; resources, H.H.; data curation, K.S.; writing—original draft preparation, K.S. and H.H.; writing—review and editing, K.S. and H.C.; visualization, H.C.; project administration, H.C. and K.S.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant number RS-2024-00399389, funded by the Korea government (MSIT). The APC was funded by IITP.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Falahrastegar, M.; Haddadi, H.; Uhlig, S.; Mortier, R. Tracking Personal Identifiers Across the Web. In Proceedings of the Passive and Active Measurement: 17th International Conference, Fukuoka, Japan, 31 March–1 April 2016; Springer: Cham, Switzerland, 2016; pp. 230–241. [Google Scholar]
Castell-Uroz, I.; Solé-Pareta, J.; Barlet-Ro, P. TrackSign: Guided Web Tracking Discovery. In Proceedings of the IEEE Annual Joint Conference: INFOCOM, IEEE Computer and Communications Societies, Vancouver, BC, Canada, 10–13 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–10. [Google Scholar]
Mikians, J.; Gyarmati, L.; Erramilli, V.; Inc, G.; Laoutaris, N. Detecting price and search discrimination on the internet. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks, 2012, Redmond, WA, USA, 29–30 October 2012; pp. 79–84. [Google Scholar]
Mikians, J.; Gyarmati, L.; Erramilli, V.; Laoutaris, N. Crowd-assisted search for price discrimination in e-commerce: First results. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies, Santa Barbara, CA, USA, 9–12 December 2013; pp. 1–6. [Google Scholar]
Mayer, J.R.; Mitchell, J.C. Third-Party Web Tracking: Policy and Technology. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 20–23 May 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 413–427. [Google Scholar]
Bujlow, T.; Carela-Español, V.; Solé-Pareta, J.; Barlet-Ros, P. A Survey on Web Tracking: Mechanisms, Implications, and Defenses. Proc. IEEE 2017, 105, 1476–1510. [Google Scholar] [CrossRef]
Königs, P. Government surveillance, privacy, and legitimacy. Philos. Technol. 2022, 35, 8. [Google Scholar] [CrossRef]
Hannak, A.; Soeller, G.; Lazer, D.; Mislove, A.; Wilson, C. Measuring price discrimination and steering on e-commerce web sites. In Proceedings of the Conference on Internet Measurement Conference, Vancouver, BC, Canada, 5–7 November 2014; pp. 305–318. [Google Scholar]
Castell-Uroz, I.; Solé-Pareta, J.; Barlet-Ros, P. Network measurements for web tracking analysis and detection: A tutorial. IEEE Instrum. Meas. Mag. 2020, 23, 50–57. [Google Scholar] [CrossRef]
Papadogiannakis, E.; Papadopoulos, P.; Kourtellis, N.; Markatos, E.P. User Tracking in the Post-cookie Era: How Websites Bypass GDPR Consent to Track Users. In Proceedings of the International World Wide Web Conference, Ljubljana, Slovenia, 19–23 April 2021; pp. 2130–2141. [Google Scholar]
Acar, G.; Juarez, M.; Nikiforakis, N.; Diaz, C.; Gürses, S.; Piessens, F.; Preneel, B. FPDetective: Dusting the Web for Fingerprinters. In Proceedings of the 2013 ACM SIGSAC Conference on Computer Communications Security (CCS ’13), Berlin, Germany, 4–8 November 2013; pp. 1129–1140. [Google Scholar] [CrossRef]
Englehardt, S.; Narayanan, A. OpenWPM: An Automated Platform for Web Privacy Measurement. Proc. Priv. Enhancing Technol. 2016, 3, 28–42. [Google Scholar]
Vastel, A.; Rudametkin, W.; Rouvoy, R.; Blanc, X. FP-Crawlers: Studying the resilience of browser fingerprinting to block crawlers. In Proceedings of the MADWeb’20-NDSS Workshop on Measurements, Attacks, and Defenses for the Web, San Diego, CA, USA, 23 February 2020. [Google Scholar]
Bahrami, P.N.; Iqbal, U.; Shafiq, Z. FP-Radar: Longitudinal Measurement and Early Detection of Browser Fingerprinting. arXiv 2021, arXiv:2112.01662. [Google Scholar] [CrossRef]
Cassel, D.; Lin, S.C.; Buraggina, A.; Wang, W.; Zhang, A.; Bauer, L.; Hsiao, H.C.; Jia, L.; Libert, T. OmniCrawl: Comprehensive Measurement of Web Tracking With Real Desktop and Mobile Browsers. Proc. Priv. Enhancing Technol. 2022, 2022, 227–252. [Google Scholar] [CrossRef]
Wu, S.; Li, S.; Cao, Y.; Wang, N. Rendered Private: Making GLSL Execution Uniform to Prevent WebGL-based Browser Fingerprinting. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 1645–1660. [Google Scholar]
Iqbal, U.; Snyder, P.; Zhu, S.; Livshits, B.; Qian, Z.; Shafiq, Z. Adgraph: A graph-based approach to ad and tracker blocking. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 763–776. [Google Scholar]
Siby, S.; Iqbal, U.; Englehardt, S.; Shafiq, Z.; Troncoso, C. WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 2875–2892. [Google Scholar]
Li, T.; Zheng, X.; Shen, K.; Han, X. FPFlow: Detect and Prevent Browser Fingerprinting with Dynamic Taint Analysis. In Proceedings of the China Cyber Security Annual Conference, Beijing, China, 20–21 July 2021; Springer: Singapore, 2021; pp. 51–67. [Google Scholar]
Johns, M. SessionSafe: Implementing XSS Immune Session Handling. In Proceedings of the European Symposium on Research in Computer Security, Hamburg, Germany, 18–20 September 2006. [Google Scholar]
Nikiforakis, N.; Meert, W.; Younan, Y.; Joosen, M.J.W. SessionShield: Lightweight Protection against Session Hijacking. In Proceedings of the Engineering Secure Software and Systems, Madrid, Spain, 9–10 February 2011.
Pantelic, O.; Jovic, K.; Krstovic, S. Cookies implementation analysis and the impact on user privacy regarding GDPR and CCPA regulations. Sustainability 2022, 14, 5015. [Google Scholar] [CrossRef]
Sipior, J.C.; Ward, B.T.; Mendoza, R.A. Online privacy concerns associated with cookies, flash cookies, and web beacons. J. Internet Commer. 2011, 10, 1–16. [Google Scholar] [CrossRef]
Nikiforakis, N.; Kapravelos, A.; Joosen, W.; Kruegel, C.; Piessens, F.; Vigna, G. Cookieless Monster: Exploring the Ecosystem of Web-Based Device Fingerprinting. In Proceedings of the IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 19–22 May 2013; pp. 541–555. [Google Scholar]
Yen, T.F.; Xie, Y.; Yu, F.; Yu, R.P.; Abadi, M. Host Fingerprinting and Tracking on the Web: Privacy and Security Implications. In Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 5–8 February 2012; p. 66. [Google Scholar]
Sirinam, P.; Imani, M.; Juarez, M.; Wright, M. Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1928–1943. [Google Scholar]
Mowery, K.; Shacham, H. Pixel perfect: Fingerprinting canvas in HTML5. In Proceedings of the W2SP, San Francisco, CA, USA, 20–23 May 2012. [Google Scholar]
Huang, Y.W.; Huang, S.K.; Lin, T.P.; Tsai, C.H. Web application security assessment by fault injection and behavior monitoring. In Proceedings of the Web Application Security Assessment by Fault Injection and Behavior Monitoring, Budapest, Hungary, 20–24 May 2003; pp. 148–159. [Google Scholar]
Barth, A.; Felt, A.P.; Saxena, P.; Boodman, A. Protecting Browsers from Extension Vulnerabilities. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 28 February–3 March 2010. [Google Scholar]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
Arunagiri, J.; Rakhi, S.; Jevitha, K. A Systematic Review of Security Measures for Web Browser Extension Vulnerabilities. In Proceedings of the International Conference on Soft Computing Systems: ICSCS; Springer: New Delhi, India, 2015; pp. 99–112. [Google Scholar]
Ter Louw, M.; Lim, J.S.; Venkatakrishnan, V.N. Enhancing web browser security against malware extensions. J. Comput. Virol. 2008, 4, 175–195. [Google Scholar] [CrossRef]
Šilić, M.; Krolo, J.; Delač, G. Security vulnerabilities in modern web browser architecture. In Proceedings of the 33rd International Convention MIPRO, Opatija, Croatia, 24–28 May 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1240–1245. [Google Scholar]
Roth, S.; Calzavara, S.; Wilhelm, M.; Rabitti, A.; Stock, B. The Security Lottery: Measuring Client-Side Web Security Inconsistencies. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 2047–2064. [Google Scholar]
Sanchez-Rola, I.; Dell’Amico, M.; Kotzias, P.; Balzarotti, D. Can I Opt Out Yet? GDPR and the Global Illusion of Cookie Control. In Proceedings of the Asia CCS’19: 2019 ACM Asia Conference on Computer and Communications Security, Auckland, New Zealand, 9–12 July 2019; pp. 340–351. [Google Scholar]
Historical Yearly Trends in the Usage Statistics of Client-Side Programming Languages for Websites. Available online: https://w3techs.com/technologies/history_overview/client_side_language/all/y (accessed on 15 October 2022).
Adobe Flash Player EOL General Information Page. Available online: https://www.adobe.com/products/flashplayer/end-of-life.html (accessed on 1 January 2021).
FaizKhademi, A.; Zulkernine, M.; Weldemariam, K. FPGuard: Detection and prevention of browser fingerprinting. In Proceedings of the IFIP Annual Conference on Data and Applications Security and Privacy, Fairfax, VA, USA, 13–15 July 2015; Springer: Cham, Switzerland, 2015; pp. 293–308. [Google Scholar]
Laperdrix, P.; Rudametkin, W.; Baudry, B. Mitigating browser fingerprint tracking: Multi-level reconfiguration and diversification. In Proceedings of the 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Florence, Italy, 18–19 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 98–108. [Google Scholar]
Laperdrix, P.; Baudry, B.; Mishra, V. FPRandom: Randomizing core browser objects to break advanced device fingerprinting techniques. In Proceedings of the International Symposium on Engineering Secure Software and Systems, Bonn, Germany, 3–5 July 2017; Springer: Cham, Switzerland, 2017; pp. 97–114. [Google Scholar]
Baumann, P.; Katzenbeisser, S.; Stopczynski, M.; Tews, E. Disguised chromium browser: Robust browser, flash and canvas fingerprinting protection. In Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society, Vienna, Austria, 24 October 2016; pp. 37–46. [Google Scholar]
Nikiforakis, N.; Joosen, W.; Livshits, B. Privaricator: Deceiving fingerprinters with little white lies. In Proceedings of the 4th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 820–830. [Google Scholar]
Yang, Z.; Pei, W.; Chen, M.; Yue, C. WTAGRAPH: Web Tracking and Advertising Detection using Graph Neural Networks. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 22–26 May 2022. [Google Scholar]
Hieu, L.; Athina, M.; Zubair, S. CV-Inspector: Towards Automating Detection of Adblock Circumvention. In Proceedings of the Network and Distributed System Security Symposium (NDSS), Virtual, 21–25 February 2021. [Google Scholar]
Iqbal, U.; Englehardt, S.; Shafiq, Z. Fingerprinting the fingerprinters: Learning to detect browser fingerprinting behaviors. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1143–1161. [Google Scholar]
Mobile-Desktop-Internet-Usage-Statistics. Available online: https://www.broadbandsearch.net/blog/mobile-desktop-internet-usage-statistics (accessed on 1 January 2022).
Das, A.; Acar, G.; Borisov, N.; Pradeep, A. The web’s sixth sense: A study of scripts accessing smartphone sensors. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1515–1532. [Google Scholar]
Zhang, J.; Beresford, A.R.; Sheret, I. Sensorid: Sensor calibration fingerprinting for smartphones. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 638–655. [Google Scholar]
About Cover Your Tracks. Available online: https://coveryourtracks.eff.org/about (accessed on 1 September 2022).
Eckersley, P. How unique is your web browser? In Proceedings of the International Symposium on Privacy Enhancing Technologies Symposium, Berlin, Germany, 21–23 July 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 1–18. [Google Scholar]
Am I Unique. Available online: https://amiunique.org/ (accessed on 1 September 2022).
Laperdrix, P.; Rudametkin, W.; Baudry, B. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 878–894. [Google Scholar]
Gómez-Boix, A.; Laperdrix, P.; Baudry, B. Hiding in the crowd: An analysis of the effectiveness of browser fingerprinting at large scale. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 309–318. [Google Scholar]
Fingerprinting Defenses 2.0. Available online: https://brave.com/privacy-updates/4-fingerprinting-defenses-2.0/#1-past-and-current-generation-fingerprinting-protections (accessed on 1 May 2020).
Tracking Prevention in Microsoft Edge. Available online: https://learn.microsoft.com/en-us/microsoft-edge/web-platform/tracking-prevention#classification (accessed on 1 May 2022).
Intelligent Tracking Prevention. Available online: https://www.simoahava.com/privacy/intelligent-tracking-prevention-ios-14-ipados-14-safari-14/ (accessed on 1 September 2020).
How to Keep Spam Away from Your Smartphone. Available online: https://news.samsung.com/global/how-to-keep-spam-away-from-your-smartphone (accessed on 1 September 2020).
About Tor Browser. Available online: https://tb-manual.torproject.org/about/ (accessed on 1 October 2022).
Should I Install a New Add-On or Extension in Tor Browser, like AdBlock Plus or uBlock Origin? Available online: https://support.torproject.org/ (accessed on 1 October 2022).
Browser Fingerprinting: An Introduction and the Challenges Ahead. Available online: https://blog.torproject.org/browser-fingerprinting-introduction-and-challenges-ahead/ (accessed on 1 September 2019).
How Do Ad Blockers Work? A Guide For Publishers. Available online: https://www.kevel.com/blog/how-ad-blockers-work/ (accessed on 1 April 2020).
Does Privacy Badger Contain a List of Blocked Sites? Available online: https://privacybadger.org/#Does-Privacy-Badger-contain-a-list-of-blocked-sites (accessed on 1 October 2022).
Merzdovnik, G.; Huber, M.; Buhov, D.; Nikiforakis, N.; Neuner, S.; Schmiedecker, M.; Weippl, E. Block me if you can: A large-scale study of tracker-blocking tools. In Proceedings of the 2017 IEEE European Symposium on Security and Privacy (EuroS&P), Paris, France, 26–28 April 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 319–333. [Google Scholar]
Munir, S.; Siby, S.; Iqbal, U.; Englehardt, S.; Shafiq, Z.; Troncoso, C. Cookiegraph: Understanding and detecting first-party tracking cookies. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023. [Google Scholar]
Kang, M.G.; McCamant, S.; Poosankam, P.; Song, D. Dta++: Dynamic taint analysis with targeted control-flow propagation. In Proceedings of the NDSS, San Diego, CA, USA, 6 February–9 February 2011. [Google Scholar]
Krumnow, B.; Jonker, H.; Karsch, S. Analysing and strengthening OpenWPM’s reliability. arXiv 2022, arXiv:2205.08890. [Google Scholar]
Goßen, D.; Jonker, I.H.; Poll, I.E. Design and Implementation of a Stealthy OpenWPM Web Scraper. Master’s Thesis, Radboud University Nijmegen, Nijmegen, The Netherlands, 2020. [Google Scholar]
Zhang, P.; Oest, A.; Cho, H.; Sun, Z.; Johnson, R.; Wardman, B.; Sarker, S.; Kapravelos, A.; Bao, T.; Wang, R.; et al. Crawlphish: Large-scale analysis of client-side cloaking techniques in phishing. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1109–1124. [Google Scholar]
Vastel, A.; Laperdrix, P.; Rudametkin, W.; Rouvoy, R. Fp-Scanner: The Privacy Implications of Browser Fingerprint Inconsistencies. In Proceedings of the 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, 15–17 August 2018; pp. 135–150. [Google Scholar]
Mughees, M.H.; Qian, Z.; Shafiq, Z.; Dash, K.; Hui, P. A first look at ad-block detection: A new arms race on the web. arXiv 2016, arXiv:1605.05841. [Google Scholar]
Cross-Origin Fingerprinting Unlinkability. Available online: https://2019.www.torproject.org/projects/torbrowser/design/#fingerprinting-linkability (accessed on 1 September 2019).
Smith, M.; Snyder, P.; Livshits, B.; Stefan, D. SugarCoat: Programmatically Generating Privacy-Preserving, Web-Compatible Resource Replacements for Content Blocking. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 15–19 November 2021; pp. 2844–2857. [Google Scholar]
Karami, S.; Ilia, P.; Solomos, K.; Polakis, J. Carnus: Exploring the Privacy Threats of Browser Extension Fingerprinting. In Proceedings of the 27th Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 23–26 February 2020. [Google Scholar]
Senol, A.; Acar, G.; Humbert, M.; Borgesius, F.Z. Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 1813–1830. [Google Scholar]
Laor, T.; Mehanna, N.; Durey, A.; Dyadyuk, V.; Laperdrix, P.; Maurice, C.; Oren, Y.; Rouvoy, R.; Rudametkin, W.; Yarom, Y. DRAWNAPART: A Device Identification Technique based on Remote GPU Fingerprinting. arXiv 2022, arXiv:2201.09956. [Google Scholar]
Dambra, S.; Sanchez-Rola, I.; Bilge, L.; Balzarotti, D. When Sally Met Trackers: Web Tracking From the Users’ Perspective. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 2189–2206. [Google Scholar]
Ad Blocker Usage and Demographic Statistics in 2022. Available online: https://backlinko.com/ad-blockers-users (accessed on 1 March 2021).

Table 1. Overview of Tracking Measurement Tools.

Attribute	FourthParty	FPDetective	OpenWPM	FP-Crawler	FP-Radar	OpenWPM-Mobile	OmniCrawl
Cookies	✓	-	✓	-	-	✓	✓
Window	✓	✓	✓	✓	-	✓	-
Navigator	✓	✓	✓	✓	✓	✓	-
Screen	✓	-	-	✓	-	-	-
HTML elements	-	✓	-	-	-	-	-
Resource loads	✓	-	-	-	-	-	-
CSS font	-	✓	-	-	-	✓	✓
Canvas	-	-	✓	✓	-	✓	✓
WebRTC	-	-	✓	✓	-	✓	✓
Audio	-	-	✓	✓	-	✓	✓
Plugin access	-	-	✓	-	-	✓	✓
MIME type access	-	-	✓	-	-	✓	-
WebGL	-	-	-	✓	✓	-	✓
Audio	-	-	-	-	-	-	✓
Network information	-	-	-	-	✓	-	-
Mouse	-	-	-	-	✓	-	-
Performance	-	-	-	-	✓	-	-
Geolocation	-	-	-	-	✓	-	-
Web worker	-	-	-	-	✓	-	-
Battery	-	-	-	-	✓	-	✓
Sensor	-	-	-	-	✓	✓	✓
Gamepad	-	-	-	-	✓	-	-
Clipboard	-	-	-	-	✓	-	-
Touch	-	-	-	-	✓	-	-

Table 2. Overview of PC browser features.

Name	Tracker and Ad Blocking	Fingerprinting Prevention	Cross-Site Cookie Prevention
Brave Browser
Chrome
Edge
Firefox
Opera
Safari
Tor Browser

: In-browser feature support. Futureinternet 16 00363 i002

: Feature can be used by installing extensions. Futureinternet 16 00363 i003

: Not supported.

Table 3. Overview of mobile browser features.

Name	Tracker and Ad Blocking	Fingerprinting Prevention	Cross-Site Cookie Prevention
Brave Browser
Chrome
Edge
Firefox
Opera
Samsung Internet
Safari
Tor Browser

: In-browser feature support. Futureinternet 16 00363 i002

: Feature can be used by installing extensions. Futureinternet 16 00363 i003

: Not supported.

Table 4. Overview of tracking prevention tools and solutions.

Name	Type	Prevention Method
FPGuard [38]	Adding a noise or randomization	Using simple element randomization for canvas, font enumeration, Flash-based fingerprinting, and JavaScript objects’ fingerprinting.
Blink [39]		Reconstructs elements called diversified platform components (DPCs) such as font, plugins, browsers, operating system, and CPU architecture.
FPRandom [40]		By modifying Firefox’s code, random values are added to the values derived by the browser function to deliver different return values between each browsing session.
DCB [41]		Instead of disabling or randomizing system and browser parameters, it is a solution developed in such a way that the value of the element changes every session.
PriVaricator [42]		PriVaricator intercepts each access to the DOM attribute and uses a series of random policies to change the returned value.
UniGL [16]	Making non-unique fingerprints	Rewrite the GLSL program and standardize the rendering process with the support of WebGL features.
AdGraph [17]	Tracking detection using machine learning	AdGraph is a solution that extracts the structural and content features of webpages and classifies malicious behaviors using a supervised learning-based random forest technique.
WebGraph [18]		WebGraph trains identifiers by featuring action behaviors that are difficult for trackers to obfuscate.
WTAGraph [43]		WTAGraph configures a graph representing HTTP network traffic and build a graph neural network (GNN) based on it to detect Web tracking and advertising.
CV-Inspector [44]		Automation tool for filter-list curators to help them focus their inspection efforts on discovering new sites that employ circumvention.
FP-Inspector [45]		ML-based approach that combines static and dynamic JavaScript analysis to counter browser fingerprinting.
FPFlow [19]	JavaScript taint analysis	FPFlow checks for taint propagation between JavaScript objects in scripts during webpage visits and intercepts requests.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sim, K.; Heo, H.; Cho, H. Combating Web Tracking: Analyzing Web Tracking Technologies for User Privacy. Future Internet 2024, 16, 363. https://doi.org/10.3390/fi16100363

AMA Style

Sim K, Heo H, Cho H. Combating Web Tracking: Analyzing Web Tracking Technologies for User Privacy. Future Internet. 2024; 16(10):363. https://doi.org/10.3390/fi16100363

Chicago/Turabian Style

Sim, Kyungmin, Honyeong Heo, and Haehyun Cho. 2024. "Combating Web Tracking: Analyzing Web Tracking Technologies for User Privacy" Future Internet 16, no. 10: 363. https://doi.org/10.3390/fi16100363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combating Web Tracking: Analyzing Web Tracking Technologies for User Privacy

Abstract

1. Introduction

2. Tracking on the Web

2.1. Web Tracking Methods

2.1.1. Cookie

Types of Cookies

Security Risks of Cookies

Cookie Security Measures

2.1.2. Web Beacon

2.1.3. Referrer Header

2.1.4. IP Address-Based Tracking

2.1.5. Fingerprinting

2.1.6. Canvas Fingerprinting

2.1.7. Third-Party Tracking Scripts

2.2. How Web Tracking Works

2.2.1. Script

2.2.2. Browser Extensions

2.2.3. Browser Plugins

2.3. Data Privacy Regulation on Web Tracking

3. Countermeasures against Web Tracking

3.1. Tracking Measurement

3.1.1. Tracking Measurements in Mobile Devices

3.1.2. Measurement of User Browsers on a Large Scale

3.2. Tracking Prevention

3.2.1. Mainly Used Tracking Prevention Solution

3.2.2. Adding a Noise or Randomization

3.2.3. Making Non-Unique Fingerprints

3.2.4. Tracking Detection Using Machine Learning

3.2.5. JavaScript Dynamic Taint Analysis

3.3. Challenges

3.3.1. Page Cloaking

3.3.2. Bot Detection

3.3.3. Effectiveness (Blocking and Preventing Tracking)

3.3.4. Arms Race to Produce Filter Lists against Trackers

3.3.5. Limiting Browsing Experience

4. Future Research Directions

4.1. Evolution of Web Tracking Technology

4.2. Combine Strategies for Layered Defense

4.3. Leverage AI and Machine Learning for Detection

4.4. Holistic Tracker Detection Across Browser, Network, and Device Levels

4.5. Automated Privacy Audits and Monitoring

4.6. User Awareness and Control with Detailed Feedback

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI