1. Introduction
The application of machine learning (ML) techniques for a better understanding of existing transportation-related phenomena, and for predicting future ones, has been constantly growing in recent years. Many studies analyze the implementation of ML methods to a specific transport-related problem; the work of Alexandre et al. [
1] is just one example of many. Large-scale literature reviews focusing on the use of data mining techniques in the transport domain are as yet scarce [
2].
Neilson et al. [
3] used a literature review to understand the current state of scientific research on the application of big data and analytics in transport authorities. Following the articles’ extraction process, based on general search terms, an additional filtering process resulted in 173 papers. After an initial manual screening of the papers’ relevancy, 56 papers were found to be relevant and 28 of them were eventually thoroughly reviewed. The authors mainly addressed the aspects of data collection, data storage, data quality, data security, and big data application in the transportation domain. The aspect of the applied ML models was outside the core focus of this review.
Kaffash et al. [
3] examined Big Data algorithms in Intelligent Transportation Systems (ITS) through an extensive literature review and a bibliometric analysis. 1086 papers published between the years 1997 and 2019 were retrieved, using a combination of the term ITS and one of six general terms associated with big data. Manual filtering, aiming to exclude articles out of the study’s scope, resulted in 586 scientific papers. Using bibliometric analysis, the authors presented several indicators regarding the contribution of authors, journals, institutions, and countries to the body of knowledge. One interesting, although expected, observation was the exponential growth in the number of relevant articles from 2013 onward. The extraction of the 50 most used keywords revealed that half of these terms are transportation related. The extraction of the 50 most used keywords revealed seven distinct transport-related areas, with traffic flow prediction being the most prominent, followed by traffic sign recognition, vehicle detection, vehicle classification, travel time prediction, traffic management, and license plate recognition. As for ML-related techniques, deep learning and artificial neural networks (ANN) were dominant, followed by support vector machines and time series.
Behrooz and Hayeri [
4] conducted a deep analysis of the use of various ML algorithms for addressing problems in the field of surface transportation systems. A manual-based approach was employed on a collection of 100 research papers to identify the issues addressed in these studies and the ML algorithms employed to address these issues. The primary constraint, identified by the authors, to expanding and deepening the use of sophisticated ML algorithms for solving problems in the transportation systems is the lack of publicly available, high-quality spatial data.
The capacity to discern trends in the utilization of diverse ML techniques across the sub-domains of sustainable transportation, particularly through automated means, facilitating the continual monitoring of these trends, offers enhanced value to both researchers and practitioners. It empowers researchers to pinpoint potential underutilization of specific ML techniques in distinct transportation sectors, prompting exploration as viable research avenues. For practitioners, these trends serve as a foundation for choosing the most suitable ML approach for addressing specific issues. Notably, inquiries into the current relevance of more traditional methods, such as time series analysis or graph-based techniques, can be addressed by investigating their presence in ongoing research studies.
The work described in this article aims to cover a wide range of scientific publications within the surface transportation domain. It involves classifying them into six sub-domains, which were defined under the vision of zero externalities, and exploring the type of ML techniques that were applied to tackle the issues these studies address. Given this broad scope, the entire process of extracting relevant papers is automatic.
The contribution of this paper is threefold:
Developing a methodology to extract and categorize a large corpus of scientific articles related to transport into specific sub-domains. The developed approach specifically tackles the overlap between these sub-domains, an aspect that, to our knowledge, has not been previously addressed in the field of transportation.
Developing a systematic approach based on ontology reasoning for identifying the main ML techniques that are applied to the various research questions addressed in the scientific journals.
Identifying the intensity with which each ML technique is applied to the main sub-domains of transportation.
The structure of the paper follows the following research steps:
Section 2 outlines the transportation sub-domains addressed in this study, while
Section 3 details the methodology used to establish the infrastructure for classifying papers into the transportation sub-domains. The initial classification results are presented in
Section 4, which led to the development of an improved classification methodology described in
Section 5. The final paper classification results are presented in
Section 6.
Section 7 outlines the methodology for identifying the ML techniques used in each article, and
Section 8 presents the findings regarding the share of each ML technique across years and across transportation sub-domains. Conclusions, limitations, and future research are described in
Section 9.
2. Sub-Domains of Transportation
The Israeli Smart Transportation Research Center (ISTRC—
https://istrc.net.technion.ac.il/ (accessed on 20 December 2023)) was established jointly by the Smart Mobility Initiative in Israel’s Prime Minister’s Office and the Council For Higher Education, with the aim of encouraging research and development, entrepreneurship, and industry in the field of smart mobility in Israel. The center aims to promote a zero externalities vision: zero casualties, zero delays and zero environmental damage.
Nine professional committees, composed of leading researchers and practitioners from the academy, industry, and public sector, were established to cover relevant research topics, with six committees operating in key areas of transportation and three focusing on methodological directions and enabling technologies. It should be noted that ISTRC focuses on land transport, i.e., marine and air transport (excluding drones) are beyond its scope.
The six key sub-domains of sustainable transportation, defined by the ISTRC, serve as a basis for classifying research papers into transport-related sub-domains. The following is a concise description of each sub-domain, based on its definition as given by the ISTRC:
Policy, transportation planning, and smart cities—Developing, planning, and policy tools that integrate urban land use and infrastructure planning for both passengers and goods are at the core of this sub-domain. It covers research works focusing on the effective planning of transport services, specifically addressing aspects of public space allocation, regulation, new models for public-private partnership (PPP), ecology, law, taxation and incentives, and social justice and equity issues.
Mobility safety and security—This sub-domain addresses safety and personal security aspects, concerning all types of road users and transport modes throughout the entire road network. Smart mobility to enhance the safety of vulnerable users, safer integration of new micro-mobility modes and autonomous transport modes into the transportation system, and advanced technologies to mitigate road users’ errors are at the core of this sub-domain.
Road users’ behavior—This sub-domain focuses on understanding travelers’ behavior in response to new technologies and policies as a basis for proper planning and the determination of measures and incentives that will encourage them to increase vehicle occupancy, reduce motorized travel, and minimize congestion and emissions. This sub-domain encompasses demand modeling, particularly in conjunction with activity purposes, behavioral change, and incentives.
Innovative transportation services—This sub-domain covers research studies focusing on improving transport services for passengers and goods and promoting the reduction of private vehicle modal share. Studies addressing Mobility as a Service (MaaS), the integration of public transport and new innovative modes (Hyper Loop, drones, etc.), and personalized transport services for passengers and goods, including the various forms of shared mobility, are within the scope of this sub-domain.
Traffic management—This sub-domain encompasses new methodologies and algorithms for efficient traffic management and control, with special emphasis on traffic flow in a mixed traffic environment, i.e., the penetration of vehicles with various levels of automation and connectivity, and traditional and emerging transport modes.
Vehicles and transport modes—This sub-domain focuses on the research and development of the applications of automated, connected, and electrical vehicles, as well as advanced propulsion systems with the purpose of establishing an efficient and convenient transportation system while reducing energy consumption, emissions, and noise. The main technologies covered by this sub-domain are vehicle monitoring systems, stability control, expansion of performance envelopes, the interfaces between vehicles, and infrastructure.
Apparently, these sub-domains are not mutually exclusive and some overlap between them exists. For example, the study conducted by Meenar et al. (2021) [
5] raises two research questions: “(1) What types of emotions do CTUs (people who combine bicycling and public transit in a single trip) experience as they travel, what are the reasons behind those emotions, and how do those emotions relate to specific geographic locations? and (2) How can mapping and understanding these emotions help urban planners comprehend CTU travel behavior and build a more sustainable transportation system?” The first question categorizes this study into the sub-domain of road users’ behavior and the second one into the sub-domain of policy, transportation planning, and smart cities.
Another example is apparent in the work of Golakiya et al. (2021) [
6], which aims to develop pedestrian crossing facility warrants based on the safe movement of pedestrians while maintaining traffic efficiency. This work is related to both Safety and Traffic Management.
For the sake of conciseness, from now on, the sub-domain of Policy, transportation planning, and smart cities will be referred to as Policy, Mobility safety and security will be referred to as Safety, Road users’ behavior will be referred to as Behavior, Innovative transportation services will be referred to as New services, Traffic Management will be referred to as TM; and Vehicles and transport modes will be referred to as Vehicles.
3. Methodology—Creating the Infrastructure for Supervised Paper Classification
3.1. Extraction of Articles
Scopus was used as the platform for extracting the research articles that constituted the dataset to be analyzed. Scopus is an abstract and citation database of peer-reviewed literature: scientific journals, books, and conference proceedings. This database covers a wide range of research domains, such as science, technology, medicine, and social sciences, etc. At the time this paper is written, Scopus covers 34,000 journals. It offers a set of tools to define the criteria by which articles are retrieved, such as keywords, year of publication, etc., and the capability of extracting the search results.
The search query included two types of terms; each contains several keywords that served as the baseline of the search. One type of term intended to identify transport-related topics, and the other reflected ML-related techniques. The transport-related terms included one of the general keywords, transport or transportation, and a list of transportation modes such as vehicle, bus, bike, etc. The intention was to filter out articles that employ the terms transport or transportation in contexts unrelated to movement within the road network. The ML-related techniques terms include keywords such as supervised algorithms, probabilistic networks, deep learning, and additional common terms in ML (algorithms, software packages, and more).
Aiming to focus on terms associated with the essence of each article, the search was limited to the title, abstract, and keywords and to articles dated from 2018 until 3 May 2022. Although several previous works attempting to analyze topics dealt with in the transportation domain focused on the most prominent transport-related journals [
7,
8], our initial search was not limited to specific journals. As our work aims to explore the ML techniques used for better understanding transport-related problems, we assumed that a substantial amount of them might be published in journals that are methodology-oriented rather than topic-oriented. The results included 4245 articles satisfying the search conditions.
The initial attempt to extract relevant articles revealed that many papers are extracted from journals that are undoubtedly irrelevant to the topics this study focuses on. For instance, articles published in medicine-related journals were included, as there is a rich body of knowledge concerning the treatment of traffic-accidents injuries, but their primary focus is not transportation oriented. Therefore, irrelevant domains were excluded by using the “subject area” field in Scopus. The final article dataset consists of 2098 research papers.
Since the initial search terms were general and did not explicitly ensure focus on sustainable transportation, a supplementary search was conducted within the articles. The sought-after terms included sustainable transportation modes like public transportation and walking, safety-related terms such as safety and incidents, pollution-related terms like emissions and CO2, terms associated with a human-centered approach such as passengers and human mobility, and network-performance terms like congestion and traffic density. While this basic search approach may have overlooked some articles addressing sustainability, at least one indicative term was discovered in 92% of the articles. This suggests that the classification methodology effectively captured studies aimed at enhancing the sustainability of the transportation network.
3.2. Creating a Transport-Related and ML-Related Lexicon
Creating a relevant lexicon as a foundation for transforming the articles (title, keywords, and abstract) into a bag-of-words (BoW) is a mandatory step [
9]. Our goal was to define two lexicons, one representing frequent transportation-related terms, specifically in articles describing the implementation of ML techniques for solving transportation problems, and the other representing relevant ML-related terms.
For the transport-related lexicon, the titles, keywords, and abstracts of 900 articles were extracted from transportation-related journals and 400 from biotechnology-related journals, using the general terms big data, machine learning, and data mining. The most frequent 1-grams (single-word terms) and 2-grams (double-word terms) that appeared in the transport-related corpus, but not in the biotechnology-related corpus, were used to constitute the initial lexicon. Two transportation experts scored the terms in the list on a scale of 0–5 according to their perceived relevance to the transportation domain. Terms rated 0–2 were excluded from the lexicon. The full lexicon included 132 1-grams and 68 2-grams.
3.3. Creating a Bag-of-Words
When using BoW models, it is common to preprocess the data in order to remove noise and unnecessary features. Tokenization, i.e., breaking the stream of text into words, phrases, symbols, or other meaningful elements [
9], was the first step. Then, punctuation, numbers, and general stop words were removed, and all remaining words have been lowercased [
10]. Last, lemmatization was performed, i.e., replacing the suffix of a word with a different one or removing the suffix of a word completely to get the basic word form [
11]. To perform these steps, regular expressions in Python and the NLTK package [
12] were used. Then, we extended the BoW model to include 2-grams, i.e., bag-of-2-grams.
3.4. Creating a Training Dataset
To implement supervised algorithms, we generated a training dataset comprising labeled articles. Out of the 2098 extracted articles, 380 underwent manual annotation by an experienced researcher affiliated with the Israeli Smart Transportation Research Center. Each article was assigned to a distinct transportation sub-domain, as defined in
Section 2. This annotated dataset served as the foundation for applying the algorithms discussed in the subsequent sections. In other words, it functioned as the ground truth, enabling the ML model to categorize each new article into one of the predefined sub-domains.
5. Methodology—Fuzzy KNN
Deeper analysis of the results obtained by applying conventional KNN revealed two common phenomena that apply to many misclassification cases. The first is that the sub-domain assigned to an article by the model, while not manually labeled as the most relevant, is somewhat related to the content of the article. The second phenomenon is that the sub-domain to which the article was manually classified appeared as the selected sub-domain for some of the Ks in the KNN model.
These indications led to the definition of a modified KNN model with a softer voting scheme [
14]. The core principle of the modified model is based on the following characteristics:
Similarity measure (distance)—The usage of similarities between papers as a way of fuzzyfying the contribution of each neighbor to the decision process may allow an enhancement of the discriminative power of the training data, thus improving classification performance.
Decision rule (voting)—In the KNN classifier, the final decision about the class of a test paper is given by a single majority voting process. Other decision rules may be derived to combine the votes of the nearest neighbors, providing the classifier with new ways of assigning the class of the test pattern. That is, instead of automatically classifying each article to a single sun-domain, each article is assigned with scores reflecting its relevance to each of the sub-domains.
Figure 2 depicts the four-step process in which these scores are calculated. The BoW representing the article is compared with the BoW of each of the annotated articles, and the number of similar terms for each of them is recorded.
The matching score of a paper D to be classified to the
j-th sub-domain (step 3 in
Figure 2) is calculated according to Equation (1):
where:
is the matching score of the classified paper to the j-th sub-domain.
is the number of similar terms appearing in both the classified paper and the i-th (i = 1…10) of the 10 manually annotated papers with the highest number of similar terms.
Normalizing the matching scores of a paper to be classified (step 4 in
Figure 2) are calculated according to Equation (2):
where:
is the normalized matching score, i.e., membership degree (MD), of the classified paper D to the j-th sub-domain.
The results of the classification process, for each article, is a vector of six scores corresponding to the six sub-domains. As the scores were normalized, one of them is always one, corresponding to the sub-domain for which the highest match was found. A zero score to sub-domain
j indicates that none of the 10 labeled articles that were found to be most similar to paper D (Step 2 in
Figure 2) is classified as sub-domain
j.
6. Results—Classification of Articles Using Fuzzy KNN
Figure 3 provides the four PIs for 10 MD thresholds, ranging from 0.1 to 1, i.e., given a threshold X, an article is classified into sub-domain
j if the
. Naturally, recall decreases as the threshold of MD increases while, at the same time, precision increases.
Comparing the results given in
Figure 1 and
Figure 3, it is apparent that the quality of classification achieved by applying fuzzy KNN is better than the quality achieved by applying conventional KNN. Except for TM, for each of the other five sub-domains, there are one or more MDs for which fuzzy KNN is superior to KNN in the sense that it is better for all four PIs. The results for TM are similar in both techniques.
Gasparetto et al. [
15] provide a valuable benchmark for assessing the achieved KPIs to those that can realistically be reached. The authors reviewed state-of-the-art text classification case studies and presented the best accuracy obtained for each of them, stating that accuracy is the most adopted evaluation metric for text classification tasks. The case study of classifying the instances in Ohsumed, a corpus of scientific abstracts describing heart and blood diseases, into 23 categories, is similar to the task at hand. Gasparetto et al. (2022) report that, among nine works addressing this corpus, the best accuracy achieved was 72.8% [
15].
Given accuracy as a benchmark, and that recall is an important index (
Section 4.2), the tradeoff between the two KPIs is interesting (
Figure 3). As the MD threshold for classifying articles into sub-domains increases, the recall of the model decreases, while the accuracy increases. Accuracy is determined by the ratio of true-positive classifications (articles correctly associated with a certain topic) and true-negative classifications (articles correctly classified as irrelevant to certain topics) to the total number of classified articles. While the number of true-positives decrease as MD increases, there is an opposite trend regarding true-negatives. As the MD threshold rises, the number of true-positives decreases because fewer articles meet the threshold and are associated with a specific sub-domain. On the other hand, there is an opposite trend with true-negatives, where the ratio of articles that score below the threshold, but are manually associated with other sub-domains, increases. Since there are six sub-domains, this phenomenon is more significant for true-negatives than true-positives, leading to an overall increase in accuracy.
To account for the tradeoff between accuracy and recall, the results in
Table 3 present the best achieved accuracy while maintaining a minimum recall of 70%. It is interesting to note that the MD associated with the best accuracy (satisfying the recall threshold) varies among the sub-domains. Low MDs indicate the difficulty of identifying relevant papers. The Vehicles’ MD is the lowest, while New Services, as well as Safety, are also relatively difficult to differentiate from other sub-domains. Articles focusing on TM seem to be the easiest to identify.
Except for the Vehicles sub-domain, the results indicate that the classification is aligned with the benchmark of 72.8% and, therefore, provide the necessary foundation for analyzing the implementation of ML in each sub-domain.
7. Methodology—Identifying the Implemented ML Techniques
The main challenge in processing natural language, and extracting meaningful insights from it, is ambiguity. One of the ways to overcome the uncertainty between different subjects/fields that share the same concepts/terms is a logic that involves additional indicative signs.
To address this challenge, we used an ontology, which is a description of data including classes, properties, and relationships between the classes, in a domain of knowledge. In our case, an ontology was built using the lexicon of ML techniques (classes) as a starting point (as discussed in
Section 3.2). This ontology was designed to encapsulate the knowledge related to the ML concepts and the relations between them. The ontology consists of eight technique nodes, which are the main classes, and 50 term nodes representing the ontology sub-classes, which are common terms in ML (algorithms, software packages, and more), such that each one of them is related (via isA relation) or indicates (via ObjectProperty) at least one of the eight technique nodes (not necessarily directly).
Figure 4 depicts a sample of this ontology, which was then used to infer the ML technique referred to in each article. There is a relation between a term node and a technique node if the term is related to the technique. In addition, if two term nodes are related to each other, they will be connected as well in the ontology. The ontology is represented using Protégé Version 4.3 (an open source ontology editor, developed at Stanford University, Stanford, CA, USA) [
16].
To identify the ML techniques each article D utilizes, its BoW ( and the ontology (O) were combined to create a dedicated instance of the ontology for the article (. The following steps were applied to create the ontology’s instance associated with the paper, and analyze it to identify the relevant ML techniques:
For each term t in , if there is a similar term t’ in O, and t’ does not have an instance in , create an instance i for t, and set the degree of i to 1. In that way, we represent a term belonging to the concept t’ observed in the article. If t’ has an instance in , i.e., the term has already been observed, add 1 to its degree.
Calculate the score of each ML technique by summing the degrees of the term’s nodes reachable from the technique’s node.
Go through all technique nodes and return the (at most) two techniques having the highest positive score.
9. Discussion, Conclusions, and Research Limitations
The work described in this paper aimed to explore the use of eight ML techniques in six sub-domains of sustainable transportation. To achieve this goal, two main methodologies were developed and implemented: a fuzzy KNN model for classifying transport-related scientific articles into the transportation sub-domains, and an ontology-based reasoning for identifying the share of each ML technique applied to each of them. The need to adequately tackle the multi-classification of articles into overlapping topics was raised in previous studies [
2]. The method presented here allows comprehensive multi-classification. In other words, the classification can be determined by establishing a threshold for an article’s membership degree, capturing all sub-domains where the membership degree exceeds the threshold.
Researchers and practitioners can benefit from each of these methodologies individually, and their combined utilization yields an additional advantage. The automatic classification of transport-related articles into sub-domains, while particularly addressing the overlap between sub-domains, can assist professionals in identifying state-of-the-art developments in specific areas they are interested in. The ontology-based identification of the ML approaches can be valuable for researchers in areas beyond the transportation domain. The ability to reuse these two methodologies together provides a basis for the ongoing exploration of trends regarding the use of ML techniques for transportation sub-domains and the identification of research gaps.
Justifying the extraction of relevant papers from a wide range of journals, not necessarily limited to transportation-focused ones, is supported by the fact that only 41% of the articles describing the application of ML techniques to transportation problems were extracted from transportation journals. This finding is aligned with the work of Shu et al. [
18], who found that, based on Chinese Science Citation Database classification, only 41.5% of transportation-related articles were published in transportation-related journals.
In general, the overlap between transportation sub-domains was adequately handled by fuzzy KNN and, except for the Vehicles sub-domain, the quality of classification is sufficient compared to the benchmark derived from a similar task [
15]. The MDs associated with the best accuracy (
Table 3) provide an indication of the difficulty of classifying articles into the various sub-domains. Improving the identification, mainly of Vehicle-related publications but also those associated with Safety and New Services, might be achieved by enriching the training dataset with papers describing these topics.
Of the eight ML techniques, the implementation of graph-based and semi-supervised learning techniques, is scarce. Regarding graph-based learning, a traditional methodology analyzing its share among other ML techniques before 2018 will enable the determination of whether this share declined in recent years, or if it was never popular, given its algorithmic challenge when applied to big data. As for semi-supervised learning, further research is required to understand the level of annotated examples in the various sub-domains of transportation and, consequently, if this technique is irrelevant to transportation or overlooked and should be given more attention.
The rise in the adoption of ANN and deep learning since 2018 is not surprising, given the significant advancements in these fields; however, it is noteworthy that other ML techniques continue to be utilized, suggesting that each of them retains its unique advantages for addressing specific problems, even as newer and innovative models emerge.
The greater prevalence of ANN and deep learning in TM, compared to the other sub-domains, is to be expected, given the availability of large datasets often obtained from various sensors.
The very small share of works that implement a combination of ML techniques is a phenomenon that should be further explored. The work of Sadeghian et al. [
19], combining supervised and unsupervised learning to detect the transport mode, exemplifies the potential of such combinations.
Looking ahead to the upcoming years, some of the trends identified in this study are anticipated to persist. While the declining trend in the share of unsupervised learning among various techniques may continue, this technique is not expected to vanish entirely. Although supervised learning methods typically yield results that are more straightforward to interpret, the capacity of unsupervised learning to unveil hidden phenomena and transcend traditional conceptions justifies its continued role among diverse methods.
Reinforcement learning, having undergone a moderate increase in share since 2018, is likely to take on a more prominent role in the field of transportation. This is especially evident in sectors that demand instantaneous decision making, including but not limited to traffic control and the efficient dispatching of public transport vehicles. This anticipation is grounded in the capability of reinforcement learning to interact with the environment [
20], coupled with the growing connectivity characteristic of transportation systems.
The observable trend of the escalating share of ANN in general, and deep learning specifically, mirrors their established capability to deliver high-quality results for numerous transportation-related problems. The requirement for extensive datasets, which is a prerequisite for implementing such techniques and is reflected in their prevalence in Traffic Management compared to other sub-domains, is likely to be mitigated by recent advancements in generating reliable synthetic data. Synthetic data address several limitations linked to data scarcity, privacy concerns, and regulations [
21], potentially allowing sub-domains like Behavior and Safety to more extensively harness the advantages of ANN and deep learning. These trends also exist in fields other than transportation. (
https://thenewstack.io/the-move-to-unsupervised-learning-where-we-are-today/ (accessed on 20 December 2023);
https://emeritus.org/blog/ai-and-ml-reinforcement-learning-in-machine-learning/ (accessed on 20 December 2023)).
This study’s main drawback lies in its limited granularity when it comes to transportation sub-domains and ML techniques. To enhance the resolution of transportation-related subjects, it is necessary to further break down the sub-domains and establish a sufficiently large training dataset with an ample number of examples for each topic. The decomposition of transportation sub-domains is a challenging task by itself. In this research, we relied on the topics defined by the Israeli Smart Transportation Research Center. However, there are various ways to increase the granularity of the seven topics addressed in this study, and it may be necessary to explore several decompositions. Using a simplified taxonomy to describe the transportation sub-domains and ML techniques also has advantages: (i) it allows a larger number of articles to be classified due to the generic nature of the concepts; and (ii) the taxonomy representation is explainable and easy to maintain.
Enhancing the precision of the applied ML techniques requires delving into additional segments of each article beyond just relying on its abstract; however, challenges arise when focusing on an article’s core methodology while overlooking terms used to describe previous works. Future research should explore the article’s sections that offer the most insight into the employed techniques. It is possible that, in addition to the abstract, the methodology section alone is sufficient for this task. Alternatively, incorporating the results section and/or the conclusions may enhance the identification of the primary ML techniques utilized. Since there is no universally standardized naming convention for sections in scientific papers, any attempt to include the specific parts of articles should be accompanied by a method for automatically identifying them within the reviewed article.
Additional research should focus on examining the intriguing pattern of similar proportions of different ML techniques being applied across sub-domains. Specifically, investigating the precise cause behind this similarity, especially whether it is solely influenced by the level of granularity in the sub-domains and techniques considered in this study, warrants further exploration. Finally, delving deeper into potential variations in the implementation of ML techniques across diverse geographic regions may reveal research gaps not identified in the current study, along with potential explanations for these differences.