Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment

Narushynska, Olga; Doroshenko, Anastasiya; Teslyuk, Vasyl; Antoniv, Volodymyr; Arzubov, Maksym

doi:10.3390/bdcc9030065

Open AccessArticle

Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment

by

Olga Narushynska

,

Anastasiya Doroshenko

^*

,

Vasyl Teslyuk

,

Volodymyr Antoniv

and

Maksym Arzubov

Department of Automated Control Systems, Lviv Polytechnic National University, 79013 Lviv, Ukraine

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(3), 65; https://doi.org/10.3390/bdcc9030065

Submission received: 30 December 2024 / Revised: 3 March 2025 / Accepted: 6 March 2025 / Published: 11 March 2025

(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)

Download

Browse Figures

Versions Notes

Abstract

:

Hierarchical classification, which organizes items into structured categories and subcategories, has emerged as a powerful solution for handling large and complex datasets. However, traditional flat classification approaches often overlook the hierarchical dependencies between classes, leading to suboptimal predictions and limited interpretability. This paper addresses these challenges by proposing a novel integration of tree-based models with hierarchical-aware split criteria through adjusted entropy calculations. The proposed method calculates entropy at multiple hierarchical levels, ensuring that the model respects the taxonomic structure during training. This approach aligns statistical optimization with class semantic relationships, enabling more accurate and coherent predictions. Experiments conducted on real-world datasets structured according to the GS1 Global Product Classification (GPC) system demonstrate the effectiveness of our method. The proposed model was applied using tree-based ensemble methods combined with the newly developed hierarchy-aware metric Penalized Information Gain (PIG). PIG was implemented with level-wise entropy adjustments, assigning greater weight to higher hierarchical levels to maintain the taxonomic structure. The model was trained and evaluated on two real-world datasets based on the GS1 Global Product Classification (GPC) system. The final dataset included approximately 30,000 product descriptions spanning four hierarchical levels. An 80-20 train–test split was used, with model hyperparameters optimized through 5-fold cross-validation and Bayesian search. The experimental results showed a 12.7% improvement in classification accuracy at the lowest hierarchy level compared to traditional flat classification methods, with significant gains in datasets featuring highly imbalanced class distributions and deep hierarchies. The proposed approach also increased the F1 score by 12.6%. Despite these promising results, challenges remain in scaling the model for very large datasets and handling classes with limited training samples. Future research will focus on integrating neural networks with hierarchy-aware metrics, enhancing data augmentation to address class imbalance, and developing real-time classification systems for practical use in industries such as retail, logistics, and healthcare.

Keywords:

hierarchical product classification; GS1 global data model; decision trees; neural networks; automated classification; product taxonomy; hierarchical structures; retail data management; global trade optimization; flat classification; entropy

1. Introduction

As global trade expands, the diversity of available products grows exponentially. This expansion has created a pressing need for efficient systems to organize and categorize goods [1]. Hierarchical product classification has emerged as a critical tool, offering a structured way to manage products by sorting them into categories and subcategories, thus streamlining organization and accessibility [2].

In business, hierarchical classification is critical to enhancing the management of logistics, sales, and marketing [3]. With the continuous growth of global commerce and the increasing complexity of consumer-related data—particularly purchasing patterns—developing effective algorithms for automatic product classification has become increasingly challenging [4]. The manual classification of extensive product inventories is not only time-consuming but also resource-intensive, rendering it impractical for large-scale operations [5]. In this context, machine learning (ML) methods offer promising solutions to handle hierarchical product classification more efficiently, providing an optimistic outlook for streamlining product organization [6].

Guided by the GS1 Global Product Classification (GPC) system—a widely recognized standard used in over 150 countries—this study aims to address the complexities of hierarchical product classification using machine learning techniques [7]. The selection of the GS1 GPC system is motivated by its global adoption in retail, logistics, and supply chain management, providing a comprehensive and standardized taxonomy that ensures consistency in product categorization across diverse markets [8]. This standardization is critical for enabling seamless international trade and fostering stronger global business partnerships.

This paper introduces machine learning models designed specifically for hierarchical classification. Employing various techniques, such as decision trees and neural networks, we demonstrate that these models can process large volumes of data and maintain a high classification accuracy level within hierarchical structures.

Additionally, as the global economy evolves, the importance of standardizing product classification systems continues to grow [9]. Hierarchical classification supports the uniformity and reliability of product data across international markets, a vital aspect of facilitating cross-border trade and fostering stronger global partnerships [10].

In this context, advancing machine learning methodologies for hierarchical classification becomes a technical innovation and an essential step for supporting modern business and economic frameworks [11].

Previous research has explored various ML approaches for hierarchical classification, including decision trees [12,13] and neural networks [4,14,15]. Traditional methods such as flat classifiers, while computationally simpler, ignore the parent–child relationships between classes, often resulting in suboptimal predictions and cascading errors at lower hierarchy levels [16]. To improve split decisions in decision tree models, Information Gain (IG) has been widely used as a criterion to maximize entropy reduction [17]. However, IG’s major limitation is its disregard for hierarchical dependencies, which can lead to splits that are statistically optimal but semantically incoherent. To address this, Taxonomic Informativeness (TI) was introduced to incorporate hierarchy into split evaluations [18]. Although TI offers better alignment with taxonomic structures, it tends to overemphasize semantic relationships at the expense of statistical optimization, particularly when dealing with deeply nested hierarchies and imbalanced data distributions [19].

Despite these advancements, practical applications of hierarchical classification still face notable challenges. Existing methods like IG and TI often struggle with cascading misclassifications, where an error at a higher-level node propagates to subsequent levels, severely compromising overall accuracy. Additionally, handling class imbalance remains a significant issue, as many real-world datasets, especially in retail and logistics, exhibit skewed distributions, with dominant categories overshadowing less frequent ones [19]. Computational efficiency is another concern; many hierarchy-aware methods are resource-intensive, limiting their scalability for large datasets or real-time applications [15,20].

This study introduces Penalized Information Gain (PIG), a novel metric designed to bridge the gap between statistical optimization and taxonomic coherence in hierarchical classification tasks. PIG extends the traditional IG criterion by incorporating level-wise entropy adjustments, assigning higher weights to upper hierarchical levels to minimize cascading errors and improve semantic consistency. By integrating PIG into tree-based ensemble models, this approach ensures that splits not only maximize statistical significance but also preserve the hierarchical structure of the data. Compared to IG and TI, PIG offers a balanced solution, addressing both statistical and taxonomic considerations simultaneously.

The work presented in this study tackles the challenges of hierarchical classification by addressing the limitations of traditional split evaluation criteria. More specifically, this study achieves the following:

It introduces Penalized Information Gain (PIG), a novel metric that integrates Information Gain (IG) with Taxonomic Informativeness (TI), enabling a hierarchy-aware evaluation of splits in decision tree models. This approach ensures that splits respect both statistical and taxonomic criteria.
It provides a detailed empirical validation of the proposed metric using two real-world datasets structured according to the GS1 Global Product Classification (GPC) system. These experiments demonstrate the practical benefits of PIG, including improved classification accuracy and semantic alignment with hierarchical taxonomies.
It critically analyzes hierarchical classification metrics, comparing IG, TI, and PIG to highlight their respective strengths and limitations. This analysis provides valuable insights for developing more effective split evaluation criteria in hierarchical classification tasks.

The remainder of this paper is organized as follows. Section 2 introduces the problem of hierarchical classification and presents the proposed Penalized Information Gain (PIG) metric. Section 3 describes the datasets and experimental setup used to validate PIG. Section 4 discusses the empirical results, comparing PIG with traditional metrics. Finally, Section 5 concludes this study and identifies avenues for future research.

Hierarchical classification is a specialized classification task involving organizing data into a taxonomy, where classes exhibit parent–child relationships. Formally, the hierarchy can be represented as a directed tree H = (Y, E), where Y is the set of nodes (classes), and E ⊂ Y² represents the directed edges linking parent nodes to their children. Each data instance x ∈ X is assigned a unique path or set of paths Y ⊆ Y that align with the hierarchy. A valid assignment satisfies the constraint that for any class y ∈ Y, all its ancestors A(y) are also included in Y.

Unlike flat classification, hierarchical classification must account for the dependencies between labels at different levels of the hierarchy. In decision tree-based models, traditional split evaluation criteria, such as Information Gain (IG), focus solely on entropy reduction, ignoring these hierarchical relationships. This oversight often leads to splits that optimize statistical metrics at the expense of semantic coherence [21,22].

This study focuses on methods that enhance split evaluation in decision tree models to better align with hierarchical structures. By introducing a hierarchy-aware metric, Penalized Information Gain (PIG), we aim to address the misalignment between statistical optimization and the preservation of taxonomic integrity. Our work evaluates how well splits respect the hierarchy while reducing uncertainty, bridging the gap between statistical and semantic criteria for hierarchical classification [23].

Hierarchical classification presents unique challenges that arise from the complex relationships between categories in a taxonomy. Unlike flat classification, where each class is treated independently, hierarchical classification requires models to account for parent–child dependencies between categories. This interdependence significantly increases the complexity of the task, particularly in scenarios involving deep or unbalanced hierarchies.

One of the primary challenges is error propagation. A misclassification at a higher level of the hierarchy can cascade down, leading to incorrect predictions at all subsequent levels [14]. For example, if a product is wrongly categorized at the “Electronics” level instead of “Furniture”, all subcategories within “Furniture” become inaccessible for further refinement. This issue is compounded in deep hierarchies, where early-stage errors have a magnified impact on the final classification.

Another significant challenge is data imbalance within the hierarchy [24]. Certain branches may have a disproportionately larger number of samples compared to others, leading models to prioritize larger categories. As a result, smaller but semantically important categories are often underrepresented, reducing the overall accuracy and utility of the classification system [25]. Additionally, the varying depths of hierarchies create further imbalance, with some branches having more detailed subdivisions while others remain broad, making it difficult for models to generalize effectively across the structure.

Overfitting is another critical concern, particularly at deeper levels of the hierarchy. As the dataset becomes narrower with each subsequent classification level, the risk of models becoming overly tailored to the training data increases. This can lead to the poor generalization of unseen data, undermining the model’s reliability in practical applications. Addressing this issue requires strategies such as ensuring a sufficient number of training examples at each level or applying regularization techniques [18].

Finally, there is the challenge of balancing statistical optimization with semantic coherence [26]. Traditional metrics like Information Gain focus purely on reducing entropy without considering the hierarchical structure of the data [27]. This often results in splits that are statistically optimal but misaligned with the taxonomy, undermining the interpretability and meaningfulness of the predictions.

These challenges highlight the need for hierarchy-aware methods and metrics that can address the limitations of standard approaches [28]. Such methods must effectively handle the cascading nature of errors, mitigate the impact of imbalanced data, prevent overfitting at deeper levels, and align statistical objectives with the semantic structure of the taxonomy [19]. Developing solutions that address these issues is crucial for advancing hierarchical classification in real-world scenarios [29].

Despite these challenges, hierarchical classification remains valuable in healthcare and product categorization, where large, multi-class datasets must be handled efficiently [30]. The ability to structure the classification process hierarchically improves accuracy and helps manage the complexity of large-scale data, making it an effective approach in real-world applications [31].

2. Relevant Research

Over the years, various machine learning models and methods have been explored to address the complexities of hierarchical classification, particularly focusing on handling multi-level taxonomies, cascading errors, and class imbalances.

One of the earliest and most influential contribution to this field was presented by Joachims [32], who introduced a model-building approach using Support Vector Machines (SVMs) combined with high-dimensional sparse text representations. This pioneering work laid the foundation for subsequent research in hierarchical text classification, emphasizing the importance of efficient feature representation in handling large and complex datasets. Building upon such foundational methods, researchers have explored ways to enhance model scalability and classification accuracy, especially for text-rich applications.

A comprehensive overview of hierarchical classification methods across various application domains was provided by Silla and Freitas [16]. Their extensive survey highlights the diverse use cases of hierarchical classification, encompassing NLP, image recognition, and gene function prediction. This study underscores the versatility of hierarchical classification methods while noting critical challenges, such as managing deeply nested hierarchies and handling data sparsity in lower-level classes. These challenges persist as significant hurdles in achieving consistent and accurate classification across multiple hierarchy levels.

In recent years, deep learning has emerged as a dominant approach in hierarchical classification, particularly for processing short texts and complex documents. Yang et al. [33] introduced Hierarchical Attention Networks (HANs) for document classification, leveraging attention mechanisms to capture hierarchical relationships within texts. This approach significantly improved classification accuracy by focusing on both sentence-level and document-level structures, addressing the challenge of representing hierarchical dependencies in textual data.

Probabilistic models have also contributed to advancements in this field. Zhang et al. [34] proposed a semi-supervised probabilistic framework for hierarchical multi-label classification. Their method utilizes both labeled and unlabeled data to enhance model robustness, effectively addressing the challenge of limited annotated data in many real-world applications. By incorporating probabilistic modeling, their approach offers improved handling of complex label dependencies inherent in hierarchical taxonomies.

Another notable contribution is the work of Sebastiani [35], who focused on methods for hierarchical multi-class text classification. His research introduced evaluation techniques using confusion matrices and precision metrics, providing a systematic framework for assessing the performance of hierarchical classification models. This work highlighted the need for comprehensive evaluation criteria that account for hierarchical relationships between classes, a gap that many earlier studies had overlooked.

Decision tree-based methods have remained a popular choice for hierarchical classification due to their interpretability and efficiency. Read et al. [36] explored classifier chains for multi-label hierarchical classification, demonstrating their effectiveness in capturing label dependencies within hierarchical structures [37]. Their approach involves decomposing complex classification tasks into a sequence of simpler models, each focusing on a specific level of the hierarchy. This method is particularly suited for handling short texts, where contextual information is limited but hierarchical cues are vital [38].

Despite these significant contributions, several research gaps remain. Existing methods often struggle with cascading errors, where misclassifications at higher hierarchical levels propagate to subsequent layers, diminishing overall accuracy. Additionally, handling class imbalance—particularly in real-world datasets where certain categories dominate—remains a persistent challenge [19]. Scalability is another critical concern, especially when dealing with ultra-large datasets or applications requiring real-time classification capabilities.

Addressing these gaps, the current study introduces Penalized Information Gain (PIG), a novel hierarchy-aware metric that integrates the statistical strengths of Information Gain (IG) [13] with the hierarchical sensitivity of Taxonomic Informativeness (TI) [18]. Unlike traditional methods, PIG incorporates level-wise entropy adjustments, assigning higher weights to upper-level splits to mitigate cascading errors and improve semantic consistency across the hierarchy. By integrating PIG into tree-based ensemble models, this research fills the existing gaps by enhancing classification accuracy, addressing class imbalance issues, and ensuring computational efficiency suitable for large-scale and real-time applications.

In conclusion, the extensive body of research in hierarchical classification demonstrates the evolution from traditional models, like SVMs and decision trees, to advanced deep learning and probabilistic approaches. While significant progress has been made, challenges such as cascading errors, data imbalance, and scalability continue to drive innovation in this field. The proposed PIG-based approach represents a step forward, offering a balanced solution that preserves hierarchical integrity while optimizing classification performance across various application domains.

3. Materials and Methods

3.1. Decision Tree Splitting Criteria

Classical splitting criteria in decision tree-based hierarchical classification, such as Information Gain (IG), primarily focus on entropy reduction without considering hierarchical relationships, often resulting in suboptimal splits that overlook semantic dependencies between parent and child categories [13,15].

In what follows, we demonstrate the limitations of traditional split evaluation methods and introduce Penalized Information Gain (PIG), a hierarchy-aware metric that integrates Information Gain with Taxonomic Informativeness (TI) [31]. This novel criterion aligns statistical optimization with the taxonomic structure of the data. We also discuss the implications of this approach for improving the interpretability and accuracy of splits in hierarchical classification tasks [39]. Furthermore, we motivate the adoption of hierarchical evaluation metrics that go beyond standard statistical measures, addressing their critical role in advancing the field of hierarchical classification.

3.1.1. Information Gain (IG) [40]

Information Gain (IG) [40,41] measures the reduction in entropy after a dataset is split, providing a statistical criterion to select features that minimize uncertainty. It is calculated as the difference between the entropy of the parent dataset and the weighted sum of the entropies of the child subsets:

I G (S) = H (S) - (p_{l e f t} \cdot H (S_{l e f t}) + p_{r i g h t} \cdot H (S_{r i g h t}))

where

$H (S) = - \sum_{i = 1}^{k} p_{i} \log (p_{i})$ is the entropy of S;
$p_{l e f t}$ and $p_{r i g h t}$ are the proportion of samples $S_{l e f t} and S_{r i g h t}$ , respectively.

While IG effectively reduces overall uncertainty, it does not consider hierarchical dependencies between classes, which is critical for tasks where taxonomic relationships play a vital role [42]. In this study, IG was applied to evaluate feature splits at each decision node, serving as a baseline for comparison against hierarchy-aware methods.

3.1.2. Average Taxonomic Informativeness (ATI) [43]

To address the hierarchical limitations of IG, we employed Average Taxonomic Informativeness (ATI) [33,34], which incorporates semantic relationships between classes into the split evaluation process. ATI measures how well a split preserves the taxonomy by weighing correct predictions at different hierarchy levels:

A T I = \frac{1}{N} \sum_{i = 1}^{N} \frac{\sum_{l = 1}^{L} w_{l} \cdot I (y_{i}^{l}, {\hat{y}}_{i}^{l})}{\sum_{l = 1}^{L} w_{l}}

Explanation of terms:

N—total number of samples.
L—total number of levels in the taxonomy.
$w_{l}$ —weight assigned to level l in the hierarchy, representing the informativeness or specificity of that level.
$I (y_{i}^{l}, {\hat{y}}_{i}^{l})$ —indicator function that evaluates whether the prediction ${\hat{y}}_{i}^{l}$ at level l matches the true label $y_{i}^{l}$ for sample i

$I (y_{i}^{l}, {\hat{y}}_{i}^{l}) = \{\begin{matrix} 1 i f y_{i}^{l} = {\hat{y}}_{i}^{l}, \\ 0 o t h e r w i s e \end{matrix}$

By emphasizing correct predictions at higher levels of the hierarchy, ATI ensures that splits respect semantic relationships between parent and child classes. In our study, ATI was integrated into the decision tree construction process to assess how well the splits align with the hierarchical structure of the GS1 GPC taxonomy.

3.1.3. Penalized Information Gain (PIG)

Penalized Information Gain (PIG) is an enhanced metric for evaluating splits in decision trees, designed specifically to incorporate hierarchical relationships into the decision-making process. While traditional Information Gain (IG) focuses solely on reducing entropy (uncertainty) in the dataset, PIG extends this approach by introducing a penalty term derived from Taxonomic Informativeness (TI). This modification ensures that splits respect the hierarchical structure of the data, favoring those that better separate taxonomically distinct classes.

PIG incorporates TI into the IG framework through a penalty factor (PF), which scales IG based on the taxonomic relevance of the split:

P I G (S) = I G (S) \cdot P F

where the penalty factor PF is defined as follows:

P F = l o g (1 + α \cdot A T I)

α > 0 is a scaling coefficient that controls the sensitivity of PF to changes in ATI.
The logarithm provides a non-linear relationship, ensuring a gradual increase in PF as ATI improves.

The Penalized Information Gain (PIG) metric balances statistical relevance and hierarchical consistency in decision tree splits by combining Information Gain (IG) with Taxonomic Informativeness (ATI). The logarithmic function ensures smooth transitions, allowing PIG to increase with ATI and reward hierarchically consistent classifications.

PIG offers several advantages for hierarchical datasets:

Hierarchy Awareness: Aligns splits with label hierarchies, preserving meaningful taxonomic distinctions.
Improved Interpretability: Produces decision boundaries that respect semantic relationships, enhancing model transparency.
Enhanced Generalization: Leverages hierarchical structures for better multi-level classification performance.

By addressing the limitations of traditional split criteria, PIG simultaneously reduces uncertainty and preserves data semantics, improving accuracy and practical utility. Future work will explore its integration into ensemble methods like random forests and gradient boosting to further enhance performance.

The Penalized Information Gain (PIG) metric includes the hyperparameter α, which controls the trade-off between Information Gain (IG) and Taxonomic Informativeness (TI). A higher α value places greater emphasis on preserving the hierarchical structure, while a lower α prioritizes entropy reduction.

The integration of TI into the PIG calculation introduces additional overhead compared to traditional split criteria. Calculating IG has a complexity of

O (n \log n)

, where n is the number of samples. Incorporating TI adds complexity proportional to the number of hierarchy levels (L) and classes (C), making the total complexity approximately

O (n \log n + n L C)

. Despite this added cost, the optimization of hierarchical splits leads to improved model generalization and interpretability. The computational overhead remains manageable, especially when using parallelized implementations, making PIG suitable for large-scale hierarchical classification tasks.

In this study, we present results obtained using CART (Classification and Regression Trees) as the base learner for evaluating the effectiveness of the proposed Penalized Information Gain (PIG) metric. While CART serves as a well-established and robust framework for decision tree construction, it is important to note that PIG is not limited to CART-based models. The metric can be seamlessly integrated into any decision tree algorithm that employs an information-based splitting criterion—such as Information Gain, Gini impurity, or similar measures—and supports the modification of this criterion. This flexibility allows PIG to be applied across a wide range of tree-based models, including those used in ensemble methods (e.g., random forests, Gradient Boosting Machines) and multi-branch decision trees (e.g., ID3, C4.5), thereby broadening its applicability to various hierarchical classification tasks.

Modifying the decision tree algorithm is not required; instead, the proposed approach focuses on substituting the split criterion with the Penalized Information Gain (PIG) metric. This enables straightforward integration into existing tree-based models while preserving the original algorithmic structure.

3.2. Addressing Challenges in Imbalanced Hierarchical Datasets

Hierarchical datasets often suffer from imbalances that can significantly impact the performance and structure of decision tree models. These imbalances typically manifest in two interconnected forms [44]. The first is size imbalance across branches, where certain branches of the hierarchy dominate due to an unequal distribution of samples. For example, in biological taxonomies, datasets may contain a disproportionately high number of mammal species compared to amphibians or reptiles, biasing the model toward larger branches [45]. The second form is depth imbalance within branches, where some branches of the hierarchy extend to greater depths, offering finer granularity, while others remain shallow [46]. This disparity often leads to splits that prioritize deeper branches at the expense of shallower ones [47].

These imbalances can bias decision trees toward larger branches, neglecting smaller but semantically important ones [39]. Traditional criteria like Information Gain (IG) exacerbate this issue by prioritizing larger classes without considering hierarchical relationships.

To address this, Penalized Information Gain (PIG) introduces two key mechanisms:

Penalization of size-dominant splits that group taxonomically distinct classes.
Promotion of balanced splits, ensuring that underrepresented branches are adequately considered.

This is achieved through the penalty factor (PF):

P F = 1 - T I (S_{l e f t}, S_{r i g h t}),

which lowers the evaluation score for these splits. This adjustment ensures that splits favor taxonomic alignment over size dominance, maintaining the semantic coherence of the hierarchy.

3.3. Encouraging Hierarchical Balance

PIG promotes splits that achieve a balanced representation of hierarchical branches, even when smaller branches are involved.

Traditional Information Gain often penalizes splits that isolate smaller yet taxonomically distinct branches, as these splits tend to result in lower entropy reductions. PIG overcomes this limitation by rewarding splits that preserve hierarchical balance. By integrating TI, PIG prioritizes semantic relevance, ensuring that smaller branches receive adequate consideration within the decision-making process.

3.4. Quantitative Foundation for Balance Preservation

The hierarchical balance maintained by PIG is underpinned by a robust mathematical framework:

Bias in Standard Information Gain: Conventional Information Gain (IG(S)) inherently favors larger branches due to the proportional weighting of subsets in entropy calculations. This bias skews splits toward majority branches, often overlooking smaller but semantically significant ones.
Adjustment Through PIG: PIG addresses this bias by scaling IG(S) with the penalty factor Tableware introducing a hierarchical adjustment. This adjustment ensures that splits consider the taxonomic relationships between classes, reducing the undue influence of size imbalances and better representing smaller branches.

By addressing dataset imbalances, PIG enhances the functionality and interpretability of decision tree models in hierarchical classification tasks. Specifically, PIG achieves this through the following methods:

Preservation of Rare Branches: PIG ensures that smaller, underrepresented branches remain distinct, enabling the model to generalize effectively to these classes.
Alignment with Taxonomy: By aligning splits with the hierarchical structure, PIG reduces semantic distortion and ensures that meaningful distinctions are preserved.
Improved Model Fairness: PIG prevents overfitting to the majority of branches, fostering models that are more robust and equitable across imbalanced datasets.

Through its emphasis on hierarchical integrity and balanced representation, PIG offers a principled solution to the challenges of imbalanced hierarchical datasets. This approach enhances the generalizability and reliability of decision tree models, making it an invaluable tool for applications that rely on structured data with complex taxonomies.

4. Experimental Settings

In this section, we introduce the datasets and models utilized in our experiments, as well as the evaluation methodologies applied. The experiments are conducted using two real-world datasets, both structured according to the GS1 Global Product Classification (GPC) system. Additionally, we present the implementation details of the proposed Penalized Information Gain (PIG) metric and its integration into decision tree models.

We implement the PIG metric within standard decision tree algorithms, modifying the split evaluation criteria to incorporate hierarchy-aware adjustments. Baseline comparisons are performed using traditional Information Gain (IG) to highlight the improvements introduced by PIG. All experiments are conducted on the same datasets, ensuring a fair comparison of the performance metrics, including accuracy, semantic coherence, and hierarchical balance.

The evaluation is designed to demonstrate how PIG addresses key challenges such as data imbalance, hierarchical misalignment, and overfitting at deeper taxonomy levels. The following subsections describe the datasets, model configurations, and experimental protocols.

4.1. Dataset

Our study utilizes the Global Product Classification (GPC) system, which is publicly available through the GS1 GPC Browser (https://gpc-browser.gs1.org, accessed on 17 December 2024) [8]. The GPC is a widely adopted standard, providing a comprehensive hierarchical structure that spans thousands of product classes and subclasses. Given its detailed and organized approach to classifying goods and services, this structure is particularly suited for analyzing various machine learning methods in product classification.

The GS1 Global Product Classification (GPC) system was developed to ensure consistency in classifying products and services across industries. Its hierarchical structure consists of four levels: segments, families, classes, and bricks. At the highest level, there are 21 segments, each of which is further divided into over 200 families. These families then break down into more specific classes and, ultimately, bricks, representing the most granular classification for products and services. For example, in the Food and Beverage segment, “milk” is categorized under the “Dairy Products” family and further refined into the “Liquid Milk” class.

For the purposes of this research, we focused on a set of key segments, including Food/Beverage, Cleaning Supplies, and Tableware. Concentrating on these widely used product categories allowed us to analyze hierarchical classification models in real-world scenarios.

In this study, we employed two datasets categorized according to the GPC system. The first dataset includes product data from a company operating within the restaurant supply industry. These products were assigned to categories by experts, following the hierarchical classification provided on the GS1 GPC website. The second dataset was sourced from [48], a consumer product information website. Both datasets classify products using the GPC structure, making it possible to compare results between them.

Each dataset entry contains a product description and a “brick” label, representing the most specific classification level within the GPC hierarchy. Using these datasets, we aim to explore how machine learning techniques can enhance hierarchical classification and streamline the organization of large volumes of product data.

Although no data labeling or splitting is required for the current phase of our research, using these two distinct datasets allows us to explore how products from different industries and data sources can be effectively categorized using machine learning models. The GPC system’s well-defined structure provides a solid foundation for investigating how hierarchical classification can be applied in practice.

4.2. Experimental Validation of PIG Performance

To demonstrate the advantages and limitations of the presented metrics, three illustrative examples were constructed, each reflecting different hierarchical structures and classification challenges. These examples highlight the strengths of Penalized Information Gain (PIG) in comparison to Information Gain (IG) and Average Taxonomic Informativeness (ATI) when applied to decision tree construction for hierarchical classification.

The table presents three hierarchical classification examples, detailing product names, their actual hierarchical levels (Levels 1–4), and their corresponding splits (Left/Right). Each example demonstrates variations in classification accuracy across hierarchical levels, providing insights into how splits affect the preservation of the taxonomic structure.

The Figure 1 represents the hierarchical taxonomy of product categories, beginning with the root node “Food/Beverage” and branching into two primary subcategories: “Bread/Bakery Products” and “Cereal/Grain/Pulse Products”. Each branch further divides into more specific categories, illustrating the hierarchical relationships down to the fourth level, such as “Bread (Perishable)” and “Cereals Products—Not Ready to Eat (Shelf Stable)” (Table 1).

We manually applied the PIG, ATI, and IG metrics to the examples presented in Table 1 to demonstrate the differences in how each metric evaluates hierarchical classification splits. This manual application allows for a clearer comparison of how Information Gain (IG) focuses solely on entropy reduction, while Average Taxonomic Informativeness (ATI) emphasizes taxonomic alignment. By integrating both statistical and hierarchical considerations, Penalized Information Gain (PIG) provides a balanced evaluation. The results, summarized in Table 2, highlight how PIG effectively penalizes splits that violate hierarchical consistency while rewarding those that maintain both entropy reduction and semantic coherence. This comparison underscores PIG’s advantage over traditional metrics in capturing the complexities of hierarchical classification tasks.

Table 2 summarizes the evaluation metrics (IG, ATI, and PIG) for three hierarchical classification examples. While Information Gain (IG) remains constant across all examples, reflecting its focus on entropy reduction alone, Average Taxonomic Informativeness (ATI) and Penalized Information Gain (PIG) show variations that account for hierarchical misalignment, with Example 1 demonstrating the highest alignment (ATI = 0.92, PIG = 0.023, ATI = 0.92, PIG = 0.023) and Example 3 the lowest (ATI = 0.75, PIG = 0.020, ATI = 0.75, PIG = 0.020). This highlights PIG’s ability to integrate statistical entropy reduction with taxonomic relevance, addressing the limitations of IG and ATI individually.

The purpose of comparing these examples is to highlight the inadequacies of traditional metrics like IG in capturing the semantic and hierarchical dependencies inherent in taxonomic data. While IG focuses exclusively on reducing entropy, it fails to differentiate between splits that statistically reduce uncertainty but semantically violate the taxonomy. This limitation becomes evident in Examples 2 and 3, where the IG values remain constant despite clear differences in hierarchical misalignment.

By incorporating hierarchical relevance into the evaluation, ATI provides a more nuanced understanding of split quality. However, ATI alone overlooks the statistical properties of the split, which are critical for robust decision tree construction. For instance, in Example 3, while ATI captures the loss of taxonomic alignment, it does not address the statistical impact of entropy reduction.

PIG bridges this gap by integrating the statistical properties of IG with the hierarchical sensitivity of ATI. The comparison across examples demonstrates that PIG is capable of penalizing splits that fail to respect the taxonomy (as in Example 3) while rewarding splits that balance entropy reduction and taxonomic relevance (as in Example 1). This makes PIG particularly suitable for hierarchical classification tasks, where both statistical and semantic considerations are critical.

5. Experimental Setup

The experiments were conducted to enhance the hierarchical classification of short texts by leveraging advanced text preprocessing and hierarchy-aware evaluation criteria. The primary objective was to explore ensemble tree-based models—random forest and XGBoost—augmented with multiple split criteria to evaluate their efficacy in hierarchical contexts.

The experiments were performed using a dataset structured in alignment with the Global Product Classification (GPC) system. The GPC taxonomy organizes product categories into four hierarchical levels: segments, families, classes, and the most granular level, bricks. Each instance in the dataset consists of a textual product description and its corresponding brick label (Figure 2).

To ensure a robust evaluation, the dataset was divided into train and test sets. The class proportions within each split, visualized through a comparative diagram, highlight the dataset’s imbalanced nature across hierarchical levels, further underscoring the necessity for hierarchy-aware split metrics.

The experiments employed a two-step preprocessing strategy to optimize text representations:

Byte Pair Encoding (BPE): Applied as a subword tokenization method to address the challenges posed by abbreviated and short texts. BPE ensures compact and meaningful text representation while preserving semantic integrity [49].
BERT Embeddings: Contextualized dense vector representations were extracted using BERT to capture semantic relationships within the input text [50]. These embeddings served as high-dimensional feature inputs to the machine learning models.

The integration of BPE with BERT allowed for improved text generalization and representation, particularly effective for hierarchical short text classification [51].

6. Results

The experiments utilized two widely adopted tree-based ensemble models, random forest and XGBoost [52], configured with the following split criteria to evaluate their impact on hierarchical classification: IG, ATI, and PIG. The combination of these models and split metrics enabled a critical evaluation of how traditional and hierarchy-aware methods influence classification performance within structured taxonomies (Figure 3).

The key goals of the experimental setup were as follows:

To compare the performance of random forest and XGBoost using different split criteria (IG, ATI, and PIG) in preserving both statistical entropy reduction and hierarchical taxonomic alignment.
To analyze the impact of hierarchy-aware metrics (ATI and PIG) on mitigating the challenges of imbalanced data, particularly within deeply nested and sparsely populated branches of the GPC taxonomy.

The models were trained and evaluated on the train and test sets, with the results visualized to demonstrate class-level performance across different configurations. The stacked bar diagram highlights class distributions and provides a comparative analysis of the models’ behavior under varying split criteria.

Table 3 presents a comprehensive comparison of hierarchical classification performance using Information Gain (IG), Average Taxonomic Informativeness (ATI), and the proposed Penalized Information Gain (PIG) across different hierarchy levels—brick, class, family, and segment—using both random forest and XGBoost models. The results consistently demonstrate that incorporating hierarchy-aware metrics (ATI and PIG) leads to superior performance compared to the standard IG-based models.

At the brick level, which represents the most granular classification, PIG-based models outperformed both IG and ATI counterparts. The XGBoost_PIG model achieved the highest accuracy (0.69) and F1 score (0.69), highlighting PIG’s ability to balance entropy reduction with hierarchical alignment. In contrast, standard IG models showed lower performance, with the RandomForest_IG model yielding an accuracy of 0.54 and F1 score of 0.52, underscoring IG’s inability to capture hierarchical dependencies.

Moving up the hierarchy, similar trends persisted. At the class and family levels, PIG-based models consistently outperformed other approaches. The XGBoost_PIG model reached a class-level accuracy of 0.77 and family-level accuracy of 0.84, indicating that PIG’s consideration of taxonomic relationships enables more accurate predictions across hierarchy depths. Comparatively, IG-based models showed moderate gains but failed to differentiate between taxonomically aligned and misaligned splits.

At the segment level, the highest hierarchy tier, performance differences among models were less pronounced, with most models achieving near-perfect accuracy (≥0.96). This outcome is expected given the broader categories at this level, which inherently pose fewer classification challenges.

Overall, the PIG-based models not only improved classification accuracy but also enhanced precision, recall, and F1 scores across all hierarchy levels, with XGBoost_PIG consistently delivering the best results. These improvements demonstrate PIG’s effectiveness in addressing the limitations of traditional IG and ATI metrics by providing a balanced evaluation of both statistical significance and taxonomic coherence. The validation results underscore the importance of hierarchy-aware evaluation in achieving robust and semantically consistent hierarchical classification.

7. Discussion

The consistent performance improvements observed with PIG across multiple tree-based models (random forest and XGBoost) confirm its utility as an enhanced split criterion for hierarchical classification tasks. PIG effectively balances entropy reduction and taxonomic coherence, making it a valuable addition to hierarchical classification pipelines.

The experimental setup underscores the importance of incorporating hierarchy-aware split metrics and robust text preprocessing methods for hierarchical classification tasks. By combining BPE, BERT embeddings, and hierarchy-aware metrics (ATI and PIG), the experiments achieved a significant step toward improving both classification accuracy and semantic coherence in hierarchical contexts.

The ROC curve (Figure 4) demonstrates that XGBoost and random forest models, using hierarchy-aware split criteria (ATI and PIG), achieve improved classification performance compared to the standard Information Gain (IG). The curves for ATI and PIG exhibit better alignment toward the top-left region, indicating higher true positive rates and lower false positive rates across multiple classes. Models employing Penalized Information Gain (PIG) show consistent advantages, especially in handling complex and imbalanced hierarchical classes. Overall, the results highlight the effectiveness of integrating hierarchy-aware metrics into tree-based models for hierarchical classification tasks.

Models employing Penalized Information Gain (PIG) exhibit fewer misclassifications and better distribution across true positive predictions (Figure 5 and Figure 6), especially at higher levels (family and segment). In contrast, Information Gain (IG) shows higher confusion, particularly at granular levels like brick and class, indicating its limitations in handling hierarchical dependencies. Overall, the use of PIG leads to improved alignment with the true labels, demonstrating its efficacy in preserving hierarchical consistency while reducing classification errors.

Our experiments highlighted that IG, while effective for reducing entropy, fails to consider the semantic structure of hierarchical data. This shortcoming became evident in the provided examples, where the IG values remained constant despite varying degrees of misalignment with the taxonomy. Such results underscore the inadequacy of IG in capturing the complex dependencies present in hierarchical classification.

ATI, on the other hand, successfully reflects taxonomic relevance by measuring how well splits preserve the hierarchical structure. However, ATI alone does not account for the statistical quality of splits, which is critical for decision tree construction. This limitation was evident in cases where ATI captured the semantic losses but overlooked the entropy reduction achieved by splits [15].

The proposed PIG metric effectively addresses these gaps by combining the strengths of IG and ATI. By introducing a penalty factor derived from taxonomic informativeness, PIG aligns statistical optimization with the semantic structure of the hierarchy. As demonstrated in the examples, PIG penalizes splits that disrupt the taxonomy (e.g., Example 3) while rewarding splits that achieve a balance between entropy reduction and taxonomic alignment (e.g., Example 1). This makes PIG particularly suitable for tasks requiring both statistical robustness and semantic coherence.

Our work highlights the importance of hierarchy-aware split evaluation metrics in improving classification performance. By ensuring that splits respect both statistical and taxonomic criteria, PIG enhances the interpretability and utility of decision trees for hierarchical classification tasks. Future research will explore integrating PIG into ensemble models, such as random forests and gradient boosting, and extending its application to multi-label hierarchical classification problems.

8. Conclusions

This study addressed the limitations of traditional split evaluation metrics in hierarchical classification by comparing Information Gain (IG), Average Taxonomic Informativeness (ATI), and the proposed Penalized Information Gain (PIG). While IG effectively reduces entropy, it fails to capture the semantic structure of hierarchical data, resulting in splits that disregard taxonomic relationships. ATI improves upon this by reflecting taxonomic relevance but lacks consideration of the statistical properties essential in decision tree construction. In contrast, PIG integrates the strengths of both IG and ATI, aligning statistical optimization with hierarchical consistency. By introducing a penalty factor based on taxonomic informativeness, PIG penalizes splits that violate the hierarchy while rewarding those that balance entropy reduction and taxonomic alignment. The experimental results demonstrated PIG’s ability to enhance classification interpretability and generalization, particularly in complex hierarchical structures.

The novelty of PIG lies in its dual optimization approach, which ensures that decision trees respect hierarchical dependencies while maintaining statistical rigor, an improvement over existing methods that typically prioritize one aspect over the other. Although PIG enhances split evaluation, it presents certain limitations, including increased computational complexity and sensitivity to the choice of hyperparameters.

Future research will focus on the following:

Comparative analysis to evaluate PIG against other state-of-the-art approaches for hierarchical classification, including neural network-based and hybrid methods.
Scalability by applying the proposed methodology to larger and more diverse datasets across domains such as bioinformatics, product taxonomy, and medical hierarchies.
Model extensions by exploring the integration of PIG into ensemble methods like gradient boosting, as well as hierarchical multi-label and deep learning frameworks.

This future direction aims to further validate the robustness of PIG and expand its applicability to complex, real-world hierarchical classification problems.

Author Contributions

Conceptualization, O.N. and A.D.; methodology, O.N.; software, M.A. and V.A.; validation, O.N., V.T. and M.A.; formal analysis, O.N. and A.D.; investigation, O.N.; resources, V.A. and V.T.; data curation, V.A., O.N. and V.T.; writing—original draft preparation, O.N.; writing—review and editing, A.D.; visualization, A.D.; supervision, V.T.; project administration, M.A.; funding acquisition, V.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data supporting the reported results can be found at https://www.directionsforme.org/categories (accessed on 17 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Junardi, W.; Khodra, M.L. Automatic Multi-Label Classification for GDP Economic-Phenomenon News. In Proceedings of the 2020 International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia, 19–20 November 2020; IEEE: Bandung, Indonesia, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Cotacallapa, H.; Saboya, N.; Canas Rodrigues, P.; Salas, R.; Linkolk López-Gonzales, J. A Flat-Hierarchical Approach Based on Machine Learning Model for e-Commerce Product Classification. IEEE Access 2024, 12, 72730–72745. [Google Scholar] [CrossRef]
Uddin, M.A.; Aryal, S.; Bouadjenek, M.R.; Al-Hawawreh, M.; Talukder, M.A. Hierarchical Classification for Intrusion Detection System: Effective Design and Empirical Analysis. arXiv 2024. [Google Scholar] [CrossRef]
Cao, Y.-K.; Wei, Z.-Y.; Tang, Y.-J.; Jin, C.-K. Hierarchical Label Text Classification Method with Deep-Level Label-Assisted Classification. In Proceedings of the 2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS), Xiangtan, China, 12–14 May 2023; IEEE: Xiangtan, China, 2023; pp. 1467–1474. [Google Scholar] [CrossRef]
Chang, C.-M.; Mishra, S.D.; Igarashi, T. A Hierarchical Task Assignment for Manual Image Labeling. In Proceedings of the 2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Memphis, TN, USA, 14–18 October 2019; IEEE: Memphis, TN, USA, 2019; pp. 139–143. [Google Scholar] [CrossRef]
Fan, Q.; Qiu, C. Hierarchical Multi-Label Text Classification Method Based on Multi-Level Decoupling. In Proceedings of the 2023 3rd International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 24–26 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 453–457. [Google Scholar]
Rejeb, A.; Keogh, J.G.; Martindale, W.; Dooley, D.; Smart, E.; Simske, S.; Wamba, S.F.; Breslin, J.G.; Bandara, K.Y.; Thakur, S.; et al. Charting Past, Present, and Future Research in the Semantic Web and Interoperability. Future Internet 2022, 14, 161. [Google Scholar] [CrossRef]
Global Product Classification (GPC). Available online: https://www.gs1.org/standards/gpc (accessed on 17 December 2024).
Sepúlveda-Rojas, J.P.; Aravena, S.; Carrasco, R. Increasing Efficiency in Furniture Remanufacturing with AHP and the SECI Model. Sustainability 2024, 16, 10339. [Google Scholar] [CrossRef]
Kong, X.; Zhu, X.; Wang, M.; Wang, X.; Zou, M. Text Classification for Social Governance: A Novel Strategy with Adaptive Reward Mechanisms. In Proceedings of the 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 10–12 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 339–343. [Google Scholar] [CrossRef]
Zhang, Q.; Chai, B.; Song, B.; Zhao, J. A Hierarchical Fine-Tuning Based Approach for Multi-Label Text Classification. In Proceedings of the 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 10–13 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 51–54. [Google Scholar] [CrossRef]
Li, J.; Wang, H.; Song, C.; Han, R.; Hu, T. Research on Hierarchical Clustering Undersampling and Random Forest Fusion Classification Method. In Proceedings of the 2021 IEEE International Conference on Progress in Informatics and Computing (PIC), Shanghai, China, 17–19 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 53–57. [Google Scholar] [CrossRef]
Tangirala, S. Evaluating the Impact of GINI Index and Information Gain on Classification Using Decision Tree Classifier Algorithm*. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 277. [Google Scholar] [CrossRef]
Peng, H.; Li, J.; Wang, S.; Wang, L.; Gong, Q.; Yang, R.; Li, B.; Yu, P.S.; He, L. Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification. IEEE Trans. Knowl. Data Eng. 2021, 33, 2505–2519. [Google Scholar] [CrossRef]
Rao, S.X.; Egger, P.H.; Zhang, C. Hierarchical Classification of Research Fields in the “Web of Science” Using Deep Learning 2024. arXiv 2023. [Google Scholar] [CrossRef]
Silla, C.N.; Freitas, A.A. A Survey of Hierarchical Classification across Different Application Domains. Data Min. Knowl. Discov. 2011, 22, 31–72. [Google Scholar] [CrossRef]
Nowozin, S. Improved Information Gain Estimates for Decision Tree Induction. In Proceedings of the 29th International Conference on Machine Learning (ICML’12), Edinburgh, Scotland, 26 June–1 July 2012; Omnipress: Madison, WI, USA, 2012; pp. 571–578. [Google Scholar] [CrossRef]
Naik, A.; Rangwala, H. Filter Based Taxonomy Modification for Improving Hierarchical Classification. arXiv 2016. [Google Scholar] [CrossRef]
Cai, X.; Xiao, M.; Ning, Z.; Zhou, Y. Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-Based Data Augmentation. In Proceedings of the 2023 IEEE International Conference on Data Mining (ICDM), Shanghai, China, 1–4 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 956–961. [Google Scholar]
Heinsen, F.A. Tree Methods for Hierarchical Classification in Parallel. arXiv 2022. [Google Scholar] [CrossRef]
Asadi, A.R. An Entropy-Based Model for Hierarchical Learning 2023. arXiv 2022. [Google Scholar] [CrossRef]
Wadud, M.d.A.H.; Rashadul, M.d. Text Coherence Analysis Based on Misspelling Oblivious Word Embeddings and Deep Neural Network. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 124. [Google Scholar] [CrossRef]
Doroshenko, A. Application of Global Optimization Methods to Increase the Accuracy of Classification in the Data Mining Tasks. Comput. Model. Intell. Syst. 2019, 2353, 98–109. [Google Scholar] [CrossRef]
Dindar, S.; Kaewunruen, S.; An, M. A Hierarchical Bayesian-Based Model for Hazard Analysis of Climate Effect on Failures of Railway Turnout Components. Reliab. Eng. Syst. Saf. 2022, 218, 108130. [Google Scholar] [CrossRef]
Busson, A.J.G.; Rocha, R.; Gaio, R.; Miceli, R.; Pereira, I.; Moraes, D.d.S.; Colcher, S.; Veiga, A.; Rizzi, B.; Evangelista, F.; et al. Hierarchical Classification of Financial Transactions Through Context-Fusion of Transformer-Based Embeddings and Taxonomy-Aware Attention Layer. In Proceedings of the Anais do II Brazilian Workshop on Artificial Intelligence in Finance (BWAIF 2023), João Pessoa, Paraíba, 6–11 August 2023; pp. 13–24. [Google Scholar] [CrossRef]
Dini, L.; Brunato, D.; Dell’Orletta, F.; Caselli, T. TEXT-CAKE: Challenging Language Models on Local Text Coherence. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; Rambow, O., Wanner, L., Apidianaki, M., Al-Khalifa, H., Eugenio, B.D., Schockaert, S., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 4384–4398. [Google Scholar]
Shu, H.; Cao, L.; Xu, Z.; Liu, K. The Research of Multidimensional Information Decision Mining Based on Information Entropy. In Proceedings of the 2009 International Forum on Information Technology and Applications, Chengdu, China, 15–17 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 732–735. [Google Scholar]
Song, J.; Zhang, P.; Qin, S.; Gong, J. A Method of the Feature Selection in Hierarchical Text Classification Based on the Category Discrimination and Position Information. In Proceedings of the 2015 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration, Wuhan, China, 3–4 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 132–135. [Google Scholar]
Mutsaddi, A.; Jamkhande, A.; Thakre, A.; Haribhakta, Y. BERTopic for Topic Modeling of Hindi Short Texts: A Comparative Study. arXiv 2025. [Google Scholar] [CrossRef]
Williams, L.; Anthi, E.; Burnap, P. Comparing Hierarchical Approaches to Enhance Supervised Emotive Text Classification. Big Data Cogn. Comput. 2024, 8, 38. [Google Scholar] [CrossRef]
Zangari, A.; Marcuzzo, M.; Rizzo, M.; Giudice, L.; Albarelli, A.; Gasparetto, A. Hierarchical Text Classification and Its Foundations: A Review of Current Research. Electronics 2024, 13, 1199. [Google Scholar] [CrossRef]
Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Machine Learning: ECML-98; Nédellec, C., Rouveirol, C., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1998; Volume 1398, pp. 137–142. ISBN 978-3-540-64417-0. [Google Scholar]
Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 1480–1489. [Google Scholar] [CrossRef]
Basu, S.; Bilenko, M.; Mooney, R.J. A Probabilistic Framework for Semi-Supervised Clustering. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; ACM: New York, NY, USA, 2004; pp. 59–68. [Google Scholar] [CrossRef]
Sebastiani, F. Classification of Text, Automatic. In Encyclopedia of Language & Linguistics; Elsevier: Amsterdam, The Netherlands, 2006; pp. 457–462. ISBN 978-0-08-044854-1. [Google Scholar]
Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier Chains for Multi-Label Classification. Mach. Learn. 2011, 85, 333–359. [Google Scholar] [CrossRef]
Sun, A.; Lim, E.-P. Hierarchical Text Classification and Evaluation. In Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, CA, USA, 29 November–2 December 2001; IEEE: Piscataway, NJ, USA, 2001; pp. 521–528. [Google Scholar]
Narushynska, O.; Teslyuk, V.; Doroshenko, A.; Arzubov, M. Data Sorting Influence on Short Text Manual Labeling Quality for Hierarchical Classification. Big Data Cogn. Comput. 2024, 8, 41. [Google Scholar] [CrossRef]
Kosmopoulos, A.; Paliouras, G.; Androutsopoulos, I. Probabilistic Cascading for Large Scale Hierarchical Classification. arXiv 2015. [Google Scholar] [CrossRef]
Kononenko, I.; Kukar, M. Measures for Evaluating the Quality of Attributes. In Machine Learning and Data Mining; Elsevier: Amsterdam, The Netherlands, 2007; pp. 153–180. ISBN 978-1-904275-21-3. [Google Scholar]
Kosmopoulos, A.; Partalas, I.; Gaussier, E.; Paliouras, G.; Androutsopoulos, I. Evaluation Measures for Hierarchical Classification: A Unified View and Novel Approaches. Data Min. Knowl. Discov. 2015, 29, 820–865. [Google Scholar] [CrossRef]
Plaud, R.; Labeau, M.; Saillenfest, A.; Bonald, T. Revisiting Hierarchical Text Classification: Inference and Metrics 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024. [Google Scholar] [CrossRef]
Warwick, R.M. Average Taxonomic Diversity and Distinctness. In Encyclopedia of Ecology; Elsevier: Amsterdam, The Netherlands, 2008; pp. 300–305. ISBN 978-0-08-045405-4. [Google Scholar]
Hwang, S.; Yeo, H.G.; Hong, J.-S. A New Splitting Criterion for Better Interpretable Trees. IEEE Access 2020, 8, 62762–62774. [Google Scholar] [CrossRef]
Naik, A.; Rangwala, H. Inconsistent Node Flattening for Improving Top-down Hierarchical Classification. arXiv 2017. [Google Scholar] [CrossRef]
Yang, J.; Gao, M.; Kuang, K.; Ni, B.; She, Y.; Xie, D.; Chen, C. Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study. In Medical Image Computing and Computer Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12266, pp. 497–507. [Google Scholar] [CrossRef]
Doroshenko, A.; Tkachenko, R. Classification of Imbalanced Classes Using the Committee of Neural Networks. In Proceedings of the 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 11–14 September 2018; IEEE: Lviv, Ukraine, 2018; pp. 400–403. [Google Scholar] [CrossRef]
Product Categories. Available online: https://www.directionsforme.org/categories (accessed on 17 December 2024).
Vilar, D.; Federico, M. A Statistical Extension of Byte-Pair Encoding. In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), Bangkok, Thailand, 5–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 263–275. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
Teslyuk, V.; Doroshenko, A.; Narushynska, O. Preprocessing Product Descriptions with Byte Pair Encoding: A Solution for Abbreviation-Heavy Texts//CEUR Workshop Proceedings. In Proceedings of the Computational Intelligence Application Workshop (CIAW 2024), Lviv, Ukraine, 10–12 October 2024; CEUR: Lviv, Ukraine, 2024; Volume 3861, pp. 57–76. [Google Scholar]
Romeo, L.; Frontoni, E. A Unified Hierarchical XGBoost Model for Classifying Priorities for COVID-19 Vaccination Campaign. Pattern Recognit. 2022, 121, 108197. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Hierarchical taxonomy of product classification with example assignments.

Figure 2. Class (bricks level) proportion for train and test datasets.

Figure 3. Performance evaluation of hierarchical classification: F1 score, precision, and recall across taxonomy levels.

Figure 4. ROC curve comparison for hierarchical classification models across multiple classes for brick level.

Figure 5. Multimodel normalized confusion matrices across hierarchical segment level.

Figure 6. Multimodel normalized confusion matrices across hierarchical family level.

Table 1. Examples of hierarchical splits for product classification using real levels and splits.

Product Name	Real Level 1	Real Level 2	Real Level 3	Real Level 4	Split
Example 1
“Pics 10” “Flour Burrito Tortillas”	Food/Beverage	Bread/Bakery Products	Bread	Bread (Perishable)	Left
“Schmidt’s Delicatessen” “No Seeds” “Rye Bread”	Food/Beverage	Bread/Bakery Products	Bread	Bread (Perishable)	Left
Ahold 100% Whole Wheat Bread	Food/Beverage	Bread/Bakery Products	Bread	Bread (Perishable)	Right
Tia Rosa Flour Tortillas Burrito	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Right
Turano Gourmet Sandwich Rolls—8 CT	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Right
Udi’s Gluten Free Classic Hamburger Buns—4 CT	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Left
Example 2
Tia Rosa Flour Tortillas Burrito	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Left
Turano Gourmet Sandwich Rolls—8 CT	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Left
Udi’s Gluten Free Classic Hamburger Buns—4 CT	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Right
Ahold Fiber Select Brownies Chocolate Fudge—6 CT	Food/Beverage	Cereal/Grain/ Pulse Products	Processed Cereal Products	Cereals Products— Not Ready to Eat (Shelf Stable)	Right
American Classic Gourmet Muffins Raisin Bran—4 CT	Food/Beverage	Cereal/Grain/ Pulse Products	Processed Cereal Products	Cereals Products— Not Ready to Eat (Shelf Stable)	Right
Balconi Choco & Latte Sponge Cakes—10 CT	Food/Beverage	Cereal/Grain/ Pulse Products	Processed Cereal Products	Cereals Products— Not Ready to Eat (Shelf Stable)	Left
Example 3
Tia Rosa Flour Tortillas Burrito	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Left
Turano Gourmet Sandwich Rolls—8 CT	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Left
Udi’s Gluten Free Classic Hamburger Buns—4 CT	Food/Beverage	Bread/Bakery Products	Bread	Bread (Shelf Stable)	Right
Wheatena Toasted Wheat Cereal	Food/Beverage	Cereal/Grain/ Pulse Products	Processed Cereal Products	Cereals Products— Not Ready to Eat (Shelf Stable)	Right
Bobs Red Mill Old Fashioned Whole Grain Rolled Oats 32 oz STAND PACK	Food/Beverage	Cereal/Grain/ Pulse Products	Processed Cereal Products	Cereals Products— Not Ready to Eat (Shelf Stable)	Right
Hodgson Mill Oat Bran Hot Cereal	Food/Beverage	Cereal/Grain/ Pulse Products	Processed Cereal Products	Cereals Products— Not Ready to Eat (Shelf Stable)	Left

Table 2. Time trade-offs between different labeling approaches.

Example	IG	ATI	PIG
1	0.082	0.92	0.023
2	0.082	0.83	0.021
3	0.082	0.75	0.020

Table 3. Comparison of hierarchical classification performance metrics (IG, ATI, and PIG) across different models and hierarchy levels.

Name	Brick_Test_Accuracy	Brick_Test_F1 Score	Brick_Test_Precision	Brick_Test_Recall	Class_Test_Accuracy	Class_Test_F1 Score	Class_Test_Precision	Class_Test_Recall	Family_Test_Accuracy	Family_Test_F1 Score	Family_Test_Precision	Family_Test_Recall	Segment_Test_Accuracy	Segment_Test_F1 Score	Segment_Test_Precision	Segment_Test_Recall
1_RandomForest_IG (standart)	0.54	0.52	0.56	0.54	0.62	0.60	0.62	0.62	0.69	0.68	0.68	0.69	0.96	0.96	0.96	0.96
1_XGBoost_IG (standart)	0.56	0.56	0.57	0.56	0.63	0.63	0.63	0.63	0.70	0.69	0.70	0.70	0.96	0.96	0.96	0.96
2_RandomForest_ATI	0.55	0.52	0.57	0.55	0.62	0.61	0.63	0.62	0.69	0.68	0.69	0.69	0.96	0.96	0.96	0.96
2_XGBoost_ATI	0.65	0.63	0.68	0.65	0.74	0.72	0.74	0.74	0.80	0.80	0.80	0.80	0.98	0.98	0.98	0.98
3_RandomForest_PIG	0.66	0.63	0.68	0.66	0.74	0.72	0.74	0.74	0.80	0.80	0.80	0.80	0.98	0.98	0.98	0.98
3_XGBoost_PIG	0.69	0.69	0.70	0.69	0.77	0.77	0.77	0.77	0.84	0.84	0.84	0.84	0.99	0.99	0.99	0.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Narushynska, O.; Doroshenko, A.; Teslyuk, V.; Antoniv, V.; Arzubov, M. Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment. Big Data Cogn. Comput. 2025, 9, 65. https://doi.org/10.3390/bdcc9030065

AMA Style

Narushynska O, Doroshenko A, Teslyuk V, Antoniv V, Arzubov M. Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment. Big Data and Cognitive Computing. 2025; 9(3):65. https://doi.org/10.3390/bdcc9030065

Chicago/Turabian Style

Narushynska, Olga, Anastasiya Doroshenko, Vasyl Teslyuk, Volodymyr Antoniv, and Maksym Arzubov. 2025. "Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment" Big Data and Cognitive Computing 9, no. 3: 65. https://doi.org/10.3390/bdcc9030065

APA Style

Narushynska, O., Doroshenko, A., Teslyuk, V., Antoniv, V., & Arzubov, M. (2025). Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment. Big Data and Cognitive Computing, 9(3), 65. https://doi.org/10.3390/bdcc9030065

Article Menu

Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment

Abstract

1. Introduction

2. Relevant Research

3. Materials and Methods

3.1. Decision Tree Splitting Criteria

3.1.1. Information Gain (IG) [40]

3.1.2. Average Taxonomic Informativeness (ATI) [43]

3.1.3. Penalized Information Gain (PIG)

3.2. Addressing Challenges in Imbalanced Hierarchical Datasets

3.3. Encouraging Hierarchical Balance

3.4. Quantitative Foundation for Balance Preservation

4. Experimental Settings

4.1. Dataset

4.2. Experimental Validation of PIG Performance

5. Experimental Setup

6. Results

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI