1. Introduction
The global apparel industry is undergoing rapid transformation, driven by athleisure consumption, shorter product cycles, and rising demand for personalization. These dynamics intensify demand volatility and compress planning horizons, particularly for functional garments, where performance attributes and aesthetics require frequent updates [
1]. Consequently, forecast-driven and order-lagged planning can become structurally misaligned with rapidly changing consumer preferences, increasing the risk of overproduction and obsolete inventory. Building on supply chain research emphasizing the alignment between product characteristics and supply chain design, agile and responsive planning approaches are widely recognized as critical in fashion-oriented markets.
As manufacturing paradigms evolve from Industry 4.0 to Industry 5.0 [
2], with increased focus on human-centricity, sustainability, and resilience, upstream textile operations such as dyeing and finishing continue to encounter significant constraints related to information latency and dependence on downstream orders [
3]. This structural lag intensifies demand distortions as signals propagate upstream, a phenomenon known as the bullwhip effect, in which minor fluctuations at the consumer level are progressively magnified throughout the supply chain, ultimately compromising operational stability [
4]. Taiwan’s upstream textile sector offers a particularly instructive context for examining this issue. Although the industry is recognized for its strength in functional fabric innovation and its critical role in global performance textile supply networks, most upstream firms remain resource-constrained and operate primarily on a make-to-order basis. In contrast to large enterprises that invest in proprietary data ecosystems, upstream small and medium-sized enterprises (SMEs) lack systematic external sensing capabilities to detect early demand shifts, resulting in reactive capacity planning and increased vulnerability to post-pandemic volatility [
5].
Taiwan represents a particularly suitable empirical context for this study for four primary reasons. First, Taiwan remains a globally significant supplier of functional and sustainable textiles, with exports reaching US$6.1 billion in 2024, demonstrating ongoing international relevance and substantial exposure to evolving end-market demand. Second, the sector is characterized by a dense network of specialized upstream and midstream manufacturers, many of which face resource constraints and have limited visibility into downstream market signals. Third, major e-commerce platforms in Taiwan generate extensive, publicly accessible consumer engagement data in apparel-related categories, creating a robust environment for evaluating whether external market signals can inform upstream production intelligence. Fourth, post-pandemic uncertainty in export markets and order visibility positions Taiwan’s textile industry as a directly relevant context for investigating resilience-oriented planning under demand volatility. Collectively, these factors establish Taiwan’s upstream textile sector as an analytically rigorous and practically significant setting for examining how customer-to-manufacturer (C2M) intelligence can enhance SME supply chain resilience.
Therefore, the challenge lies not only in technological limitations but also in organizational factors: effectively converting fragmented market signals into actionable production intelligence without displacing experienced planners. The customer-to-manufacturer (C2M) paradigm presents a promising approach by linking consumer behavior to upstream manufacturing decisions [
6]. E-commerce environments generate high-frequency signals, including product attributes, consumer feedback, and engagement traces, which can serve as leading indicators of demand sensing [
7]. Nevertheless, a persistent industrial integration gap remains in terms of transforming raw, noisy, and unstructured public signals into granular, attribute-level intelligence (e.g., color, material, and function) suitable for upstream planning [
8]. Furthermore, the literature frequently underemphasizes the human-centric requirements for SMEs; practical digital transformation necessitates interpretable decision support that reduces information latency and bridges the semantic gap between consumer language and engineering parameters.
This study addresses these challenges by developing and field-validating an upstream attribute-level consumer-to-manufacturer (C2M) decision-support framework using publicly available Shopee e-commerce data in Taiwan. Specifically, we pursued three objectives: (1) to develop a robust and platform-friendly data acquisition pipeline for standardizing real-time consumer signals; (2) to benchmark machine-learning models for high-cardinality textile attributes and develop a Neural Boosted Tree approach with entity embeddings to improve generalization; and (3) to operationalize probabilistic forecasts into a human-centric “Traffic-Light” dashboard to reduce cognitive load in shop-floor decision-making. The framework was deployed in a 12-month longitudinal field study at a Taiwanese dyeing SME, where its implementation was associated with audited, observed improvements in firm records, including reduced inventory value, fewer dye-lot changeovers, and higher capacity utilization. Collectively, this study demonstrates a replicable pathway for upstream SMEs to enhance resilience through low-cost digital intelligence, aligning operational decision-making with Industry 5.0 principles.
This study is structured around three primary research questions. First, it investigates whether publicly available e-commerce consumer signals can be ethically acquired and transformed into attribute-level demand intelligence suitable for upstream textile production planning. Second, it examines whether a neural-boosted tree architecture with entity embeddings outperforms conventional tree-based forecasting models when applied to high-cardinality, noisy textile demand data. Third, it explores whether a probabilistic, human-centric decision support system can operationalize such forecasts to deliver measurable improvements in supply chain resilience for small and medium-sized enterprises (SMEs) in real-world longitudinal deployments. The research contributions are threefold. The study extends the customer-to-manufacturer (C2M) paradigm from finished-product retail forecasting to upstream, component-level manufacturing, specifically dyeing and finishing operations, thereby providing an end-to-end framework applicable to resource-constrained SMEs. It introduces a domain-adapted neural-boosted tree model with entity embeddings, validated against five benchmark architectures using 3.87 million consumer records across 127,846 product listings under high-cardinality textile data conditions. Finally, it designs and deploys a Bayesian Monte Carlo probabilistic framework, operationalized as a traffic-light dashboard, that translates forecast uncertainty into interpretable shopfloor signals and embodies Industry 5.0 principles of human-centricity and resilience.
The remainder of this paper is structured as follows.
Section 2 reviews the relevant literature on supply chain resilience, C2M, and human-centric decision support.
Section 3 details the system architecture and ethical data acquisition methodology.
Section 4 presents benchmarking results and insights from industrial deployment.
Section 5 discusses theoretical and practical implications.
Section 6 concludes with limitations and future research.
3. Methodology
This study introduces an end-to-end C2M intelligence framework designed to bridge the semantic gap between unstructured public e-commerce signals and shop-floor production parameters. Addressing the specific resilience needs of SMEs, the system is designed as a low-cost, modular solution that converts noisy external data into attribute-level production intelligence. The system architecture, illustrated in
Figure 1, comprises four integrated modules: (1) publicly visible data acquisition (a platform-friendly pipeline); (2) semantic interoperability and ontology mapping, standardizing consumer language into engineering taxonomies; (3) high-cardinality attribute learning using neural boosted trees with entity embeddings; and (4) probabilistic decision support using Bayesian inference that enables risk-aware production planning. Bayesian inference is a statistical method that systematically updates uncertainty estimates as new data become available, producing probability distributions that quantify forecast risk instead of relying on single-value predictions.
3.1. Ethical Acquisition of Publicly Visible E-Commerce Data
To support demand forecasting and production planning for functional textile products, we developed a modular data acquisition pipeline to collect publicly available product and review information from Shopee Taiwan. The pipeline employs a server-friendly, ethics-aware design that prioritizes lightweight data exchange, conservative request scheduling, and avoids full-page rendering. This approach reduces network overhead while maintaining data completeness and robustness to changes in the front-end interface. The three-stage architecture (
Figure 2) integrates browser-based enumeration of public product listings, structured ingestion of lightweight item metadata, and retrieval of publicly visible consumer reviews.
Products were enumerated across four functional categories—Moisture-wicking, Cooling, Thermal, and Windproof/Waterproof—reflecting common functional classifications in the apparel and textile markets. Due to the dynamic loading of listings and potential exceedance of standard pagination limits, browser automation was employed to replicate typical user interactions, such as incremental scrolling, and to record network responses generated by the platform’s public listing interface. From these publicly visible responses, product and shop identifiers were extracted to construct a comprehensive sampling frame that remained resilient to minor front-end updates.
The collected identifiers were used to retrieve structured item-level metadata via publicly accessible item-detail interfaces, yielding compact, JSON-formatted records (
Figure 3). This metadata includes variant-level attributes (e.g., color and size) and time-indexed historical sales indicators, supporting semantic mapping and demand forecasting for functional textile products. Compared with browser-only retrieval, this approach substantially reduces computational and bandwidth overheads. All requests were scheduled using conservative rate-limiting and back-off strategies to ensure platform stability.
To capture downstream consumer engagement as a proxy for market demand, publicly visible product reviews were collected via the same interfaces accessible to standard web clients. Data collection employed session-aware scheduling and request throttling to prevent disruptions to platform services. Only information necessary for analysis–specifically, review text and aggregated rating indicators was retained, and no personally identifiable information was stored. For reproducibility, we report the data fields used, sampling criteria, and cleaning procedures, omitting implementation details that could be misused to circumvent platform protections.
In this study, we collected only publicly available information and did not access private user accounts or restricted content. Our data collection complied with platform access constraints and used conservative rate limiting to minimize platform load.
We did not conduct interviews, surveys, or controlled experiments with individual participants in this phase. It should be noted that the domain-expert review described in
Section 3.2 (Stage 2) constitutes an internal professional validation procedure conducted by research team members, rather than human-subjects data collection, and is therefore governed by distinct ethical considerations. Consequently, no personally identifiable information was collected or processed, and institutional review board (IRB) approval was not required for the procedures described herein. Owing to platform terms, we cannot publicly release raw records. However, we provide derived features and processing logic to support reproducibility.
3.2. Domain-Driven Semantic Ontology Mapping and Data Harmonization
Given the high heterogeneity of raw e-commerce text, we engineered a rigorous four-stage pipeline (
Figure 4) to transform unstructured descriptions into a standardized Industry 5.0 Semantic Ontology, creating a modeling-ready dataset of 51,072 observations.
To address data sparsity (18.4% attribute-level missingness), a hybrid recovery strategy was employed. A curated domain-specific lexicon, comprising 480 functional textile keywords, was combined with fuzzy string-matching algorithms (Levenshtein ratio ≥ 0.88) to recover missing attribute values from product titles. Listings with more than 50% missing critical attributes were algorithmically excluded, balancing data quality and sample coverage and resulting in a controlled attrition rate of 11.9%.
To address extreme lexical variation, such as 47 distinct strings representing “Nylon,” a unified color–material–function ontology was developed. FastText embeddings were employed for semantic clustering, followed by an internal domain-knowledge review. Two research team members, each with over 10 years of textile industry experience, independently assessed the canonical color–material–function triplets for semantic accuracy and manufacturing relevance. This review was an internal professional validation. No external participants were involved, and no additional IRB oversight was required (see
Section 3.1). This process reduced the attribute cardinality from 12,847 raw strings to a canonical set of 2128 unique triplets (28 colors, 19 materials, and four functions). Dimensionality reduction retained 99.3% of the variance in principal component analysis (PCA) projections, minimizing information loss and ensuring semantic interoperability.
To mitigate the impact of non-stationary market regimes induced by external shocks (e.g., pandemic-related demand distortions observed between 2021 and 2022), direct temporal concatenation of annual data was deliberately avoided. Instead, a volume-weighted temporal aggregation was employed to harmonize item-level demand signals across years.
where
denotes the observed demand proxy for item
in year
, and
represents the total platform sales volume in year
t. By weighting annual observations according to overall market activity, this aggregation reduces the influence of transient demand anomalies and preserves the relative demand structure across items. As a result, the training signal remains robust to macro-environmental shifts, thereby enhancing the stability and resilience of the downstream demand modeling process.
A critical engineering challenge was validating “Comment Count” as a reliable proxy for “True Sales.” We used a calibration subset of 52,317 items observed during a transient system update window, during which exact sales figures were publicly available in an unredacted format. The empirical validation results (
Table 2 and
Figure 5) demonstrate a near-perfect log-log correlation (Pearson
, Spearman
).
As shown in
Figure 5, the narrow prediction band confirmed the high fidelity of the proxy. The derived conversion ratio (mean
= 0.0437) exhibited high cross-sectional stability (95% CI: [0.0412, 0.0465]). Based on this compelling evidence, we formally adopted the cumulative comment count per standardized attribute triplet as the target variable
. This approach provides a scientifically validated alternative to the coarse rounded sales figures typically available on public platforms.
3.3. High-Cardinality Feature Learning and Model Benchmarking
We formulated the attribute-level demand-forecasting task as a supervised regression problem. Let be the dataset where is the temporally harmonized comment count for a unique product entity. The high-dimensional feature vector comprises continuous variables (price, product age, sentiment score, and competitor density) and high-cardinality categorical attributes (e.g., specific color/material codes). A critical challenge in textile informatics is the “curse of dimensionality” introduced by categorical features. Standard encoding methods (e.g., One-Hot Encoding) often yield sparse matrices that degrade tree-based learners. To address this, we implemented a systematic benchmarking protocol that compared five architectures spanning the spectrum of ensemble learning. Modeling was conducted independently for each of the four functional product categories to account for the market heterogeneity.
We evaluated representatives from three paradigms using identical stratified train–validation splits (80%/20%) to isolate algorithmic efficacy.
- (1)
Baseline: The CART model served as an interpretability baseline.
- (2)
Bagging: A bootstrap forest was selected to evaluate the variance-reduction capabilities of bagging on noisy e-commerce data.
- (3)
Boosting: We compared gradient-boosted trees with XGBoost. Although XGBoost is the industry standard, its default handling of high-cardinality data through sparse splits can be susceptible to overfitting in “small-N, large-P” regimes.
- (4)
Hybrid Architecture: We introduced Neural Boosted Trees (via H2O Driverless AI). This architecture hypothesizes that integrating Entity Embeddings (to capture semantic relationships in categorical data) with the residual learning of gradient boosting can improve the capture of interaction effects in sparse, high-cardinality datasets.
In the initial screening phase (
Table 3), a critical “validation collapse” was observed in the pure boosting methods. Although XGBoost achieved a near-perfect training fit (
R2 > 0.93), its validation performance degraded sharply (average ∆
R2 = 0.58). This pattern aligns with prior findings that tree-boosting methods may overfit under sparse, high-cardinality categorical regimes [
34].
In contrast, Neural Boosted Trees and Bootstrap Forest demonstrated superior architectural robustness, maintaining training–validation gaps below 0.12. The incorporation of entity embeddings within the neural architecture contributed to regularization by leveraging the inherent sparsity of textile attribute taxonomies. Following this screening, the two models were selected for intensive hyperparameter optimization using a 50-trial Tree-structured Parzen Estimator (TPE) search. Comparative results indicated that Neural Boosted Trees consistently outperformed Bootstrap Forest across cross-validation folds, with statistically significant differences (paired t-test, p < 0.01). The final model performance is reported in the Results section.
3.4. Probabilistic Decision Support for Supply Chain Resilience
3.4.1. Bayesian Uncertainty Quantification and Sales Conversion
While the harmonized comment count serves as a robust demand proxy, effective resilience planning requires forecasts expressed in actual sales units, such as meters of fabric. Relying on publicly displayed “historical monthly sales” data is insufficient because these figures are dynamically rounded and lack temporal persistence. To address this limitation, we developed a statistically rigorous conversion mechanism calibrated against the “Ground Truth” subset observed during the transient data visibility window (detailed in
Section 3.2).
We model the relationship between observed comments
and true sales
as a stochastic process. Assuming a constant conversion rate
within the category, the likelihood is modeled as a Binomial distribution
To minimize subjective bias, we employ Jeffrey’s non-informative prior for the conversion rate:
By conjugacy, the closed-form posterior distribution for
is derived as:
Calibration on the ground-truth dataset yielded a posterior mean of
(approximately 1 comment per 22.88 sales), with a tight 95% Highest Density Interval (HDI) of [0.0412, 0.0465].
To operationalize this for forecasting, we reject simple point estimates in favor of Full Uncertainty Propagation. For a given Neural Boosted Trees prediction
(predicted comments), the corresponding sales forecast
is generated as a probability distribution via Monte Carlo sampling (
K = 1000 draws):
This probabilistic formulation transforms a single scalar prediction into an empirical distribution of possible sales outcomes. This represents a critical resilience mechanism, as it enables the downstream decision support system to compute risk metrics—such as “Stockout Probability” or “conservative 95% Lower Bound”—rather than relying on brittle deterministic averages.
3.4.2. Human-Centric Operationalization: The Traffic-Light Early-Warning Interface
To bridge the gap between probabilistic forecasting outputs and shop-floor execution, the Monte Carlo ensembles generated in
Section 3.4.1 are integrated into an interactive, human-centric decision support system (DSS) (
Figure 6). The core of this interface is a rule-based traffic-light protocol designed to reduce cognitive load by translating distributional uncertainty into interpretable operational signals.
The decision logic is based on the relative position of the predicted expected sales within the category-specific empirical distribution (where denotes the k-th percentile), in conjunction with the month-over-month (MoM) growth rate (). High- and low-demand regimes are identified using percentile-based thresholds, whereas short-term momentum is captured by growth signals. The resulting decision regions are mutually exclusive and collectively exhaustive, ensuring consistent and interpretable signal assignment. The formal specification of the decision rules is provided in Algorithm 1 as follows:
| Algorithm 1. Traffic-Light Signal Assignment for Human-in-the-Loop Production Planning |
Input:
Predicted expected sales for item Category-specific empirical sales distribution percentiles: Month-over-Month growth rate: Output:Decision signal ∈ {Green, Rising Yellow, Falling Yellow, Red, Neutral} Steps:If , assign Green (Immediate Procurement). Else if , assign Rising Yellow (Emerging Trend). Else if , assign Falling Yellow (Obsolescence Risk). Else if , assign Red (Clearance/Halt). Else, assign Neutral (Monitoring only).
|
Traffic-light signals translate probabilistic demand forecasts into interpretable managerial actions by mapping forecast uncertainty and demand dynamics into structured decision regions. Percentile-based thresholds ensure robustness across product categories with heterogeneous demand scales, and the incorporation of short-term month-over-month growth captures dynamic demand transitions. The resulting decision regions are mutually exclusive and collectively exhaustive, thereby preventing ambiguous or conflicting signals.
From an operational perspective, products residing in the upper tail of the empirical demand distribution are flagged as high-priority items, triggering immediate raw material reservations to mitigate stockout risk. Conversely, products in the lower tail are identified as candidates for production suspension or inventory clearance. Intermediate demand regimes are further differentiated by short-term demand momentum, enabling planners to distinguish emerging trends from obsolescence risks. Products that do not exhibit pronounced demand signals or significant growth or decline are classified as neutral and monitored without immediate intervention.
The deployed dashboard was explicitly designed to enhance cognitive efficiency. Visualizing 95% credible intervals alongside point forecasts enables non-technical managers to intuitively assess forecast uncertainty without requiring advanced statistical training. Furthermore, the system supports dynamic, multi-dimensional filtering (e.g., isolating nylon-based products during summer months) and one-click export of ranked production schedules in CSV format, facilitating low-friction integration with legacy ERP systems commonly used by small and medium-sized enterprises.
A distinguishing feature of this framework is its ability to operationalize Bayesian uncertainty quantification within a user-friendly environment. Instead of automating production decisions, the traffic-light interface provides structured decision boundaries that support risk-informed human judgment.
4. Analysis Results
4.1. Model Benchmarking and Selection: Ensuring Algorithmic Robustness
To identify a robust forecasting engine for the proposed C2M framework, five candidate algorithms were evaluated using identical stratified train–validation splits (80%/20%). The benchmarking process comprised two phases: an initial screening to assess architectural suitability under high-cardinality demand signals, followed by intensive optimization of the shortlisted models.
During the initial screening using default hyperparameters, standard gradient boosting architectures demonstrated pronounced discrepancies between training and validation performance. While XGBoost and Gradient-Boosted Trees achieved high goodness-of-fit on the training data (R2 > 0.93), their validation performance deteriorated substantially (mean validation R2 = 0.353 and 0.550, respectively), indicating limited generalization under sparse, high-cardinality attribute regimes. Consequently, both models were excluded from subsequent optimization stages.
In contrast, Neural Boosted Trees and Bootstrap Forest exhibited more stable generalization, maintaining consistently narrower train–validation gaps and achieving higher validation performance (mean validation R2 = 0.802 and 0.753, respectively). These results motivated their selection for further optimization.
The two shortlisted models underwent intensive hyperparameter optimization using a 50-trial Tree-structured Parzen Estimator (TPE) search, combined with 5-fold stratified cross-validation. Model performance was evaluated using cross-validated R2, root mean squared error (RMSE), and prediction interval (PI) coverage. Across all functional categories, Neural Boosted Trees consistently outperformed Bootstrap Forest. A paired t-test of fold-level R2 scores indicated statistically significant differences (p < 0.01). Consequently, Neural Boosted Trees were selected for final evaluation, and their cross-validated performance is reported in the Results section.
The final 5-fold cross-validated performance of the selected Neural Boosted Tree model is summarized in
Table 4. The model achieved strong predictive accuracy across all functional apparel categories, with
R2 values ranging from 0.854 to 0.980. Performance was highest for moisture-wicking and cooling apparel, reflecting more stable and frequent demand signals in these categories. Importantly, the model maintained consistent uncertainty calibration, with prediction interval coverage close to the nominal 95% level across all categories. This balance between point accuracy and uncertainty reliability supports the suitability of the proposed approach for upstream production planning, where both forecast precision and risk awareness are required.
The selected Neural Boosted Trees model achieved an overall R2 of 0.921. Crucially, it demonstrated a 95% Prediction Interval (PI) coverage of 94.4%, indicating excellent probabilistic calibration. This metric is vital for the downstream traffic-light system, as it ensures that the communicated “risk” to human decision-makers is statistically accurate. These results support the hypothesis that the proposed hybrid architecture, which combines neural representations of high-order categorical interactions with tree-based residual correction, is well-suited to noisy, high-dimensional tabular data derived from publicly available data streams.
4.2. Temporal Generalization and Resilience to Regime Shifts
To evaluate the system’s robustness to temporal concept drift, the final model trained on data from 2021–2022 was prospectively assessed using a hold-out dataset spanning January to June 2023 (N = 18,426 new observations). This period constitutes a challenging regime transition from pandemic-influenced consumption toward post-recovery market normalization.
Despite concurrent volatility in raw material costs and consumer sentiment, the aggregate hold-out evaluation yielded an
R2 of 0.907, corresponding to a modest performance degradation of approximately 1.5% relative to cross-validation results. This limited decline indicates that the Neural Boosted Tree model maintained stable predictive capacity under a temporal regime shift, capturing demand-relevant attribute semantics without overfitting to transient, shock-driven patterns observed during the pandemic period. Category-specific results (
Table 5) further reveal interpretable seasonal behaviors aligned with the physical usage characteristics of apparel products, such as peak sensitivity in thermal categories during winter months. These findings support the suitability of the proposed framework as a robust, human-centric decision-support tool that provides reliable guidance during periods of market transition, rather than solely under stationary demand conditions.
Thermal apparel demonstrated the highest R2 value (0.901) during the winter test period (January–June), indicating strong alignment between model forecasts and peak-season demand dynamics when forecasting accuracy is most critical.
Cooling apparel exhibited a negative R2 value (−0.412) in January. This result reflects a seasonal zero-demand effect rather than a deficiency of the forecasting model. During winter months, the true demand for cooling fabrics approaches zero and exhibits minimal variance, a condition under which standard regression-based metrics such as become unstable and may yield negative values despite small absolute prediction errors. In practical terms, this category represents a structurally inactive demand regime rather than a forecasting failure. Accordingly, winter cooling apparel forecasts were treated as low-priority signals in the decision-support system and did not trigger production actions, ensuring that the observed metric anomaly did not affect operational planning or system robustness.
This edge case highlights the importance of incorporating human-in-the-loop decision support (
Section 3.4.2). Fully automated, opaque forecasting systems may generate misleading alerts when performance metrics are affected by seasonal or statistical artifacts. Conversely, the proposed traffic-light interface enables human planners to contextualize model outputs and filter out seasonally irrelevant signals. This design reflects the principles of Industry 5.0, wherein artificial intelligence is intended to augment, rather than replace, human judgment in complex manufacturing environments.
4.3. Longitudinal Field Validation: Operational Impact and Environmental, Social, and Governance (ESG) Implications
In July 2023, the full C2M framework was deployed at a partner textile SME in northern Taiwan (annual production capacity of approximately 1.2 million meters) for a 12-month longitudinal field validation. The objective was to evaluate the operational implications of transitioning from a reactive, push-based order planning paradigm to a pull-based C2M intelligence framework. Performance outcomes were assessed through a before–and–after comparison, with the 12-month period preceding deployment serving as the baseline.
The average monthly inventory holding value decreased by approximately 28%, from NT$18.4 million in the pre-deployment period to NT$13.2 million in the post-deployment period. Inventory value was calculated based on month-end stock levels recorded in the firm’s enterprise resource planning system. The release of working capital tied to slow-moving and obsolete inventory directly strengthened the firm’s short-term financial resilience under demand volatility.
Following deployment, the number of dye lot changeovers decreased by 31%. Incorporating attribute-level demand forecasts into production planning enabled the consolidation of small and fragmented orders into forecast-guided batch sizes. Reduced changeovers decreased the frequency of machine washdowns, which are resource-intensive operations in textile dyeing. This operational shift reduced water, energy, and chemical consumption, aligning production planning decisions with environmental sustainability objectives.
The effective capacity utilization increased from an average of 68% during the baseline period to 84% following deployment. This improvement was primarily due to reduced idle time, stemming from frequent setup changes and unanticipated material shortages. Increased utilization allows the SME to accommodate shorter lead times or rush orders that were previously infeasible under the push-based planning regime.
In addition to quantitative indicators, system-level operational traces provided context for the observed performance changes. Analysis of dashboard access logs, planning-cycle timestamps, and event records of schedule revisions and order releases demonstrated that the traffic light dashboard reduced information latency between market-demand signals and manufacturing actions by approximately six to eight weeks. This reduction mitigated the amplification effects typically associated with downstream order volatility, thereby enhancing responsiveness and coordination in alignment with Industry 5.0 principles.
5. Discussion
5.1. Theoretical Contributions
This study advances the literature on fashion and textile supply chains by addressing persistent gaps in upstream demand sensing, semantic translation, and human-centric decision support.
First, this research extends the customer-to-manufacturer (C2M) paradigm beyond predominantly downstream, retail-centric applications to upstream, component-level manufacturing contexts. Previous C2M studies have primarily examined finished-product forecasting and retailer responsiveness [
6]. From an information-processing perspective [
35], upstream small and medium-sized enterprises (SMEs) experience the highest information-uncertainty load in the textile value chain but receive the least timely market signals. This structural asymmetry is directly addressed by the proposed framework. This study demonstrates that consumer-generated signals can be systematically translated into attribute-level intelligence, specifically color, material, and function, which are directly actionable for upstream dye-batch scheduling. The observed 6–8-week reduction in information latency during field deployment aligns with Lee et al.’s [
4] theoretical prediction that shared demand information can substantially reduce bullwhip amplification upstream. This result provides empirical corroboration in an SME context that was previously absent from the C2M literature and addresses a key limitation regarding the applicability of C2M to upstream SMEs operating under high product variety and fragmented orders.
Second, this research advances the study of Industry 5.0 by integrating human-centric decision support into digital forecasting, thereby contributing to the dynamic capabilities literature as applied to SME digitalization [
36,
37]. In contrast to Industry 4.0, which emphasizes black-box automation, Industry 5.0 prioritizes resilience and the ongoing role of human participation [
3]. The findings indicate that resilience does not require full automation. Instead, the artificial intelligence system functions as a noise filter, processing high-frequency data so human planners can focus on high-value signals. By utilizing probabilistic forecasts to address approximately 80% of data noise, the system enables human operators to manage the remaining 20% of complex exceptions. This approach is exemplified by the ‘winter stress test,’ in which human judgment correctly identified a zero-demand artifact in cooling fabrics that standard metrics might misinterpret.
Third, this research makes a methodological contribution by addressing high-cardinality categorical data within demand signals. The results extend the empirical findings of Shwartz-Ziv and Armon [
34], who argued that deep learning does not universally outperform tree-based methods on tabular data. This study identifies a boundary condition of extreme categorical cardinality combined with semantic sparsity, where hybrid neural-boosting architectures outperform both pure deep learning and tree-based methods. Furthermore, validating ‘Comment Count’ as a high-fidelity proxy for sales (Pearson’s r = 0.954) expands the methodological toolkit for demand sensing in contexts where direct sales observations are unavailable.
5.2. Practical and Managerial Contributions
This study provides targeted, actionable insights for five practitioner groups: production planning teams, supply chain and procurement managers, SME owners and financial decision-makers, sustainability officers, and industrial policymakers. Collectively, these implications demonstrate that the proposed framework offers not only academic value but also serves as a practical intelligence tool for resource-constrained upstream manufacturers.
For production planning teams, the traffic-light dashboard offers an immediately deployable decision-support interface that does not require advanced data science expertise. Production planners may incorporate monthly C2M intelligence cycles using publicly available Shopee Taiwan consumer data, with data refreshes scheduled to align with Northern Hemisphere athletic apparel purchasing cycles (March–May and September–November). Evidence from the field shows that traditional order-lagged planning results in demand-signal latency of 6–8 weeks. Implementing weekly data refresh cycles reduces this latency to near-real-time, enabling proactive dye-batch scheduling ahead of peak seasonal demand. Planners should also identify color–material combinations that maintain green-signal status over two or more consecutive monthly cycles, as these indicate structurally stable demand anchors suitable for longer-term capacity commitments and reduced emergency setups. The system’s one-click CSV export is compatible with legacy ERP environments prevalent in Taiwanese mid-tier dyeing firms, removing the need for custom IT integration.
For supply chain and procurement managers, the probabilistic forecast outputs—specifically the 95% credible intervals generated by the Bayesian Monte Carlo engine—offer a quantitative basis for supplier negotiations. Procurement teams can use sustained green-signal attribute clusters to justify consolidating minimum order quantities with upstream yarn and chemical suppliers, thereby reducing per-batch procurement costs and inventory exposure. The documented 31% reduction in dye-lot changeovers at the partner SME demonstrates that forecast-guided batch consolidation is operationally feasible within a single planning cycle. Each avoided changeover results in an estimated reduction of 800–1200 L of process water, a metric that can be directly incorporated into procurement contracts specifying environmental performance standards. Procurement managers at firms supplying European sports brands may also use this evidence to negotiate preferential terms under sustainability-linked purchasing frameworks, as reduced changeover rates correspond to verifiable Scope 3 supply chain emission reductions.
For SME owners and financial decision-makers, the total implementation cost at the partner firm was less than NT$150,000 (approximately USD 4700), achieved entirely through open-source software and publicly available e-commerce data. For SMEs considering digital transformation under capital constraints, the framework provides a phased adoption pathway: starting with data acquisition and demand sensing, and gradually integrating the semantic ontology and dashboard as operational familiarity increases. The increase in capacity utilization from 68% to 84% post-deployment—a 16-percentage-point improvement on a 1.2-million-meter annual production base—results in significant incremental revenue without additional capital expenditure. Additionally, the 28% reduction in average monthly inventory holding value (from NT$18.4 million to NT$13.2 million) released approximately NT$5.2 million in working capital per month, thereby strengthening the firm’s liquidity position amid demand volatility. This cost–benefit profile indicates that payback periods for similar implementations may be achievable within a single fiscal quarter.
For sustainability officers and ESG reporting teams, the framework produces audit-ready operational data—including changeover frequency trajectories, inventory turnover records, and capacity utilization logs—suitable for buyer-facing ESG reporting under internationally recognized standards. The 31% reduction in dye-lot changeovers supports disclosure under ISO 14001 (Environmental Management Systems) and Science-Based Targets (SBT) frameworks, which are increasingly required by European brand clients. Sustainability officers should establish pre-deployment baselines for water, energy, and chemical consumption per changeover event to enable rigorous pre- and post-implementation impact quantification in accordance with GRI Standard 303 (Water and Effluents) and GRI 302 (Energy). The system’s operational logs also provide verifiable evidence for Scope 3 supply chain emission disclosures, which are increasingly subject to third-party audit requirements under EU sustainability reporting regulations (CSRD). For firms seeking competitive advantage in global sustainable sourcing programs, this quantifiable evidence base constitutes a differentiated capability not achievable through non-data-driven production planning approaches.
For industrial policymakers and sector associations, this study demonstrates that the digital divide between upstream SMEs and large-brand supply chain ecosystems can be significantly reduced through the use of open-source tools and publicly available data. Government agencies, such as the Taiwan Textile Research Institute (TTRI) and the Industrial Development Bureau, are encouraged to develop shared C2M intelligence portals that aggregate e-commerce platform data across participating SME clusters, thereby lowering per-firm implementation costs through collective data governance. Regional industrial associations may adopt the framework’s architecture as a shared intelligence service for member firms, enabling collective downstream demand visibility without requiring each SME to independently establish data acquisition pipelines. These policy instruments would directly support the Ministry of Economic Affairs’ Smart Machinery and Sustainable Manufacturing Initiative, positioning Taiwan’s upstream textile sector as a model for Industry 5.0-aligned SME digitalization in export-oriented manufacturing economies.
Practitioners implementing this framework should consider two operational boundary conditions. First, the framework was validated for the upstream dyeing-and-finishing segment; application to downstream cut-and-sew or accessories manufacturing would require recalibration of the semantic ontology and proxy validation procedures. Second, the traffic-light percentile thresholds (P5, P20, P80, P95) are category-specific and should be recalibrated annually to reflect changes in platform-level demand structure, especially after significant algorithmic updates or post-pandemic demand normalization.
6. Conclusions
This study presents a validated, scalable pathway to enhance supply chain resilience in upstream textile small and medium-sized enterprises (SMEs) by integrating public e-commerce data into a human-centric Industry 5.0 framework. Through the development and field validation of a customer-to-manufacturer (C2M) decision-support system, this research demonstrates that unstructured consumer signals can be systematically converted into detailed, attribute-level intelligence regarding color, material, and functional properties relevant to upstream dyeing and finishing operations. Longitudinal industrial deployment results show that shifting from reactive, order-lagged planning to proactive, demand-informed decision-making yields substantial operational and ESG-aligned benefits. The implementation led to a 28% reduction in inventory value, a 31% decrease in dye-lot changeovers, and a 16% increase in capacity utilization. Rather than supplanting human expertise, the proposed framework acts as an intelligent noise filter, reducing information latency by approximately six to eight weeks and enabling shop-floor planners to prioritize significant market shifts. This research provides empirical support for Industry 5.0 principles in resource-constrained settings, illustrating how affordable digital intelligence can strengthen the resilience and sustainability of upstream manufacturing.
For upstream Taiwanese textile SMEs, particularly those engaged in dyeing and finishing operations supplying functional sportswear and outdoor apparel fabrics to global brands, this study provides four specific and actionable recommendations. First, production planning teams are advised to implement a monthly C2M intelligence cycle based on Shopee Taiwan consumer data, aligned with Northern Hemisphere seasonal athletic apparel purchasing patterns. Field evidence indicates that demand-signal latency peaks at 6 to 8 weeks under conventional order-lagged planning; adopting weekly data refresh cycles can reduce this latency to near real-time, enabling proactive dye-batch scheduling ahead of peak athletic seasons (March to May and September to November). Second, SME production managers can use the traffic-light dashboard to identify color–material combinations that sustain green-signal status over multiple months and use this information to negotiate minimum order quantities with upstream yarn and chemical suppliers, thereby consolidating dye batches. The observed 31% reduction in dye-lot changeovers at the partner SME decreased machine washdown frequency, with each eliminated changeover saving an estimated 800 to 1200 L of process water, directly contributing to quantifiable ESG outcomes aligned with ISO 14001 documentation requirements increasingly required by European brand clients, such as Nike and Patagonia. Third, the framework’s operational data, including changeover frequency, inventory turnover, and capacity utilization trajectories, provide audit-ready sustainability metrics suitable for buyer-facing ESG reporting, thereby enhancing the competitive positioning of Taiwan’s functional-fabric manufacturers in global sustainable sourcing programs. Fourth, the total implementation cost at the partner SME was below NT$150,000 (approximately USD 4700), achieved entirely with open-source software and publicly available data. This demonstrates that the transition from reactive order-following to proactive demand-informed planning is financially accessible to mid-tier dyeing firms without the need for proprietary IT investment.
Despite these contributions, this study has certain limitations. First, empirical validation was conducted on a single e-commerce platform within a localized market, which may constrain generalizability across different cultural, regulatory, or platform-specific contexts. Second, although consumer engagement metrics such as comment counts served as high-fidelity proxies for actual demand (r = 0.954), these remained indirect indicators and were subject to platform-specific algorithmic variations and promotional influences.
Future research should incorporate multi-platform data streams, including social media and global cross-border marketplaces, to improve the robustness and scope of demand sensing. Additionally, advancements in natural language processing and large language models may enable a more detailed extraction of functional nuances and granular consumer sentiment from unstructured text. The integration of online learning mechanisms could allow forecasting models to adapt dynamically to rapid regime shifts and non-stationary fashion cycles. For studies focusing on Taiwan’s upstream textile sector, it is important to assess whether the seasonal demand-signal patterns identified in this research, specifically the March to May and September to November athletic apparel procurement cycles, apply to other functional-fabric product segments, such as technical outerwear and medical-grade performance textiles, which are expanding export categories for Taiwanese suppliers. Finally, future studies should quantify environmental impacts with greater specificity by directly linking reduced setup frequencies to reductions in water, energy, and carbon footprints, with particular emphasis on the dyeing and finishing segment, given its significant contribution to Taiwan’s textile sector’s water consumption. This approach would further advance the sustainability objectives of Industry 5.0.