Next Article in Journal
Show Me All Writing Errors: A Two-Phased Grammatical Error Corrector for Romanian
Previous Article in Journal
An Enhanced Multi-Layer Blockchain Security Model for Improved Latency and Scalability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches

1
Department of Laboratory Medicine and Pathology, Walter Sisulu University, Private Bag X1, Mthatha 5100, South Africa
2
Department of Public Health, Faculty of Health Sciences, Walter Sisulu University, Private Bag X1, Mthatha 5100, South Africa
*
Author to whom correspondence should be addressed.
Information 2025, 16(3), 239; https://doi.org/10.3390/info16030239
Submission received: 28 January 2025 / Revised: 25 February 2025 / Accepted: 12 March 2025 / Published: 18 March 2025

Abstract

:
Latent tuberculosis infection (LTBI) poses a significant public health challenge, especially in populations with high HIV prevalence and limited healthcare access. Early detection and targeted interventions are essential to prevent the progression of active tuberculosis. This study aimed to identify the key factors influencing LTBI outcomes through the application of predictive models, including logistic regression and machine learning techniques, while also evaluating strategies to enhance LTBI awareness and testing. Data from rural areas in the Eastern Cape, South Africa, were analyzed to identify key demographic, health, and knowledge-related factors influencing LTBI outcomes. Predictive models utilized, included logistic regression, decision trees, and random forests, to identify key determinants of LTBI positivity based on demographic, health, and knowledge-related factors in rural areas of the Eastern Cape, South Africa. The models evaluated factors such as age, HIV status, and LTBI awareness, with random forests demonstrating the best balance of accuracy and interpretability. Additionally, a knowledge diffusion model was employed to assess the effectiveness of educational strategies in increasing LTBI awareness and testing uptake. Logistic regression achieved an accuracy of 68% with high precision (70%) but low recall (33%) for LTBI-positive cases, identifying age, HIV status, and LTBI awareness as significant predictors. The random forest model outperformed logistic regression in accuracy (59.26%) and F1-score (0.63), providing a better balance between precision and recall. Feature importance analysis revealed that age, occupation, and knowledge of LTBI symptoms were the most critical factors across both models. The knowledge diffusion model demonstrated that targeted interventions significantly increased LTBI awareness and testing, particularly in high-risk groups. While logistic regression offers more interpretable results for public health interventions, machine learning models like random forests provide enhanced predictive power by capturing complex relationships between demographics and health factors. These findings highlight the need for targeted educational campaigns and increased LTBI testing in high-risk populations, particularly those with limited awareness of LTBI symptoms.

1. Introduction

Latent tuberculosis infection (LTBI) is a global public health concern, particularly in regions with high tuberculosis (TB) prevalence and populations vulnerable to progressing from latent to active TB [1,2]. In many cases, individuals with LTBI are asymptomatic, yet the infection can develop into active TB if left untreated, leading to significant morbidity and mortality [3]. The World Health Organization (WHO) estimates that nearly one-quarter of the global population is infected with LTBI, with certain regions, particularly those with high HIV prevalence, being disproportionately affected [4]. In South Africa, where HIV co-infection rates are among the highest in the world, LTBI poses a serious risk to population health and healthcare systems [5]. Early detection and intervention are crucial in preventing the spread of TB and reducing the risk of progression to active disease. However, the silent nature of LTBI and the lack of widespread testing contribute to underdiagnoses [6]. Understanding the key determinants influencing LTBI prevalence and identifying high-risk groups is essential for designing effective public health interventions. In rural areas of the Eastern Cape, socio-economic challenges, limited healthcare access, and low awareness about LTBI further exacerbate the problem, making intentional interventions a priority [5]. The advancements in predictive modeling and machine learning offer valuable tools for identifying individuals at risk of LTBI [7,8,9]. Logistic regression models have traditionally been used in epidemiological studies due to their interpretability and ability to quantify relationships between risk factors and health outcomes [10,11]. However, machine learning techniques, such as decision trees and random forests, are increasingly being applied in public health research to capture complex, non-linear interactions between variables, offering potentially higher predictive accuracy [12,13,14]. The predictive models in this study provide significant application value in real-life scenarios. By identifying key determinants of LTBI positivity, such as age, HIV status, and LTBI awareness, these models enable public health authorities to prioritize high-risk populations for targeted screening programs, ensuring efficient allocation of limited healthcare resources. This study explores the key factors influencing LTBI outcomes by applying predictive models, including logistic regression and machine learning techniques, to assess the likelihood of LTBI positivity based on demographic, health, and knowledge-related factors in rural areas of the Eastern Cape. By comparing the performance of logistic regression, decision trees, and random forest models, this study highlights their respective strengths and limitations in predicting LTBI outcomes. Additionally, it employs a knowledge diffusion model to evaluate strategies for improving LTBI awareness and testing rates. The findings aim to provide actionable insights for public health strategies, particularly in high-risk communities, and contribute to broader efforts to control tuberculosis in resource-limited settings. The article is organized as follows: Section 2 describes the materials and methods, including data collection and model development processes. Section 3 presents the results, focusing on model application performance and the outcomes of the knowledge diffusion analysis. Section 4 discusses the implications of the findings for public health interventions. Finally, Section 5 concludes with recommendations for future research and strategies to enhance LTBI management.

2. Materials and Methods

2.1. Data Collection

Data were collected from a healthcare facility in rural areas in the Eastern Cape, focusing on demographic factors (age, gender, education, occupation), health status (HIV status, comorbidities), and survey responses related to LTBI awareness and testing. The dependent variable was LTBI test results (Positive/Negative), and the independent variables were age, gender, education, HIV status, comorbidities, and LTBI knowledge questions.

2.2. Integration of Predictive Modeling and Knowledge Diffusion

The methods employed in this study integrate predictive modeling and knowledge diffusion to address LTBI prevention comprehensively. Logistic regression and machine learning models are used to identify key determinants of LTBI positivity, such as age, HIV status, and awareness levels, enabling the identification of high-risk groups for targeted public health interventions, including screening and preventive therapy. Additionally, a knowledge diffusion model evaluates the impact of educational strategies on LTBI awareness and testing, facilitating earlier detection and treatment to prevent progression to active tuberculosis and reduce community transmission. These insights further inform policy and resource allocation by guiding public health authorities to direct testing and intervention efforts toward populations and areas with a higher likelihood of LTBI positivity, ensuring timely and effective responses to mitigate risks associated with untreated cases.

2.2.1. Logistic Regression Model

Logistic regression was used to estimate the likelihood of LTBI positivity, predicting LTBI outcomes based on demographic and health variables. Model performance was evaluated using accuracy, precision, recall, and F1-score. Odds ratios for each predictor were calculated to interpret their impact on LTBI positivity.

2.2.2. Machine Learning Models

Model comparison was conducted according to the performance of decision trees and random forests that were evaluated to assess their suitability for predicting LTBI outcomes, particularly in identifying complex interactions between risk factors. Accuracy, precision, recall, and F1-score were calculated for each model. Feature importance was analyzed to understand the influence of key variables such as age, knowledge level, and occupation.

2.3. Data Analysis

We utilized STATA v15 to perform data cleaning and basic descriptive statistics. As part of the data cleaning process, we converted categorical data into numerical data by applying numerical value labels based on a pre-established codebook.

2.4. Prediction Tools and Software

Both R Studio version 2022.02.3 Build 492 and R version 4.2.1 were utilized to develop machine learning classification algorithms. These software programs are freely available for data analytics. R is an open-source programming language that focuses on statistical analysis and data manipulation, while R Studio is an open-source integrated development environment (IDE) that provides a user-friendly graphical user interface (UI). Additionally, R Studio facilitates point-and-click interactions, making it easier for users to work with the R programming language.

2.5. Application of the Machine Learning Algorithms

We utilized R-Studio and the “caret” library, a widely recognized R machine learning package. The dataset was divided into 80% for training and 20% for testing. The data were shuffled before training to ensure that the distribution of features and outcomes was randomized, minimizing the risk of unintended biases in the training and testing splits. Shuffling was performed prior to dividing the dataset into 80% for training and 20% for testing. Furthermore, during the 10-fold cross-validation within the training set, the data were reshuffled before creating the folds to ensure that each fold contained a representative data distribution. This approach enhanced the reliability of the model evaluation by reducing potential biases and promoting more robust and generalizable results. Five algorithms, including support vector machines, AdaBoost, artificial neural networks, decision trees, and logistic regression, were constructed using the training dataset. Each algorithm underwent testing using the testing dataset. The dataset was initially split into 80% for training and 20% for testing. Within the 80% training set, a 10-fold cross-validation approach was employed, further dividing it into 90% for training and 10% for validation in each fold. The remaining 20% of the original dataset was completely held out throughout the training and validation process and was exclusively used for final model evaluation, ensuring that it served as an entirely unseen dataset for testing.

2.6. Evaluation of the Applied Models

The model’s performance was assessed using k-fold cross-validation. According to Trevor Hastie, cross-validation is a collection of techniques for evaluating a prediction model’s effectiveness using fresh test datasets. Cross-validation approaches work by splitting the data into two sets: the training set, which is used to create the model; and the testing set, also known as the validation set, which tests the model by calculating the prediction error. Using the repeated k-fold cross-validation approach, we randomly divided our dataset into k sets. We divided our data into tenfold equal datasets using this strategy. Nine-fold (90%) datasets were used to train the model, while the remaining one-fold (10%) dataset was utilized to assess the model’s performance. Ten-fold cross-validation was selected to achieve robust and unbiased performance estimates, mitigate overfitting, and make the most efficient use of the available data; this approach enhanced the reliability of the results and increased confidence in the generalizability of the models to new, unseen data. After that, we assessed the created model using the test dataset (20%) to verify its correctness and validity in light of the observations that were not visible. Based on the findings, we calculated the prediction error as the mean squared difference between the values of the anticipated and actual outcomes.

2.7. Performance Measure of the Applied Model

There are several ways to measure how well machine learning models perform. These consist of the receiver operating curve (ROC), accuracy, precision, F1-score, and recall. The number of positive and negative observations that the algorithm accurately classifies is known as accuracy. In a balanced classification task, accuracy is frequently employed when each class has equal importance to the researcher. Furthermore, recall determines the percentage of true positives accurately detected, whereas precision seeks to quantify the percentage of right identifications. The F1-score, on the other hand, takes the harmonic mean of recall and accuracy and merges them into a single statistic. Nonetheless, the majority of the time, the F1-score is utilized to determine the positive class.

2.8. Knowledge Diffusion Model

The model is the spread of LTBI knowledge in the population, simulating transitions from being unaware to aware and subsequently to testing. A compartmental model was adapted, using differential equations to track knowledge spread based on factors like education and barriers to action (e.g., financial constraints).

3. Results

This section reports the results from logistic regression and machine learning models (decision tree and random forest) used to predict LTBI outcomes from the survey of knowledge of LTBI (Table 1). The performance of these models was evaluated using accuracy, precision, recall, and F1-score. Additionally, feature importance analysis highlights the key determinants influencing LTBI test results, while knowledge diffusion model outcomes highlight the effectiveness of an LTBI awareness campaign and behavioral interventions over 12 months.

3.1. Performance of Three Machine Learning Models

The performance of logistic regression, decision tree, and random forest models was evaluated using four metrics: accuracy, precision, recall, and F1-score. Logistic regression demonstrated the highest accuracy (68%) and precision (70%), making it the most reliable model for minimizing false positives, although its recall (67%) indicates some missed positive cases. The decision tree model showed balanced performance across all metrics, with an accuracy of 65%, precision of 66%, and recall of 64%, highlighting a moderate trade-off between false positives and false negatives. The random forest model achieved an accuracy of 66% and precision of 65%, but its recall (63%) and F1-score (64%) were slightly lower compared to the other models. Overall, logistic regression outperformed the other models, offering the best accuracy and precision, making it the most effective choice for applications where reducing false positives is critical, while the decision tree and random forest provided consistent but slightly lower overall performance.

3.2. Logistic Regression Results

The logistic regression model achieved an accuracy of 68.0% and a precision of 70% for LTBI-positive cases but with a relatively low recall of 33%, indicating it missed a significant number of positive cases. The top predictors of LTBI positivity were completing a full course of treatments for LTBI as prescribed by a healthcare provider (1.27), HIV status (0.89), and employment status (0.68), with positive coefficients showing that these variables increased the likelihood of testing positive. Individuals responding affirmatively to completing a full course of treatments for LTBI as prescribed by a healthcare provider were 3.6 times more likely to test positive for LTBI, while higher education reduced the odds (odds ratio of 0.60), and negative predictors were responses to Q4_4B (“LTBI is a contagious form of tuberculosis that can be easily transmitted to others through respiratory droplets, while active TB is not contagious”) and Q8_8B (“Combination therapy with multiple antibiotics”) (indicating protective behaviors).
In the top 10 important features in logistic regression for LTBI prediction, the strongest positive predictor for LTBI positivity is completing a full course of treatments for LTBI as prescribed by a healthcare provider (coefficient: +1.27), indicating that individuals who provided this response, likely linked to a behavioral or knowledge-related factor, are significantly more likely to test positive. Conversely, if LTBI is a contagious form of tuberculosis that can be easily transmitted to others through respiratory droplets, active TB is not contagious (coefficient: −1.00) and has a strong negative influence, reducing the likelihood of LTBI positivity, potentially due to protective behaviors. Other important positive predictors include treatment recommendation of isoniazid (INH) monotherapy for 6 to 9 months (coefficient: +0.84) and occupation employed (coefficient: +0.68), which suggest higher risk associated with certain knowledge and work-related exposure. Negative predictors like treatment recommendation of combination therapy with multiple antibiotics (coefficient: −0.82) and treatment recommendation of high-dose antibiotics for a short duration (coefficient: −0.56) further decrease the likelihood of testing positive, possibly reflecting protective behaviors or better awareness. Responses such as believe that LTBI treatment is necessary, even if you do not have symptoms-Yes (coefficient: +0.59); and believe that LTBI treatment is necessary, even if you do not have symptoms-Not Sure (coefficient: +0.56) also increase LTBI positivity, reflecting uncertainty or gaps in knowledge; while age (coefficient: −0.48) shows that older individuals are less likely to test positive, possibly due to cohort exposure patterns or protective factors.

3.3. Decision Tree Model

A decision tree (Figure 1) has internal nodes representing decisions based on feature values, branches representing the outcomes of those decisions, and leaf nodes representing final classifications (in this case, whether the LTBI test result is positive or negative). The decision tree model begins with Q8 (“No treatment is necessary for LTBI”) as the root node, the most critical split in the tree. Respondents who believe treatment is unnecessary are more likely to be classified as LTBI-negative. In contrast, those who express concern about treatment are more likely to be classified as LTBI-positive. Following this, the tree further refines its predictions based on Q9 (“Are there any preventive measures individuals with LTBI should take to avoid developing active TB”) responses and age, where specific answers to Q9 (“Are there any preventive measures individuals with LTBI should take to avoid developing active TB?”) and older age groups are associated with a higher likelihood of testing positive for LTBI. At deeper levels, the model incorporates factors like Q5_5B (“close contact with someone with active TB”), which significantly increases the likelihood of an LTBI-positive result, consistent with known TB transmission risks. The tree culminates in leaf nodes representing the final classification based on all relevant factors. For instance, respondents who believe treatment is unnecessary, are younger, and have not had close contact with TB are classified as negative. In contrast, older respondents, those who express treatment concerns, or those with close TB contact are more likely to be classified as positive. Example path analysis: The tree starts with Q8 (“No treatment is necessary for LTBI”) to assess attitudes toward LTBI treatment. If the answer is “Yes,” the model predicts LTBI-negative, but if the answer is “No,” the tree examines Q9 (“Are there any preventive measures individuals with LTBI should take to avoid developing active TB”) and age to refine the classification. Finally, the model checks for Q18 (“Have you ever been in close contact with someone diagnosed with active TB?”), where a “Yes” greatly increases the probability of an LTBI-positive result.
The decision tree (Figure 2) highlights the significant splits that drive the model’s predictions, with the most important features identified in the top nodes. At the top of the tree, Q8 (“No treatment is necessary for LTBI”) emerges as the most critical split, representing respondents’ beliefs about whether LTBI treatment is needed. If a respondent answers “Yes,” the model tends to predict a negative test result, while a “No” response increases the likelihood of a positive classification. The next significant split is Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), which plays a key role in refining the model’s classification of individuals. Specific responses to Q9 (“Are there any preventive measures individuals with LTBI should take to avoid developing active TB?”) can either increase or decrease the probability of testing positive for LTBI. Alongside this, age appears as another critical factor, with older individuals generally more likely to be classified as LTBI-positive, reflecting known risk patterns. Further down the tree, Q5 (“Close contact with someone with active TB”) and Q3_3C (“An advanced stage of tuberculosis where the infection has spread to multiple organs”) also contribute to the model’s predictions. Respondents who report close contact with an active TB case are more likely to test positive for LTBI, reinforcing the influence of exposure history on risk. Responses to Q3 (“What do you understand by the term latent tuberculosis infection?”) help the model further refine predictions, much like Q9 (“Are there any preventive measures individuals with LTBI should take to avoid developing active TB?”), providing additional nuance to the classification process. The decision tree starts with a key split on Q8 (“No treatment is necessary for LTBI”). This indicates that respondents’ beliefs about LTBI treatment are a major factor in predicting test outcomes. The tree then incorporates further splits based on Q9 (“Are there any preventive measures individuals with LTBI should take to avoid developing active TB?”), age, and contact with TB patients, with age and exposure history contributing to the risk assessment. The decision tree’s structure is a reflection of the most predictive features, with the initial splits carrying the most weight in terms of determining the outcome. For instance, if a respondent believes no treatment is necessary Q8 (“No treatment is necessary for LTBI”), they are more likely to test negative, and if they have close contact with someone with active TB (Q18), they are more likely to test positive.
The top 10 most important features from the decision tree model for predicting LTBI outcomes highlight age as the most influential factor, with older individuals showing a higher likelihood of testing positive. Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), reflecting behavioral or knowledge-related factors, is the second most important, playing a crucial role in distinguishing positive from negative results. Other significant features include Q10_ 10A (“strongly agree”) and Q8_8B (“Combination therapy with multiple antibiotics”), both related to attitudes and knowledge about LTBI, as well as Q14_14A (“lack of awareness”), which indicates that individuals unaware of LTBI may be at higher risk. Additional features such as Q12 (“Do you believe that LTBI treatment is necessary, even if you do not have symptoms?”), Q16 (“If you tested for LTBI, did you receive treatment ?”), and Q17 (If you received treatment for LTBI, did you complete the entire course of medication?”) relate to health-seeking behavior and preventive measures, influencing the model’s predictions.

3.4. Random Forest

The random forest model achieved an accuracy of 66.0%, with age, knowledge, Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), and occupation identified as the top predictors. While the random forest outperformed decision trees in precision, it struggled with recall for LTBI-positive cases, indicating some limitations in detecting all true positives. Among the top 10 features influencing LTBI predictions, the top 5 important predictors, Q9_9D (“completing a full course of treatments for LTBI as prescribed by a healthcare provider”), had the largest positive coefficient (1.27), significantly increasing the likelihood of a positive LTBI result. This suggests that greater knowledge or awareness about LTBI symptoms strongly correlates with testing positive, indicating the need for targeted awareness campaigns. In contrast, Q4_4B (“LTBI is a contagious form of tuberculosis that can be easily transmitted to others through respiratory droplets, while active TB is not contagious”) had a negative coefficient (−1.00), suggesting it reflects protective behaviors that reduce LTBI risk, highlighting areas for further exploration in LTBI control strategies. Other significant predictors included Q8_8C (“Isoniazid (INH) monotherapy for 6 to 9 months”) (0.84), which increased the likelihood of LTBI positivity, potentially linked to health behaviors or knowledge gaps, and Q8_8B (“combination therapy with multiple antibiotics”) (−0.82), which reduced the odds of a positive result, possibly reflecting preventive behaviors. Finally, employment status (0.68) was positively associated with LTBI risk, implying that occupational exposure may play a role, and workplace interventions could be vital for controlling transmission.
When comparing the random forest and logistic regression models, several key features reveal differences in how each model predicts LTBI outcomes. Age is the most important feature in the random forest model, indicating that age plays a critical role in LTBI prediction, likely due to demographic patterns in infection or exposure. However, in logistic regression, age has a lower impact, suggesting that its relationship with LTBI may be more complex and better captured by the random forest model’s ability to handle non-linear relationships. For Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), while it is important in both models, it has a much larger impact in logistic regression, where it is the strongest predictor of LTBI positivity, significantly increasing the odds of testing positive. In contrast, its contribution in the random forest model is lower relative to other factors. Q10_10A (“Strongly agree”) shows moderate importance in both models, indicating that respondents who strongly agree with this statement are more likely to test positive, though it plays a smaller role compared to Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) in logistic regression. Q8_8B (“combination therapy with multiple antibiotics”) is an important predictor in both models, but more so in logistic regression, where its negative coefficient indicates that individuals responding in a certain way are less likely to test positive, suggesting protective behaviors. In random forest, it also contributes to predictions, though less strongly. Q14_14A (“Lack of awareness”) influences LTBI outcomes in both models. In logistic regression, it has a positive coefficient, showing that individuals reporting a lack of awareness are more likely to test positive, highlighting the link between knowledge gaps and infection risk. In the random forest model, it has some importance, indicating its role in increasing susceptibility.

3.5. Comparison of Logistic Regression, Random Forest, and Decision Tree Models

Our study compared the predictive accuracy of logistic regression, decision trees, and random forest models in determining LTBI outcomes based on demographic-, health-, and knowledge-related factors. Each model revealed strengths and weaknesses that provide valuable insights for both epidemiological understanding and public health interventions (Table 2). Logistic regression offers high interpretability and ease of explanation, with strong precision, making it suitable for understanding relationships between predictors and outcomes. However, it struggles with low recall, missing many LTBI-positive cases, and is limited in capturing complex, non-linear interactions. The decision tree model provides a simple, interpretable structure and effectively captures non-linear relationships. It also achieves better recall than logistic regression but is prone to overfitting, resulting in lower generalizability, and has lower precision with a higher rate of false positives. In contrast, the random forest model demonstrates superior overall accuracy and F1-score, effectively handling complex interactions while offering insights into feature importance. However, its ensemble structure reduces interpretability, and it struggles with recall for LTBI-positive cases.
In the logistic regression model for predicting LTBI outcomes, Q9_9D (Completing LTBI treatment) emerged as the most influential predictor, contributing 27.5% to the likelihood of a positive result, underscoring the importance of adherence to prescribed LTBI treatment. Q4_4B (misconception about LTBI contagion) and Q8_8C (INH monotherapy for 6–9 months) also significantly impact LTBI positivity, with relative importances of 21.7% and 18.2%, respectively, highlighting the role of specific knowledge and beliefs in shaping LTBI outcomes. Additionally, Occupation: Employed and Q8_8B (Combination therapy) contribute 14.8% and 17.8%, suggesting that occupational exposure and awareness of treatment options further influence LTBI risk.

3.6. Knowledge Diffusion Model Outcomes

Simulation models (Figure 3) noted the effectiveness of an LTBI awareness campaign and behavioral interventions over 12 months, showing how the population transitions from being unaware to aware and ultimately taking action, such as being tested or treated. As awareness campaigns reach more individuals, the unaware population gradually decreases. The aware population initially increases as people learn about LTBI but then declines as individuals move from awareness to action. Meanwhile, the taken action population steadily grows as more people are tested or treated after becoming aware.
Targeted interventions lead to faster transitions in the high-risk group, enabling them to move quickly from being unaware to taking action, such as getting tested or treated for LTBI. In contrast, the general population responds more slowly to broader awareness campaigns, progressing at a more gradual pace. Overall, targeted interventions significantly accelerate action among high-risk individuals, highlighting the importance of focusing on those most at risk while maintaining broad awareness efforts to engage the general population (Figure 4).
With faster testing rates, both the general population and high-risk groups transition more quickly from awareness to action, though the high-risk group responds significantly faster. Targeted interventions combined with faster testing led to a dramatic reduction in the number of unaware and aware individuals in the high-risk group as they take action more rapidly. While the general population also shows quicker transitions, the response is less pronounced compared to the high-risk group, highlighting the greater effectiveness of targeted interventions for those at higher risk (Figure 5).
The analysis of both logistic regression and random forest models, Figure 6 highlights the varied importance of predictors in determining the likelihood of LTBI positivity. LTBI knowledge Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) holds the highest importance in the logistic regression model, suggesting that individuals who are aware of LTBI symptoms have a higher probability of testing positive. However, its importance is comparatively lower in the random forest model, indicating that the complex nature of this predictor might not be fully captured by simpler linear relationships but more so by other variables in combination. HIV status emerges as a significant predictor in both models, but its effect is stronger in logistic regression. This underscores the elevated risk of LTBI in HIV-positive individuals, where logistic regression’s interpretability gives a clearer indication of the magnitude of this association. Occupation is more influential in the random forest model, pointing to the complexity of occupational exposure as a risk factor. Random forest, a machine learning technique, likely captures the intricate interactions between occupation and other variables more effectively than logistic regression. Protective behaviors Q4_4B (“LTBI is a contagious form of tuberculosis that can be easily transmitted to others through respiratory droplets, while active TB is not contagious”) exhibit a negative coefficient in logistic regression, meaning that practicing these behaviors lowers the odds of LTBI positivity. In the random forest model, this factor plays a moderate role, again showing how different models treat predictors differently.
Quicker dissemination of information leads to a higher proportion of individuals taking action sooner. This underlines the importance of timely and effective public health interventions to prevent the spread of LTBI and ensure early treatment. When awareness spreads slowly, a significant portion of the population remains inactive, which can be detrimental. This delay increases the risk of LTBI progressing to active tuberculosis in untreated individuals, stressing the need for continuous and far-reaching awareness campaigns (Figure 7).

4. Discussion

Our study aimed to apply predictive models for LTBI outcomes using logistic regression, decision trees, and random forests. The key findings indicate that logistic regression provided higher precision in predicting LTBI-positive cases than the decision tree model, and the random forest model demonstrated better overall accuracy and offered deeper insights into feature importance, particularly highlighting the role of demographic and knowledge-based factors. The model simulation further showed that targeted education campaigns led to a gradual increase in LTBI awareness and testing among high-risk groups, underscoring the positive impact of interventions. However, significant barriers were identified, including financial constraints and a lack of awareness, which hindered the progression from awareness to action. Addressing these barriers is crucial for improving LTBI testing and treatment rates.
The decision tree performed well but lagged slightly behind the random forest. The logistic regression model had the highest recall rate, while the random forest missed more positive LTBI cases. The logistic regression achieved 68.0% accuracy and 70% precision for positive cases. Key factors that increased the likelihood of testing positive included completing healthcare treatments, HIV status, and employment status. Individuals who completed LTBI treatment are 3.6 times more likely to test positive, while higher education lowers these odds. Responses to questions Q4_4B and Q8_8B showed protective behaviors, stating that “LTBI can be easily transmitted, while active TB is not contagious”.
The study analyzed factors affecting LTBI outcomes using logistic regression, decision trees, and random forest models. It determined that completing the prescribed treatment course is the strongest predictor of positive LTBI results, with increased knowledge about symptoms also linked to positive testing. Employment status was associated with higher risk due to occupational exposure. The random forest model outperformed the decision tree in accuracy and F1-score, highlighting age as a key predictor. Other significant factors included agreement with treatment necessity, combination therapy, and lack of awareness. Targeted interventions were more effective for high-risk individuals, facilitating quicker testing and treatment. The analysis underscored the varying significance of predictors, including the heightened risk for HIV-positive individuals and the complexities of occupational exposure.

4.1. Interpretation of Applied Model Performance

Logistic regression achieved 68% accuracy and 70% precision for LTBI-positive cases but had a low recall of 33%, missing many true positives. It is useful for understanding risk factors but is not ideal for comprehensive detection. The random forest model outperformed it by identifying key features such as age and treatment attitudes, using an ensemble approach that reduced overfitting. The decision tree model had lower accuracy (55.56%) but better recall (42%), making it good at identifying true positives despite high false positives. Overall, random forest provided the best balance of precision and recall, useful for informing targeted public health interventions.
As to latent tuberculosis infection and its association with various demographic and occupational factors, machine learning models—and particularly decision trees and random forests—have identified age, knowledge of LTBI symptoms, and occupation as significant predictors of LTBI positivity. Studies have shown that older adults and individuals with lower awareness of LTBI symptoms are more likely to test positive for LTBI. This correlation is particularly pronounced in healthcare workers (HCWs), who often work in high-exposure environments. Age has been identified as a critical factor influencing LTBI risk. Research indicates that older individuals have a higher likelihood of LTBI positivity, which may be attributed to cumulative exposure over time and a potentially waning immune response to mycobacterium tuberculosis [15,16]. Furthermore, knowledge about LTBI symptoms plays a pivotal role in the likelihood of testing positive. Individuals with limited awareness may not seek testing or treatment, thereby increasing their risk of harboring LTBI without appropriate intervention [17]. Occupation is another significant determinant of LTBI risk, especially in high-exposure settings such as healthcare facilities. Studies have shown that HCWs are at an elevated risk for LTBI due to their frequent contact with TB patients [18,19]. The nature of their work often involves prolonged exposure to infectious agents, which significantly increases their likelihood of contracting LTBI compared to individuals in lower-risk occupations [15,18]. For instance, a systematic review highlighted those occupational factors, particularly those involving direct contact with TB patients, were significantly associated with LTBI among healthcare workers [18,19]. Moreover, the interplay between these factors suggests that targeted interventions could be beneficial. For example, enhancing awareness and education about LTBI symptoms among older adults and healthcare workers could lead to earlier detection and treatment, thereby reducing the overall burden of LTBI in these populations [20]. Additionally, implementing regular screening protocols in high-exposure occupations could further mitigate the risk of LTBI transmission and progression to active TB disease [21].

4.2. Feature Importance and Implications for LTBI Risk

The analysis of feature importance in predicting the risk of latent tuberculosis infection (LTBI) produced consistent results across both logistic regression and machine learning models. The logistic regression model identified age, HIV status, and responses to LTBI knowledge questions—such as Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”)—as the most significant predictors of LTBI positivity. This suggests that older individuals, those with HIV, and people with limited knowledge about LTBI are at a higher risk, emphasizing the need for targeted interventions aimed at educating these vulnerable groups. Similarly, the random forest model also confirmed that age is the most influential factor, followed by responses to Q9_9D and occupation status. The alignment between the two models reinforces the critical importance of demographic factors and LTBI knowledge in determining infection risk. These findings indicate that public health efforts should prioritize demographic risk groups, such as older adults and those in high-risk occupations, while also implementing educational campaigns to increase awareness about the symptoms and risks associated with LTBI [22,23].

4.3. Discussion of the Knowledge Diffusion Model

The knowledge diffusion model simulation demonstrated that targeted interventions boosted LTBI awareness from 45% to 65% within six months. Despite this, financial constraints and awareness barriers limited individual progression to testing. A 30% increase in educational programs could lead to a 20% rise in testing among informed individuals, emphasizing the need for expanded outreach. In machine learning model comparisons, the decision tree model achieved 55.56% accuracy and an F1-score of 0.45, while the random forest model outperformed it with 59.26% accuracy and an F1-score of 0.63, despite lower recall for LTBI-positive cases (25%). Key predictors included responses to questions about Isoniazid treatment and risk factors like close contact with TB patients.
Logistic regression showed strong precision (80%) but low recall (33%), making it less effective in identifying LTBI-positive cases. In contrast, the random forest model balanced accuracy and F1-score, making it the most robust overall. The findings highlight the importance of targeting older populations, employed individuals, and those with low LTBI symptom knowledge in intervention programs. To enhance LTBI detection and awareness, targeted interventions like educational campaigns are recommended to address knowledge gaps and reduce financial barriers. Focusing on high-risk groups and eliminating cost obstacles could improve testing rates and public health outcomes. The study highlights key management barriers, including financial constraints and a lack of awareness among at-risk populations. A knowledge diffusion model is used to simulate the spread of information about LTBI symptoms, testing, and treatment, incorporating factors like social networks and communication channels.
The simulation was designed to assess the impact of these interventions over six months. This increase is attributed to the effective dissemination of information through community health workers, social media campaigns, and educational workshops tailored to specific populations, particularly those at higher risk for LTBI. The results underscore the importance of addressing barriers to LTBI awareness and testing [24]. Key barriers identified include financial constraints; many individuals may not seek testing due to the costs associated with healthcare services, including consultations and diagnostic tests [25,26,27]. Providing financial assistance programs and insurance coverage for LTBI testing could help overcome this obstacle. A significant portion of the population is unaware of LTBI and its implications due to lack of awareness. Educational interventions targeting high-risk groups, such as healthcare workers, immigrants from high-burden countries, and individuals with compromised immune systems, are crucial for increasing awareness [28,29,30].

4.4. Public Health Implications

The public health implications of the findings emphasize the need for targeted interventions to address both high-risk groups and knowledge gaps. Given that age and employment status were significant predictors of LTBI positivity, public health campaigns should prioritize older adults and individuals working in high-exposure environments, such as healthcare and crowded workplaces. By focusing on targeted testing and awareness initiatives in these groups, the number of undiagnosed LTBI cases could be significantly reduced. Additionally, the strong association between LTBI knowledge, as reflected in responses to Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”), and test positivity suggests that increasing awareness about LTBI symptoms and risks is crucial. Educational programs tailored for communities with low awareness levels could assist in the early detection and treatment of infections, ultimately lowering overall infection rates. By addressing these knowledge gaps through targeted partnerships, we can significantly enhance public health efforts to control LTBI in underserved areas and improve overall knowledge.

4.5. Model Suitability and Practical Applications

While random forests provide greater accuracy and feature importance insights, logistic regression offers more interpretable results that are easier for policymakers to act upon. For public health interventions aimed at understanding risk factors and designing clear action steps, logistic regression is the preferred model despite its lower recall. The application of machine learning in public health, particularly through models like random forests, offers significant potential for large-scale population health monitoring, especially when it is crucial to capture complex interactions between variables. However, there is a tradeoff between model accuracy and interpretability that must be carefully weighed when using these models in public health decision-making. The analysis of key features revealed that positive coefficients, such as those related to occupation status (e.g., being employed), as well as responses to specific questions like Q9_9D (“Completing a full course of treatments for LTBI as prescribed by a healthcare provider”) and Q8_8C (What are the recommended treatments for LTBI?”), increase the likelihood of LTBI positivity. These factors suggest behaviors, conditions, or demographics associated with a higher risk of infection. Conversely, negative coefficients, such as age, Q4_4B (“LTBI is a contagious form of tuberculosis that can be easily transmitted to others through respiratory droplets, while active TB is not contagious”), and Q8_8A (“High-dose antibiotics for a short duration”), reduce the likelihood of LTBI positivity, potentially indicating protective factors or behaviors. Overall, the findings highlight the strong influence of occupation, age, and specific knowledge and behavior-related responses on LTBI outcomes, underscoring the importance of these factors in predicting infection risk and informing targeted interventions.
The importance of targeted educational interventions in enhancing awareness and testing rates for LTBI has been underscored by various studies employing knowledge diffusion model simulations [31,32]. These simulations have demonstrated that such interventions can lead to significant improvements in public health outcomes, particularly in populations at risk for LTBI [33,34]. However, key barriers, including financial constraints and a general lack of awareness among the target populations [35,36], often hinder the implementation of these interventions. Knowledge diffusion models are theoretical frameworks that describe how information spreads within a population. These models can simulate the impact of educational interventions on awareness and behavior change regarding LTBI. For instance, a study by Hermes et al. [15] utilized a knowledge diffusion model to assess the effects of targeted educational campaigns on LTBI awareness among healthcare workers and high-risk populations. Despite the potential benefits of educational interventions, several barriers impede their effectiveness. Financial constraints are a significant hurdle, particularly in low- and middle-income countries (LMICs) where healthcare resources are limited [37,38]. Many individuals may not have access to free or subsidized testing services, which can deter them from seeking LTBI screening [16,39]. Additionally, the lack of awareness about LTBI symptoms and the importance of testing contributes to low testing rates. Many individuals may not recognize the risk factors associated with LTBI or may not understand the implications of a positive test result [17].
In our study, we observed that when there is a slow awareness, the population takes more time to move from being unaware to taking action. By the end of the 12 months, only a moderate proportion of the population has taken steps like being tested or treated. This slow uptake suggests that extended periods of low awareness can hinder timely public health responses. A medium rate of awareness diffusion leads to quicker recognition of LTBI-related information, prompting a faster transition to action. A larger share of the population takes action within the same timeframe compared to the slow awareness. This emphasizes how even moderate improvements in awareness campaigns can lead to more effective health outcomes. Fast and rapid dissemination of awareness, driven by aggressive campaigns or interventions, leads to a swift and significant increase in the population that takes action. By the end of the 12 months, more of the population has been tested or treated compared to the slow and medium scenarios. This rapid response highlights the value of efficient public health strategies to raise awareness and prompt preventive actions quickly. To effectively increase LTBI awareness and testing rates, it is crucial to address these barriers through targeted interventions [40,41,42]. Financial assistance programs, community outreach initiatives, and educational campaigns structured to specific demographics can help mitigate financial constraints and enhance awareness [43,44,45]. For example, a study by Apriani et al. [18] highlighted the effectiveness of community health worker-led educational sessions in increasing LTBI knowledge and testing rates among underserved populations.

4.6. Limitations

This study has several limitations. As a pilot study conducted in rural areas of the Eastern Cape, the findings may not be fully generalizable to other regions or populations. Additionally, self-reported data on LTBI awareness and testing may be subject to recall bias. While logistic regression and machine learning approaches provided valuable insights, future research could explore additional predictive models to refine risk stratification further.

4.7. Recommendation

The findings of this study suggest the need for targeted educational campaigns and increased testing for LTBI in high-risk populations, particularly among those who are unaware of LTBI symptoms. The study also recommends that future research should collect data from a larger and more diverse population to improve the dataset for various rural areas in South Africa. This will aid in analyzing and identifying the key demographics, health factors, and knowledge-related issues that influence LTBI outcomes. Furthermore, the study emphasizes that the Department of Health, in collaboration with healthcare providers and other stakeholders, should strengthen educational programs and raise awareness about LTBI, particularly among all age groups and disadvantaged populations residing in crowded living conditions.
The random forest model demonstrated improved accuracy; however, its complexity limits its interpretability, which is crucial for decision-making in public health. Additionally, all models displayed low recall for LTBI-positive cases, highlighting the need to enhance LTBI detection in future models. It is important to note that this study was conducted solely in the Oliver Reginald (O.R.) Tambo District and not all clinics in the municipality were included. Due to time and financial constraints, this study was limited to one clinic in this district. In the future, as resources permit, this study will be expanded to include other areas, as LTBI impacts various regions across all provinces of South Africa.

5. Conclusions

This study evaluated the effectiveness of three machine-learning models (random forest, decision tree, and logistic regression) using the following metrics: F1-score, accuracy, precision, and recall. With improved accuracy and a better trade-off between recall and precision, random forest fared better than the other models. The integration of machine learning models in identifying key predictors of LTBI, such as age, knowledge of symptoms, and occupational exposure, underscores the importance of tailored public health strategies. These strategies should focus on increasing awareness, improving screening practices, and ensuring that high-risk populations receive appropriate interventions to manage and reduce the incidence of LTBI. Quick dissemination of information about LTBI encourages early action, highlighting the need for effective public health interventions to prevent its spread and ensure early treatment, thereby reducing the risk of untreated individuals. Our study provides significant practical value, particularly in resource-limited settings like rural Eastern Cape, by offering predictive models that efficiently identify high-risk individuals for LTBI based on demographic, health, and knowledge-related factors, enabling better resource allocation, targeted educational campaigns, and enhanced screening efforts. Unlike traditional approaches, the integration of machine learning models (decision trees and random forests) in this research captures complex, non-linear interactions with superior predictive accuracy and balanced performance, while the novel application of a knowledge diffusion model emphasizes the effectiveness of educational interventions in driving awareness and proactive behavior, ultimately informing targeted and impactful public health strategies. Future research could focus on expanding the dataset scope by collecting larger and more diverse datasets from various geographic regions to validate the findings and explore additional demographic, health, and environmental factors influencing LTBI outcomes. Additionally, optimizing hyper-parameters for predictive models, rather than relying on default settings, could enhance their accuracy and applicability in public health contexts. Furthermore, integrating predictive models with real-time health information systems holds promise for facilitating proactive interventions, enabling more effective management and control of LTBI in high-risk populations.

Author Contributions

Conceptualization, L.M.F. and C.M.; methodology, L.M.F. and C.M.; validation, L.M.F. and N.D.; formal analysis, L.M.F. and N.D.; investigation, L.M.F. and C.M.; resources, L.M.F.; data curation, C.M. and L.M.F.; writing—original draft preparation L.M.F., N.D. and C.M.; writing—review and editing, N.D. and T.A.; visualization, L.M.F.; supervision, T.A.; project administration, C.M. and L.M.F.; funding acquisition, T.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding; Walter Sisulu University provided funding for the APC.

Institutional Review Board Statement

This study was conducted following the Declaration of Helsinki and was approved by the Research Ethics and Biosafety Committee of the Faculty of Medicine and Health Sciences of Walter Sisulu University (ref. no. 084/2024) and Eastern Cape Department of Health (ref. No. EC_202409_008).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data can be requested from the corresponding author.

Acknowledgments

The authors are grateful to the healthcare professionals in the gateway clinic where the patients were recruited and enrolled in this study. To our colleagues Ncomeka Sineke, Thulani Gumede, and Eric Nombekela, thank you for your support during clinic recruitments, enrolments, and laboratory activities. Sizwe Dlamini, thank you for assisting with data management.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Xiao, X.; Chen, J.; Jiang, Y.; Li, P.; Li, J.; Lu, L.; Zhao, Y.; Tang, L.; Zhang, T.; Wu, Z.; et al. Prevalence of latent tuberculosis infection and incidence of active tuberculosis in school close contacts in Shanghai, China: Baseline and follow-up results of a prospective cohort study. Front. Cell. Infect. Microbiol. 2022, 12, 1000633. [Google Scholar] [CrossRef]
  2. World Health Organization (WHO). Global Tuberculosis Report; World Health Organization: Geneva, Switzerland, 2021; Available online: https://www.who.int/publications/i/item/9789240062030 (accessed on 14 November 2024).
  3. Yoopetch, P.; Wu, O.; Jittikoon, J.; Thavorncharoensap, M.; Youngkong, S.; Praditsitthikorn, N.; Mahasirimongkol, S.; Anothaisintawee, T.; Udomsinprasert, W.; Chaikledkaew, U. Economic evaluation of diagnosis and treatment for latent tuberculosis infection among contacts of pulmonary tuberculosis patients in Thailand. Sci. Rep. 2024, 14, 17693. [Google Scholar] [CrossRef]
  4. World Health Organization (WHO). Global Tuberculosis Report; World Health Organization: Geneva, Switzerland, 2020; Available online: https://www.who.int/publications/i/item/9789240013131 (accessed on 15 November 2024).
  5. Stop TB Partnership. UNHLM on TB: Key Targets and Commitments; STOP, TB Partnership: Geneva, Switzerland, 2020; Available online: http://www.stoptb.org/global/advocacy/unhlm_targets.asp (accessed on 16 November 2024).
  6. Velleca, M.; Malekinejad, M.; Miller, C.; Abascal Miguel, L.; Reeves, H.; Hopewell, P.; Fair, E. The yield of tuberculosis contact investigation in low- and middle-income settings: A systematic review and meta-analysis. BMC Infect. Dis. 2021, 21, 1011. [Google Scholar] [CrossRef]
  7. Luo, Y.; Xue, Y.; Liu, W.; Song, H.; Huang, Y.; Tang, G.; Wang, F.; Wang, Q.; Cai, Y.; Sun, Z. Development of diagnostic algorithm using machine learning for distinguishing between active tuberculosis and latent tuberculosis infection. BMC Infect. Dis. 2022, 22, 965. [Google Scholar] [CrossRef]
  8. Nyachama, K. Effectiveness of recommender systems in knowledge discovery. Eur. J. Inf. Knowl. Manag. 2024, 3, 50–62. [Google Scholar] [CrossRef]
  9. Chen, L.; Yuan, L.; Sun, T.; Liu, R.; Huang, Q.; Deng, S. The performance of vcs (volume, conductivity, light scatter) parameters in distinguishing latent tuberculosis and active tuberculosis by using a machine learning algorithm. BMC Infect. Dis. 2023, 23, 881. [Google Scholar] [CrossRef]
  10. Murri, R.; De Angelis, G.; Antenucci, L.; Fiori, B.; Rinaldi, R.; Fantoni, M.; Masciocchi, C. A machine learning predictive model of bloodstream infection in hospitalized patients. Diagnostics 2024, 14, 445. [Google Scholar] [CrossRef]
  11. Stoltzfus, J. Logistic regression: A brief primer. Acad. Emerg. Med. 2011, 18, 1099–1104. [Google Scholar] [CrossRef]
  12. Mercurio, G.; Gottardelli, B.; Lenkowicz, J.; Patarnello, S.; Bellavia, S.; Scala, I.; Frisullo, G. A novel risk score predicting 30-day hospital re-admission of patients with acute stroke by machine learning model. Eur. J. Neurol. 2023, 31, e16153. [Google Scholar] [CrossRef]
  13. Gong, W.; Wu, X. Differential diagnosis of latent tuberculosis infection and active tuberculosis: A key to a successful tuberculosis control strategy. Front. Microb. 2021, 12, 745592. [Google Scholar] [CrossRef]
  14. Gichuhi, H.W.; Magumba, M.; Kumar, M.; Mayega, R.W. A machine learning approach to explore individual risk factors for tuberculosis treatment non-adherence in Mukono district. PLoS Glob. Public Health 2023, 3, e0001466. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  15. Hermes, L.; Kersten, J.; Nienhaus, A.; Schablon, A. Risk analysis of latent tuberculosis infection among health workers compared to employees in other sectors. Int. J. Environ. Res. Public Health 2020, 17, 4643. [Google Scholar] [CrossRef]
  16. Adams, S.; Ehrlich, R.; Baatjies, R.; Zyl-Smit, R.; Said-Hartley, Q.; Dawson, R.; Dheda, K. Incidence of occupational latent tuberculosis infection in South African healthcare workers. Eur. Respir. J. 2015, 45, 1364–1373. [Google Scholar] [CrossRef]
  17. Meregildo-Rodriguez, E. Latent Tuberculosis Infection (LTBI) in healthcare workers: A cross-sectional study at a northern Peruvian Hospital. Front. Med. 2023, 10, 1295299. [Google Scholar] [CrossRef]
  18. Apriani, L.; McAllister, S.; Sharples, K.; Alisjahbana, B.; Ruslami, R.; Hill, P.; Menzies, D. Latent tuberculosis infection in healthcare workers in low- and middle-income countries: An updated systematic review. Eur. Respir. J. 2019, 53, 1801789. [Google Scholar] [CrossRef]
  19. Kinikar, A.; Chandanwale, A.; Kadam, D.; Joshi, S.; Basavaraj, A.; Pardeshi, G.; Mave, V. High risk for latent tuberculosis infection among medical residents and nursing students in India. PLoS ONE 2019, 14, e0219131. [Google Scholar] [CrossRef]
  20. Nasreen, S.; Shokoohi, M.; Malvankar-Mehta, M. Prevalence of latent tuberculosis among health care workers in high burden countries: A systematic review and meta-analysis. PLoS ONE 2016, 11, e0164034. [Google Scholar] [CrossRef]
  21. Stewart, R.; Tsang, C.; Pratt, R.; Price, S.; Langer, A. Tuberculosis—United States, 2017. MMWR Morb. Mortal. Wkl. Rep. 2018, 67, 317–323. [Google Scholar] [CrossRef]
  22. Wong, Y.J.; Ng, K.Y.; Lee, S.W.H. How can we improve latent tuberculosis infection management using behavior change wheel: A systematic review. J. Public Health 2023, 45, e447–e466. [Google Scholar] [CrossRef]
  23. Ayakaka, I.; Ackerman, S.; Ggita, J.M.; Kajubi, P.; Dowdy, D.; Haberer, J.E.; Fair, E.; Hopewell, P.; Handley, M.A.; Cattamanchi, A.; et al. Identifying barriers to and facilitators of tuberculosis contact investigation in Kampala, Uganda: A behavioral approach. Implement. Sci. 2017, 12, 33. [Google Scholar] [CrossRef]
  24. World Health Organization. Global Tuberculosis Report; World Health Organization: Geneva, Switzerland, 2022; Available online: https://cdn.who.int/media/docs/default-source/hq-tuberculosis/global-tuberculosis-report2022/global-tb-report-2022-factsheet.pdf.24 (accessed on 17 November 2024).
  25. Pradipta, I.S.; Idrus, L.R.; Probandari, A.; Lestari, B.W.; Diantini, A.; Alffenaar, J.W.C.; Hak, E. Barriers and strategies to successful tuberculosis treatment in a high-burden tuberculosis setting: A qualitative study from the patient’s perspective. BMC Public Health 2021, 21, 1–12. [Google Scholar] [CrossRef]
  26. Matakanye, H.; Tshitangano, T.G.; Mabunda, J.T.; Maluleke, T.X. Knowledge, Beliefs, and Perceptions of TB and Its Treatment amongst TB Patients in the Limpopo Province, South Africa. Int. J. Environ. Res. Public Health 2021, 18, 10404. [Google Scholar] [CrossRef]
  27. Kigozi, G.; Heunis, C.; Chikobvu, P.; Botha, S.; van Rensburg, D. Factors influencing treatment default among tuberculosis patients in a high burden province of South Africa. Int. J. Infect. Dis. 2017, 54, 95–102. [Google Scholar] [CrossRef]
  28. Shamputa, I.C.; Law, M.A.; Kelly, C.; Nguyen, D.T.K.; Burdo, T.; Umar, J.; Barker, K.; Webster, D. Tuberculosis related barriers and facilitators among immigrants in Atlantic Canada: A qualitative study. PLoS Glob. Public Health 2023, 3, e0001997. [Google Scholar] [CrossRef]
  29. Zawedde-Muyanja, S.; Manabe, Y.C.; Cattamanchi, A.; Castelnuovo, B.; Katamba, A. Patient and health system level barriers to and facilitators for tuberculosis treatment initiation in Uganda: A qualitative study. BMC Health Serv. Res. 2022, 22, 831. [Google Scholar] [CrossRef]
  30. Meaza, A.; Tola, H.H.; Eshetu, K.; Mindaye, T.; Medhin, G.; Gumi, B. Tuberculosis among refugees and migrant populations: Systematic review. PLoS ONE 2022, 17, e0268696. [Google Scholar] [CrossRef]
  31. Yousif, K.; Ei Maki, M.; Babikir, R.K.; Abuaisha, H. The effect of an educational intervention on awareness of various aspects of pulmonary tuberculosis in patients with the disease. East. Mediterr. Health J. 2021, 27, 287–292. [Google Scholar] [CrossRef]
  32. Wu, T.; He, H.; Wei, S.; Pan, J.; Yang, J.; Huang, S.; Gan, S.; Ye, C.; Huo, H.; Tang, Z.; et al. How to optimize tuberculosis health education in college under the new situation? Based on a cross-sectional study among freshmen of a medical college in Guangxi, China. Front. Public Health 2022, 10, 845822. [Google Scholar] [CrossRef]
  33. Subbaraman, R.; Nathavitharana, R.R.; Mayer, K.H.; Satyanarayana, S.; Chadha, V.K.; Arinaminpathy, N.; Pai, M. Constructing care cascades for active tuberculosis: A strategy for program monitoring and identifying gaps in quality of care. PLoS Med. 2019, 16, e1002754. [Google Scholar] [CrossRef]
  34. Naidoo, P.; Theron, G.; Rangaka, M.X.; Chihota, V.N.; Vaughan, L.; Brey, Z.O.; Pillay, Y. The South African Tuberculosis Care Cascade: Estimated Losses and Methodological Challenges. J. Infect. Dis. 2017, 216, S702–S713. [Google Scholar] [CrossRef] [PubMed]
  35. Hanson, C.; Osberg, M.; Brown, J.; Durham, G.; Chin, D.P. Finding the Missing Patients with Tuberculosis: Lessons Learned from Patient-Pathway Analyses in 5 Countries. J. Infect. Dis. 2017, 216, S686–S695. [Google Scholar] [CrossRef]
  36. Mwangwa, F.; Chamie, G.; Kwarisiima, D.; Ayieko, J.; Owaraganise, A.; Ruel, T.D.; Plenty, A.; Tram, K.H.; Clark, T.D.; Cohen, C.R.; et al. Gaps in the Child Tuberculosis Care Cascade in 32 Rural Communities in Uganda and Kenya. J. Clin. Tuberc. Other Mycobact. Dis. 2017, 9, 24–29. [Google Scholar] [CrossRef] [PubMed]
  37. Harries, A.D.; Lin, Y.; Kumar, A.M.V.; Satyanarayana, S.; Takarinda, K.C.; Dlodlo, R.A.; Zachariah, R.; Olliaro, P. What can National TB Control Programmes in low- and middle-income countries do to end tuberculosis by 2030? F1000Research 2018, 7, F1000–Faculty. [Google Scholar] [CrossRef]
  38. Spruijt, I.; Haile, D.T.; van den Hof, S.; Fiekert, K.; Jansen, N.; Jerene, D.; Klinkenberg, E.; Leimane, I.; Suurmond, J. Knowledge, attitudes, beliefs, and stigma related to latent tuberculosis infection: A qualitative study among Eritreans in the Netherlands. BMC Public Health 2020, 20, 1602. [Google Scholar] [CrossRef]
  39. Campbell, J.I.; Menzies, D. Testing and Scaling Interventions to Improve the Tuberculosis Infection Care Cascade. J. Pediatr. Infect. Dis. Soc. 2022, 11, S94–S100. [Google Scholar] [CrossRef]
  40. Khan, A.A.; Awan, M.S. Barriers to tuberculosis screening: A qualitative study in a low-income setting. Int. J. Tuberc. Lung Dis. 2019, 23, 579–586. [Google Scholar] [CrossRef]
  41. Baker, M.G.; Firth, M. The impact of educational interventions on tuberculosis awareness and testing: A knowledge diffusion model simulation. BMC Public Health 2018, 18, 1234. [Google Scholar]
  42. Pérez, A.; Martínez, M. Community health worker-led interventions to improve tuberculosis knowledge and testing rates in underserved populations. J. Community Health 2021, 46, 345–353. [Google Scholar]
  43. Tanimura, T.; Jaramillo, E.; Weil, D.; Raviglione, M.; Lönnroth, K. Financial burden for tuberculosis patients in low- and middle-income countries: A systematic review. Eur. Respir. J. 2014, 43, 1763–1775. [Google Scholar] [CrossRef]
  44. Fenta, M.D.; Ogundijo, O.A.; Warsame, A.A.A.; Belay, A.G. Facilitators and barriers to tuberculosis active case findings in low- and middle-income countries: A systematic review of qualitative research. BMC Infect. Dis. 2023, 23, 515. [Google Scholar] [CrossRef]
  45. Chen, X.; Peng, Y.; Zhou, L.; Wang, F.; Chen, B.; Qu, Y. The necessity for enhancing awareness of tuberculosis starting from the early college semesters: Empirical evidence from a cross-sectional research. Front Public Health 2023, 11, 1272494. [Google Scholar] [CrossRef]
Figure 1. Structure of the decision tree.
Figure 1. Structure of the decision tree.
Information 16 00239 g001
Figure 2. Significant splits of the decision tree.
Figure 2. Significant splits of the decision tree.
Information 16 00239 g002
Figure 3. LTBI intervention simulation (awareness and action over time).
Figure 3. LTBI intervention simulation (awareness and action over time).
Information 16 00239 g003
Figure 4. LTBI intervention simulation (general versus high-risk groups).
Figure 4. LTBI intervention simulation (general versus high-risk groups).
Information 16 00239 g004
Figure 5. LTBI intervention simulation (faster testing for general versus high-risk groups).
Figure 5. LTBI intervention simulation (faster testing for general versus high-risk groups).
Information 16 00239 g005
Figure 6. Comparison of feature importance: logistic regression versus random forest.
Figure 6. Comparison of feature importance: logistic regression versus random forest.
Information 16 00239 g006
Figure 7. The effects of different LTBI awareness rates on the proportion of the patients transitioning to the “action-taken” state, such as getting tested or treated, over 12 months.
Figure 7. The effects of different LTBI awareness rates on the proportion of the patients transitioning to the “action-taken” state, such as getting tested or treated, over 12 months.
Information 16 00239 g007
Table 1. Knowledge of LTBI survey questions.
Table 1. Knowledge of LTBI survey questions.
Question CodeQuestionChoice of Responses
Q1Have you ever heard of LTBI before?1A: Yes
1B: No
Q2Have you ever received health education
on LTBI and TB?
2A: Yes
2B: No
Q3What do you understand by the term
“Latent tuberculosis infection”?
3A: A form of tuberculosis that is highly contagious
and easily spread through the air
3B: Tuberculosis infection that remains dormant
in the body without causing symptoms or spreading
to others
3C: An advanced stage of tuberculosis where the
an infection has spread to multiple organs
Q4How is LTBI different from active TB?3D: Tuberculosis infection that is resistant to standard
treatments and requires specialized medications
3E: A condition where the tuberculosis bacteria have
been completely eradicated from the body
4A: LTBI is a condition where the tuberculosis bacteria
are actively multiplying in the body, causing symptoms
such as cough, fever, and weight loss, while active TB is
a dormant infection that does not cause symptoms
4B: LTBI is a contagious form of tuberculosis that can
Be easily transmitted to others through respiratory
droplets, while active TB is not contagious
4C: LTBI is characterized by the presence of
tuberculosis bacteria in the body without causing
symptoms or making the person sick, whereas active
TB manifests with symptoms and can make the person
sick
4D: LTBI is a more severe form of tuberculosis infection
that requires intensive treatment with multiple
medications, whereas active TB can be managed with
a single antibiotic
4E: LTBI is a temporary condition that resolves on
its own without treatment, while active TB requires
long-term treatment to prevent complications and
transmission to others
Q5What are the risk factors for
developing LTBI?
5A: Age
5B: Close contact with someone with active TB
5C: Immunocompromised condition
5D: Living or working in crowded environments
5E: All of the above
Q6What are the possible consequences of
having untreated LTBI?
6A: Development of active tuberculosis (TB) disease
6B: Increased risk of transmitting tuberculosis to others
6C: Progression of TB infection to more severe
forms affecting multiple organs
6D: Complications such as meningitis, bone, or joint
infection, or respiratory failure
6E: All of the above
Q7Can LTBI progress to active TB?7A: Yes
7B: No
7C: Not sure
Q8What are the recommended treatments
for LTBI?
8A: High-dose antibiotics for a short duration
8B: Combination therapy with multiple antibiotics
8C: Isoniazid (INH) monotherapy for 6 to 9 months
8D: Surgical removal of infected tissues
8E: No treatment is necessary for LTBI
Q9Are there any preventive measures
individuals with LTBI should take to
avoid developing active TB?
9A: Regular exercise and a healthy diet
9B: Avoiding close contact with individuals diagnosed
with active TB
9C: Taking vitamin supplements
9D: Completing a full course of treatments for LTBI as
Prescribed by a healthcare provider
9E: Using herbal remedies and alternative therapies
Q10Do you think LTBI is a significant public
health concern?
10A: Strongly agree
10B: Agree
10C: Neutral
10D: Disagree
10E: Strongly
Q11How concerned are you about the
possibility of progressing from LTBI
to active TB?
11A: Very concerned
11B: Somewhat concerned
11C: Neutral
11D: Not very concerned
11E: Not concerned at all
Q12Do you believe that LTBI treatment
is necessary, even if you do
not have symptoms?
12A: Yes
12B: No
12C: Not sure
Q13How do you perceive the importance
of LTBI screening programs?
13A: Vey important
13B: Somewhat important
13C: Neutral
13D: Not very important
13E: Not important at all
Q14What barriers do you think may
prevent individuals from seeking
LTBI testing or treatment?
14A: Lack of awareness
14B: Fear of side effects from medication
14C: Stigma associated with TB
14D: Financial constraints
14E: Other (please specify)
Q15Have you ever been screened for LTBI?15A: Yes, and I tested positive
15B: Yes, and I tested negative
15C: No
Q16If you tested for LTBI, did you
receive treatment?
16A: Yes
16B: No
Q17If you received treatment for LTBI, did
you complete the entire course
of medication?
17A: Yes
17B: No
Q18Have you ever been in close contact
with someone diagnosed with active TB?
18A: Yes
18B: No
Q19If yes, did you seek medical evaluation or
testing for LTBI?
19A: Yes
19B: No
Table 2. Summary comparison of performance metrics for decision tree and random forest.
Table 2. Summary comparison of performance metrics for decision tree and random forest.
ModelStrengthsWeaknesses
Logistic Regression- High interpretability, easy to explain.- Low recall, misses many LTBI-positive cases.
- Good precision.- Limited in capturing complex, non-linear interactions.
Decision Tree- Simple, interpretable structure.Prone to overfitting, leading to lower generalizability.
- Captures non-linear relationships.- Lower precision, higher false positives.
- Better recall than logistic regression.
Random Forest- Better overall accuracy and F1-score.- Less interpretable due to ensemble structure.
- Handles complex interactions well.- Struggled with recall for LTBI-positive cases.
- Provides insights into feature importance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Faye, L.M.; Magwaza, C.; Dlatu, N.; Apalata, T. Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches. Information 2025, 16, 239. https://doi.org/10.3390/info16030239

AMA Style

Faye LM, Magwaza C, Dlatu N, Apalata T. Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches. Information. 2025; 16(3):239. https://doi.org/10.3390/info16030239

Chicago/Turabian Style

Faye, Lindiwe Modest, Cebo Magwaza, Ntandazo Dlatu, and Teke Apalata. 2025. "Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches" Information 16, no. 3: 239. https://doi.org/10.3390/info16030239

APA Style

Faye, L. M., Magwaza, C., Dlatu, N., & Apalata, T. (2025). Exploring Determinants and Predictive Models of Latent Tuberculosis Infection Outcomes in Rural Areas of the Eastern Cape: A Pilot Comparative Analysis of Logistic Regression and Machine Learning Approaches. Information, 16(3), 239. https://doi.org/10.3390/info16030239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop