1. Introduction
The aging of society has become a crucial concern in many countries. It leads not only to care pressure in small families, but also increased in expenses for elderly welfare and medical care. A major aging-related health problem hip-fracture, which is common cause of disability and mortality in older people [
1]. According to a survey, the number of hip fractures will increase to 6.26 million worldwide by 2050 [
2]. Older patients with hip fracture impose a high care burden on their families and health-care services, which can result in numerous other negative consequences in these patients’ lives.
Higher mortality rates, decreased functional status, and reduced self-care ability in order people following hip fracture are well-discussed and well-known factors that have been discussed in relevant studies [
3,
4,
5,
6]. Hip fractures in older people are always severe problems in public health, and many studies have explored this topic and analyzed the factors contributing to hip fracture [
7,
8,
9,
10].
The trend of mortality in older patients with hip fracture is likely to change with improvements in care systems and surgical techniques [
11,
12]. According to research [
13], the long-term trend of first-year mortality in older patients with hip fracture has declined by 8.8% for women and 20.0% for men in the United States. In general, the most critical time after a hip fracture is the first year [
12]. Three studies have indicated that the long-term trend of first-year mortality in older patients with hip fracture remained unchanged [
14,
15,
16]. To improve the quality of treatment older patients with hip fracture, a thorough understanding of the underlying deterioration factors is necessary before clinical treatment.
The aforementioned factors include comorbidities, patient demographics, and the physician. Making a decision about the type of specific intervention to administer is difficult before the factors related to the mortality of such patients are known. Wang et al. published a 10-year trend analysis of hip fracture mortality. However, to our knowledge, no further survival-related analysis has been conducted for the first-year trend, no population-based study has reported the long-term trend in older patients with hip fracture, and no study has used a population-based approach to examine the association between the characteristics and mortality of older patients with hip fracture [
17].
For multivariate survival analysis, semiparametric models have been used in related studies, among which one of the popular approaches is the Cox regression model. Even though the model is the most widely used survival analysis method in medical literature, it still has numerous drawbacks and limitations [
18,
19]. The Cox proportional hazard method relies on assumptions that can easily be violated in the presence of time-dependent covariates. Moreover, the potential for covariate bias is considerable, and the method lacks individual predictions [
20]. Even when fixed covariate values are used in the simpler Cox model to make individual predictions, this is computationally expensive [
21]. Furthermore, the Cox proportional hazards method produces poor results with many input variables [
22]. Multicollinear, continuous, and dichotomized data entered together in the model may produce unstable results [
23]. Finally, this model may not be capable of modeling interaction terms when data are dispersed in multidimensional space with nonlinear interactions [
22,
23].
With recent developments in data science technology, data classification methods have been used in many important research fields. Such methods have also been used to represent and develop some useful tools to support clinical diagnostic decisions and guidelines in medical care. Machine learning is a commonly used data-mining method, and it has been applied to analyze critical information hidden in medical databases. Many different machine-learning methods exist, and they are used for building predictive models for disease prognosis.
In this study, we conducted a population-based study in Taiwan using the National Health Insurance Research Database (NHIRD) to observe the changes in the trends of the older patients with hip fractures. We also identified risk factors for mortality in patients with hip fracture among an older population and examined the association between patient characteristics and risk factors for patients in Taiwan. Moreover, we developed prognostic risk models for patients with hip fracture through data mining and the traditional Cox proportional hazards model, and we compared the performance results. The models were developed and validated using a large population of older patients with hip fracture retrieved from the NHIRD.
2. Subjects and Methods
2.1. Database
This retrospective cohort study was performed from the National Health Insurance Research Database (NHIRD) in Taiwan. NHIRD is the one of the largest nationwide population databases in the world and provides health insurance with a coverage rate from 99% to 99.5% for the residents of Taiwan [
24,
25]. The dataset named Longitudinal Health Insurance Database 2000 (LHID2000) of the NHIRD was used for this study (registered number NHIRD 104-071). LHID2000 is a longitudinal cohort dataset and also contains completed all the original claim data of health insurance of one million individuals randomly sampled from the year 2000 registry of beneficiaries of the NHIRD and is created by the Taiwan National Health Research Institutes [
25]. This study was exempt and approved from full review by the Institutional Review Board of the Kaohsiung Veterans General Hospital Institutional Review Board (VGHKS14-CT7-09).
2.2. Study Population
We reviewed related literature to know potential variables affecting hip fracture deterioration, then identified all possible variable collected from the NHIRD. We selected the period of this study was from 1 January 2000 to 31 December 2010 to get the samples by following procedure and diagnosis codes which are the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). We selected the elderly subjects with an age equal or greater than 60-years-old in each year from 2001 to 2010. In total, there were 166,274 subjects enrolled in this study. From these subjects, hip fracture group were selected (ICD-9-CM codes: 820.xx) from the data of inpatient expenditures by admissions. Operative treatment for hip fracture was defined by the following ICD-9-CM procedure codes: 78.55, 79.05, 79.15, 79.25, 79.35, 81.51, and 81.52. The cases of hip fracture in 2000 were as baseline. In this study, the definition of hip fracture was defined as the new hip fracture from 2001 to 2010; the definition of death was defined as the duration from the admission date for hip fracture to the date of death. In addition, the patients who left without being permission, or were transferred to another hospital by physician asked were excluded from the analysis. The records with unknown disposition and miscellaneous missing data, such as unknown time of registration or disposition, unrecorded prognosis of disease, missing data of birth, and ambiguous and unreasonable data were also dropped out. A total of 1166 hip fracture patients were excluded from our study (as
Figure 1). The main outcome of interest was if these patients were dead within five years. Time of dead was determined from the time of dead time of mark from hospital. Further, index of outcome was classified to with dead within five years’ group, otherwise to alive after five years’ group.
2.3. Covariates
Potential variables affecting these patients dead within 5 years were comorbidities which also collected from NHIRD. Comorbidities consisted of the following ICD-9-CM codes: (1) diabetes mellitus (ICD-9-CM codes: 250.xx); (2) cardiovascular diseases, which include hypertensive disease (secondary hypertension were excluded) (ICD-9-CM codes: 401.xx-404.xx), ischemic heart disease (ICD-9-CM codes: 410.xx-414.xx), cardiomyopathy (ICD-9-CM codes: 425.xx) and heart failure (ICD-9-CM codes: 428.xx); (3) cerebrovascular disease, which include subarachnoid hemorrhage (ICD-9-CM codes: 430.xx), intracerebral and other intracranial hemorrhage (ICD-9-CM codes: 431.xx-432.xx), occlusion and stenosis of precerebral arteries (ICD-9-CM codes: 433.xx) and other and ill-defined cerebrovascular disease (ICD-9-CM codes: 437.xx); (4) renal failure unspecified, which range from acute renal failure to unspecified renal failure (ICD-9-CM codes: 584.xx-586.xx).
2.4. Statistical Analysis
First of all, the incidence, first-year mortality rate and first-year standardized mortality ratio (SMR) were calculated by each year. The incidence of new hip fracture was calculated as the number of patients with new hip fracture divided by the live subjects in the same year and stratified them by gender. The first-year mortality rate of new hip fracture was calculated as the number of deaths divided by patients with new hip fracture. The first-year SMR was the ratio of observed deaths in patients with new hip fractures to expected deaths in the study population.
An independent t-test was tested between the means in genders. Chi-square test was used to test the association between fracture numbers, death, first-year mortality rate and genders. We performed the linear regression for the long-term trend for incidence, first-year mortality rate and first-year SMR in elderly hip-fracture. Regression analysis was performed with logistic regression. Significance was defined if p-value < 0.05.
Next, we used first univariate Cox regression analysis, controlling that the proportionality of hazards was not violated, to test the different variables for their ability to predict mortality at 5 years from hip fracture patient dataset. On the other hands, the hazard ratios (HR) can be interpreted as the relative risk of dying due to hip fracture, are presented with their 95% confidence intervals as Cox regression analysis. Then, the mortality probability for any case at 5 years from diagnosis was calculated [
23]. Cox analysis was performed by the SPSS ver. 22.0 software.
Moreover, the following rules of the classifies with data mining classifiers as below: hip fracture patients alive ≥5 years from hip fracture patients who died within 5 years from disease dataset of whole NHIRD were classified as “dead” while patients alive but with a disease duration <5 years were excluded from the analysis. Thus, we incorporated the following three steps in building our data mining based model: (a) attribute selection based on the literature; (b) comparative evaluation of different classifiers or learning algorithms to select one classifier is the best classifier in this study; and (c) decision tree were generated through data mining tool and identified variables respectively for helping the prognosis of hip fracture patients. The classifiers are re-learned from the resampled datasets to further improve the performance on the hip fracture survival task.
2.5. Data Mining Learning Algorithms
The following learning algorithms/data mining classifiers were used for analysis: we applied several well-known single classification techniques, including decision tree (DT), support vector machines (SVM), and multilayer perceptron (MLP). We used their implementations in the Weka 3.7.3 open-source data mining software (
www.cs.waikato.ac.nz/ml/weka) for all the analyses.
For the DT, the most commonly used DT-based learning techniques areC4.5 and random forest (RF). DT is a model utilizing classification technology and induction methods to generate a tree-like decision structure which can be learned by the inductive method of the known examples of each class. Moreover, nodes of DT model consist of branches and leaves, the decision node indicates the test to be performed. To classify the input data, each node of the DT is a predicate and each predicate can determine whether the variable is greater than or equal to or less than types. When analyzing the data, if the selected data variable belongs to the categorical data, it is called a classification tree. If the selected data variable belongs to the continuous pattern, it is called the regression tree. Since it is not affected by linear regression and interactions between independent variables, therefore, especially in filed with complex data and not easy to make decision easier, DT is a very useful data mining model for the decision maker.
The classification of the data using the decision tree technique is a two-step process. The first is the learning process, training data is analyzed by DT algorithm in order to create a model to show as classification rules or a decision tree. The next step is used to determine the accuracy of the classification rules or decision tree. If the accuracy is acceptable, rules can be reused to classify new data in the same scenario of the practical field [
26]. It is well known the DT structure is very similar to the clinical decision-making process of the doctor. After the DT has been modelized, it can provide a good way to explain the problem that we want to find it. Therefore, the DT has been preferred in this study.
The aim of SVM is a controlled classification algorithm based on fictional based statistical learning techniques. It is devise a computationally efficient way of learning separating hyperplanes in a high dimensional feature space based on statistical learning theory. There are two cases for SVM: linear SVM and non-linear SVM. The working principle of the SVM is based on the principle of predicting the most appropriate decision function that separates the two classes in the most appropriate way, so as to achieve the best classification effect [
27]. It is major functional based learning techniques to generate the classification problems encountered during data mining.
In MLP, neurons are organized in layers, which is a mathematical and integrated based model that imitates the functionality of biological neural systems. MLP consists of an input and an output layer with one or more hidden layers. Neurons between two adjacent layers are fully connected, and each of them receives inputs and converts them into a higher level of combinations through both the combination and the transfer functions [
28].
2.6. Measures for Performance Evaluation
To build an alive prediction model for hip fraction patients, this study adopts Weka to investigate the performance of the classification techniques, including J48 (C4.5 in Weka), random forest (RF in Weka), SVM (SMO in Weka), multilayer perceptron (MLP in Weka). The performance of the classifiers was evaluated by means of sensitivity, specificity, accuracy and area under ROC curve (AUC). The evaluation of the performance of AUC that we are using, relies on the rules provided by Hosmer and Lemeshow [
29], and these are as follows: “excellent” if AUC ≧ 0.9; “good” if 0.9 > AUC ≧ 0.8; “fair” if 0.8 > AUC ≧ 0.7; “poor” if 0.7 > AUC ≧ 0.6; and “very poor” if AUC < 0.6. To label the prediction form each classifier, among all the possible thresholds T that constitute the coordinates of the ROC curves, the value that resulted in the highest classification accuracy was then chosen [
30]. Moreover, avoid the data class imbalance problem deteriorates the performance of classification techniques, resample module [
31]; thus a resample module in Weka is adopted to modify the proportions of two classes to be almost identical. In addition, ten-fold cross-validation is then applied in all the experimental evaluations for each generated dataset. The specific parameter values setting selected for each classification technique of this study in Weka Software is list in
Table 1.
4. Discussion
This study examined the 10-year trend in hip-fracture incidence and first-year mortality among older patients. It is the first study to report the long-term trend of first-year mortality in older patients with hip fracture in Taiwan. From 2001 to 2010, hip-fracture incidence did not significantly increase, but first-year mortality decreased significantly.
4.1. Incidence of Hip Fracture
In the 1990s, Gullberg et al. estimated the number of hip fractures would increase from 2.6 million in 2025 to 4.5 million in 2050 [
32]. In a nationwide study in Romania, the annual incidence of hip fracture increased from 184 to 214 per 100,000 persons between 2005 and 2009 [
33]. However, in some cross-sectional observational studies, the incidence of hip fracture has decreased in recent years [
13,
34]. The present study revealed that the annual incidence of new hip fractures among patients aged older than 60 years increased from 552 to 587 per 100,000 persons from 2001 to 2010. Moreover, the incidence increased from 435 to 486 per 100,000 persons in the male subgroup and from 683 to 689 per 100,000 persons in the female subgroup from 2001 to 2010. The increasing trend may at least be partially explained by population aging during the study period. Other studies have indicated that as populations age, the rate of hip fractures increases without additional interventions [
35,
36,
37]. In Taiwan, hip-fracture prevention measures have been implemented. Since 2003, Taiwan’s National Health Insurance (NHI) program has funded a drug to prevent osteoporosis; furthermore, since 2005, the Health Promotion Administration has cooperated with the Taiwanese Osteoporosis Association to promote osteoporosis prevention through measures such as publishing clinical guidelines and educating medical personnel. These measures may have prevented the rate of hip fracture among older adults from increasing significantly.
Similar to this study, many authors have reported a higher incidence of hip fracture in women than in men across various countries and regions [
13,
38,
39,
40,
41,
42]. Brauer et al. reported annual incidence rates in the United States in 2005 of 793.5 per 100,000 persons in female and 369 per 100,000 persons in male patients [
13]. In our series, the incidence rates in 2005 were 740 per 100,000 persons in the female subgroup and 462 per 100,000 persons in the male subgroup. Comparing our 2005 data with those from the United States, the incidence rates of hip fracture in female patients are similar, but that in male patients is higher. In a systematic review of hip fracture, Kanis et al. reported that the rate for men and for men and women combined in Taiwan were higher than those in the United States, and our findings were similar [
38].
4.2. First-Year Mortality
In the United States, the long-term trend of first-year mortality in older patients with hip fracture declined by 8.8% for women and 20.0% for men [
13]. However, the mortality of hip fracture was unchanged after 1995 [
13]. Wang et al. reported a similar finding for a standardized mortality rate in Taiwan using NHI data from 1999 to 2009 [
17]. In the present study, the first-year mortality rate after hip fracture was 21.5% in 2001, and it declined to 15.0% in 2010. Though there was a minor fluctuation between 2001 and 2010, the trend of decreasing first-year mortality rate after hip fracture was significant. There was a similarly significant finding in the male subgroup, decreasing from 29.3% in 2001 to 17.3% in 2010. Among females, the decrease was less—from 15.9% in 2001 to 13.5% in 2010. Although a minor fluctuation occurred between 2001 and 2010, the decreasing trend in first-year mortality rate after hip fracture was significant. A similar significant finding was observed in the male subgroup, where the rate decreased from 29.3% in 2001 to 17.3% in 2010. Among women, the decrease was less substantial—from 15.9% in 2001 to 13.5% in 2010. The integration of care systems, progress in medical and surgical care and techniques, and improvements in self-care ability and general health status in the older population may explain this phenomenon [
11,
43].
Moreover, the female subgroup exhibited lower first-year mortality than the male subgroup during each year of the study. Ariza-Vega et al., Endo et al. and Kannegaard et al. reported the similar finding [
44,
45,
46]. Brauer et al. reported a higher 30-, 180-, and 360-day mortality in males than in women [
13]. Hasegawa et al. reported a higher 120-day mortality in Japanese males than in women [
1]. Mortality after hip fracture was also sex-dependent in Singapore Chinese people and people covered by Singapore’s Medicare program [
47,
48]. When we adjusted for age and sex, the SMRs in the overall population and men significantly decreased from 2001 to 2010. However, the decreasing trend of SMR in women was nonsignificant. Even when we adjusted for age and sex, the SMR in men was still higher than that in women. The reason for the higher hip-fracture mortality in men is unknown [
45,
49]. After a hip fracture, male sex is a risk factor for death within the first year in itself [
46], which warrants further exploration.
4.3. Risk Factor of First-Year Mortality after Hip Fracture
In this study, for the general population of patients with hip fracture, risk factors for first-year mortality after hip fracture were age, sex, fracture occurring later in the year, and surgical intervention, but the number of comorbidities was not a significant risk factor. Patients with fractures occurring later in the year had lower first-year mortality following hip fracture, which confirmed the decreasing trend of first-year mortality among older patients. In the population of older patients with hip fracture, male sex was a risk factor for first-year mortality, as discussed in the section above.
Age was reported to be a risk factor for loss of bone strength over the femoral neck and hip fracture [
50,
51,
52,
53]. However, Richmond et al. reported that the mortality was higher in older patients (64–85 years) than it was in very old patients (85 years and older). Older age and more comorbidities (i.e., low health status) were reported to be common risk factors for in-hospital [
6,
54], acute [
6,
34,
55,
56], and late mortality after hip fracture [
1,
44,
46,
56,
57,
58]. Jou et al. reported that the Charlson comorbidity index score was associated with in-hospital mortality in women aged older than 50 years with hip fracture [
59]. Nonoperative treatment was another risk factor for first-year mortality following hip fracture in this study. Parker et al. reported no significant difference in mortality between operative and nonoperative treatment of hip fracture [
60]. Jain et al. reported that 30-day mortality was higher in their nonoperative group than in their operative group [
61]. Neuman et al. reported a correlation between race and 7- and 30-day mortality following hip fracture [
62]. The findings of this study were similar to those of the abovementioned relevant studies. In the present study, data-mining approaches were adopted to identify significant predictors of mortality in patients with hip fracture. This model uses a simple process for building a decision tree, and it can be easily and accurately applied in clinical practice. It could help to identify patients at risk of hip-fracture prognosis at an early stage and to arrange appropriate treatment for them.
5. Conclusions
In this retrospective cohort study using the secondary NHIRD, we found that the first-year mortality following hip fracture in older people exhibited a decreasing trend from 2001 to 2010. Risk factors for first-year mortality following hip fracture among older adults included male sex, older age, nonoperative treatment, and fractures occurring later in the year. We developed a framework to build a prognostic model for predicting 5-year mortality in patients with hip fracture using data-mining algorithms. Furthermore, we demonstrated for the first time that accurate individual predictions can be made. The advantages of our model include its logical simplicity, its biological plausibility, and its capability for generalization and applicability, which would ultimately support its implementation in clinical practice. It may consider to build various models for predicting the prognosis of hip fracture or integrating prediction algorithms into the computerized physician order entry system, thus creating a practical clinical decision support system with warning functions.
Some constraints must be addressed here because they may restrict the implications of the study; these limitations may provide clues for future studies. First, we collected cases from the NHIRD, which is only used for NHI claims. Information regarding the caregiver, family history, lifestyle factors, and environmental factors are not included in the NHIRD, all of which might be associated with the risk. Second, studies using the NHIRD have indicated a lack of clarity in diagnostic classification. Therefore, some diagnostic accuracy and the prognostic situation of the cases could not be ascertained.
Despite these limitations, we used decision-tree algorithms to build a robust tool that is easy to use and has excellent discriminative ability for the prediction of the prognosis of older patients with hip fracture. Prediction ability may be improved if models can include more clinical and lifestyle variables, which is often the least expensive way to improve care quality for patients with hip fracture. Finally, future population-based prospective studies are required to further validate the present findings. Alternatively, an interventional study in which patients are recruited based on the rules of expert systems in a Hospital Information System may be conducted.