Next Article in Journal
Taxation of the Digital Economy and Direct Digital Service Taxes: Opportunities, Challenges, and Implications for African Countries
Previous Article in Journal
A Growing Light in the Lagging Region in Indonesia: The Impact of Village Fund on Rural Economic Growth
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cox Proportional Hazards Regression for Interval-Censored Data with an Application to College Entrance and Parental Job Loss

Department of Information and Statistics, Chungnam National University, Daejeon 34134, Korea
*
Author to whom correspondence should be addressed.
Economies 2022, 10(9), 218; https://doi.org/10.3390/economies10090218
Submission received: 2 May 2022 / Revised: 24 August 2022 / Accepted: 1 September 2022 / Published: 7 September 2022

Abstract

:
This study involved conducting a survival analysis by fitting a Cox proportional hazards model to Korea Labor Panel data to analyze the impact of parental job loss on children’s delayed admission to colleges and universities in South Korea, using 376 subjects whose parental education levels were college-level or higher. Since Korea Labor Panel data are interval- and right-censored, we compared three imputation methods: simple omission, imputation as the average of the left and right values of the interval, and multiple imputation. Their integrated areas under the ROC curve (AUC) and mean square errors (MSE) were compared to assess their predictive and estimation performances. It was found that, within the simulation, the multiple imputation method exhibited a lower MSE than the other two methods. However, no difference was observed in the iAUC values. In the group where each householder had at least a college degree, parental job loss was significantly related to the delayed college or university admission of the first-born child regardless of the use of the interval censoring imputation method. In particular, when the first-born children experienced their parents’ unemployment at the age of 18, the probability of college admission was reduced nearly by 53% compared to cases where they did not. This analysis implies that the government should develop a policy in the education system offering psychosocial support for adolescents who cannot expect parental help.

1. Introduction

Parental job loss affects children’s educational achievement. Unemployment results in an immediate income loss and thus obliges parents to reduce financial support for their children’s education, thereby negatively affecting their school performance (Becker and Tomes 1986; Berger et al. 2009; Blau 1999; Rege et al. 2011). Job loss can deteriorate a child’s family environment through family separation, alcohol-related problems in mothers, and increased mortality in fathers (Mörk et al. 2020). Such disruptions in their family environments hinder children’s educational achievements (Codjoe 2007; Muola 2010; Pan and Ost 2014; Parveen 2007).
Educational attainment is one measure of educational achievement (Nielsen and Roos 2015). Since it is critical to social and economic success beyond efforts and ability (Goldthorpe 1996), many researchers have studied the relationships between parental job loss and children’s college enrollment (Coelli 2011; Lindemann and Gangl 2019; Pan and Ost 2014). Coelli (2011) investigated the relationship between parental job loss when children are in the age group of 16 to 18 years old and their enrollment in university and community college. University enrollment is a binary outcome for youth: enrolled in a university or community college in the two years after high school. They considered youth enrollment in the two years after graduating high school as an outcome. They found that youth who have experienced parental job loss tend to go to university in later years, and not immediately after high school, to make the money necessary for the tuition fee. Pan and Ost (2014) analyzed the difference in the college admission rates of adolescents depending on the difference in their parents’ unemployment period. All the households in this paper experienced parental unemployment. The experimental group experienced parental unemployment at 15–17 years of age, and the control group experienced parental unemployment at 21–23 years of age. Their analysis confirmed that the experimental group’s college entrance rate was 10% lower than that of the control group. Lindemann and Gangl (2019) found that paternal unemployment during children’s secondary education had a negative effect on their enrollment in postsecondary education.
We investigated if parental job loss contributes to delayed college enrollment in South Korea. A Cox regression model was employed to analyze how the timing of college enrollment is affected by parental job loss (Cox 1972). Previous studies simply focused on whether the affected children were able to enroll in college or not at the age of 18 or 19 years, which was a binary outcome observed in a short term. Meanwhile, we used the data for the analysis based on 20 years of panel data from surveys conducted between 1998 and 2017 as part of the Korean Labor & Income Panel Study (KLIPS). The timing of college enrollment was determined during the follow-up time, which ranged from one year to ten years and varied with subjects. By doing so, we could utilize information about delayed college enrollment to analyze the effects of parental unemployment. For example, if the outcome variable is college enrollment of a child aged 18 or 19 years, a child who enrolls in college three years after high school graduation is considered “not enrolled.” Meanwhile, in our analysis, the child’s outcome was three, the exact timing of college enrollment. Through this analysis, we could estimate the effects of parental unemployment on delayed enrollment.
Section 2 provides the theoretical background of this study in the following two aspects: socio-economic research and statistical methodology. We drew upon the literature on children’s education to discuss potential mechanisms behind the effects of parental job loss on delayed college enrollment. The necessity of survival analysis models for interval-censored data to analyze the KLIPS data used in this study is also explained. Section 3 provides details on several methodologies for handling interval censoring in the Cox proportional hazard model and describes the multiple imputation approach, which is the focus of this paper. Section 4 presents the simulation study for comparing the performance of the methodology for processing interval censoring and the results. In Section 5, the factors affecting children’s enrollment in university are analyzed using data from the Korea Labor Panel Survey for the preceding 20 years, and their significance is examined. Section 6 discusses the implications of this study and suggestions for public policy in South Korea.

2. Theoretical Background

2.1. A Relationship between Parental Unemployment and College Entrance

Convincing evidence indicates that education is one of the key channels through which parental socio-economic status is transmitted to their children (Breen and Goldthorpe 1997; Kopycka 2021; Van de Werfhorst et al. 2003). Job loss threatens socio-economic status by negatively affecting future income, job security, and health (Baum 2003). Parental job loss is devastating in that it has intergenerational effects. In particular, children’s educational achievement is susceptible to parental job loss. Involuntary parental unemployment arouses tension and the possibility of family disintegration (Charles and Stephens 2004; Jahoda et al. 2017), which leads to adverse consequences on children’s academic achievements (Johnson et al. 2012). Becker (2009) showed that the likelihood of wealth transfer increases as the educational expenditure increases. This suggests that parental socio-economic background affects children’s future income, which is mediated by children’s educational achievements. Since long-term household income tends to decline after job loss regardless of re-employment, (Brand 2015; DiPrete and McManus 2000; Gangl 2006), educational expenditure can be limited for a certain period. Coelli (2011) and Kalil and Wightman (2011) found that decreased income because of the unemployment of parents affects their children’s educational outcomes. Many studies have revealed the adverse effects of parental job loss on children’s college enrollment (Coelli 2011; Lindemann and Gangl 2019; Pan and Ost 2014). According to Lindemann and Gangl (2019), fathers’ unemployment had a negative effect on their children’s college admissions in Germany, but to a lesser extent than in the United Kingdom and United States. In contrast to countries such as the United States and United Kingdom, Germany has robust social protection measures in place against unemployment, and most colleges offer tuition assistance, thereby leading to the economic impact being minimal.
In South Korea, many studies have focused on the relationships between parental socio-economic status and children’s education (Kim 2007; Shin 2010; Yang 2016). Kim (2007) found that the socio-economic status of the family affects the children’s educational achievement, which is mediated by the parent–child relationship. Yang (2016) discovered that the phenomenon of educational pedigree, in which the academic achievement of students rises as the socio-economic status of their parents increases, was observed in all OECD member countries. The strength of the phenomenon varied according to the type of welfare in each country. In particular, the intergenerational transmission of parental socio-economic status was more prevalent in the liberal welfare system to which Korean education belongs. Shin (2010) divided socio-economic status into two groups using the case study method: high-education-middle class and low-education-working class. Then, they investigated the impact of parents’ socio-economic backgrounds on their children’s education in depth. While well-educated middle-class parents encouraged their children to study with the goal of gaining admissions into high-ranked colleges, less-educated working class parents tended to leave their children to make their own decisions in most cases. Ku (2003b) analyzed how family backgrounds, such as family structure, low income, and poverty, affect children’s college admissions. The investigation found that having a single parent and living in poverty both had negative effects on children’s college admissions; moreover, the situation of single parents was shown to have a more detrimental effect than that of poverty. Ku (2003a) analyzed the impact of parental unemployment on children’s college admission. The study used three years (1998–2000) of data from KLIPS. Children who experienced parental unemployment at 16 to 18 years of age tended to have lower probabilities of obtaining college admission at 19 years of age.
As a measure to evaluate a child’s educational achievement, most of the studies utilized whether or not to enroll in a university. Along with its rapid economic growth, the tertiary school enrollment rate of South Korea has increased sharply. In 1995, the global ranking of the tertiary school enrollment rate of South Korea was 9, which was indicative of huge growth from the ranking of 44 in 1980. The university entrance rate in Korea in 2019 was 69.8% higher than the OECD average of 44.9% (OECD 2020). Children who have experienced parental job loss may not be able to enter college immediately after graduating high school because of poor academic performance or financial issues, resulting from parental job loss. Since most Korean high school graduates enroll in college, children may be under pressure to enter college even later, when they resolve the relevant issues. This implies that parental job loss would affect not only the decision to enroll in college but also the timing of college enrollment. Therefore, we analyzed how parental job loss affects the timing of college enrollment. This analysis can help the government to develop and implement its education strategy on time by clearly understanding how parental job loss affects children’s education.

2.2. Characteristics of Time to Admission as Survival Data

Survival analysis considers the time elapsing until the interest event occurs as a response variable. The time it takes to enroll in college can be seen as the survival time. In specific, the panel data of the Korea Labor Institute, which is collected annually from a survey administered to the same target audience, is appropriate for survival analysis methodology in that the time to enter university can be utilized as a response variable. In the panel data, right censoring occurs when no further investigation can be made because of the moving of, immigration of, or loss of contact with the household to be investigated. The Cox proportional hazards regression model is the one of survival models to analyze the relationship between the event time and explanatory variables by dealing with the right censoring (Cox 1972). It is a semiparametric model because it assumes only the model for the regression coefficient without any assumptions on a baseline hazard function.
Other than being right-censored, the KLIPS data are interval-censored. Survival data are called interval-censored if a subject’s survival time is only known within a certain specified time interval instead of being observed exactly. In the KLIPS data, the households’ responses are annually recorded. If a subject enters university during a year when the subject’s response is missing, the exact timing of the college entrance will be unknown and interval censoring will occur.
Since the interval-censored survival time can be considered as missing data, imputation methods can resolve the missing data issue. There are two types of imputation methods: single imputation and multiple imputation. While the single imputation method replaces the missing value as a single estimated value, the multiple imputation method pools multiple analysis results based on the multiple imputed data sets. Pan (2000) proposed the use of multiple imputation (Rubin 1987) for interval-censored data and the employment of Cox regression for the imputed data. As a nonparametric method, Turnbull (1976) suggested nonparametric survival functional estimation methods satisfying self-consistency. Groeneboom and Wellner (1992) proposed iterative convex minorant (ICM) algorithms to improve convergence speed of Turnbull’s method. Wellner and Zhan (1997) combined an EM algorithm and an ICM algorithm as an EM-ICM algorithm. Finkelstein (1986) applied a Newton–Raphson algorithm to the Cox regression by adding covariates to the model in a nonparametric way. Zeng et al. (2016) formulated a semiparametric survival model called IntCens for interval-censored survival time with time-dependent covariates. They employed nonparametric maximum likelihood estimation with an EM-type algorithm.

3. Data and Methodology

3.1. Cox Proportional Hazards Regression

Survival models explore a relationship between a hazard function and a set of covariates. The Cox proportional hazards model assumes that the effect of a unit increase in a covariate is multiplicative to the hazard rate with a proportional hazard assumption. The proportional hazard model has a nonparametric form in that it does not assume any distribution for the survival time or specification for the baseline hazard function. Furthermore, because it assumes only the model for the regression coefficient β k and uses a parametric method to conduct estimations β k , it is a semiparametric model.
We will assume that T is a non-negative continuous random variable representing the survival time or the time to a specific event (e.g., time to admission). The hazard function at time t is defined as
h ( t ) = lim d t 0 P { t T t + d t | T t } d t .
The Cox regression specifies the hazard function of the i-th subject with covariates x i in the following manner.
h i ( t ) = h ( t | x i ) = h 0 ( t ) exp k = 1 p β k x i k
The covariate vector is given by x i = ( x i 1 , x i 2 , , x i p ) , and the baseline hazard function at time t is denoted by h 0 ( t ) . The regression coefficients β = ( β 1 , β 2 , , β p ) are estimated by maximizing the partial likelihood as β ^ , where the partial likelihood is given by
P L ( β ) = i = 1 n exp ( β x l ) l R ( t i ) exp ( β x l ) δ i .
δ i is an indicator variable for the censoring of the i-th subject. R ( t i ) is a risk set that is exposed at any risk at time t i , which includes subjects that have not experienced the event before t i and are not censored. A null hypothesis for β , H 0 : β = β 0 , is tested using a Wald test, a likelihood ratio test, or a score test (Cox 1972).

3.2. Multiple Imputation for Interval-Censored Time

The data with censoring are denoted by D = { ( A i , x i ) , i = 1 , , n } , where A i is ( L i , R i ] and x i is a p-dimensional covariate vector. If the survival time T i is left-censored, L i = 0 ; if the survival time is not censored, L i = R i . If it is right-censored, R i = . Since the partial likelihood function in Equation (1) can be calculated for the data with right censoring but not for the data with left or interval censoring, an additional step is necessary for the interval-censored cases in the KLIPS data.
We considered three approaches for dealing with the interval-censored cases: omitting the interval-censored cases, midpoint imputation, and multiple imputation. Midpoint imputation refers to imputing the interval-censored time to the event by using the midpoint of the interval ( L i , R i ) as ( L i + R i ) / 2 . While midpoint imputation can be classified through simple imputation, multiple imputation is one of the probability-based imputation methods. Pan (2000) proposed the use of multiple imputation (Rubin 1987) for interval-censored data and the employment of Cox regression for imputed data.
For this paper, we used the MIICD package (Delord and Génin 2015) in the R program to implement the multiple imputation method. For the imputation of the interval-censored data, we considered the use of poor man’s data augmentation (PMDA) or asymptotic normal data augmentation (ANDA). When there are few missing values, the PMDA methodology underestimates the actual variability (Wei and Tanner 1991), and ANDA is recommended for the imputation algorithm. In addition, when the number of truncated data is small, the regression coefficient converges to 0, so it is recommended to use “Link estimate” instead of the Breslow method to estimate the baseline survival function (Pan 2000).
This method uses an iterative algorithm and generates multiple imputed data sets. The subscript ( k ) and the superscript ( i ) represent the k-th imputed data set and the i-th iteration, respectively. Let us say ( T ( k ) , δ ( k ) , X ) represent m sets of possibly right-censored values for k = 1 , , m . T is the observed event time, δ is whether or not it is censored, and X is the set of covariates. The multiple imputation method proposed by Pan converts interval-censored data to right-censored data using the PMDA or ANDA method and then calculates it through the partial likelihood ratio. The detailed algorithm is as follows. Without loss of generality, only one explanatory variable x j and the corresponding regression coefficient β are considered.
  • In the i-th iteration, the estimates for the regression coefficient and the baseline survival function are denoted by β ^ i and S ^ 0 i . Note that the starting value is β ^ 0 = 0 . After assuming a uniform distribution for L j and R j in the m sets, the failure time X j is randomly generated and designated as an imputed value. This is expressed as T ( k ) , j = X j and δ ( k ) , j = 1 . The baseline survival probability S ^ 0 , ( k ) ( 0 ) is the Breslow estimate of the baseline survival probability for the k-th replaced data set.
  • We generate m sets of imputed data { T ( 1 ) , δ ( 1 ) , X } , , { T ( m ) , δ ( m ) , X } which are possibly right-censored as follows. For each observation L j , R j , and x j , j = 1 , , n , m sets created as right-censored data by replacing interval-censored time is empirically appropriate, and S ^ 0 in the second step is discrete assumed as follows:
    Each object has ( ( L j , R j , X j ) ) , j = 1 , , n and k = 1 , , m , if V j < . Samples Y j are from the [ S ^ 0 i ] exp Z j β ^ ( i ) distribution, under the condition that { L j < T j < R j } , let { L j < T j < R j } , T ( k ) , j = Y j and δ ( k ) , j = 1 . In the case of R j = , T ( k ) , j = L j and δ ( k ) , j = 0 .
    ( L j , R j ) is interval-censored time and the i-th base survival function [ S ^ 0 i ] exp Z j β ^ ( i ) is { p 1 , , p k j } following the probability mass function at { t 1 , , t k j } . Here, the failure time Y j is randomly proportional to the probability at { t 1 , , t k j } with the probability mass function value { p 1 , , p k j } .
  • Since all the interval-censored values are imputed, the Cox proportional hazard model can be employed. Through this, the regression coefficient estimate can be considered as being β ^ ( k ) ( i ) and the covariance estimate can be considered as being Σ ^ ( k ) ( i ) .
  • T ( k ) , δ ( k ) , X denotes the k-th right-censored data of m sets obtained through the imputation of the interval-censored data. β ^ ( k ) ( i ) denotes the regression coefficients obtained by fitting a Cox proportional hazard model. The Breslow estimate S ^ 0 , ( k ) ( i ) for the basis survival function is calculated based on T ( k ) , δ ( k ) , X and β ^ ( k ) ( i ) .
  • In the i-th iteration, the β ^ ( k ) ( i ) of m sets is summed and divided by m, which is denoted by β i + 1 . In this way, the basis survival function is also obtained. The covariance is the sum of the intragroup and intergroup imputation variances. This can be expressed as an equation as follows.
    β ^ ( i + 1 ) = 1 m k = 1 m β ^ ( k ) ( i ) , S ^ 0 ( i + 1 ) = 1 m k = 1 m S ^ 0 , ( k ) ( i )
    Σ ^ ( i + 1 ) = 1 m k = 1 m Σ ^ ( k ) ( i ) + 1 + 1 m k = 1 m β ^ k i β ^ i + 1 2 m 1
    Finally, it repeats from the first until the β ^ i converges.
The ANDA method includes a variation in the fifth step of the PMDA above. In the fifth step, the normal distribution is assumed with a mean vector of the regression coefficients and a covariance matrix of the covariances are obtained from the k-th set; furthermore, the estimated values of the regression coefficients are obtained.
g ^ ( i + 1 ) β = 1 m k = 1 m N β ^ ( k ) ( i ) , Σ ^ ( k ) ( i )

4. Simulation Study

In order to find an appropriate imputation method for the KLIPS data with interval censoring, the imputation performance of the imputation methods for interval-censored data, the simple omission of the interval-censored data, the midpoint imputation, and the multiple imputation were compared. We compared the models in terms of model estimation, which is measured by the mean squared error (MSE). The mean squared error (MSE) was calculated to evaluate the performance of the three imputation methods. The MSE is a method for measuring the accuracy of the estimated regression coefficient value.
M S E = 1 p j = 1 p ( β ^ j β j ) 2
We employed the IntCens method (Zeng et al. 2016) to compare the estimation accuracy with other methods. For both simulation and real data sets, the IntCens method failed because of a singularity issue. Since the parental unemployment rate was only 2.63%, the simulated data contain only a few cases having unemployment experience. This may cause a numerical singularity problem. To resolve the singularity issue, we added random noise on the simulated response variable values. The simulation results are presented in the Appendix A. The MSE values of the IntCens were the largest regardless of simulation settings. On the top of that, its Monte Carlo standard deviation of the MSE values was the largest, which implies unstable model estimation. We concluded that the IntCens is not suitable for the KLIPS data. More results can be found in Appendix A.

4.1. Simulation Settings

In order to generate the simulation data, we mimicked the censoring rate of the KLIPS data, while we considered several scenarios for the interval censoring rate from the low values to the high values. The right censoring rate was fixed at 20%, and the interval censoring rate was fixed as 15%, 30%, or 45%. The simulation data were sampled with replacement from the KLIPS data of 376 subjects whose parental education levels reached college or higher. We considered samples sizes of 100, 300, and 1000 to compare the imputation methods for different cases.
The baseline hazard function was generated from the exponential distribution and the Weibull distribution. A nonparametric method was also employed to mimic a case where the data were generated from Cox regression; this is known as the flexible-hazard method. For the noncensored cases, a uniform distribution was used to generate the left and right bound times L and R. If the left and right bound times were not the same, the case was regarded as interval-censored time. The survival time was generated assuming that the true regression coefficient was β = ( 0.058 , 0.0446 , 0.8758 , 0.052 ) , which can be obtained by repeating 100 times for the MIICD package to which the multiple imputation method was applied.

4.2. Simulation Results

The MSE values are summarized in Table 1, where the exponential distribution is assumed. When the baseline hazard function follows an exponential distribution, the MSE tends to decrease as the sample size increases, regardless of the interval censoring rate. Furthermore, for a given sample size and interval censoring rate, the MSE of the multiple imputation method is slightly lower than that of the midpoint imputation and omission method for the sample size of 100, 300. In the cases of the sample sizes of 1000, there is no difference between the MSE of the multiple imputation and the midpoint imputation, and the MSE of the midpoint imputation and MSE of the multiple imputation are lower than those of the omission.
The simulation results are summarized in Table 2, where the right censoring rate is set as 20% and the Weibull distribution is assumed for the baseline hazard function. As in the case of the exponential distribution, the MSE value tends to decrease as the sample increases, regardless of the interval censoring rate. Furthermore, for a given sample size and interval censoring rate, the MSE of the multiple imputation is lower than that of the midpoint imputation when the sample size is 100 or 300. The MSE of the midpoint imputation is lower than that of the multiple imputation and omission for some cases. For a sample size of 1000, there is little difference between the MSE of the multiple imputation and that of the midpoint imputation, and in some cases, the MSE of the midpoint imputation is lower than that of the multiple imputation.
The simulation results are summarized in Table 3, where the right censoring rate is fixed at 20% and the flexible-hazard method is assumed for the baseline hazard function. As in the case of assuming the flexible-hazard method, the MSE value tends to decrease as the sample is enlarged, regardless of the interval censoring rate. Furthermore, when the sample size and the interval censoring rates are the same, the MSE of the multiple imputation is lower than that of the midpoint imputation if the sample size is 100, 300, or 1000.
The multiple imputation method showed a lower MSE than the midpoint imputation and omission when the sample size was 100, and the right censoring rate was fixed at 20%. However, the sample sizes of 300 and 1000 showed similar MSEs to those of the midpoint imputation, regardless of the interval censoring rate. The midpoint imputation is affected by the sample size, so the MSE decreases as the sample gets larger; however, the multiple imputation method shows a low residual regardless of the sample size, so it is a good method for imputing interval censoring with a robust model. Therefore, we proceed to use the Cox proportional hazards model with a multiple imputation method, which shows a better model estimation accuracy.

5. Data Analysis

5.1. Data

This study’s utilized data were taken from the Korean Labor & Income Panel Study (KLIPS), and the survey was administered to households living in urban areas and their household members. The members of the panel sample were household members from a sample of 5000 households. The KLIPS, a longitudinal survey, followed up on the subjects once a year to gain data about economic activities, labor market movement, income activities and consumption, education and vocational training, and social life.
The data used for the analysis were based on 20 years of panel data, which were collected from surveys conducted between 1998 and 2017 as part of the KLIPS. We analyzed the effect of parental job loss on children’s college entrance. Since the Korean Labor Panel Survey is conducted in urban areas, the survey data were limited to apply analysis results to rural areas.. Of the 989 subjects, 58 (5.9%) were right-censored and 79 (7.9%) were interval-censored time.
Among the variables of KLIPS, the householder’s education level, gender of the first child, gender of the householder, poverty, employment status of the first child’s parents, double income status of the first child’s parents, and the number of household members were selected as covariates based on the study by Ku (2003a). The descriptive statistics of the chosen variables are summarized in Table 4. However, the Fisher exact test results showed that the correlation between household poverty and parental unemployment experience was significantly high (p-value = 0.025); furthermore, the poverty variable was excluded from the real data analysis. Assuming that the effect of parental unemployment may vary depending on the household head’s academic background, we divided the sample into two subsets: a sample where the household head’s education level included the achievement of a high school diploma or a lower qualification and a sample where the household head’s education level included the achievement of a college degree or a higher qualification.

5.2. Analysis Results

In Table 5, the model’s estimation results were summarized for the sample where household heads’ education levels included qualifications under or equal to the achievement of a high school diploma. The first child’s probability of being admitted to a college under circumstances of parental unemployment was 58.4% lower than those who don’t experience parental unemployment in the multiple imputation. The variables of parental unemployment were significant in the cases of omission and multiple imputation at a significance level of 5%. Unlike other imputation way, the midpoint imputation produced a positive coefficient regarding the direction of the effect of the parental unemployment variable. Table 6 shows the model estimation results for the sample where household heads’ education levels included qualifications that were college graduation or more. The first child’s probability of being admitted to a college under circumstances of parental unemployment was 57.5% lower than that in the other cases. The variables of parental unemployment were significant in all the imputation methods at a significance level of 5%. When the interval-censored data were omitted, in the case of double-income households, the probability of being admitted to a college was 46% higher than that in the other cases. This shows that use of an inappropriate imputation method for interval censoring (such as simple omission or midpoint imputation) could distort the data analysis results.

5.3. Comparison of Predictive Performance According to the Imputation Method

We used the time-dependent receiver operating characteristic (ROC) curve to evaluate the predictive power of survival data instead of a simple ROC curve, which is used for evaluating the predictive power of binomial response variables. The area AUC ( t ) under the time-dependent ROC curve can be calculated at each time point t. The integrated AUC (iAUC) was used to compare the prediction performance of statistical methods for interval censoring.
Figure 1 and Figure 2 show ROC curves at a certain time point t when interval-censored data are imputed by the multiple imputation method. Figure 1 is for a child whose household head has the achievement of 328 a high school diploma or lower qualification. Figure 2 is for a child whose household head has achieved a college degree or a higher qualification. When t is less than 5, the ROC curves of the two samples show similar predictive performance. When t is greater than or equal to 5, the ROC curve of a sample with a higher education level shows better predictive performance.
The closer iAUC is to 1, the better the model is; the closer it is to 0.5, the less accurate the model is. The iAUC of the sample where household heads had educational qualifications that were below or equal to the achievement of a high school diploma was estimated as 0.51 based on five-fold cross-validation. The iAUC of the sample where household heads had educational qualifications that were equal to or higher than the achievement of a college degree was estimated as 0.53 based on the five-fold cross-validation.
This result implies that the predictive performance of the Cox regression model is very poor. Thus, this estimated model is limited to only interpretation, not prediction.

6. Conclusions

Using 20 years’ worth of data (1998 to 2017) from the Korean Labor Panel Data (KLIPS), we analyzed how college admissions could be affected by parental unemployment status when the first child in a household was aged 18 years and preparing for college. Suppose the child was admitted to the college in 2018, and the household answered the survey in 2020, except for the information about the child's college admission year. In this case, the panel data are interval-censored data since the researcher was unaware of the exact time of admission because of a lack of response. We considered three imputation methods for interval censoring: simple omission, the midpoint imputation, and the multiple imputation proposed by Pan. In order to choose an appropriate imputation method for this data, we ran extensive simulation studies. Mean squared errors (MSEs) were compared to evaluate the performance of the imputation methods. For the simulation study, 100, 300, and 1000 samples were resampled with replacement from the real data of 376 subjects whose parental education qualifications included college graduation or a higher qualification. The right censoring rate was set as 0.2, and the interval censoring rate varied from 0.15 to 0.45. Overall, the model estimation accuracy of the multiple imputation method was found to be higher than that of other imputation methods. The estimation accuracy midpoint imputation was affected by the sample size, so the MSE decreased as the sample grew larger. On the other hand, the multiple imputation method showed a low residual regardless of the sample size. Therefore, we can conclude that multiple imputation is a good method for ensuring robust model estimation.
The real data analysis showed that the effect of the variable “whether or not the parents are unemployed” on the time taken to be admitted to a college was significant only when the householder’s academic background was higher than and equal to college graduation. When the interval-censored data were removed, the “double income” and “parents’ unemployment” variables became significant. The first children of double-income parental households had a 46% higher probability of entering a college than others. On the contrary, when the first children experienced their parents’ unemployment at the age of 18, the probability of college admission was reduced nearly by 53% compared to cases where they did not. Therefore, our study suggests that college entrance is affected by parental financial status—in particular for households where the householder’s academic background is higher than and equal to college graduation. Children having well-educated parents tend to have strong will to get into good universities (Shin 2010). After experiencing parental job loss, the children cannot get admission to the colleges they have aimed for, leading to delayed college enrollment.
In the past, the economic crisis contributed to parental unemployment. These days, government policies to prevent infectious diseases such as COVID-19 can contribute to parental unemployment. Parental job loss may happen more frequently because of the long-term economic recession and recurring epidemics. It is crucial to understand the effects of parental unemployment on children’s academic achievements so that the government can develop a policy targeting adolescents who cannot expect parental help. The government should develop a policy in the education system offering psychosocial support. First, the government should improve the quality of education. Many South Korean students study at availing profit-making private institutes instead of regular school programs (Yoo 2021). If regular school programs play their standard roles, the financial contraction resulting from parental job loss would not affect the children’s academic performance as much. Additionally, the government and municipality need to provide psychosocial support to the households experiencing job loss. As per the OECD database, the proportion of dual-income households in South Korea having children less than 14 years was 29.4%, half the average of that of OECD countries, being 60.7% in 2018. Since most households in South Korea rely on the earnings of the head of the household, parental unemployment greatly impacts a family’s economic situation Lim (2020). Therefore, children experiencing parental job loss will suffer from the traumatic effects of job loss (Wightman 2012).

Author Contributions

Conceptualization, H.K. and E.L.; methodology, H.K. and E.L.; software, H.K. and S.K.; formal analysis, H.K. and E.L.; data curation, H.K.; writing—original draft preparation, H.K., E.L. and S.K.; writing—review and editing, H.K., E.L. and S.K.; supervision, E.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the 2019 research fund of Chungnam National University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data are available in the following website: https://www.kli.re.kr/klips_eng/index.do (accessed on 1 September 2019).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

We generated simulation data in the same manner in Section 4.1. Since the IntCens method had numerical singularity errors, we added random noise on the right bound time R of the simulated response variable values. The random noise was generated from the normal distribution with a mean of 0 and a standard deviation of 1. The MSE values of the IntCens were the largest regardless of simulation settings. On top of that, its Monte Carlo standard deviation of the MSE values was the largest, which implies unstable model estimation. We concluded that the IntCens was not suitable for the KLIPS data.
Table A1. MSE values are presented when the exponential distribution is assumed.
Table A1. MSE values are presented when the exponential distribution is assumed.
right censoring20%
sample size1003001000
interval censoring15%30%45%15%30%45%15%30%45%
omission0.9050.4950.5520.2070.1410.1490.0610.0580.072
midpoint imputation0.8570.3970.4480.1920.1370.1330.0590.0530.062
multiple imputation0.7750.3920.4410.2000.1440.1400.0640.0600.065
IntCens2.8870.7231.1440.4850.2920.3630.1740.1890.167
Table A2. MSE when the Weibull distribution is assumed.
Table A2. MSE when the Weibull distribution is assumed.
right censoring20%
sample size1003001000
interval censoring15%30%45%15%30%45%15%30%45%
omission0.6190.3270.9800.1270.1980.2210.0490.0540.067
midpoint imputation0.4680.3140.6860.1220.1630.1880.0450.0450.055
multiple imputation0.4860.3440.6880.1160.1660.1870.0490.0480.060
IntCens0.8180.9461.4950.3300.4440.4340.1550.1690.167
Table A3. MSE when the flexible-hazard method is assumed.
Table A3. MSE when the flexible-hazard method is assumed.
right censoring20%
sample size1003001000
interval censoring15%30%45%15%30%45%15%30%45%
omission0.5020.7081.1720.1970.2500.2170.0400.0540.071
midpoint imputation0.4840.5300.6430.1630.1280.1500.0350.0380.051
multiple imputation0.4900.4850.6050.1630.1090.1330.0360.0340.051
IntCens1.1561.0921.4810.4700.4560.5550.2550.2030.177

References

  1. Baum, Charles L., II. 2003. Does early maternal employment harm child development? An analysis of the potential benefits of leave taking. Journal of Labor Economics 21: 409–48. [Google Scholar]
  2. Becker, Gary S. 2009. Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education. Chicago: University of Chicago Press. [Google Scholar]
  3. Becker, Gary S., and Nigel Tomes. 1986. Human capital and the rise and fall of families. Journal of Labor Economics 4: S1–S39. [Google Scholar] [CrossRef]
  4. Berger, Lawrence M., Christina Paxson, and Jane Waldfogel. 2009. Income and child development. Children and Youth Services Review 31: 978–89. [Google Scholar] [CrossRef]
  5. Blau, David M. 1999. The effect of income on child development. Review of Economics and Statistics 81: 261–76. [Google Scholar] [CrossRef]
  6. Brand, Jennie E. 2015. The far-reaching impact of job loss and unemployment. Annual Review of Sociology 41: 359. [Google Scholar] [CrossRef]
  7. Breen, Richard, and John H. Goldthorpe. 1997. Explaining educational differentials: Towards a formal rational action theory. Rationality and Society 9: 275–305. [Google Scholar] [CrossRef]
  8. Charles, Kerwin Kofi, and Melvin Stephens Jr. 2004. Job displacement, disability, and divorce. Journal of Labor Economics 22: 489–522. [Google Scholar] [CrossRef]
  9. Codjoe, Henry M. 2007. The importance of home environment and parental encouragement in the academic achievement of african-canadian youth. Canadian Journal of Education 30: 137–56. [Google Scholar] [CrossRef]
  10. Coelli, Michael B. 2011. Parental job loss and the education enrollment of youth. Labour Economics 18: 25–35. [Google Scholar] [CrossRef]
  11. Cox, David R. 1972. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34: 187–202. [Google Scholar]
  12. Delord, Marc, and Emmanuelle Génin. 2015. Multiple imputation for competing risks regression with interval-censored data. Algorithms 11: 22. [Google Scholar] [CrossRef]
  13. DiPrete, Thomas A., and Patricia A. McManus. 2000. Family change, employment transitions, and the welfare state: Household income dynamics in the united states and germany. American Sociological Review 65: 343–70. [Google Scholar] [CrossRef]
  14. Finkelstein, Dianne M. 1986. A proportional hazards model for interval-censored failure time data. Biometrics 42: 845–54. [Google Scholar] [CrossRef] [PubMed]
  15. Gangl, Markus. 2006. Scar effects of unemployment: An assessment of institutional complementarities. American Sociological Review 71: 986–1013. [Google Scholar] [CrossRef]
  16. Goldthorpe, John H. 1996. Class analysis and the reorientation of class theory: The case of persisting differentials in educational attainment. British Journal of Sociology 47: 481–505. [Google Scholar] [CrossRef]
  17. Groeneboom, Piet, and Jon A. Wellner. 1992. Information Bounds and Nonparametric Maximum Likelihood Estimation. Berlin: Springer Science & Business Media, vol. 19. [Google Scholar]
  18. Jahoda, Marie, Paul F. Lazarsfeld, Hans Zeisel, and Christian Fleck. 2017. Marienthal: The Sociography of an Unemployed Community. London: Routledge. [Google Scholar]
  19. Johnson, Rucker C., Ariel Kalil, and Rachel E. Dunifon. 2012. Employment patterns of less-skilled workers: Links to children’s behavior and academic progress. Demography 49: 747–72. [Google Scholar] [CrossRef]
  20. Kalil, Ariel, and Patrick Wightman. 2011. Parental job loss and children’s educational attainment in black and white middle-class families. Social Science Quarterly 92: 57–78. [Google Scholar] [CrossRef] [PubMed]
  21. Kim, Eun-jung. 2007. A study on the social economic status of the family, extra tutoring fee, parent-child relationship and children’s educational achievement. Korean Journal of Sociology 41: 134–62. [Google Scholar]
  22. Kopycka, Katarzyna. 2021. Higher education expansion, system transformation, and social inequality. social origin effects on tertiary education attainment in poland for birth cohorts 1960 to 1988. Higher Education 81: 643–64. [Google Scholar] [CrossRef]
  23. Ku, In-Hoe. 2003a. The effect of economic loss and income levels on adolescents’ educational attainment. Korean Journal of Social Welfare 53: 7–30. [Google Scholar]
  24. Ku, In Hoe. 2003b. The effect of family background on adolescents educational attainment. Korean Journal of Social Welfare Studies 22: 5–32. [Google Scholar]
  25. Lim, Yong Bin. 2020. Labor market and economic activity of dual-income households. Labor Review 180: 79–94. [Google Scholar]
  26. Lindemann, Kristina, and Markus Gangl. 2019. The intergenerational effects of unemployment: How parental unemployment affects educational transitions in germany. Research in Social Stratification and Mobility 62: 100410. [Google Scholar] [CrossRef]
  27. Mörk, Eva, Anna Sjögren, and Helena Svaleryd. 2020. Consequences of parental job loss on the family environment and on human capital formation-evidence from workplace closures. Labour Economics 67: 101911. [Google Scholar] [CrossRef]
  28. Muola, James Muthee. 2010. A study of the relationship between academic achievement motivation and home environment among standard eight pupils. Educational Research and Reviews 5: 213–217. [Google Scholar]
  29. Nielsen, François, and J. Micah Roos. 2015. Genetics of educational attainment and the persistence of privilege at the turn of the 21st century. Social Forces 94: 535–61. [Google Scholar] [CrossRef]
  30. OECD. 2020. Population with Tertiary Education. Paris: OECD. [Google Scholar]
  31. Pan, Wei. 2000. A multiple imputation approach to cox regression with interval-censored data. Biometrics 56: 199–203. [Google Scholar] [CrossRef]
  32. Pan, Weixiang, and Ben Ost. 2014. The impact of parental layoff on higher education investment. Economics of Education Review 42: 53–63. [Google Scholar] [CrossRef]
  33. Parveen, Azra. 2007. Effect of Home Environment on Personality and Academic Achievement of Students of Grade 12 in Rawalpindi Division. Ph.D. thesis, National University of Modern Languages, Islamabad, Pakistan. [Google Scholar]
  34. Rege, Mari, Kjetil Telle, and Mark Votruba. 2011. Parental job loss and children’s school performance. The Review of Economic Studies 78: 1462–89. [Google Scholar] [CrossRef]
  35. Rubin, Donald B. 1987. Multiple Imputation for Nonresponse in Surveys. Wiley Series in Probability and Statistics; New York: Wiley. [Google Scholar]
  36. Shin, Myung-Ho. 2010. The academic performance gap between social classes and parenting practices. Korean Journal of Social Welfare Studies 41: 217–45. [Google Scholar]
  37. Turnbull, Bruce W. 1976. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society: Series B (Methodological) 38: 290–95. [Google Scholar] [CrossRef]
  38. Van de Werfhorst, Herman G., Alice Sullivan, and Sin Yi Cheung. 2003. Social class, ability and choice of subject in secondary and tertiary education in britain. British Educational Research Journal 29: 41–62. [Google Scholar] [CrossRef]
  39. Wei, Greg C. G., and Martin A. Tanner. 1991. Applications of multiple imputation to the analysis of censored regression data. Biometrics 47: 1297–1309. [Google Scholar] [CrossRef]
  40. Wellner, Jon A., and Yihui Zhan. 1997. A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data. Journal of the American Statistical Association 92: 945–59. [Google Scholar] [CrossRef]
  41. Wightman, Patrick. 2012. Parental Job Loss, Parental Ability and Children’s Educational Attainment. Population Studies Center. Available online: http://www.psc.isr.umich.edu/pubs/abs/7648 (accessed on 1 March 2022).
  42. Yang, Kyung-Eun. 2016. Revisiting the effect of parents’ socioeconomic status on students’ academic performance in relation to welfare state regimes. Journal of Critical Social Welfare, 146–74. [Google Scholar]
  43. Yoo, Jin Seong. 2021. Analysis of the current status of educational indicators in korea and the impact of private education. KERI Insight 2021: 1–32. [Google Scholar]
  44. Zeng, Donglin, Lu Mao, and D. Y. Lin. 2016. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103: 253–71. [Google Scholar] [CrossRef]
Figure 1. ROC curves at a certain time point t when interval-censored data are imputed by the multiple imputation method.
Figure 1. ROC curves at a certain time point t when interval-censored data are imputed by the multiple imputation method.
Economies 10 00218 g001
Figure 2. ROC curves at a certain time point t when interval-censored data are imputed by the multiple imputation method.
Figure 2. ROC curves at a certain time point t when interval-censored data are imputed by the multiple imputation method.
Economies 10 00218 g002
Table 1. MSE values are presented when the exponential distribution is assumed.
Table 1. MSE values are presented when the exponential distribution is assumed.
right censoring20%
sample size1003001000
interval censoring15%30%45%15%30%45%15%30%45%
omission0.4440.4320.4730.1950.2160.1900.0880.0950.105
midpoint imputation0.3370.3090.3170.1840.1910.1570.0800.0800.080
multiple imputation0.3330.3050.3050.1880.1870.1540.0800.0800.080
Table 2. MSE when the Weibull distribution is assumed.
Table 2. MSE when the Weibull distribution is assumed.
right censoring20%
sample size1003001000
interval censoring15%30%45%15%30%45%15%30%45%
omission0.3230.3410.3590.2230.2340.2390.0750.0810.090
midpoint imputation0.2930.2830.2780.2210.2250.2180.0710.0710.070
multiple imputation0.2950.2830.2760.2180.2220.2160.0710.0710.071
Table 3. MSE when the flexible-hazard method is assumed.
Table 3. MSE when the flexible-hazard method is assumed.
right censoring20%
sample size1003001000
interval censoring15%30%45%15%30%45%15%30%45%
omission0.7400.7750.9520.2490.3130.4060.0610.0750.090
median imputation0.6940.7100.7880.2330.2120.2200.0560.0550.057
multiple imputation0.6890.6830.7260.2310.2120.2160.0560.0550.055
Table 4. Variables and their descriptive statistics.
Table 4. Variables and their descriptive statistics.
FrequencyProportion (%)
education level of householdermiddle school graduation or less (1)15315.5%
high school graduation or less (2)45846.3%
college graduation or more (3)37838.2%
sex of the first childmale (0)50050.6%
female (1)48949.4%
povertyno (0)92693.6%
yes (1)636.4%
whether parents are unemployedno (0)96397.4%
yes (1)262.6%
double incomeno (0)10911.0%
yes (1)88089.0%
the number of household members2191.9%
313013.1%
463564.2%
517317.5%
6323.2%
censoringright censoring585.9%
interval censoring797.9%
no censoring85286.2%
Total989100%
Table 5. Comparison of model estimation for the sample whose householder graduated high school or less.
Table 5. Comparison of model estimation for the sample whose householder graduated high school or less.
High School Graduation or Less
OmissionMidpoint ImputationMultiple Imputation
n = 576; Number of Events = 524n = 610; Number of Events = 558n = 610; Number of Events = 558
β ^ exp ( β ^ ) se( β ^ )p-Value β ^ exp ( β ^ ) β ^ p-Valuese( β ^ ) exp ( β ^ ) se( β ^ )p-Value
sex0.1081.1140.1080.3160.0741.0770.0860.3490.0551.0560.1130.628
double income0.2851.3300.1860.1250.1381.1480.1480.3700.0681.0710.1950.727
whether parents are unemployed−0.7870.4550.3840.040 **0.1861.2040.2540.330−0.8770.4160.3910.025 **
the number of household members−0.0910.9130.0850.283−0.0430.9580.0630.409−0.0590.9430.0880.502
** denotes significance at 5% level.
Table 6. Comparison of significance of regression coefficients for the households whose householder graduated college or more.
Table 6. Comparison of significance of regression coefficients for the households whose householder graduated college or more.
College Graduation or More
OmissionMidpoint ImputationMultiple Imputation
n = 333; Number of Events = 315n = 376; Number of Events = 357n = 376; Number of Events = 357
β ^ exp ( β ^ ) se( β ^ )p-Value β ^ exp ( β ^ ) β ^ p-Valuese( β ^ ) exp ( β ^ ) se( β ^ )p-Value
sex0.1141.1200.1080.2920.1081.1140.1080.3160.0701.0730.1120.530
double income0.3811.4640.1880.043 **0.2851.3300.1860.1250.0311.0320.2130.883
whether parents are unemployed−0.7530.4710.3830.049 **−0.7870.4550.3840.040 **−0.8570.4250.3880.027 **
the number of household members−0.1190.8880.8880.176−0.0910.9130.0850.283−0.0400.9610.0880.649
** denotes significance at 5% level.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, H.; Kim, S.; Lee, E. Cox Proportional Hazards Regression for Interval-Censored Data with an Application to College Entrance and Parental Job Loss. Economies 2022, 10, 218. https://doi.org/10.3390/economies10090218

AMA Style

Kim H, Kim S, Lee E. Cox Proportional Hazards Regression for Interval-Censored Data with an Application to College Entrance and Parental Job Loss. Economies. 2022; 10(9):218. https://doi.org/10.3390/economies10090218

Chicago/Turabian Style

Kim, HeeJin, Sunghun Kim, and Eunjee Lee. 2022. "Cox Proportional Hazards Regression for Interval-Censored Data with an Application to College Entrance and Parental Job Loss" Economies 10, no. 9: 218. https://doi.org/10.3390/economies10090218

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop