Thomas Tsiampalis, Demosthenes Panagiotakos
Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, Athens, Greece
Background: In studies of all-cause mortality, a one-to-one relation connects the hazard with the survival and as a consequence the regression models which focus on the hazard, such as the proportional hazards model, immediately dictate how the covariates relate to the survival function, as well. However, these two concepts and their one-to-one relation are totally different in the context of competing risks, where the terms of cause-specific hazard and cumulative incidence function appear. Objective: The aim of the present work was to present two of the most popular methods (cause-specific hazard model and Fine & Gray model) through an application on cardiovascular disease epidemiology (CVD), as well as, to narratively review more recent publications, based on either the frequentist, or the Bayesian approach to inference. Methods: A narrative review of the most widely used methods in the competing risks setting was conducted, extended to more recent publications. For the application, our interest lied in modeling the risk of Coronary Heart Disease in the presence of vascular stroke, by using the cause-specific hazard and the Fine & Gray models, two of most commonly encountered approaches. Results-Conclusions: After the implementation of these two approaches in the context of competing risks in CVD epidemiology, it is noted that while the use of the Fine & Gray model includes information about the existence of a competing risk, the interpretation of the results is not as easy as in the case of the cause-specific risk Cox model.
Key words: Competing risks, competing events, cardiovascular disease, epidemiology, survival analysis, Bayesian inference, Maximum likelihood
Corresponding author: Prof. Demosthenes B Panagiotakos, DrMed, FRSPH, FACE, Harokopio University, 70 Eleftheriou Venizelou Ave., 176 71, Athens, Greece, Tel. +30210-9549332, E-mail: email@example.com
Submission: 16.08.2019, Acceptance: 30.09.2019
Survival analysis is a set of statistical methods for analyzing the expected duration of time until one or more events happen, such as death. The majority of its methods are based on the assumption that each study’s participant is at risk of dying from only one cause, e.g., cardiovascular disease (CVD). However, most of the time participants are at risk of dying from more than one cause, which are called competing risks, such as death from stroke or other atherosclerotic disease. In addition, according to,1 competing risks are mutually excluded events, such as death from different causes, since the occurrence of one of these prevents the occurrence of any other event. In this case, the competing risk prevents us from observing the event of interest or modifies the probability of this event to occur. Now, the methodology of competing risks is increasingly being applied to the causes of death in order to be able to calculate the probability of death broken down by the specific causes. Such information is vital not only to inform patients about the dangers they face, but, also to make decisions about the treatment of the patient, how to best allocate health resources, as well as, to understand the long-term outcomes of chronic conditions.
The statistical analysis and the interpretation of the results from a comparing risk setting is totally different compared to the case of having only one event of interest. Specifically, in the context of competing risks, in order to have the correct estimate of the cumulative probability for each event, the application of the appropriate statistical methodology is required,2-6 while during the comparison of the cause- specific outcomes, the effect of the competing risks should be also examined, as well as, the correct statistical test should be chosen for the event of interest.7-9 The major difference of the models used in this context, is the fact that both the time until the first event occurs, as well as, the type of this event is analyzed. Finally, except for the models used for the competing risks, the extension- modification of the basic concepts and functions is also interesting.
In the present work, theoretical issues of competing risks are presented and discussed in relation to the 10-year CVD risk using the ATTICA epidemiological study data, as an application.10
1.1. Basic concepts of competing risks analysis
The fundamental principles of competing risks have been extensively examined by many authors the past years.10,11 Briefly, in the competing risk data, an individual may potentially fail from any of the different types of events, but only the time until the first failure of them can be observed and recorded. Therefore, the observations take the form (T, δ), where T is a non-negative random variable indicating the failure time and δ is an indicator that either records the type of failure that occurred or indicates that no failure has happened yet. Although only one failure-time is recorded, there is partial information about all types of events. For example, if two competing risks are studied and it is known that the patient failed from the first cause at after 5 months, then it is also known that the patient has not presented the second event at the same time.
The cause-specific hazard function plays a key role in this type of data, as does the marginal survival distribution, while the modeling of these concepts is the ultimate goal. In the literature, there is a number of models used in these cases, but they are differentiated as to whether they model the cumulative or the cause specific hazard. Specifically, one of the most well-known models used in the classic survival analysis is that of Sir David Cox (1972),12 particularly due to the direct interpretation of its parameters. Of course, the Cox model is applied even if there are competing risks, treating each one as censoring against the other risks. According to Prentice et al,13 the assessment of the relative risks for each specific event (of interest and/or competing) is carried out through the separate execution of the Cox model for each one, whereas, due to their separate application for each cause, they are known as cause-specific hazards models. Analogous to the Cox model in classical survival analysis, a basic prerequisite for implementing the model is the hypothesis of proportionality, in order for the hypothesis to be assumed that the hazard remains constant over time. Based on this approach, Lunn & McNeil14 proposed a stratified (per event type) Cox proportional hazards model by appropriately rearranging the data in order to replace the multiple cause- specific hazards models, which has been applied by a number of researchers.15,16
An alternative approach is the use of the model proposed by Fine & Gray.17 The difference between this model and the previous one lies in the fact that Fine & Gray’s approach is based on and goal at modeling the cumulative incidence function, which is the risk of the person experiencing the event of interest before time t17 Indeed, within this approach, the hazards that are shaped are referred to as sub distribution hazards. Specifically, these two models are different in the way of managing the people who experience a competing event and in the way of managing them in terms of the risk set. Specifically, the cause specific hazards model simply deletes these individuals, while in the approach suggested by Fine and Gray,17 these individuals remain in the risk set even after experiencing a competing event. Thus, the risk set contains both the individuals for whom the event of interest may be observed, as well as the individuals for whom the event of interest cannot be observed due to the occurrence of a competing event. Indeed, as reported by Fine & Gray17 as well as Andersen et al,18 the cause specific hazards model represents the instantaneous rate of occurrence of the event of interest at time t for those who are still at risk before this time, while Fine & Gray17 showed the momentary rhythm occurrence of interest at time t for individuals who are either still at risk or have experienced a competing event before that time.
In addition to the aforementioned models, over the years, there has been a significant development in the area, which has resulted in the development and presentation of more and more refreshed techniques. Alaa & Van der Schaar proposed a new model of estimation based on Gaussian processes.19 Specifically, they developed a non-parametric model based on Bayesian inference and can be used to jointly assess the risk of multiple (competing) negative outcomes of a patient. This model treats survival times in the context of competing risks as a result of a deep multi-task Gaussian (DMG) process that takes patient characteristics into account. Using these DMG processes, nonlinear interactions between patient characteristics and survival times from any competing hazard can be modelled without being based on parametric assumptions. Besides, Bellot & Schaar20 presented a new model, the Tree-based Bayesian mixture model, used in these cases. Specifically, the authors proposed a new semi-parametric regression model based on the Bayesian approach to inference, the Hierarchical Bayesian Mixture model, in order to describe the survival pathways in which the patient’s characteristics affect both the assessment of the event type, as well as, the subsequent chance of survival, through multivariate random forests. At the same time, Zhang & Zhou21 presented a new methodology based on Bayesian inference, named Lomax Delegate Racing (LDR). The aforementioned approach aimed to model the survival mechanism under competing risks and to interpret how patient characteristics accelerate or slow survival time until an event occurs. This model explains the non-monotonous effects of patient characteristics through a number of competing risks, while the rigid assumption of proportionality of the risks mentioned above is gradually relaxed. The advantage of this model is the ability to model not only censored data but also incomplete data, both in terms of survival time and type of occurrence event.
2. APPLICATION OF COMPETING RISKS’ MODELING
Based on the principles of competing risks theory described above, in the following paragraph an example of competing risks analysis is presented based on the 10-year follow-up of the ATTICA epidemiological study.10 In the present section the most widely known and established methods which are used in the competing risks setting, are used.
2.1. Description of the ATTICA study
In brief, the ATTICA study is an epidemiological survey that was established in 2001, in Athens metropolitan area and its primary objective was to assess the prevalence of CVD risk factors in the general population, to relate the risk factors to other characteristics of the individuals and to assess the effect of various factors on the risk of developing CVD. The baseline sampling was performed during 2001-2002, by trained personnel; the sample consisted of 3,042 adults (18-89 years), of whom 1,528 were women and 1,514 men, with no clinically manifested CVD or any other chronic disease. Various participants’ characteristics were measured at baseline examination, i.e., demographic, social, biological, clinical, nutritional, behavioral & lifestyle; moreover, a validated healthy aging index (HAI), ranging from 0 to 10 and using 10 attributes that reflect the aging process, was applied for assessing successful ageing. The index encompasses health-related social-, lifestyle- and clinical factors, including education, financial status, physical activity, Body Mass Index (BMI), depressive symptomatology, participation in social activities with friends and family, number of yearly excursions, total number of clinical CVD risk factors (i.e., history of hypertension, diabetes, hypercholesterolemia, obesity) and level of adherence to the Mediterranean diet (using MedDietScore). In 2006 and in 2012 the 5-year and 10-year follow-up were performed, respectively. Incidence of CVD events (Coronary Heart Disease – CHD, stroke or any other CVD), as well as the development of hypertension, diabetes, dyslipidemia and obesity were measured at each time-point. Details about the procedures of the ATTICA Study may be found elsewhere.10
In the present work, we focused on the data concerning the 10-year follow-up (2002-2012) of the ATTICA Study; of the 3,042 initially enrolled participants, 2,583 were found during the follow-up (85% participation rate), with an average age of 45.2 years (standard deviation = 13.9 years). From those who were lost to follow-up (i.e., n=459), 224 were not found because of missing or wrong addresses and telephone numbers that they have provided at baseline examination and 235 because they denied being re-examined. In order to participate in the follow-up all participants were initially appointed through telephone calls, while afterwards, the investigators approached them and performed a detailed evaluation of their medical records. For those who died, the information achieved from their relatives, as well as death certificates. Finally, regarding the individuals who might first suffered from stroke and then had CHD, it was a-priori decided the first outcome to be considered as the end-point, but also to record the consequent event for further testing of competing risks.
In the context of this application, we are interested in modelling the risk of occurrence of a CVD event in the presence of some competing risk. Specifically, our interest lies in modeling the risk of CHD (n=306 patients) in the presence of vascular stroke (n=24 patients), as a function of the various demographic and clinical characteristics of the participants.
Table 1 illustrates the results after using the Cox model without considering the information about the existence of the competing risk but considering the individuals who presented a vascular stroke as censored observations (the Results are presented as Hazard Ratios -HR, and 95% Confidence Intervals – CI). The findings presented in Table 1, cannot be considered accurate and valid as it is not taken into account the fact that patients can also die from the competing risk of stroke. Nevertheless, as mentioned in the previous sections, one can see the previous results as an attempt to model the cause-specific risk. However, in order to properly interpret the estimated parameters, the risk of death from stroke should be simultaneously modeled (see Table 2). It should be noted though, that in the present work only the statistically significant predictors are presented in Tables, and not those who are known to be associated with the risk of death from stroke.22,23
Combining the information presented in Tables 1 and 2, it appears that older people have a higher risk of developing CHD, but also a higher risk of vascular stroke. On the other hand, it is worth noting that while the sex of the patients, their blood glucose levels, the metabolic syndrome and the healthy aging index are significantly associated with the hazard of CHD, they do not appear to be associated with the risk of stroke. It therefore appears that the correct interpretation of the results in the context of the competing risks requires a simultaneous reference both to the risk of CHD and to the risk of stroke.
The second most common approach to these situations is the use of the Fine & Gray model (1999) presented above. The advantage of this approach to the former is the fact that, unlike cause-specific hazards, there is a one-to-one match between the hazard and the cumulative risk for the respective types of events. In the present case this means that the cumulative risk of CHD is only a function of the underlying risk of this event and not of the stroke. In Table 3 the results of this model are illustrated; it is observed that the same predictors were found as in Table 1. However, what is changing is the interpretation of the parameters. Specifically, in the case of the HAI where the HR was found to be 0.62, this means that for an increase of the index by 1 unit, the hazard of dying from CHD is reduced by 38%, if-and-only-if the patient still has it. This implies that interpretations of the parameters now refer to both those who have not had CHD and those who have died from the competing risk of vascular stroke. In practice, the previous result could be interpreted as that the 1-unit increase in the HAI is significantly associated with a reduction in the hazard of dying from CHD by about 40% among people still living or among people who have already died by vascular stroke. Therefore, after the implementation of the two most commonly encountered approaches in the context of competing risks in CVD epidemiology, it is noted that while the use of the Fine & Gray model includes information about the existence of a competing risk, the interpretation of the results is not so easy and accessible, as in the case of the cause-specific risk Cox model.
3. CONCLUSIVE REMARKS
In the present article the competing risks methodology was presented, starting with the most widely known analytical approaches and extended to more recently proposed methods. The majority of the scientific community uses either the cause- specific hazard model or the Fine & Gray model whenever there are two or more competing events, which are two of the most well established analytical approaches. When the results are based on the first approach, their interpretation is straightforward, while they are not interpreted in such an easy way when the Fine & Gray model is being used. Both models account for competing risks and offer the opportunity of modeling the effect of covariates on different hazard functions. However, the second model described as a CIF (Cumulative Incidence Function) regression model, makes explicit the link between the sub- distribution hazard and the effect on the incidence of an event. That is, one may directly predict the cumulative incidence for an event of interest using the usual relationship between the hazard and the incidence function under the proportional hazards model. Thus, the sub- distribution hazard model allows one to estimate the effect of covariates on the cumulative incidence function for the event of interest.
Given the availability of statistical software, analysis of cumulative incidence function has become increasingly popular and widely reported in recent years. CIFs can be estimated in R using the cuminc function in the cmprsk package, while sub- distribution hazard models can be fit in R by using the crr function in the cmprsk package. In addition, in SAS, PROC PHREG permits estimation of sub- distribution hazard models through the use of the ‘eventcode’ option in the model statement, as well as, in Stata, the stcrreg function permits estimation of sub- distribution hazard regression models. As far as the cause- specific hazards models, they can be fit in any statistical software that permits estimation of the conventional Cox proportional hazards model, by simply treating those subjects who experience a competing event as being censored at the time of the occurrence of the competing event. In R, one can use the coxph function in the survival package, in SAS, one can use PROC PHREG, while in Stata, one can use the stcox function.
As illustrated in our application, when it comes to regression modelling, we have to choose whether our conclusions should focus on cause-specific hazards or on cumulative incidences. Cause-specific hazards models based on Cox regression, while they are easy to fit and provide simple interpretations, they do not provide simple relationships between the covariates and the easier interpretable cumulative incidences. Such relationships are obtained by the Fine & Gray model but the interpretation of the parameters is much more difficult. One major difference between these two approaches is the risk set. More specifically, the risk set of the second approach is consisted of people who are either still living or they have already died by the competing event. However, over the years, there has been a significant development in the specific field, which has resulted in the development and presentation of more refreshed techniques, based either on the frequentist or on the Bayesian approach to inference.
Therefore, when deciding on how to make inference in a competing risks situation the aforementioned properties as well as the assessment of model fit, should be kept in mind. Both rates and risks for all competing events remain useful and tend to supplement each other when studying models for competing risk. However, cause-specific hazards may be more relevant when the disease etiology is of interest, while cumulative incidences are easier to interpret and are more relevant for the purpose of prediction.
Researchers need to decide whether the research objective is on addressing etiologic questions or on estimating incidence or predicting prognosis.
Cumulative incidence functions (CIFs) should be used to estimate the incidence of each of the different types of competing risks, while the Kaplan-Meier estimate of the survival function should not be used for this purpose.
Fine-Gray sub- distribution hazard model should be used the when the focus is on estimating incidence or predicting prognosis in the presence of competing risks, while cause-specific hazard model should be used when the focus is on addressing etiologic questions.
Finally, in order to permit a full understanding of the effect of covariates on the incidence and the rate of occurrence of each outcome, in several cases, both types of regression models should be estimated for each of the competing risks.
Conflict of interest
None to declare
The authors would like to thank the ATTICA study group of investigators: Yannis Skoumas, Natassa Katinioti, Labros Papadimitriou, Constantina Masoura, Spiros Vellas, Yannis Lentzas, Manolis Kambaxis, Konstadina Palliou, Vassiliki Metaxa, Agathi Ntzouvani, Dimitris Mpougatsas, Nikolaos Skourlis, Christina Papanikolaou, Georgia-Maria Kouli, Aimilia Christou, Adella Zana, Maria Ntertimani, Aikaterini Kalogeropoulou, Evangelia Pitaraki, Alexandros Laskaris, Mihail Hatzigeorgiou and Athanasios Grekas for their assistance in the initial physical examination and follow-up evaluation, Efi Tsetsekou for her assistance in psychological evaluation, as well as the laboratory team: Carmen Vassiliadou and George Dedoussis (genetic analysis), Marina Toutouza-Giotsa, Constadina Tselika and Sia Poulopoulou (biochemical analysis) and Maria Toutouza for the database management.
1. Gichangi A, Vach W. The analysis of competing risks data: A guided tour. Stat Med. 2005 Jan; 132(4):1-41.
2. Gelman R, Gelber R, Henderson IC, Coleman CN, Harris JR. Improved methodology for analyzing local and distant recurrence. J Clin Oncol. 1990 Mar; 8(3):548-55.
3. Korn EL, Dorey FJ. Applications of crude incidence curves. Stat Med. 1992 Apr; 11(6):813-29.
4. Gaynor JJ, Feuer EJ, Tan CC, Wu DH, Little CR, Straus DJ, et al. On the use of cause-specific failure and conditional failure probabilities: examples from clinical oncology data. JASA. 1993 Jun; 88(422):400-09.
5. Caplan RJ, Pajak TF, Cox J D. Analysis of probability and risk of cause-specific failure. Int J Radiat Oncol Biol Phys. 1994 Jul; 29(5):1183-6.
6. Gooley TA, Leisenring W, Crowley J, Storer BE. Estimation of failure probabilities in the presence of competing risks: new representations of old estimators. Stat Med. 1999 May; 18(6):695-706.
7. Freidlin B, Korn EL. Testing treatment effects in the presence of competing risks. Stat Med. 2005 Jun; 24(11):1703-12.
8. Williamson PR, Kolamunnage- Donna R, Tuddur Smith C. The influence of competing-risks setting on the choice of hypothesis test for treatment effect. Biostatistics. 2007 Oct; 8(4):689-94.
9. Dignam JJ, Kocherginsky MN. Choice and interpretation of statistical tests used when competing risks are present. J Clin Oncol. 2008 Aug; 26(24):4027.
10. Panagiotakos DB, Georgousopoulou EN, Pitsavos C, Chrysohoou C, Metaxa V, Georgiopoulos GA, et al. Ten-year (2002-2012) cardiovascular disease incidence and all-cause mortality, in urban Greek population: the ATTICA Study. Int J Cardiol. 2015 Feb; 180:178-84.
11. Allignol A, Schumacher M, Wanner C, Drechler C, Beyersmann J. Understanding competing risks: a simulation point of view. BMC Med Res Methodol. 2011 Jun; 11(1):86.
12. Cox DR. Regression models and life-tables. J R Stat Soc Series B Stat Methodol. 1972; 34(2):187-202.
13. Prentice RL, Kalbfleisch JD, Peterson Jr AV, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978 Dec; 34(4):541-54.
14. Lunn M, McNeil D. Applying Cox regression to competing risks. Biometrics. 1995 Jun; 524-32.
15. Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: competing risks and multistate models. Stat Med. 2007 May; 26(11):2389-430.
16. Lim HJ, Zhang X, Dyck R, Osgood N. Methods of competing risks analysis of end- stage renal disease and mortality among people with diabetes. BMC Med Res Methodol. 2010 Oct; 10(1):97.
17. Fine JP, Gray RJ. A proportional hazards model for subdistribution of a competing risk. JASA. 1999 Jun; 94(446):496-509.
18. Andersen PK, Geskus RB, de Witte T, Putter H. Competing risks in epidemiology: possibilities and pitfalls. Int J Epidemiol. 2012 Jun; 41(3):861-70.
19. Alaa AM, Van der Schaar M. Deep multi-task gaussian processes for survival analysis with competing risks. 31st Conference on neural information processing systems. Curran Associates Inc. 2017 Dec 4-9; Long Beach, CA, USA. pp. 2326-34.
20. Bellot A, Schaar M. Tree-based Bayesian mixture model for competing risks. 21st International conference of artificial intelligence and statistics. 2018 April 9-11, Playa Blanca, Lanzarote, Canary Islands. pp. 910-8.
21. Zhang Q, Zhou M. Nonparametric Bayesian Lomax delegeate racing for survival analysis with competing risks. 32nd Conference on Neural Information Processing Systems. 2018 Dec 3-8, Montreal. pp. 5002-13.
22. Gikas A, Lambadiari V, Sotiropoulos A, Panagiotakos D, Pappas S. Prevalence of major cardiovascular risk factors and coronary heart disease in a sample of Greek adults: the saronikos study. Open Cardiovasc Med J. 2016 May; 10:69-80.
23. Gutierrez J, Alloubani A, Mari M, Alzaatreh M. Cardiovascular disease risk factors: Hypertension, diabetes mellitus and obesity among Tabuk Citizens in Saudi Arabia. Open Cardiovasc Med J. 2018 Apr; 12:41-49