1. Introduction
In the past two decades, the human genome has been successfully sequenced. Whole genome sequencing (WGS) and genome-wide association studies (GWASs) of human genomes (as well as those of other organisms) have become an everyday occurrence [
1]. Our knowledge of genetic variants, particularly the single nucleotide polymorphisms (SNPs) associated with susceptibility to diseases, has become deeper and more extensive.
Experimental gene therapy techniques, aimed at diseases caused by a single defective gene or a single SNP—the so-called Mendelian conditions—are being refined. Mendelian conditions cause high mortality and morbidity, but each of these conditions affects only a minute fraction of the population. As of June 2019, the OMIM Gene Map Statistics [
2] compendium has listed 6436 phenotypic genetic conditions caused by 4102 gene mutations. This list includes a variety of conditions, with onsets ranging from very early to late. For example, type 1 diabetes mellitus is caused by single defects in the HLA-DQA1, HLA-DQB1, or HLA-DRB1 genes [
3]. Early-onset Alzheimer’s disease is caused primarily by APP, PSEN1, or PSEN2 gene mutations and affects a relatively small proportion of the population, starting in their thirties, with the majority of mutation carriers being affected by the age of 65 [
4]. In contrast, macular degeneration [
5,
6,
7] is primarily caused by a small number of high-effect variants and manifests at a relatively old age. In some cases, individualized genetic diagnoses, where an SNP that needs to be edited can be specified precisely, are possible. Over the last two decades, 287 monogenic disease clinical trials have been conducted worldwide [
8]. When the medical technology becomes available, individuals who receive treatment will be effectively cured and will have no need for concern about the single specific cause of their disease.
Polygenic or complex late-onset diseases (LODs) pose a more nuanced problem, and this study will focus on them. There are thousands of estimated gene variants or SNPs of typically small effect that, in combination, constitute the polygenic LOD risk of each individual [
9,
10]. These diseases include the old-age diseases that eventually affect most individuals and are exemplified by cardiovascular disease (particularly coronary artery disease (CAD)), cerebral stroke, type 2 diabetes (T2D), senile dementia, Alzheimer’s disease (AD), cancer, and osteoarthritis.
What distinguishes polygenic LODs from infectious diseases or from Mendelian genetic conditions is difficulty in terms of the concept of cure. The diseases of aging are primarily a consequence of an organism’s decline over time, leading to increased susceptibility to many LODs [
11,
12,
13]. The combination of genetic liability, environmental factors, and the physiological decline of multiple organ systems leads to individual disease presentations [
14]. Detrimental gene variants are exacerbating factors [
15], compared to the average distribution of common gene variants that define human conditions, as they apply to polygenic LODs. The time of onset for each individual is modulated by genotype and environment [
16]. While some individuals will be diagnosed at a relatively young age, others will not be diagnosed with a particular LOD during their lifetime [
17]. According to the current consensus, a large number of common low-effect variants offer the likeliest explanation for the heritability of the majority of complex traits [
18,
19]. For example, in the cancers analyzed in this study, the fraction of all diagnoses that were attributed to highly detrimental inherited mutations was relatively low—it was estimated to explain heritability connected with 10%–14% of breast cancers [
20,
21], 10%–12% of prostate cancers [
22,
23,
24,
25], 5%–10% of colorectal cancers [
26,
27], and was assumed to be a relatively minor fraction for lung cancers [
28,
29,
30]. For the majority of these cancers, liability is attributed to the common low-effect gene variants and environmental factors. The development of cancer is a multistage process, wherein individual variability in any tumorigenesis stage duration or liability may be influenced by hereditary predisposition, as well as environmental factors [
31]. The level of susceptibility to the major polygenic LODs, and the difference between high-risk and low-risk individuals, may lie in a slightly higher- or lower-than-average fraction of detrimental gene variants. Certainly, the failure does not begin immediately prior to the age of diagnosis. For example, AD deterioration begins decades before symptoms first become noticeable [
32]. A similar situation holds for cardiovascular disease [
33,
34] and cancer [
35].
The best cure is prevention, and the time may be nearing when prophylactic gene therapy will be attempted for the prevention of complex polygenic diseases. Much scientific knowledge and technical expertise is required and many ethics questions will need to be settled before this level of prophylactic gene therapy can become possible. From an ethical perspective, as techniques have developed and the medical possibilities offered by gene therapy for improving health and preventing diseases have gradually materialized, its acceptance is becoming more widespread. This is exemplified by the findings of the U.S. Committee of the National Academies of Sciences, Engineering, and Medicine [
36] in
Human genome editing—Science, ethics, and governance, and the recommendations of the U.K. Nuffield Council on Bioethics [
37] in
Genome editing and human reproduction: Social and ethical issues, which considered germline editing as one possible application.
Computational techniques attempting to evaluate the effects of mutations or gene variants have been developed, although their accuracy needs to improve dramatically before they can become applicable to personalized human genetic evaluation or treatment [
38]. Similarly, while extensive libraries of human SNPs have been compiled, including dbSNP, HapMap, SNPedia, and aggregating sites [
39], the information is far from actionable as far as modifying multiple personalized SNPs is concerned. The ability to locate or be able to computationally estimate a complete set of the low-effect causal SNPs requires knowledge that may take decades to gain.
Gene editing technologies may also be a few decades away from the time when they can be used routinely, with the same low risk as applying an influenza vaccination, to modify a large number of gene variants distributed across the human genome. The latest gene editing technique, CRISPR-Cas9 [
40], has supplemented and mostly replaced older technologies, such as zinc-finger nuclease (ZFN) [
41] and transcription activator-like effector nuclease (TALEN) [
42], although, for some applications, these older techniques continue to be more appropriate. While its selectivity and on-target precision have improved, CRISPR is still the most effective in gene knockdown operations. For modification and repair, only a small fraction of CRISPR operations—using homologous repair with a template or a sister chromatid sequence—succeed. A recent advance, reported by Smith et al. [
43], proposed base editing with reduced DNA nicking, allowing for the simultaneous editing of >10,000 loci in human cells. CRISPR, which is only six years old, remains to be a rapidly developing technology that holds great promise. Synthetic genomics [
44,
45] could be another promising future technology. Synthetic genomics techniques could also help in developing the precise mapping of the effects of gene variants on disease phenotypes. If none of these approaches ultimately succeed in becoming reliable enough for the purposes of gene therapy, it is almost certain that a new, more suitable technique will be invented.
Changes in lifestyle and medical care, including the prevention and treatment of infectious diseases, have extended longevity over the last century, and this trend is projected to continue. This increased longevity is partly due to medical advances, helping people to live and function decades after first being diagnosed with historically deadly or debilitating illnesses. Preventive gene therapies may also become a future factor in prolonging health span. Actuarial science has tracked human mortality trends for centuries. The Gompertz–Makeham law of mortality, which was established more than 150 years ago, depicts an exponential increase in the rate of human mortality after the age of 30 [
46,
47]. While the parameters of the Gompertz–Makeham law continue to be adjusted, the principle remains valid. The apparent squaring of the mortality curve—the so-called compression of morbidity and mortality into older ages—implies that the maximum human lifespan is likely limited to about 120 years of age [
48,
49,
50].
Within the next few decades, gene therapy techniques and genetic knowledge may sufficiently advance to support prophylactic gene therapy to prevent late-onset diseases. It may be timely to evaluate the extent of the effects that future gene therapies may have on delaying the onset of LODs or preventing them entirely.
The goal of this study is to establish how the proportional hazards model and multiplicative genetic architecture can be used to map the polygenic risk to hazard ratio of succumbing to common late-onset diseases with advancing age and apply this mapping to quantify the effects of hypothetical future prophylactic gene therapies. As its foundation, this study used earlier research [
51], which reviewed epidemiology, heritability, and polygenic risk models, and developed a simulational basis for the analysis of eight of the most common diseases: AD; T2D; cerebral stroke; CA; and breast, prostate, colorectal, and lung cancers. Computer simulations in this study quantified the correlation between the aging process, the polygenic risk score (PRS), and the change in the hazard ratio with age—using as inputs the clinical incidence rate and familial heritability—and estimated the outcomes of hypothetical future prophylactic gene therapy on the lifetime risk and age of onset for these eight LODs, they also estimated the lifetime risk increase associated with longevity gains.
3. Discussion
For the purposes of this hypothetical treatise, it was assumed that it is possible to precisely identify individual gene variants and their detrimental or beneficial effects, then, use gene therapy to modify a large number of detrimental variants. Rather than analyzing arbitrary synthetic choices of heritability and disease incidence progressions, eight LODs were chosen as a case study. Using this approach allowed us to relate the findings to some of the highly prevalent LODs that cover the broad spectrum of heritability and disease incidence patterns and—while keeping in mind that the results are a model view, with each of the reviewed LODs certainly possessing deeper specific causal mechanisms—it allowed us to make generalizations about lifetime risk changes if the LOD risks were lowered by some intervention, in this case, by gene therapy. This hypothetical gene therapy model was applied to estimate what would happen to LOD progression as the population ages. Conceptually, gene therapy here does not consider additions of artificially designed genomic sequences, but rather, only corrections made to typically low-effect heterozygous in-population gene variants, that is, a correction of a detrimental variant to a naturally occurring neutral state. For the sake of simplicity, the model used SNP distributions, though the same would apply (albeit with a higher degree of complexity) to gene therapy using other gene variant types.
This study does not evaluate potential obstacles due to pleiotropy, defined for the purposes of gene therapy as the possible negative effect on other phenotypic features of any attempt to prevent an LOD by modifying a subset of SNPs [
52,
53]. The high-risk individual PRS is caused by numerous variants. In this model, these are normally distributed in the population. There is a relatively small difference in the absolute number of detrimental alleles between the population average and higher-risk individuals. Arguably, for the purpose of personalized prophylactic treatment, it will be possible to select a small fraction of variants from a large set of available choices (as seen in
Table 4) that do not possess antagonistic pleiotropy, or perhaps even select SNPs that are agonistically pleiotropic with regard to some of the other LODs.
Applying the modeled aging coefficient to evaluate the impact of longer life expectancy on lifetime risk confirms the long-standing observation that aging itself is the predominant risk factor for many late-onset diseases and conditions. The calculations applying the discovered aging coefficient to the discrete hazard ratio values showed a delay in onset incidence for all analyzed LODs. The lifetime risk decreased in proportion with a decrease in hazard ratio, as long as the absolute value of lifetime risk remained low. With the introduction of an emulated life expectancy increase, the lifetime risk increased. The lifetime risk increase with age was most prominent for AD. In those countries with longer life expectancy, the lifetime risk of AD is usually higher, as was demonstrated by Wu et al. [
54], using the example of Japan. These results confirm, once more, that if mortality from all causes is lower (resulting in a longer life expectancy), AD is an LOD that exhibits a rapid rise in advanced-age prevalence. It would be difficult to limit the prevalence of AD, which is delayed only by approximately 3 years with the modeled level of therapy. AD may require a higher number of gene edits, likely postponing the possibility of more effective treatment to a point even further in the future; yet, any improvement would be welcome. It is possible that a pharmaceutical intervention targeting a causal metabolic pathway or immune or inflammatory response may be more effective for AD, although past announcements that generated false hope regarding breakthroughs through these kinds of approaches are too numerous to cite.
The Framingham General Cardiovascular Risk Score included age as one of the major risk factors for stroke and CAD [
55]. Boehme et al. [
56] showed a similar pattern for T2D, which the results of the current study were in agreement with. For T2D, stroke, and CAD, lifetime risk will regain pretreatment baselines within 10 to 15 years of longer life, which is equivalent to delaying the average onset age of these LODs by as many years. Based on heritability and incidence rate combinations, prophylactic gene therapy holds the potential to bring significant and longer-lasting benefits for cancer prevention, even with a similar or smaller number of edited gene variants than for the more prevalent diseases. The potential limitation of this study is the possibility that GWAS (and other future techniques) will have difficulties in finding a sufficient number of common low-effect SNPs to decrease the disease liability to the level simulated in this research, or that gene-environment effects will not follow Cox’s proportional hazards model [
57,
58] for some of the late-onset polygenic diseases. The likeliest candidate is lung cancer, which has the lowest heritability and is the most environmentally affected of all cancers reviewed here. For lung cancer, addressing the polygenic risk of smoking [
59], as well as genetically influenced carcinogenicity of smoking on an individual level [
60] and environmental improvements may allow for similar amelioration of disease liability. Additionally, when such advanced gene therapy technologies become available, preventing monogenic, highly detrimental variants will be simple, and the combination of therapies can bring about even more substantial improvements in both individual and population-wide health outcomes.
Gene therapy simulation scenarios analyzing population statistics showed decreases in LOD incidence and delays in LOD onset. These simulations also showed the increase in lifetime risk with emulated longer life expectancy. Such estimates may be important for evaluating population health and well-being and the potential financial impact on healthcare systems. The estimates in this study, based on the proportional hazards model and multiplicative genetic architecture using the aging coefficient, allowed for an estimation of these effects accounting for a model genetic architecture of the LODs, rather than a more simplistic calculation based primarily on the statistical shape of the incidence rate progression. In a study, aptly titled “Projections of Alzheimer’s disease in the United States and the public health impact of delaying disease onset," Brookmeyer et al. [
61], it was estimated that an intervention that achieved a two-fold AD hazard ratio decrease would shift the exponential rise curve of AD by five years, leading, in the long term, to a twofold decline in the cumulative incidence and prevalence of AD when accounting for mortality. The simulation reflecting age-related change in PRS distribution demonstrated that the positive effect on the lifetime risk of AD would be significantly lower than projected by the above study, in the case of preventative gene therapy. While AD has emerged as one of the most difficult diseases to prevent, LODs with low cumulative incidence, such as cancer, exhibit enduring improvement under this model.
Even though each LOD was analyzed independently in this study, prioritizing certain LODs for preventative therapy, in practice, could have a significant effect on other conditions not specifically targeted for treatment. For example, T2D is one of the diseases that causes the most comorbidities, accelerating the onset of cardiovascular and other diseases, sometimes by decades [
56]. For this reason, preventative treatment of T2D could mean improvements in health or delays in the presentation of a range of LODs, either independently of or in addition to treating their specific gene variants.
5. Conclusions
In this study, computer simulations mapped polygenic risk to the hazard ratio of being diagnosed with eight common LODs, based on their known heritability and incidence rates, under the proportional hazards model and multiplicative genetic architecture. The resulting mapping—the aging coefficient—enabled the researcher to quantify the population effects of the emulated prophylactic gene therapy, alongside longevity increases. Computer modeling and simulations deal with simplifications and generalizations of biological processes, and aim to make predictions about the behavior of the modeled systems when modifying parameters of a model, the conclusions of this study are made in such context. The conclusions of this study are contingent on progress in molecular genetics identifying a sufficient number of true causal SNPs for a particular LOD on an individual basis, gene editing technologies becoming capable to safely provide such a level of therapy, and prophylactic gene therapies successfully passing clinical trials and obtaining the approval of governmental agencies.
The intensive gene therapy simulated here could dramatically delay the average onset of the analyzed LODs and reduce the lifetime risk of the population. The simulations highlighted that the magnitude of familial heritability and cumulative incidence patterns distinguish the outcomes for the analyzed LODs when subjected to the same PRS decrease. This outcome can be characterized by the delay in LOD onset, that is, the estimate of the number of years it would take for each LOD to regain the pretreatment baseline level.
In summary, if gene therapy, as hypothesized here, were to become possible, and if the incidence of the treated diseases followed the proportional hazards model with multiplicative genetic architecture composed of a sufficient number of common low effect gene variants, then (a) late-onset diseases with the highest familial heritability will have the highest number of variants available for editing; (b) diseases with the highest current lifetime risk, particularly those with the highest incidence rate continuing into advanced age, will be the most resistant to attempts to lower the lifetime risk and delay the age of onset at a population level; (c) diseases that are characterized by the lowest lifetime risk will show the strongest and longest-lasting response to such therapies; and (d) longer life expectancy is associated with a higher lifetime risk of these diseases, and this tendency, while delayed, will continue after the therapy.