Next Article in Journal
Customized Lattice Structures Tailored to Mimic Patients’ Bone Anisotropic Properties and Microarchitecture for Joint Reconstruction Applications
Previous Article in Journal
Assistive Communication Devices in Rett Syndrome: A Case Report and Narrative Review
 
 
Article
Peer-Review Record

Comprehensive Analysis of the Genetic Variation in the LPA Gene from Short-Read Sequencing

BioMed 2024, 4(2), 156-170; https://doi.org/10.3390/biomed4020013
by Raphael O. Betschart 1, Georgios Koliopanos 1, Paras Garg 2, Linlin Guo 3, Massimiliano Rossi 4, Sebastian Schönherr 5, Stefan Blankenberg 3,6,7, Raphael Twerenbold 3,6,7, Tanja Zeller 3,6,7 and Andreas Ziegler 1,3,6,8,*
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
BioMed 2024, 4(2), 156-170; https://doi.org/10.3390/biomed4020013
Submission received: 19 March 2024 / Revised: 29 May 2024 / Accepted: 31 May 2024 / Published: 4 June 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear authors, you have not included links to tables and figures in your text. I had to guess which table/figure was meant.

(see lines 265, 268, 295, 316, 331, 344 "Error! Reference source not found")

- You didn't write a conclusion.

- It's not clear that are recommendations for improving the assessment of risk factors for cardiovascular disease? What conclusion should the reader draw from your article?

Author Response

Please, see attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

 

Multi Digital Publishing Institute (MDPI): 2947595

Title: Comprehensive analysis of the genetic variation in the LPA 2 gene from short-read sequencing

 Overall comments:

The authors set out to investigate the predictability of lipoprotein (a) or Lp(a) concentrations by understanding the gene sequence/structure of the LPA gene.

Specifically, the kringle IV-type-2 (KIV-2) copy number of the LPA gene was targeted for analysis using the short-read whole genome sequencing data available via two methods, because this region had been shown to be important in determining the concentration of Lp(a). 

 Due to the repeat of kringle sequence, it has been very difficult to ascertain the number in each individual. 

Trinucleotide repeat sequencing had been very difficult in the past as well.

(the emphasis of the use of short-read GWAS data should have been included here)

The authors used two methods to analyze the data from WGS of the KIV-2 region by two methods, a newer method DRAGEN LPA Caller (DLC) and a read-depth based copy number (CN) estimate, in the 8,351 individuals who had whole genome sequencing results from the GENESIS-HD study, and these two methods were compared. 

The authors stated that the results of the two methods agreed at a high rate.  In addition, the authors used the DLC to identify allele-specific KIV-2 repeats since this method allows for differentiating two alleles using two SNVs which are in linkage disequilibrium (LD) at positions 296 and 1264. 

This is reminiscent of linkage analysis which was an old method to discern differences in alleles.   

The utility of the pentanucleotide repeat in the promoter region as well as several other single nucleotide variants (SNPs), and allele specific KIV-2 CN were also included in the overall assessment. 

The predictability of the various genetic variants was assessed in 4,861 individuals who had available a Lp(a) concentration.  The authors found that the allele specific KIV-2 CN was more useful in predicting a Lp(a) concentration than the total KIV-2 CN.

In addition, the addition of two LPA SNPs (4925G>A and rs41272114 (C/A/T), splice donor variant) improved the prediction. 

           From their analyses, the authors concluded that the allele-specific KIV-2 copy number (CN) assessed by the DRAGEN LPA Caller which allows for allele differentiation had a better correlation to the concentration of Lp(a) than total KIV-2 CN.

           Unfortunately, it was not clear until the first sentence in the conclusion that the big caveat of these analyses was the use of short-read GWAS data for the LPA sequence.

Please incorporate this into the abstract as well as in the introduction why the authors decided to pursue these analyses.

It may be helpful to have a work-flow diagram of the experiments which may really be clarified by the authors have taken to the readers.  

Bioinformatics has become a valuable tool in genetic studies and contributed so much to our understanding of the human genome; however, findings from these studies are often not well connected to the understanding of underlying biology, metabolism, or mechanism so that it is important to make some connections or references while using bioinformatics in genetics studies.

Coassin S, Kronenberg F. Lipoprotein(a) beyond the kringle IV repeat polymorphism: The complexity of genetic variation in the LPA gene. Atherosclerosis. 2022 May;349:17-35. doi: 10.1016/j.atherosclerosis.2022.04.003. PMID: 35606073; PMCID: PMC7613587.

Lines 365-367: “In this study, we performed a comprehensive analysis of the genetic variation within the LPA gene by using data from whole genome short-read sequencing.  “Key to these analyses was the availability of specialized callers to estimate the number of KIV-2 repeats. 

           This first sentence in the discussion should have been mentioned earlier so that the authors’ intention is clearly understood by the readers from the beginning.   It might have been something which was understood by the GENESIS-HD study personnel, but for someone who is not familiar with this study, it was not common knowledge. 

Recommend changing the title or adding “extracting the LPA sequence from the short-read whole genome sequencing data”, to better align with what was done, and what the authors were trying to do.

In addition, the lines 420-425: should be elaborated more since trying to use the short-read WGS data had not been utilized for the evaluation of the LPA gene previously (or something), so the authors were set out to determine the utility of the short-read WGS data on LPA using the newer specialized caller for analyses. 

           The comparison of short-read and long-read should be presented in a table or in the text so that the goals of the authors are understood. 

Short-read sequencing is always tricky depending on the length of each read, because it can miss sequencing a region which is much longer but may be technically or methodologically much if it can be perfected.

Just a note about Lp(a) measurements in the article below:

Kronenberg F. Lipoprotein(a) measurement issues: Are we making a mountain out of a molehill? Atherosclerosis. 2022 May;349:123-135. doi: 10.1016/j.atherosclerosis.2022.04.008. PMID: 35606072.

Key points box 1: Lp(a) measurement

Lp(a) is measured either in mass units (mg/dL) or in molar units (nmol/L), which causes major confusion in clinical practice.

• Measurements in molar terms are desirable but not easy to                   accomplish.

• The repetitive kringle IV (KIV) repeat structure of apolipoprotein(a) is the basic source for measurement problems of Lp(a).

• Polyclonal antibodies are widely used in clinical routine assays. They recognize with high likelihood the repetitive KIV structure of apolipoprotein(a).

• Consequently, concentrations of Lp(a) with a large number of KIV repeats might be overestimated and those with a small number of KIV repeats might be underestimated.

• The selection of the calibrators and their isoform sizes is of key importance and might improve the measurement performance of an Lp(a) assay.

• Major efforts to better standardize Lp(a) measurements are under way and should be followed by assay manufacturers.

• Despite the assays not yet perfected, most of them can be used for risk stratification of the patients.

• Lp(a) should be measured at least once in each person.

• There is no need for genetic testing of high Lp(a) in most of the individuals since a measurement of the Lp(a) concentration might be sufficient.

I agree with the last statement in the clinical realm at this time, but to expand our knowledge of genetics and the understanding of LPA gene, I believe it is important to conduct analyses in this manuscript. 

Regardless, it would be important to present the results with this understanding, and if possible, some references to the underlying biology should be included such as sequencing a certain region of LPA can replace the difficult to normalize Lp(a) measurements in the future (as an example) or understanding the variants and sequence may allow to target a better region for Lp(a) concentration analysis (if that’s the case). 

In general, many genes are influenced by environmental factors and other genetic factors or regulatory regions which affect when and how much a gene is transcribed so it is difficult to determine whether the results would have been the same if other methodologies or different settings had been used when correlating the gene sequence to protein levels. 

However, for the gene LPA, its regulation although not well understood, seems to be simpler than many other genes as far as I understand though more information may be discovered to change this in the future.  Thus, the use of bioinformatics/sequencing may be very fruitful. 

Regardless, it is always important to consider the underlying biology of the gene and proteins as well as how some biochemical measurements are done, even when performing these types of experiment (genetic sequencing and bioinformatic analysis). 

If one considered how a protein functions, it is not surprising that an allele specific method may be more useful than the total KIV-2 CN (both alleles), especially how a gene is transcribed and translated into a protein. 

Thus, obtaining the total KIV-2 CN does not reflect the structure of one Lp(a) particle into account in vivo which actually has any effect on our biology/physiology, and considering that each allele encoding for a protein individually makes more sense, in my opinion. 

Furthermore, if LPA has large portions with linkage disequilibrium (LD), it actually may make it easier in genetic studies via bioinformatics.  One can clearly discern each block within which SNPs are inherited together.  This means that it is easier to assess once SNPs in a block have been identified, a marker or SNP within each block can be set as a representative of the block, instead of sequencing the entire region. 

It may be important and conceptually easier to have a diagram of Lp(a) molecule.

Unless apo(a) always gets detached from the whole Lp(a), it is assumed that Lp(a) is the molecule transported in plasma, not apo(a) separately, encoded by the LPA gene.  Thus, one should consider each Lp(a) as a unit for the biological effect or measurements. 

Lp(a) = apolipoprotein (a) (LPA) + apolipoprotein B100 + LDL-like protein molecule

Please see the below for additional comments. 

Specific comments:

Abstract:

Please include the clarification about  the data utilized being short-read data form WGS etc. 

Please consider replacing the word “regulated” with another word such as “determined” since the LPA is not a regulator of the Lp(a) levels, but Lp(a) is actually the protein product of the gene LPA (transcribed and translated), very minor point.

Please specify the genotype identified at the SNP rs41272114 which is a splice site donor, such as nucleotides (C/A/T).

Recommend editing once the main text has been revised if appropriate.

Main Manuscript:

Introduction:

Lipoprotein (a) or Lp(a) molecules are a uniquely and bit mysterious group of lipoproteins since the function of Lp(a) is not clearly deciphered compared to other lipoproteins, despite the fact that its high levels have been documented to be associated with a high risk of atherosclerotic cardiovascular disease (ASCVD).

A number of association studies have been conducted to understand the relationship between genetic variants of the LPA gene to the predictability of Lp(a) concentration in the past. 

Although this is one aim of the authors, it was not clear from the reading the title because the authors did not clearly mention about the unique short-read WGS data to extract the reads on the LPA gene.

Some revising of this section would make their goals clearer, such as bringing the aims or hypotheses written at the end to earlier or highlight them better. 

The last sentence in the abstract “It would be important that the allele-specific KIV-2 CN is determined in all subjects.”

There may be several aims in these analyses, and these should be clarified in the introduction so that the readers are not guessing as far as the main goals of the authors are. 

Considering the basic role of Lp(a):

The main role of various lipoproteins is to transport cholesterol and triglyceride through circulation to various parts of the body for their use.  The basic biology and information to be included in the introduction would be helpful for the readers if feasible though the audience for this journal may not be as interested.

Lines 42-43: Lipoprotein (a) should be spelled out at least once in the main text although the authors spelled out in the abstract. 

Lines 43-45: Lp(a) consists of a low-density lipoprotein-like particle connected the glycoprotein apolipoprotein (a) [apo(a)] through the apolipoprotein B-100 protein in the LDL-like particle via a disulfide bond.  The presence of apoB should also be mentioned to be accurate.

Line 48: Recommend replacing the word “heritable” in the sentence “Lp(a) levels are highly heritable…” to another word such as “determined by underlying genetics”.  This is a very subtle point, but heritable has a connotation of being an inherited disorder, but technically, even though high levels may be observed in family members (clusters), it is not technically considered as a heritable (Mendelian) disorder as in familial hypercholesterolemia (FH) which has a certain genotype with a discernible inheritance pattern. 

 

Since Lp(a) levels are not modulated strongly by the environment or other factors compared to other genes, it is in a way easier to understand from the information of genetics. 

ACC: An Update on Lipoprotein(a): The Latest on Testing, Treatment, and Guideline Recommendations (https://www.acc.org/latest-in-cardiology/articles/2023/09/19/10/54/an-update-on-lipoprotein-a)

Lp(a) levels are genetically determined, with little to no influence from environmental or lifestyle factors, and adult levels are reached in childhood, typically by 5 years of age. Studies have shown that inflammatory conditions, pregnancy, hypothyroidism, growth hormone therapy, and kidney disease increase levels of Lp(a).

Lp(a) levels are decreased in the settings of severe acute phase conditions, postmenopausal hormone replacement, hyperthyroidism, and liver disease.   This is a bit counter intuitive of a molecule associated with a high risk of ASCVD. 

Hence, checking levels at steady states is advised.

There are several theories why high levels of Lp(a) resulting in a high risk of ASCVD as below:

The Lp(a) molecule first received attention because apo(a) protein structure resembled plasminogen encoded by PLG. 

LPA is transcribed in the opposite direction of PLG.

One theory why Lp(a) contributes to ASCVD is because it inhibits the activation of plasminogen:

Hancock MA, Boffa MB, Marcovina SM, Nesheim ME, Koschinsky ML. Inhibition of plasminogen activation by lipoprotein(a): critical domains in apolipoprotein(a) and mechanism of inhibition on fibrin and degraded fibrin surfaces. J Biol Chem. 2003 Jun 27;278(26):23260-9. doi: 10.1074/jbc.M302780200. Epub 2003 Apr 15. PMID: 12697748.

The LPA gene is known as a paralog of PRSS56 (Serine Protease 56) is a protein coding gene. Diseases associated with PRSS56 include microphthalmia, Isolated 6 and nanophthalmos.  Gene Ontology (GO) annotations related to this gene include serine-type endopeptidase activity.

Some believe that the LPA gene is a pseudogene in the article below.  Generally, pseudogenes do not have any “real” functional role though some do have some type of function in certain situations.  They hypothesized that high levels of Lp(a) results in ASCVD due to increased viscosity so that high levels would worsen the viscosity which contributes to the development ASCVD.    

           They also report that LPA may be generated by duplication of PLG and has no specific “basic” function.   

Sloop GD, Pop G, Weidman JJ, St Cyr JA. Apolipoprotein(a) is the Product of a Pseudogene: Implications for the Pathophysiology of Lipoprotein(a). Cureus. 2018 May 31;10(5):e2715. doi: 10.7759/cureus.2715. PMID: 30079281; PMCID: PMC6067813.

This may explain the fact that the level of Lp(a) is not altered by usual factors such as age, sex, fasting state or lifestyle factors compared to many other genes which are studied in ASCVD.

Coassin S, Kronenberg F. Lipoprotein(a) beyond the kringle IV repeat polymorphism: The complexity of genetic variation in the LPA gene. Atherosclerosis. 2022 May;349:17-35. doi: 10.1016/j.atherosclerosis.2022.04.003. PMID: 35606073; PMCID: PMC7613587.

It is well understood as the authors state that LPA gene is highly polymorphic, and if the gene product is functionally critical, having many mutations or polymorphisms would not be beneficial or tolerated so this fact seems to support the speculation that the gene may be a pseudogene or may not have a critical role in the overall lipoprotein metabolism. 

Lines 79-82: “However, it has been demonstrated that the association between a high number of KIV-2 repeats and low Lp(a) levels is mediated by a SNV located in the KIV-2 repeat.” 

Here authors of this manuscript did not provide the reason for the relationship or the mechanism which leads to the low level of Lp(a). 

It is well taken that many authors of past articles just mention like above, but at least it is important to consider why this may be so. 

The article below mentions that “the high level of Lp(a) is the consequence of the ease of secretion by the liver for a smaller molecule than the larger molecule”, thus, being interpreted as the size which is determined by the number of KIV-2 repeat determining the molecular size. 

This may be plausible reason for the lower level for a larger molecule being secreted slowly compared to smaller molecules.

Jawi MM, Frohlich J, Chan SY. Lipoprotein(a) the Insurgent: A New Insight into the Structure, Function, Metabolism, Pathogenicity, and Medications Affecting Lipoprotein(a) Molecule. J Lipids. 2020 Feb 1;2020:3491764. doi: 10.1155/2020/3491764. PMID: 32099678; PMCID: PMC7016456.

Lines 83-93:  This section may be the most important section within the introduction and the manuscript being the goals of the study. 

One goal of this project seems as though the authors were trying to assess the genetic variations in predicting the Lp(a) levels, but the rest of this section seems a bit blurred because instead of stating another goal succinctly, they decided to explain the processes they are taking to assess the above goal.

In addition, the authors should state that they wanted to evaluate the utility of short-read WGS data in the processes.  This point was not well stated in the introduction.  Two methods the DRAGE CN of LPA and a read-depth based copy number (CN) estimate were utilized and compared.   

Recommend revising this section and moving this section to the earlier in the first or second paragraph to grab the readers’ attention as recommended earlier.

In addition, revising this section to better align with the second goal of this study which seems to be the assessment of the utility of allele-specific assessment (from the discussion section), and also consider revising the title?

Then, maybe listing the hypothesis (moving from the later section) in this part just before the materials and method section. 

Materials and Methods:

Lines 224-235: This section helps the readers to understand what the authors plan to do so maybe have this section earlier in the section.

Hypotheses definitely should be placed before the materials and methods section:

Lines 238-248:  This section seems to be very important so it should be stated earlier than the method section and also the reasoning for their hypotheses should also be stated clearly why the authors believe that the full model would show the highest predictive value than the others proposed, by adding one or two lines why these set up were selected.  It was not clear from the manuscript.

As above, it is possible that allele specific methods perform better than total ones because as each allele is transcribed and translated into the product apoprotein (a), and the other allele goes through the same processes separately. 

Statistical analyses:

Discussion:

Lines 392-396: “Predicting Lp(a) concentrations based on genetic variation is challenging….”

Not sure genetic variations with large LD blocks would make it more difficult. 

It is also important to remember that Lp(a) measurement is not just apolipoprotein (a) which is encoded by LPA.  It is the whole molecule as mentioned above. 

Lines 396-397: The authors’ observation that only a small number of genetic variations had a significant impact on the predictions of the Lp(a) concentrations.  This is not surprising because the gene has large LD blocks. 

Once genetic variants within a group (LD block) have been clarified, it makes things easier that smaller number of variants can be utilized for analysis.

Lines 420-429:  This section seems to be the conclusions of the authors, but making a separate section would make it clearer and easier for the readers to understand the conclusions.

The authors’ conclusion on the short-read seems to be a bit conflicting about the long-read is better, but time and cost are the limiting factors.  Did the authors mean that the short-read method did not perform to the expectations?  Recommend to have a table or make a section for comparison would be informative. 

Why did the authors comment about the allele-specific KIV-2 repeats to be determined in all subjects?  Pleas add a reason or two for this statement. 

Revising the discussion section would improve the readability of the section and the reader would understand the goals and findings better. 

Conclusion:

Please summarize the authors’ conclusion separately under “Conclusion” section which was not present in the manuscript. 

Thank you very much for allowing me to review this manuscript. 

I am more of an individual who likes to understand gene sequence, but also values understanding its functionality.  Therefore, my perspective may be different from individuals who have dealt with sequencing alone. 

Sincerely,

 

Author Response

Please, see attached file for point-by-point reply.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript concerned the effectiveness of SNV and CNV in LPA gene testing systems, including the recently developed DRAGEN LPA Caller. The full approach, including determination of the number of KIV-2 repeats and the allele-specific number of KIV-2 repeats, has been shown to be most effective.

Fully consistent with the theme of the journal.

According this manuscript test systems showed high predictive ability for lipoprotein a. The results contribute to the extension and simplification of testing for one of the important biochemical parameters.

The results obtained are interpreted adequately.

The study provides an interesting example of accounting for multiple levels of single nucleotide variation and copy number variation with the overall task of accounting for the complex inheritance pattern of the LPA gene product level and optimizing algorithms for clinical testing of lipoprotein a.

Supplementary materials contain a detailed description of mathematical and statistical methods.

The findings are interesting for improving the methodology for determining LP(a).

Remarks:

How do you suggest relative effectiveness of sequencing-based and PCR-based methods?

Line 265, 268, 295, 316, 331, 344. Please check these links.

Author Response

Please, see attached file for point-by-point reply.

Author Response File: Author Response.pdf

Back to TopTop