ORIGINAL RESEARCH
The rs17713054 and rs1800629 polymorphisms of genes LZTFL1 and TNF are associated with COVID-19 severity
1 Sistema BioTech LLC, 109235 Moscow, Russia
2 State Budget Institution Of Health Of The City Of Moscow "Diagnostic Center (Center For Laboratory Research) Of The Department Of Health Of The City Of Moscow" Russian Federation, , Moscow
3 Sklifosovskiy Research Institute for Emergency Medical Aid, Moscow, Russia
Correspondence should be addressed: Natalia V. Pozdnyakova
1 Kurjanovskaja, 34, korp. 11, Moscow, Russia; ur.hcetoibametsis@avokayndzop.n
Funding: the study was supported by the major Sistema BioTech LLC shareholder, the Sistema Public Joint Stock Financial Corporation. The study was funded by the Moscow Department of Health as part of the double blind clinical trial.
Acknowledgements: the authors express their gratitude to Antipova YuO, Deputy Head of the Moscow Department of Health, for organizing the blind clinical trial of COVID-19 severity and the patients of the Diagnostic Center (Center of Laboratory Testing) of the Moscow Department of Health for the provided biomaterial samples.
Author contribution: Pozdnyakova NV, Poyarkov SV — study concept and design; Minashkin MM — molecular genetic research, laboratory tests; Poyarkov SV, Traspov AA — literature review, manuscript writing; Traspov AA — statistical processing of the results, manuscript editing; Komarov AG — project management, planning the experiment, data analysis; Shtinova IA, Speshilov GI, Karbyshev IA — providing clinical data and metadata; Godkov MA — control over clinical sample collection at the Sklifosovsky Research Institute for Emergency Medicine, providing anonymized data.
Compliance with ethical standards: the study did not need to be approved by the Ethics Committee because of dealing with anonymized patient data and the double blind study format.
In the past decades it was shown that human genome variation contributes to heterogeneity emerging in response to infectious diseases.
The pandemic of severe acute respiratory syndrome caused by the SARS-CoV-2 coronavirus has killed millions of people all over the world.
The disease caused by the novel coronavirus infection (COVID-19) is characterized by high variability of clinical manifestations (World Health Organization, 2021). The majority of patients, specifically 81%, are asymptomatic or develop mild disease, while 14% have severe disease and 5% develop critical illness [1]. The most common symptoms include fever, dry cough, and fatigue; ageusia, anosmia, and gastrointestinal symptoms have been also reported. Severe COVID-19 is characterized by respiratory failure requring mechanical ventilation or the use of high flow oxygen.
There is a constant search for the risk factors associated with the disease. Initially it was assumed that elderly males having a history of cardiovascular disorders developed severe disease, but it turned out that genetic background contributed just as much. Numerous studies have shown that polymorphisms of genes encoding the host factors essential for realization of the viral life cycle, such as ACE2, TMPRSS2, AR, and genes involded in the innate immunity, such as TNF, TLR7, may be associated with the COVID-19 severity [2].
The first genome-wide association study (GWAS) of the COVID-19 severity that compared 1980 patients from Italy and Spain who developed severe infection with the control population with the unknown status of the SARS-CoV-2 infection revealed two loci significant at the genome-wide level, which were mapped to the 3p21.31 region comprising six genes (SLC6A26, LZTFL1, CXCR6, CCR1, CCR3, CCR9) and the 9q34.2 region comprising the ABO blood group locus [3].
The signal at the 3p21.31 locus remains the most stable and strong in numerous studies, this is associated with both susceptibility to infection and the disease severity. In this locus, allele C of the rs10490770 variant is associated with the highest risk of severe COVID-19-associated pneumonia [4].
The genome-wide association studies (GWAS) have shown that the 3p21.31 region is associated with the twofold increased risk of respiratory failure [5].
The role of the host factors determining both susceptibility and infection severity has been shown for many pathogens, including Mycobacterium tuberculosis, HIV, Candida albicans, and many more [6].
It is known that the risk of severe infection is associated with the genes involved in the immune and imflammatory responses. Thus, the role of the minor alleles of TLR7 [7] and the interferon system genes [8] is well-known.
Along with the risk alleles, there are variants that have a protective effect. Thus, the TT genotype of the rs5443 SNP in the gene GNB3 is associated with protection against COVID-19 fatality [9]. Polymorphisms in the regulatory regions of the genes encoding pro-inflammatory cytokines capable of affecting the expression of mRNA of this genes, such as C-572G rs1800796 in the promoter region of the gene IL-6, are associated with protection against severe COVID-19 and COVID-19 fatality in the Asian population [10].
METHODS
Clinical and demographic characteristics of surveyed patients
The patients admitted to the intensive care units (ICU) of the Sklifosovsky Research Institute for Emergency Medicine, Medsi Clinical Hospital № 1, City Clinical Hospital № 40 together with the staff members of the Sistema Public Joint Stock Financial Corporation and the Sistema BioTech laboratory were enrolled. A retrospective study was carried out (the disease outcome was known) that involved 713 patients (n = 713). Inclusion criteria: coronavirus infection of varying severity. Exclusion criteria: comorbidities of various types capable of dramatically affecting the patient's overall condition (cancer, CAD, immune defects).
To reveal the association of the studied variants with the severe and critical course of the disease, we divided the patients into two groups: the control group that included two categories of patients (patients with mild-to-moderate disease) and the experimental group (patients with severe disease). The patients were allocated to groups in accordance with the computed tomography-based (or MSCT-based) visual assessment scale for the lung damage developed during the pandemic of novel coronavirus infection by the experts of the US center for diagnosis and telemedicine after assessing CT images of 13,003 people who constituted the core sample. This classification is also widely used in the Russiam practice [24]. The lung CT scan data were the criteria of the disease severity: CT1 and 2 for the control group, CT3 and 4 for the group of patients with the severe disease. The main characteristics of the studied cohort are provided in tab. 1.
Genomic DNA isolation
Venous blood stabilized with EDTA was used as biomaterial for extraction of genomic DNA.
DNA was extracted using the DiaGene kit (Dia-М; Russia) for DNA isolation from whole blood. The extracted DNA purity was determined with the NanoDrop OneC spectrophotometer (Thermo FS; USA). The A260/280 ratio was between 1.8–1.91, and the A260/230 ratio was between 1.62–2.28. DNA concentration was measured using the dsDNA BR kit in the Qubit Flex fluorometer (Thermo FS; USA). The concentration values varied between 15–300 ng/μL. The concentrations of all DNA samples were adjusted to 2 ng/μL.
DNA genotyping
DNA genotyping in the region of the studied polymorphic markers was performed by the real-time PCR with fluorescence detection using the TaqMan SNP kits for genotyping analysis (Thermo FS; USA). PCR was performed in the CFX96Touch System (BioRad; USA).
A number of markers was genotyped that was later confirmed by Sanger sequencing. The primers were constructed using PrimerBlast (https://www.ncbi.nlm.nih. gov/tools/primer-blast/ (as at 24 September 2020)). The BigDye Terminator v3.1 kit (Thermo FS; USA) was used for cycle sequencing; sequencing was performed in the Genetic Analyzer 3500 system (Thermo FS; USA).
Statistical data processing
The two-tailed Fischer's exact test and chi-squared (χ2) test were used to calculate the statistical significance of differences in allele frequency. The calculations were performed for both minor allele (АА vs. Аа+аа; dominant model) and the minor allele homozygous genotypes (АА+Аа vs. аа; recessive model). The odds ratios (OR), 95% confidence intervals (CI), and significance levels were calculated for the groups in which significant differences were revealed. Calculations were performed using the SNPStats web tool (https://www.snpstats.net/ (as at 20 September 2021)) designed to reveal the associations between the single nucleotide polymorphisms and the risk of the disease. The Holm–Bonferroni method was used to adjust the OR values of significant polymorphisms for multiple comparisons. The samples, including all polymorphisms, were tested for the sample size sufficiency. The analysis was performed in the Statsoft Statistica 12 software package. To analyze the sample size required to achieve a power of 80%, we used the two-sample t-test for comparison of two population means (the null hypothesis was μ1 = μ2). The features of the SNPstats software package were taken into account, along with the values used: the means with the Akaike information criterion (AIC) values and Bayesian information criterion (BIC) values.
RESULTS
tab. 1 describes the studied sample in terms of the sample diversity.
The two-tailed Fischer's exact test and χ2 test were used to calculate the statistical significance of differences in allele frequency. Statistical calculations were based on the dominant model (involving identification of the risk allele) and recessive model (involving identification of the risk genotype). Furthermore, the odds ratios (OR), 95% confidence intervals (CI), and significance levels (p) were calculated for the groups in which significant differences were revealed.
Statistical significance was calculated using two models, the dominant and recessive ones, for all polymorphisms (depending on the patients' division into groups). The model was based on estimation of the risk of transition from mild to severe disease; allele frequencies were compared in patients of the control group and patients of the group with severe disease (tab. 2).
Since multiple hypotheses were tested during the study using the same dataset, the Holm–Bonferroni correction was used due to the multiple comparison problem in order to avoid type I errors.
As for two markers, the presence of the minor allele and homozygous genotype were associated with the increased risk of severe disease. These were rs1800629 in the gene TNF (OR = 1.5; p = 0.02) and rs17713054 in the gene LZTFL1 (OR = 1.60; p = 0.0043), however rs17713054 appeared to be pathogenic in the recessive model (OR = 4.56; p = 0.0025).
As for marker TLR2, the minor allele homozygous genotype exerted protective effect in the dominant model.
All the markers identified are the risk factors of both disease susceptibility and the transition from mild to more severe disease, however, only the variants provided in the above tables are significant.
Furthermore, the studied sample was also tested for deviations from Hardy-Weinberg equilibrium (n = 713), the analysis was performed for 10 studied markers (tab. 3).
Summation of the Hardy–Weinberg disequilibrium coefficients and their significance levels (P) revealed deviation from equilibrium for IFIH1 (p = 0.0047), CCR2 (p = 0.049), and significant deviation from equilibrium for IFITM3 (p = 0.00053).
Thus, the TNF rs1800629 and LZTFL1 rs17713054 variants may be considered as probable candidates for further assessment involving larger samples (tab. 4).
DISCUSSION
Our study demonstrates evidence of the human 3p21.31 locus involvement in pathophysiology of COVID-19 based on assessing the independent cohort of patients and the control group comprising people from the Russian Federation.
Our previous research has analysed the distribution of 10 SNPs and the association of these SNPs with the disease severity [11]. In this study involving the use of the larger sample and the extended set of polymorphisms, the TLR7 polymorphism was excluded, and the polymorphism LZTFL1 polymorphism was added; the study revealed that only two polymorphisms in the genes LZTFL1 and TNF showed a strong association with severe COVID-19.
Among the studied polymorphisms, only two were strongly associated with severe disease. It is interesting to note that the results for some polymorphisms obtained during the earlier study [11] turned out to be non-significant after the sample expansion to 713 patients. However, the equally strong signal was found in the rs17713054 (gene LZTFL1, locus 3p21.3). The secong most important was the TNF rs1800629. It is known that both polymorphisms are associated with severe COVID-19, and carriers of the minor alleles are at high risk of developing severe COVID-19 [12]. The rs17713054 polymorphism is located in the enchancer that regulates gene expression in this locus, including the expression of LZTFL1, SLC6A20, and genes encoding chemokines.
This polymorphism results in the emergence of the new binding site for the C/EBP beta transcription factor, which leads to the increased expression of LZTFL1 and neighbouring genes in this locus [13]. The LZTFL1 gene is involved in the ciliary function of the lung epithelial cells, which is important for airway virus clearance[14].
LZTFL1 is widely expressed in the lung epithelial cells, including the ciliated epithelial cells that have been identified as one of the key cellular targets of the SARS-CoV-2 infection. Furthermore, a homozygous deletion of LZTFL1 causes the classic ciliopathy, the Bardet–Biedl syndrome [15]; it is known that respiratory viruses may affect mucociliary clearance. LZTFL1 encodes the cytosole leucine zipper protein that binds to E-cadherin (epithelial marker) and is involved in the transport of numerous signaling molecules. It is also known that LZTFL1 activation in the context of malignant neoplasms inhibits the epithelial-mesenchymal transition (EMT) pathway, which is known to be a part of the mechanisms underlying both wound healing and immune response [16]. The study of postmortem lung biopsies obtained from patients who died from the COVID-19 complications showed a widespread epithelial dysfunction with the signs of EMT [13]. According to several studies, the signal in the 3p21.31 locus remains the most stable and strong. This is associated with both susceptibility to infection and the disease severity [17].
It has been shown that the rs17713054 risk allele A in the LZTFL1 gene enchancer is largely responsible for the increased risk of respiratory failure in COVID-19 patients that is associated with 3p21.31 [13]. LZTFL1 is widely expressed in the lung epithelial cells, including the the ciliated epithelial cells that have been identified as one of the key cellular targets of the SARS-CoV-2 infection SARS-CoV-2 [18].
The analysis of other LZTFL1 polymorphisms, such as rs11385942, revealed the increased risk of hospitalization (р < 0.01; OR = 5.73; 95% CI: 1.2–26.5 based on the allelic test) in the Colombian [19] and Latvian [20] populations; the rs35280891 intronic variant (p = 6.88 × 10−7; OR = 19.846, 95% CI: 5.728–68.761) was associated with severe disease in the Serbian population [21].
It is interesting to note that the rs17713054 minor allele А is a part of the extended haplotype inherited from the Neanderthals. Today, this haplotype is the major risk factor associated with developing severe symptoms after the SARS-CoV-2 infection. The differences in the haplotype frequencies between the populations of South Asia and East Asia led to a speculation that certain selective pressure, probably related to cholera, resulted in the haplotype spread across the population of South Asia [22].
Our study confirmed the rs17713054 strong association with severe disease (OR = 4.56; p = 0.0025) in the recessive model.
It was also shown that both the minor allele T of the TNF rs1800629 variant and the minor TT genotype were the risk factors of severe disease in all the variants of samples (OR = 1.5; p = 0.02).
It is known that the rs1800629 variant of the gene TNF is associated with the need for respiratory support and the longer duration of respiratory support in patients with COVID-19 [23]. Along with the pathogenic variants, we also found the TLR2 rs1898830 variant that exerted protective effect (OR = 0.72).
The association study of 10 SNPs showed that two polymorphisms had enough power to be used for assessment of the risk of severe disease.
The association study of the LZTFL1 rs17713054 in the RF that involved the balanced sample of patients with severe and mild course of the disease revealed a strong association with the disease severity. This confirmed the hypothesis about the functional significance of this polymorphism.
A simple, rapid, and affordable test for prediction of the risk of severe COVID-19 based on the individual DNA polymorphisms would be useful for both stratification of patients at high and low risk of complications and, which is more important, for assesment of the disease severity in the population of healthy people in case of infection followed by the disease as a promising predictive test.
CONCLUSIONS
The genetic testing of the sample of 713 patients with the confirmed diagnosis of COVID-19 revealed the key TNF and LZTFL1 SNPs associated with severe disease. The new genetic marker in the gene LZTFL1 with the predictive value exceeding 91% was characterized. The deviation of the allele frequencies of the genetic polymorphisms associated with the risk of severe COVID-19 from the Hardy–Weinberg equilibruim are of great epidemiological significance and require further research. The TNF rs1800629 variants and LZTFL1 rs17713054 variants may be considered as probable candidates for further analysis involving larger samples. The clinical study conducted makes it possible to draw a number of conclusions for further clinical practice. The overall flow of patients admitted to the clinics due to the acute respiratory coronavirus infection is heterogenous, it can be divided into the groups of patiens with the potentially mild and potentially severe disease based on the presence or absence of the selected TNF rs1800629 and LZTFL1 rs17713054 polymophisms in the patient's genotype at an early stage. The LZTFL1 and TNF polymophisms may be used as both prognostic and predictive biomarkers. These markers can provide scientific grounds for the new approaches to the genotype-directed treatment of patients with the severe COVID-19-associated lung damage. The study entails a promising opportunity to organize the apparently affordable and efficient screening studies.