ORIGINAL RESEARCH

Gene geography of pharmacogenetically significant CYP2C19 cytochrome superfamily DNA markers in the populations of Russia and neighboring countries

Balanovska EV1, Abdulaev ShP2, Gorin IO1, Belov RO1, Mukatdarova EA1, Pylev VYu1,3
About authors

1 Research Centre for Medical Genetics, Moscow, Russia

2 Russian Medical Academy of Continuous Professional Education, Moscow, Russia

3 Biobank of North Eurasia, Moscow, Russia

Correspondence should be addressed: Elena V. Balanovska
Moskvorechye, 1, 115522, Moscow, Russia; ur.liam@aksvonalab

About paper

Funding: the study was supported by the Russian Science Foundation grant № 21-14-00363 (bioinformatics, statistical and cartographic analysis), State Assignment of the Ministry of Science and Higher Education of the Russian Federation for the Research Centre for Medical Genetics (genealogical analysis, interpretation of the results).

Acknowledgements: the authors would like to thank all sample donors, who took part in the study, and Biobank of North Eurasia for access to DNA collections.

Author contribution: Balanovska EV — data analysis, manuscript writing, research management; Abdulaev ShP — discriptions of pharmacogenetic markers; Gorin IO — bioinformatics analysis; Belov RO — manuscript formatting; Mukatdarova EA — working with genealogical database; Pylev VYu — statistical analysis, cartographic analysis.

Compliance with ethical standards: the study was approved by the Ethics Commitee of the Research Centre for Medical Genetics (protocol № 1 of 29 June 2020); all subjects submitted the informed consent to study participation.

Received: 2023-09-18 Accepted: 2023-10-18 Published online: 2023-10-31
|

Drug therapy efficacy and safety depend largely on individual differences between patients. This is a pressing issue of modern pharmacotherapy, since the body’s genetic status accounts for up to 50% of the pharmacological response individual variability. Selection of the drug and dosage considering the patient’s molecular genetic features is the subject of pharmacogenetics [1, 2] aiming to search for the drug dose that would be effective and safe for this particular patient [3].

The CYP450 cytochrome superfamily, in which the CYP2C19 gene is highly polymorphic, represents one of the pharmacogenes extensively studied from clinical perspective. The CYP2C19 enzyme is involved in biotransformation of the wide range of drugs, such as clopidogrel, omeprazole, lansoprazole, propranolol, diazepam, imipramine and some other antidepressants [4]. It has been shown that CYP2C19*2 and CYP2C19*3 are associated with the enzyme reduced metabolic activity [5], while CYP2C19*17 (rs12248560) is associated with enchanced metabolism of the enzyme substrates [5]. Clopidogrel is a prime example of the drug, for which the guidelines on the treatment regimen and dose adjustment have been developed. In individuals with “normal” *1/*1 genotype, it is used in accordance with the instructions. The *1/*2, *1/*3, *2/*17, *3/*17 genotypes are characterized by smaller decline in platelet aggregation relative to normal, higher residual platelet aggregation, and increased risk of cardiovascular events. Accumulation of “slow” alleles (*2/*2, *2/*3, *3/*3) in the genotype is associated with low clopidogrel efficacy and high residual platelet reactivity. The group of "ultrafast" metabolizers (*1/*17, *17/*17) is characterized by the increased antiplatelet activity and decreased residual platelet aggregation, which can be due to the risk of hemorrhage [5]. The carrier frequency for various CYP2C19 SNP markers and the associated clopidogrel resistance are characterized by the pronounced ethnoracial heterogeneity [6]: CYP2C19*2 is found in 15% of Caucasoids, 17% of Negroids and is far more frequent in Mongoloids living in East Asia (31%). The opposite trend has been revealed for CYP2C19*17: it is common in Caucasoids (22%) and rare (1.5%) in Mongoloids of East Asia [6]. The CYP2C19*3 variant is rare: it averagely accounts for 1.4% of global population [6]. In the Russian population the rate of CYP2C19*2 is about 11%, the rate of CYP2C19*3 is about 0.34%, and that of CYP2C19*17 is about 27% [7].

The principles of precision, preventive, and personalized medicine envision using genetic information to make clinical decisions. However, extensive use of pharmacogenetic testing (PGT) in clinical practice has a number of limitations. PGT remains an option that is not available in the regions with inadequate funding of the health sector. An important role is also played by the time of obtaining PGT results, which can be relevant when providing emergency care [8].

The population gene geography that reveals the patterns in pharmacogenetic biomarker distribution provides one of the solutions [913]. The gene geography data can play an important role in clinical decision-making in such multi-ethnic country as Russia. That is why assessment of polymorphic genes’ carrier frequency in Russia is essential for development and implementation of the personalized medicine principles. Exploration and identification of the distribution patterns of important pharmacogenetic markers in the population of Russia make it possible to identify ethnic groups and regions, in which PGT of the large population of patients can be a clinically and economically advantageous solution: in these regions the decision that PGT is essential for therapy personalization can be made based on the patient’s ethnicity.

The study was aimed to determine frequencies of major CYP2C19 cytochrome superfamily DNA markers (*1, *2, *3, *17) in the population of Russia and determine the trends in their gene geographical variability.

METHODS

The study involved analysis of frequencies of the CYP2C19*1, CYP2C19*2, CYP2C19*3, CYP2C19*17 variants (hereinafter *1, *2, *3, *17) and their genotypes in indigenous peoples of North Eurasia and other regions. These four variants (*1, *2, *3, *17) belong to different CYP2C19 gene SNP markers and are not alleles of the same SNP; their frequencies were calculated using the PLINK 1.9 tool [14] and Python 3. The *1 variant is the sum of 11 SNPs of “normal” variants. Among them 7 SNPs were found in the databases and had representative frequencies, and the *1 frequency was calculated as the square root of the frequency of the combination of seven homozygotes. The genotype frequencies were calculated based on the *1, *2, *3, *17 variant frequencies in accordance with the Hardy–Weinberg equilibrium.

The analysis involved the use of the database “Pharmacogenetics of Population of Russia and Neighboring Countries” created by the research team and the GG-base (world’s populations) [15] compiled in accordance with [16] and assessed using various panels of SNP markers [913]. Populations with the sample size of n < 25 samples were included in large metapopulations together with other populations based on the commonality of ethnogenesis and region. The data on the peoples of the Caucasus on the scale of North Eurasia were represented by four subregional samples. The total samples for the CYP2C19 gene SNP variants were as follows: *1 — n = 2261 samples; *2 — n = 6346; *3 — n = 7517; *17 — n = 3313. The results were presented as tables (frequencies of SNP variants and their genotypes in 53 metapopulations of 13 world’s regions) and as the gene geographical atlas including maps showing spatial variability of frequencies of SNP markers and their genotypes, as well as correlation maps demonstrating the association of  geographical variability of frequencies of all SNP markers. The following SNP variant variability parameters were provided; q̇ — average frequency; GST — interpopulation differences based on certain variant (GST is the FST analog for biallelic cases); HS — heterozygosity level.

The CYP2C19 gene geographic maps were plotted using the GeneGeo software [17] by the weighted average interpolation with the 2nd degree weighting function, range radius of 1500 km for North Eurasia and 5000 km for the world. In the tables each population was assigned a number that was shown on the maps, which made it possible to clearly identify and distinguish all the studied populations on the maps. As for metapopulations, the trait frequency value was projected onto all coordinates of input local populations. The genotype frequency maps were calculated for each node of the map based on the frequency value in each node of the maps for the *1, *2*, *3, *17 variants in accordance with the Hardy– Weinberg equilibrium. Statistical parameters of the map are shown in specific box of the legend of each map: K — number of input populations for map plotting; min — minimum trait frequency; max — maximum trait frequency; avr — average trait frequency; GST — interpopulation differences based on this trait; HS —heterozygosity level.

The correlation maps were created by the 1100 km floating window method using the Kendall rank correlation coefficient. Correlation between two traits was calculated for all nodes falling into the specified window and assigned to the central node. Then this window was moved one node, and calculation was repeated. Thus, correlations were calculated for all nodes of digital mesh model of the map (81259 nodes) providing the basis for correlation pattern visualization.

RESULTS

Gene geography of CYP2C19*1 (fig. 1; tab. 1, tab. 2)

Understanding the *1 “normal” variant gene geography (fig. 1) is beneficial for identification of patterns in variability of genotypes that are important for pharmacotherapy.

The main longitudinal trend represented by the *1 frequency increase from west to southeast is combined with latitudinal variability in the Asian part of the region. The average *1 frequency (q̇ = 0.58) is much higher than that of variants *2, *3, *17 (0.04 < q̇ < 0.15). That is why a common frequency interval (0.125) was used in all maps for variant *1 to make all maps comparable (fig. 1–4), however it was within another frequency range (0.41 < q < 0.75).

The longitudinal trend of frequency decrease from west to southeast is associated with many irregularities. Low frequencies (q < 0.44) are concentrated not in the west of the region, but in the plain stretching from the Russian Vologda and Kostroma regions to peoples of the Volga-Ural region, then to southern Russian populations and after that to peoples of Central and Eastern Caucasus. The second low frequency center is found in the north of the Far East (0.37 < q < 0.45). However, when moving southwards from the latter, frequency increases rapidly to maximum values (0.70 < q < 0.78) found in Evenks of the Okhotsk Coast and peoples living along Amur. Siberian peoples are characterized by great *1 genetic diversity: high frequencies (0.61 < q < 0.66) prevail in Eastern Siberia (Yakut) and in the south of Western Siberia (Siberian Tatars), which decrease (0.52 < q < 0.56) in the north of Western Siberia and in Southern Siberia.

The combination of *1 longitudinal and latitudinal variability is even more prominent on the map of “normal” *1/*1 homozygous genotype (fig. 1B) and the map of “slow” *1/*2 heterozygote (fig. 1C). An “ultrafast” *1/*17 heterozygote (fig. 1E) associated with the risk of hemorrhage is characterized by clear, but oppositely directed vector: frequency dropping from west (q = 0.40) to east (q = 0).

Gene geography of CYP2C19*2 (fig. 2; tab. 1, tab. 2)

The gene geographic variability of *2 variant is similar to that of *1. At significantly lower frequencies (0 < q < 0.43, q̇ = 0.15; fig. 2А) overlapping of two trends is found again. The main trend is longitudinal once again, with the *2 frequency increase from west to southeast, where the primary maximum falls on Central Asia (0.20 < q < 0.31) with frequency surge in the Amur region (q = 0.43). The latitudinal trend is observed in Siberia: the *2 frequency increases from north to Central Asia. Both trends demonstrate numerous irregularities.

As for European part of the assessed range, additional *2 frequency maximum (0.19 < q < 0.24) is found in the northwest in the Veps, Sami, North Karelians, Ingrian Finns. Frequencies above the average (0.17 < q < 0.18) are also found in peoples speaking Indo-European languages: in the Russian North and in the west of the European part of the region (Balkans, Belarus, west of Russia, Moldova, Ukraine).

In the Russian populations, the *2 frequency varies within a broad range (0.02 < q < 0.18, q̇ = 0.12). In the Cis-Urals area at q̇ = 0.11 it varies within a narrower range (0.06 < q < 0.15).

In the Trans-Urals region, in Ob-Ugrians, the unexpectedly large differences between Khanty (q = 0.05) and Mansi (q = 0.18) have been revealed. In the North Caucasus the *2 frequency varies within a very broad range: between q = 0.08 in Dagestan and q = 0.19 in the Chechens and Ingush. The quite expected frequency increase in the Kalmyks (q = 0.26) results from preserving the memory of their Central Asian ancestral homeland.

The trend becomes latitudinal in the Asian part of the region: the frequency increases from north to south. The range of populations with low frequencies is huge: from Khanty in the west (q = 0.05) to Kamchatka (q = 0.07) and Chukotka (q = 0) in the east. Southwards, it extends to South Siberia (q = 0.05) and the Baikal region (q = 0.06). Increased frequencies are reported in Buryats and Yakuts (q = 0.16). The Central Asian maximum

in the west spans the Dungan people, Kyrgyz and Uighurs (q = 0.25), Mongols, Northern Altai people and Shors (q = 0.22), Siberian Tatars and Uzbeks (q = 0.20). In the Amur region, the *2 frequency in the Nanai people, Oroch people and Evenks is lower (0.18 < q < 0.19), however, an unexpectedly sharp surge is observed in the combined population of the most ancient Far Eastern peoples: Negidals, Nivkh people and Ulchis (q = 0.43).

Gene geography of pharmacogenetically significant genotypes *2/*2 (fig. 2B) and *2/*3 (fig. 2D) is monotonous. The homozygote varying over a wide range (0 < q < 0.19) is spread across the region with low frequency (q̇ = 0.03) showing a slight increase toward Central Asia and reaching its maximum in the Far East. There is almost no *2/*3 heterozygote in Europe, Urals and Western Siberia, low frequencies (q = 0.07) are reported in Mongolia and the Amur region.

Gene geography of CYP2C19*3 (fig. 3; tab. 1, tab. 2)

Spatial variability of the *3 variant differs from previous variants showing a far stronger trend  (fig. 3А). That is why these have the same level of interpopulation differences (GST = 0.02), despite huge differences in heterozygosity of variants *3 (HS = 0.04) and *2 (HS = 0.13), as well as the range of their frequencies (*3 — 0 < q < 0.10; *2 — 0 < q < 0.40).

Furthermore, the maximum *3 frequencies are once again concentrated in the southeast (0.08 < q < 0.12) of Transbaikalia, in Khalkha Mongols and in the Amur region. Some frequency increase is observed in the Evenks of Okhotsk coast and the Chukchi people (q = 0.06) continuing west to the number of peoples of South Siberia (0.06 < q < 0.07) and Central Asia (0.05 < q < 0.06). A sharp frequency surge in the Mansi (q = 0.12) is the exception to this pattern.

In the European part of the region the *3 variant is missing or extremely rare. In Slavic populations, notable frequency have been found only in the Belarusians, Russians of Arkhangelsk region (q = 0.03), Russians of Yaroslavl region, as well as in the Central Caucasus (q = 0.02).

Gene geography of pharmacogenetically significant genotypes is discussed in other sections (fig. 2D for *2/*3; fig. 4D for 3/*17).

In general, the *3 variant is characterized by gradual frequency increase from zero values in the west of North Eurasia to low frequency (q = 0.12) in the east and southeast of the region. However, these low frequencies turn out to be maximum frequencies on a global scale (fig. 5А): high world’s frequencies are concentrated in East Asia with their maximum in the Amur region.

Gene geography of CYP2C19*17 (fig. 4; tab. 1, tab. 2).

The trend of *17 variant variability is much stronger, and the vector is oppositely directed: the natural frequency drop from maximum values (q = 0.32) in the west of North Eurasia to zero frequency in the east and southeast of the region. This clear variability, even with low average frequency (q̇ = 0.13), results in the interpopulation difference value (GST = 0,08) 4 times higher than that of other variants.

The maximum frequency range (0.27 < q < 0.32) included 12 populations out of 35 assessed using this SNP marker: all Slavic populations (Belarusians, Russians, Ukranians), Mari and Chuvash of the Ural region, peoples of Transcaucasia.

The next interval (0.20 < q < 0.25) brought together the Finnish-speaking peoples (all western Finnish-speaking populations and Komi-Permyaks), Ob-Ugrians, Urals Turkic (Bashkir and Volga Tatars), and peoples of Dagestan.

Only two European populations (Udmurts and peoples of South Europe) have been found among other populations showing above the average frequencies (0.13 < q < 0.18), and the trend is shifted southeastward: here peoples of Kazakhstan, Pamirs and Central Asia coexist with the populations of Western and Central Caucasus.

Only Asian peoples are found among populations showing below the average frequencies (0 < q < 0.12): South Siberia (Altai people, Tofalars, Tuvans, Shors), Baikal region (Evenks), Far East (Nanai people, peoples of Kamchatka, Evenks, Evens, Chukchi people), East and Central Asia.

Both genotypes of “ultrafast” metabolizers (*17/*17, fig. 4B; *1/*17, fig. 4C) are characterized by the same variability trend: obvious frequency drop from west to east. However, their variability ranges are different: the range of variant *17/*17 is small (0 < q < 0.10) and that of variant *1/*17 is 4 times larger (0 < q < 0.40).

The *3/*17 heterozygote (fig. 4E) encoding intermediate metabolizers has been found showing extremely low frequencies (0 < q < 0.05), it is almost absent in the west and east of North Eurasia. It has unique geography: the range where *3/*17 is present forms a continuous strip stretching from Ob-Ugrians in the north to peoples of Central Asia, then stretching to the east, to Mongolia and East Asia.

DISCUSSION

The population of North Eurasia is the area of most ancient (since Paleolithic times) interactions between major racial branches of the humankind: western (Caucasoids) and eastern (Mongoloids). These interactions are clearly reflected by the CYP2C19 SNP marker maps (fig. 1–4). However, it has been no less convincingly shown how tough and imprecise is confining the real picture showing variability of these variants to the straightforward scheme of two racial poles.

Information about the fact that CYP2C19*2 is two times less often found in Caucasoids (15%) than in Mongoloids of East Asia (31%) [5] is obviously insufficient. True situation is much more complex: the actual *2 variability (fig. 2) represents the imposition of the latitudinal vector and additional maximum in Europe on the common longitudinal vector in Siberia. That is why the use of specific data on the peoples of Russia and neighboring countries in pharmacogenetics is so popular. Such data on the frequencies of four CYP2C19 variants in multiple ethnic groups or their sets are provided in tab. 1: information about the *2 variant spans 79 populations with the total sample of 6346 individuals. With the *2 frequency variability span between q = 0 and q = 0.43, both its minimum (Chukchi people, q = 0) and maximum (Negidals, Nivkh people, Ulchis, q = 0.43) are in the same region, in the Far East of Russia. This precedent demonstrates that it is impossible to think in terms of generalized ethnoracial categories and there is a need for a differentiated approach. The CYP2C19*2 frequency in the Russian populations is about 11% [6], but it varies over a very broad range: between q = 0.02 in the Nizhny Novgorod region and q = 0.18 in the Arkhangelsk, Bryansk and Smolensk regions.

The sharper ethnoracial differences are reported for *17 variant in literature: between q = 0.22 in Caucasoids and q = 0.02 in Mongoloids of East Asia [5]. As for European populations, the papers report the increase in *17 frequency in Central and Eastern Europe (0.25 < q < 0.33) with the decrease in the north (0.19 < q < 0.22), south (up to q = 0.18) and west (q = 0.17) of Europe [18, 19]. Our data on the frequency of *17 variant provided in tab. 1 for 35 populations with the total sample of 3313 individuals demonstrate a similar Eurasian*17 frequency range (0 < q < 0.32). Furthermore, in Caucasoids, frequency varies within the range between q = 0.15 in peoples of South Europe and q = 0.32 in the north in Russians of Arkhangelsk region. A wide variety of populations conditionally classified as Mongoloids has an equally large span: from 0 in the Far East to q = 0.31 in Mari of the Cis-Urals region. That is why it is necessary to assess the real picture of geographical variability instead of using the straightforward Caucasoid-Mongoloid scheme.

The data reported (tab. 1, tab. 2) provide important information about many ethnic groups of Russia and neighboring countries, populations of which have migrated en masse to Russia. However, these data cover only a part of the genetic diversity of peoples of our countries. That is why the knowledge of the gene geographical variability (fig. 1–5) predicting frequencies of the CYP2C19 clinically significant variants for peoples, information about which is currently missing in the databases and published papers, is so important.

Given the common patterns, first of all, it is necessary to emphasize that the geographical trend clarity is not dependent on the abundance of this or that CYP2C19 variant or its variability span (fig. 1–4).

Variants *2 and *17 are characterized by similar variability span (0 < q < 0.4) and similar average heterozygosity (0.11 < HS < 0.12). However, while *17 demostrates a strong trend of frequency decrease from west to east (fig. 4), the *2 variability is much more complex (fig. 2). This apparent difference between *2 and *17 is also indicated by the interpopulation variability (GST) value: *2 shows interpopulation differences (GST = 0.02) that are 4 times lower than that of *17 (GST = 0.08). However, a less frequent (0 < q < 0.1) marker *3 shows a very strong trend (fig. 3), which results in the same interpopulation difference span (GST = 0.02) as in *2.

The correlation maps (fig. 5) demonstrate similarity zones in the patterns of the *1, *2, *3, *17 frequency variability (positive correlations are highlighted in red) and in the area of the oppositely directed vectors of their variability (negative correlations are highlighted in blue). The set of six correlation maps (fig. 5B–G) demonstrates that, given overall similarity of the *1, *2, *3, *17 frequency variability patterns, there are always regions, in which the general pattern is replaced by the correlation with opposite sign. The combination of longitudinal and latitudinal trends (red color) makes similarity of the *1 and *2 gene geography obvious (fig. 5B), however, a number of exceptions are found in the north showing negative correlation between maps of variants *1 and *2 (blue color). The pronounced similarity of the patterns of maps for *2 and *3 (fig. 5E) is disrupted in the northeast of the region (negative correlation highlighted in blue). Despite the fact that the *3 and *17 frequency change vectors are generally oppositely directed (negative correlation), but not always alternative to each other, in some regions (foreign Europe, Ob-Ugrians, Middle Asia, South Siberia, north of the Far East) there is a positive correlation between these two maps (fig. 5G). In general, the correlation maps convincingly show that even when the common trends in pharmacogenetic biomarker variability have been found, it is necessary to continue studying each of the peoples living in multi-ethnic Russia. The upcoming publication will show how high such variability can be on the example of peoples of the Caucasus region. That is why it is necessary to get closer to the real picture of the pharmacogenetic biomarker gene geographical variability and create cartographic atlases for various regions of Russia.

Study limitations

Certain study limitations are related to limited sample sizes of assessed biomaterial samples from some populations (sample sizes are provided in tab. 1, tab. 2).

CONCLUSIONS

The paper provides data in the CYP2C19*1, *2, *3, *17 SNP marker frequencies and pharmacogenetically significant genotypes in the major ethnic groups of Russia and neighboring countries. The gene geographical variability of CYP2C19*1 (based on the data on 2261 individuals of 53 populations) combines a longitudinal trend of frequency increase from west to southeast of North Eurasia and latitudinal variability of frequency increase from north to south in the Asian part of the region. The CYP2C19*2 spatial variability (6346 individuals, 79 populations) is characterized by variability similar to that of *1, however, both trends, longitudinal and latitudinal, are interrupted by local extrema. Gene geography of CYP2C19*3 (7517 individuals, 92 populations) shows a stronger longitudinal trend of natural frequency increase from 0 in the west to 12% in the east and southeast of North Eurasia. This is a world’s maximum: the high frequency area is located in East Asia with the peak frequency in the Amur region. The CYP2C19*17 gene geographical variability (3313 individuals, 35 populations) is different from that of previous variants, it shows a strong oppositely directed longitudinal trend of frequency decrease from west to southeast. The correlation maps of the CYP2C19*1, *2, *3, *17 variant frequencies demonstrate regions, in which there is no similarity between the main frequency variability patterns of these CYP2C19 gene variants. The fact is important for practical use in pharmacogenomics. So long as the currently available data do not cover all peoples of Russia, the gene geographical variability maps are first to predict the CYP2C19*1, *2, *3, *17 variant frequency and pharmacogenetically significant genotypes for the populations, information about which is missing.

КОММЕНТАРИИ (0)