Footprints of interaction among Finniс-speaking, Slavic, and Turkic-speaking populations in modern gene pool and their reflection in pharmacogenetics

About authors

1 Bochkov Research Centre of Medical Genetics, Moscow, Russia

2 Vavilov Institute of General Genetics, Moscow, Russia

3 Biobank of North Eurasia, Moscow, Russia

4 Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, Russia

Correspondence should be addressed: Elena V. Balanovska
Moskvorechie, 1, 115522, Moscow, Russia; ur.liam@aksvonalab

About paper

Funding: the study was supported by RFBR grant 20-29-01017 Ancient DNA (bioinformatics analysis), RSF grant 21-14-00363 (analysis of pharmacogenetics markers), and State Assignment of the Ministry of Science and Higher Education of the Russian Federation to Vavilov Institute of General Genetics (cartographic analysis) and Bochkov Research Centre of Medical Genetics (data interpretation).

Acknowledgements: the authors thank all sample donors who participated in this study and the Biobank of North Eurasia for the access to DNA collections.

Received: 2022-04-01 Accepted: 2022-04-16 Published online: 2022-04-26

Genetic history of the Russian people involves contributions from pre-Slavic populations and genetic footprints of the Golden Horde invasion. Gene pools of modern Russians are thought to result from interactions of three ethnic layers: pre-Slavic (Finnic-speaking), Slavic, and Golden Horde (Turkicspeaking). In the perspective of population genetics, these interactions have diverse projections including Y-chromosome phylogeography, selectively neutral ancestral components of autosomal genomes, and selectively relevant pharmacogenetic landscapes of DNA markers that determine drug sensitivity. However, the degree of interaction significantly varies within the indigenous geographical range of Russians [1]. For informative analysis, it is reasonable to focus on a nodal territory with the highest possible degree of interpenetration of the three genetic influences [2]. An excellent candidate territory for this role is the Volga-Oka interfluve in general and Ryazan region in particular.

In the second half of the 1st millennium AD, Slavic tribes started to penetrate into these lands, inhabited by Finnic-speaking and partly Baltic tribes, and the vectors of their migration were diverse. According to existing evidence, Slavic tribes initially arriving from southwestern territories were subsequently joined by Slavs from the northwest of Eastern Europe at the beginning of the 2nd millennium [1, 36]. In the early 11th century, the Murom principality was established, incorporating Ryazan lands [4, 7]. In the mid 12th century, the Murom principality splits into two, with capitals in Murom and Staraya Ryazan. In 1237, the Ryazan principality becomes the first casualty of the Mongol invasion led by Batu; since then, raids and devastation of Ryazan lands continue for over 350 years. In 1521, the Ryazan principality experiencing critical loss of its territories ultimately comes under control of Moscow sovereigns, but, even with subordination to Moscow, the ruin of Ryazan lands by Tatar raids continues until 1594. Taking into account the early military encounters of Ryazan people with the neighboring Volga Bulgaria (Ryazan campaigns against them in 1172 and 1183 are documented), the interaction of Ryazan people with the Turkic world, located at its borders, can be dated to the 12th century or earlier. In addition, the Ryazan region was, in a sense, an outpost that bordered on the Wild Field (rus. Dikoe Pole; the vast steppes sparsely populated by nomadic groups). It is reasonable therefore to view the Ryazan region as the major hub of interpenetration between gene pools of Slavic and Turkic-speaking populations, with corresponding genetic footprints in its modern Russian populations. The interaction between Slavic and Finniс-speaking tribes has an even longer history. Overall, the “nodal” territory of the VolgaOka interfluve and Ryazan lands provides arguably the best model for studying genetic footprints of Finnic-speaking, Slavic, and Turkic-speaking tribes and peoples.

The modern methods of DNA analysis allow reconstruction of ancient genomes from excavated human remains [814]. However, the number of ancient genomes suitable for analysis is limited, especially for populations that practiced, like the Slavs, cremation of the dead. An alternative important source of information on population history is provided by modern genomes subject to genome-wide genotyping or sequencing [1519]. The most appropriate bioinformatics handling for such data is provided by the autosomal genome ancestral component modeling tool ADMIXTURE [20].

Genetic interactions among peoples of Indo-European, Uralic, and Altaic language families have been considered in a number of studies applying genome-wide analysis to the modern gene pool of Northern Eurasia [2126]. For instance, a genome-wide panel-assisted reconstruction of gene pools for Balto-Slavic populations [21] reveals the genetic proximity of the Balts (Lithuanians, Latvians) to the Volga group of FinnoUgric peoples and especially to Mordovians. The Slavs, both Eastern and Western, absorbed the local pre-Slavic Eastern European genetic substratum. A genome-wide study of modern ethnic groups populating the East European Plain [22] reveals the “East Asian” ancestral component contributing 20% to gene pool of Bashkirs and 5% to gene pools of Chuvashs and Volga Tatars. Another genome-wide study identifies a specific ancestral component shared by peoples of the Uralic language family, including Finnic-speaking Karelians, Mordovians, Mari, and Udmurts, and defining the degree of their genetic relationship [23]. A genome-wide genetic study of North Eurasian populations reveals three clines stretching from west to east [24]. Subsequent analysis shows that gene pools of Turkic-speaking and Uralic-speaking populations in Povolzhye are highly similar, although the Uralic-speaking populations genetically gravitate towards Trans-Ural Ugrians. Comparison of autosomal genome data between Novgorod region and a wide range of populations in the European part of Russia and the Urals produced a hypothesis on considerable preservation of the local pre-Slavic population legacy in gene pools of the Novgorod region, which turned out to be closer to the eastern Finnic-speaking groups (Volga and Perm) than to the western (Baltic) [25]. Another important line of evidence is provided by pharmacogenetic studies, which enable creation of cartographic atlases of subcontinents, but consider local variants as well. For instance, Besermyans and Udmurts are pharmacogenetically close to indigenous populations of Volga Region, Urals, and Southern Urals, but distant from inhabitants of more remote regions [26].

This study aimed at modeling of ancestral components in order to reveal genetic footprints of interactions among Finnic-speaking, Slavic and Turkic-speaking ethnic groups in the autosomal gene pool of modern Russian populations inhabiting the nodal region of the Volga-Oka interfluve. The second, more applied, goal of this study was to create maps of pharmacogenetic DNA markers and contemplate pharmacogenetic landscape of the studied geographic area.


Methodological and bioinformatics aspects of the analysis of autosomal gene pools using genome-wide panels have been described in detail previously [27]. The genotyping for a genome-wide panel of 4.5 million SNP markers was performed using Infinium OmniExome BeadChip Kit (Illumina; USA) with an iScan system (Illumina; USA). Primary analysis and quality assessment of the data was carried out in the GenomeStudio v2011.1 software at a CallRate of at least 0.99.

The population genetic analysis for small panels of autosomal markers requires samples of at least 50 individuals. By contrast, genome-wide panels comprising millions of DNA markers afford a reliable output on much smaller samples of 5–10 individuals. Since the reduced sample size implies ultimate tightening of the sampling criteria, we emphasize that all genomes included in this study were selected in accordance with internationally recognized criteria [28]. In particular, genealogies of all participants, traced at least three generations backward, proved their origin from a given population and identification with a given ethnic group.

The “nodal” Ryazan region was represented by 20 genomes from 4 ethnic Russian populations (Mikhailovsky, Spassky, Sapozhkovsky, and Saraevsky districts), with Russian populations in Tver, Kostroma, Smolensk, Kaluga, Oryol, Tambov, and Nizhny Novgorod regions included for comparison.

The Finnic-speaking populations of the Volga-Ural region were represented by Mordovians (Erzya, Moksha, Shoksha), Mari, and Udmurts, whereas southern Karelians were included as the most geographically close representative of the western branch of the Finnic-speaking peoples. The Turkic-speaking populations of Volga Region and Ural were represented by Kazan Tatars and Chuvashs, with Astrakhan and Stavropol Nogais included for comparison. Identification of genetic footprints of Mongolic-speaking peoples involved genomewide data for six tribal groups of Kalmyks.

The analysis of ancestral components was carried out using the ADMIXTURE bioinformatics tool for 248 individual genomes representing 47 populations of 9 ethnic groups (table), including 104 genomes from Russian populations, 81 genomes from four Finnic-speaking peoples, 47 genomes from three Turkic-speaking peoples, and 16 genomes of Mongolic-speaking Kalmyks. The ADMIXTURE tool affords quantitative assessment for the contributions of different ancestral components to each individual genome [20, 29]. The ancestral components are modeled for the same uploaded set of genomes, with each level of modeling carried out independently. The number of ancestral components k is the only parameter specified by the user. At k = 2, contributions of two ancestral components are modeled for each genome; at k = 3, the tool presents the same genomes with three ancestral components; at k = 20, the tool reconstructs contributions of twenty ancestral components for the same set of genomes, etc.; as the k increases, the patterns become increasingly elaborated. The contribution of particular ancestral component to a gene pool is estimated by averaging its contributions to individual genomes.

A series of pharmacogenetic maps were built to assess the interactions among Finnic-speaking, Slavic, and Turkic-speaking ethnic groups in pharmacogenetic perspective and estimate their impact on the modern pharmacogenetic landscape.

The mapping employed data on 42 key pharmacogenetic markers (the absorption, distribution, metabolism, and excretion (ADME) genes; pharmacological target-encoding genes; and hemostasis system genes) derived from the same genome-wide genotyping datasets previously used in the ADMIXTURE processing [26]. The incidence matrix for the studied 42 pharmacogenetic DNA markers comprised data for 16 pooled populations (to increase sample size). Calculation of Nei’s genetic distances (d) based on this matrix produced 42 partial maps showing the extent of pharmacogenetic similarity between Ryazan and other regions for each of the studied markers. The averaging of partial maps produced the map of average pharmacogenetic distances from Ryazan, reflecting its pharmacogenetic status with regard to other subjects within the studied geographic area.

All maps of pharmacogenetic landscapes and ancestral components were built using the original GeneGeo mapping package [30] using the weighted average interpolation method with an influence radius of 400 km and a weight function value of 3. The genogeographic technology has been described in detail elsewhere [2, 31].


Modeling of ancestral components for the studied scope of 47 populations was carried out at 13 levels of k, obtained by sequentially incrementing k by 1, from 2 to 14 inclusive. Two models turned out to be the most informative for solving the main problem: at k = 3 and k = 7 (table). The level of k = 3 reveals three ancestral components conventionally defined as “Western”, “Ural” and “Eastern”. At the level of k = 7, the ancestral components of the western and eastern Finnicspeaking peoples become separated for the first time, which allows differentiating their contributions. The estimated contributions to individual genomes for each of the identified ancestral components are given in table. Contributions to individual genomes for each ancestral component at k = 3, k = 7, and k = 8 are presented in fig. 1. The level of k = 8 preserves the contributions of all components identified at the previous steps of analysis, while the new eighth component further elaborates the structure of Russian populations.

To validate the observed trends, modeling at each level from k = 2 to k = 14 was run in 10 repeats (yielding a total of 130 models). At k = 3, all models were virtually identical; at k = 7, six of ten runs revealed stable ancestral components (these are described in the text of the article). In the remaining four runs, one of the ancestral components was replaced by an alternative, and each of these runs presented with a higher simulation error value.


Modeling of three ancestral components

The analysis at k = 3 revealed three ancestral components conventionally designated “Western”, “Ural”, and “Eastern”. Most notably, the identified ancestral components poorly fit into the framework of three language families (Slavic, Finnic, and Turkic) (fig. 1, table).

“Western” ancestral component

This component prevails in all Russian populations (95%), but also in Finnic-speaking populations of Karelians (75%) and Mordovians (78%) (fig. 2A, table). Moreover, it constitutes a significant portion of gene pools in Turkic-speaking peoples: more than a half in Kazan Tatars (52%) and about a quarter in both Nogais (25%) and Chuvashs (23%).

“Ural” ancestral component

This component dominates in gene pools of Udmurts (99%) and Mari (91%) (fig. 2B, table). It is also prominent in Turkicspeaking peoples, accounting for two-thirds of Chuvash (67%) and a third of Tatar (34%) gene pools. A smaller but still substantive contribution of the “Ural” ancestral component is found in Karelians (24%) and Mordovians (19%). The average contribution of this component to Russian populations is small (4%) with the maxima in Kostroma and Nizhny Novgorod regions.

“Eastern” ancestral component

This component totally prevails (100%) in all six Kalmyk tribal groups included in this study, so it provides a suitable measure of the Central Asian influence on European gene pools (fig. 2C, table). This component is also prominent in Nogais (62%), which confirms its “Central Asian” status. Among Volga Region peoples, the highest Central Asian influence is observed in Kazan Tatars (14%) and Chuvashs (9%). In other studied gene pools, the “Eastern” influence is small, 5% (in Mari) or lower. Its average contribution to Russian gene pools is 1% (up to 3% in eastern districts of the Nizhny Novgorod and Ryazan regions).

Kazan Tatars

The “composite” nature of gene pool in Kazan Tatars, represented by five populations, should be discussed in detail. The subtle interpopulation divergence is due to variable “Western” (48–60%) and “Ural” (26–38%) contributions accompanied by similar “Eastern” contributions (14–15%). The dominant “Western” component (a half or more) was followed by “Ural” (roughly one third) and “Eastern” (14%) components in all studied populations. Increasing the resolution of analysis by incrementing k revealed some minor ancestral components, but these were shared with other ethnic groups. The analysis identified no singular ancestral component for Kazan Tatars; the “composite” structure preserved at higher levels of k prevents using their gene pool for evaluation of the “tatar” influence on the neighboring Russian populations.

Modeling of seven ancestral components

Analysis at k = 7 yielded four new ancestral components, though not as a result of sheer branching of those identified at previous steps of the analysis. The picture is much more complex: the new components mosaically absorb the elements of “Western” and “Ural” components revealed at k = 3. It should be emphasized that the components were attributed with “ethnic” trivial names only for the sake of brevity.

“Karelian” ancestral component

This component, which marks the contribution of Western Finnic-speaking ethnic groups, accounts for 94% of Karelian gene pools and is minor in other populations (table), with secondary maxima in gene pools of Kazan Tatars and Kostroma Russians (11%).

“Slavic” ancestral component

This component dominates in all ethnic Russian populations (81% on average, within the total range of 70–87%) (fig. 3A, table) and is virtually absent in other gene pools with the exception of Kazan Tatars (6%). The accentuated presence of the “Slavic” component in the Tatar gene pool cannot be explained genealogically, given its 80% prevalence in individual genomes. Although this component is also detectable in Mordovian gene pools (3%), it is present in only 17% of genomes in the northwest of Mordovia.

“Mordovian-1” ancestral component

This component shares the second largest geographic range with “Mordovian-2” (fig. 3B, table). Reaching maximum (53%) in gene pools of Mordovia, it is also ubiquitously found in other populations. Its prominent contributions are characteristic of Turkic-speaking peoples: 36%, 35%, and 20% in gene pools of Kazan Tatars, Astrakhan Nogais, and Chuvashs. Importantly, the “Mordovian-1” component shows almost total prevalence in these ethnic groups, contributing to almost all individual genomes (table), which indicates its historical significance in gene pools of the Turkic-speaking peoples of Volga Region.

Contribution of the “Mordovian-1” ancestral component to gene pools of ethnic Russians is modest (7%) despite rather high prevalence (60% of individual genomes). The maxima are encountered in Tver (19%) and Kaluga (16%) regions, with a very high prevalence (80–90% of individual genomes; fig. 4); in other regions, the prevalence is lower (45–65% of genomes). Overall, the “Mordovian-1” ancestral component is ubiquitously found in gene pools of almost all Slavic-speaking, Turkicspeaking and Finnic-speaking populations within the studied geographic area.

“Mordovian-2” ancestral component

The component shown in fig. 3C and table, has a more distinguished authenticity: it is already identifiable at k = 4, whereas the “Mordovian-1” component arrives at k = 7. Genomes of Mordovia show distinct clusterization (fig. 1): onefifth of them are 100% “Mordovian-1” and another one-fifth are 100% “Mordovian-2”. This component is found in gene pools of all studied populations (except Udmurts). In none of them, however, its contribution exceeds 5%, apart from, again, Kazan Tatars: with the average contribution of 6%, the “Mordovian-2” component is 90% prevalent in Tatars (in Chuvashs, it is found in 40% of genomes only).

In gene pools of ethnic Russians, the “Mordovian-2” component is relatively weak (3% on average) but ubiquitous. Moreover, it is encountered in 60% of individual Russian genomes, most commonly in eastern regions (Kostroma, Nizhny Novgorod, and Ryazan) (fig. 3C and fig. 4).

“Mari” ancestral component

This component firstly arrives at the level of five ancestral components (k = 5) and almost totally prevails in the meadow Mari gene pool (96%). It also accounts for two-thirds of the Chuvash gene pool (62%), with similarly high levels in all Chuvash populations (57–65%) (table). Of other ethnic groups, the most significant contribution of the “Mari” component is encountered in Kazan Tatars (15% on average, with 100% prevalence in individual genomes). In other studied populations, contributions of the “Mari” component never exceeds 4% (table).

“Udmurt” ancestral component

This component firstly arrives at k = 3 and has been already described by us as “Ural” (table, fig. 2B). At all higher levels it accounts for 100% of the Udmurt gene pool, while being minor (within 4%) in gene pools of other peoples. The only exception is, again, Kazan Tatars: the “Udmurt” component accounts for 10% of the gene pool and is present in almost all individual Tatar genomes with a maximal contribution of 21%.

“Kalmyk” ancestral component

This component fixes the “breath” of Central Asia; it firstly arrives at k = 2 and has been already described by us as “Eastern” (fig. 2C). Among the studied populations, it is only prominent in gene pools of Kalmyks (100%) and Nogais (61%). Of other ethnic groups, it is present at highest in Kazan Tatars (12%). Noteworthy, the “Kalmyk” component was found in all individual genomes of Tatars, constituting 7–17%. In other studied gene pools, contributions of the “Kalmyk” component never exceeded 5% (table).

Ryazan gene pool

Four modern populations of ethnic Russians (fig. 1fig. 3, table) provided a relevant model for the assessment of the mutual genetic influence of pre-Slavic, Slavic, and Turkic-speaking populations in the “nodal” Ryazan region. We picked one district (Mikhailovsky) at the very west of Ryazan region and three districts (Spassky, Sapozhkovsky, and Saraevsky) located on the same transect from north to south, with Saraevsky being borderline. The analysis indicates similar genetic constitution of the four gene pools, with certain differences in contributions of Finnic-speaking peoples: 19% in the borderline Saraevsky district and as low as 10–13% in the other three districts (table). Given the equally small Central Asian influence in all four populations (1–2%), this difference could not be directly related to the Golden Horde invasion, nor attributed to the influence of any known pre-Slavic tribe. The only suggestion to explain the authenticity of gene pools in the southeastern Ryazan lands is the higher influence of the Wild Field in this borderline area.

Pharmacogenetic status of Ryazan Russians

Analysis of genetic markers associated with pharmacologic phenotypes is a prerequisite in the transition to personalized medicine in terms of optimal drug choice and medication regimen adjustment. However, the majority of studies in this field have been focused on Western Europe and the results have little application to populations of Russia with their huge genetic diversity [31].

To assess the uniqueness of pharmacogenetic landscape within the studied geographic area, a map of genetic distances (d) from Ryazan Russians was created using an extensive panel of pharmacogenetic markers (fig. 5). In contrast with the ancestral component maps based on selectively neutral DNA markers (fig. 2 and fig. 3), pharmacogenetic mapping revealed the highest proximity of Ryazan Russians to their Finnic-speaking neighbor — Mordovian populations (0.03 < d < 0.04). One step more distant from Ryazan Russians in terms of pharmacogenetic status were Russian populations of Kaluga, Smolensk, and Kostroma regions (0.05 < d < 0.07), followed by Russians in Oryol and Tver regions (0.08 < d < 0.09). The third most similar to Ryazan Russians were Tambov Russians and their eastern neighbors — the Finnic-speaking Mari and the Turkic-speaking Chuvash peoples (0.09 < d < 0.10). Pharmacogenetic portraits of Tatar and Udmurt peoples were expectedly divergent from those of Ryazan Russians (0.11 < d < 0.15). The highest pharmacogenetics divergency from Ryazan Russians was most unexpectedly revealed by Russians of the adjacent Nizhny Novgorod region (0.11 < d < 0.12) despite the substantive similarity of selectively neutral genomic patterns between the two regions (fig. 1fig. 4).

Overall, comparing pharamacogenetic landscape and selectively neutral genomic pattern maps demonstrates that optimization of healthcare programs at the regional level should not be based on averaged genetic status of the target populations, but requires specific assessment of local pharmacogenetic landscapes.


Modeling of ancestral components for the autosomal gene pool of modern populations in the nodal region of interaction between Finnic-speaking, Slavic, and Turkic-speaking peoples revealed that (1) the Finnic-speaking ethnic groups of Volga Region (Udmurts, Mari, and Mordovians) lack common ancestral components; instead, each of these groups has its own characteristic ancestral components. Remarkably, two ancestral components identified in the Mordovian gene pool can be traced in almost all populations within the studied geographic area, regardless of their linguistic affiliation, which allows us to suggest that genetic portrait of pre-Slavic population of the studied geographic area included two main “colors” preserved in the modern gene pool of Mordovia. (2) The contribution of Finnic-speaking ethnicities to gene pools of modern Turkic-speaking peoples in Volga Region is enormous: ancestral components associated with Finnicspeaking ethnic groups constitute 81% and 94% in gene pools of Kazan Tatars and Chuvashs, respectively. (3) Gene pool of Kazan Tatars is the most “composite” of all gene pools in this study, most organically incorporating all ancestral components observed in other gene pools within the studied geographic area. Although the Central Asian influence is most pronounced among the Kazan Tatars, its contribution is low (12%), seven times lower than the total contribution of the Finnish-speaking peoples (81%), which makes it a poor basis for evaluation of the “Tatar” influence. (4) Gene pools of the studied Russian populations represent a single cluster, which can be basically (80%) described by characteristic ancestral component. At the same time, gene pools of modern ethnic Russians incorporate all other ancestral components found in genetic landscapes of other ethnic groups within the studied geographic area. The “nodal” Ryazan region is fully archetypal in terms of common features of the Slavic cluster with a somewhat increased total contribution of Finnish-speaking groups towards its southeastern border. (5) The picture of diversity based on selectively neutral ancestral components is complemented by unique pharmacogenetic landscapes revealed with a custom panel within the same genome-wide genotyping platform. The knowledge of pharmacogenetic parameters for a given population is essential for the future of personalized medicine and proper logistics of pharmaceuticals at the regional level with regard to genetic diversity of modern Russian populations.