Copyright: © 2025 by the authors. Licensee: Pirogov University.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (CC BY).

ORIGINAL RESEARCH

Genetic portraits of volga–Oka region in the context of the Central Russia’s gene pool (Y-SNP polymorphism)

About authors

1 Research Centre for Medical Genetics, Moscow, Russia

2 Lomonosov Moscow State University, Moscow, Russia

Correspondence should be addressed: Georgy Yu. Ponomarev
Moskvorechye, 1, 115522, Moscow, Russia; moc.liamg@009i62ts

About paper

Funding: the study was supported by the RSF grant No. 25-28-01594.

Acknowledgements: the authors would like to thank all participants of the expedition survey, who provided their biological samples for the study, to Administration and employees of the Ministry of Healthcare of the Ryazan Region and Republic of Mordovia for institutional support and assistance in conducting expeditions, as well as to Biobank of North Eurasia for access to DNA collections.

Author contribution: Ponomarev GYu — genotyping and Y-SNP marker analysis, study design; Shlykov AG — manager of the expedition survey of the gene pool of the Ryazan Region and Mordovia; Ponomarev GYu, Voronina MM, Petrov VA — expedition members, questionnaire survey data analysis; Adamov DS, Potanina AYu, Gorin IO — statistical analysis; Koshel SM — cartographic analysis; Adamov DS, Balanovska EV — study design and manuscript writing.

Compliance with ethical standards: the study was approved by the Ethics Committee of the Research Centre for Medical Genetics (protocol No. 1 dated 29 June 2020). The data were acquired after obtaining the written informed consent from the assessed individuals and anonymised.

Received: 2025-10-16 Accepted: 2025-11-22 Published online: 2025-12-01
|

In the recent decade, the research focused on the origin of modern Slavic-speaking peoples has risen to a new level due to the studies of modern and ancient DNA. The genetic data on the autosomal DNA, Y-chromosome, mitochondrial DNA provide an opportunity to see various aspects of communication between pre-Slavic and Slavic populations during massive expansion of Slavic tribes in the large territories of Europe in the 6th–12th century A.D. One of the main issues of this communication is the relationship between two processes: population change (complete substitution of the indigenous people’s gene pool with the newcomer Slavic tribes) and assimilation of the preSlavic population (“slavicisation” — interpenetration of gene pools of Slavic and pre-Slavic populations with the adoption of Slavic culture). With the historical, archaeological, linguistic, and genetic data accumulated to date, a clear answer to the question cannot be given.

Synthesis of data on the autosomal genome, Y-chromosome, and mtDNA of the modern Slavic population has shown that most of the Baltic-Slavic genetic variation is related mainly to assimilation of pre-Slavic gene pools that were different in Western, Eastern and Southern Slavs [1].

In 2025, two important reports were published, the authors of which analyzed communication between the pre-Slavic population and Slavic tribes based on various systems of genetic data of modern populations and ancient DNA. Autosomal genomes of ancient samples (n = 555) from burials in the eastern Germany, northwestern Balkans, Poland (and northwestern Ukraine) dating to the period before (6–7th century) and after the Slavic expansion were assessed [2]. In addition to these three major territorial clusters, the data for the Volga–Oka region are also provided. To calculate the extent, to which the indigenous population was substituted, samples from the earliest Slavic inhumations that replaced the cremation ceremony (Powiat Hrubieszowski, Poland, 600–900 A.D.) were used. DNA of these samples was used to calculate ancestral genetic components (qpAdm) as a reference. Based on the calculation results, the authors came to a conclusion about the dominance of the process of intense population substitution in the territories of Slavic expansion: the Slavs substituted 82 ± 1% of the local gene pools in the northwestern Balkans, 83 ± 6% in the eastern Germany, 93 ± 3% in Poland and Ukraine. The values reported for the Volga–Oka region are much lower: 65 ± 4% of the indigenous population was substituted by the newcomer Slavic migrants [2].

The second study performed by our team based on the Y-chromosome data was limited to the Slavic expansion northeastern periphery: the Volga–Oka region [3]. Haplogroup R1a predominates in modern Y gene pools of the population of the Volga–Oka interfluve (n = 935): 56% in Russians of the Ryazan Region and 44% in the indigenous population of Mordovia. That is why the authors focused mainly on the comprehensive analysis of the detailed data on the Y-SNP and Y-STR polymorphisms of haplogroups R1a-CTS1211 and R1a-Z92, along with five ancient samples dating to the 6th–12th century A.D. Calculation of the time to the most recent common ancestor (TMRCA) by two independent methods revealed 10 informative clusters in phylogenetic networks of the 37-marker Y-STR haplotypes aged 1600–2900 years, i.e. before the beginning of Slavic expansion. About a half of Russians of the Ryazan Region (carriers of the studied R1a branches) can be male descendants of the pre-Slavic population of the region, presumably the ancient indigenous Finnish-speaking tribes. In contrast to the findings of the previous paper [2], we concluded that the Russian population of the Volga–Oka interfluve was shaped largely through cultural assimilation, not complete substitution of the pre-Slavic population by the Slavs. This important conclusion is supported by the earlier studies conducted by our team that were focused on the Y-chromosome [1, 49] and the autosomal gene pool [1, 10].

The conclusions about substitution of about 65% of the gene pool of the pre-Slavic population of the Volga–Oka region [2] are based on the data on the ancient DNA of the Suzdal Opolye (n = 31) acquired by another team, which recorded the gene pool changes associated with the spread of the Slavs and was more careful when drawing conclusions about the scope of substitution [11]. These authors report that the medieval Suzdal principality was populated by various ethnic groups that eventually yielded a hybrid, but totally Russian-speaking population inhabiting this region by today.

It is clear that further research is necessary to clarify an important question: what is the relationship between the process of intense substitution of the indigenous population with the Slavs and the process of interpenetration of their gene pools. The question is difficult to answer due to the lack of consolidated data on Y-haplogroups in various Slavic-, Finnish-, and Turkic-speaking populations of European Russia. Such a gap in our knowledge prevents further solution to the problem of the nature of communication between Slavic and pre-Slavic populations in Russia. The study aimed to perform comparative analysis of the major Y-haplogroups in the context of the gene pool of central part of European Russia, which would help fill the gap.

METHODS

The data on the present-day indigenous population of the Volga–Oka region were collected during expedition surveys in 2005–2023 (n = 1136 in total): in the Ryazan Region (n = 497; Kadomsky, Kasimovsky, Mikhaylovsky, Sapozhkovsky, Sarayevsky, Spassky, Shilovsky districts) and the Republic of Mordovia (n = 639; Ardatovsky, Insarsky, Ichalkovsky, Krasnoslobodsky, Lukyanovsky, Ruzayevsky, Tengushevsky, Torbeyevsky, Chamzinsky districts). Inclusion criteria: samples exclusively from unrelated males, whose ethnicity was confirmed at least three generations deep.

DNA was isolated by nucleic acid purification on magnetic particles in the QIAsymphony system (Qiagen; Germany). The Y-SNP marker genotyping was performed by the TaqMan Open Array method in the QuantStudio 12К Flex thermocycler (Thermo Fisher Scientific; USA) using custom plates.

The data on the gene pool of 80 indigenous populations of Central Russia (n = 9712) provided by the Biobank of North Eurasia [12] were assessed based on the single panel of 35 haplogroups: C-M217, D-M174, E-M96, G1-M285, G2-P15, G-M201(xM285,P15), H-M69, I-M170(xM253,M223,P37), I-M223, I-M253, I-P37, J1-M267, J2-M172, L-M20, N2-P43, N3a1-B211, N3a2-M2118, N3a3-CTS10760, N3a4-Z1936, N3a5a-F4205, N-M231(xP43,M178), O-M175, Q-M242, R1a-M198(xM458,CTS1211,Z92,Z93), R1a-CTS1211, R1a-M458, R1a-Z92, R1a-Z93, R1b-M343(xM269,M73), R1b-M269 (xL51,Z2105), R1b-L51, R1b-M73, R1b-Z2105, R2-M124, T-M70. These were complemented with the published data on 10 populations of Latvia, Lithuania, Poland, Finland, Sweden, and Estonia [1315]. Frequencies of 35 Y-haplogroups were used to calculate a pairwise matrix of Nei's genetic distances between 90 populations (DJgenetics [4]); the multidimensional scaling (MDS) plots were created in Statistica version 7.1 (TIBCO Software, USA).

Maps of the Y-haplogroup spread and maps of genetic distances were created in the GeneGeo 2.8 software package [16] by the weighted average interpolation method with an influence radius of 500 km and the weight function decrement 3.

RESULTS

Volga–Oka region gene pool diversity

Among 28 Y-haplogroups found in the gene pool of the studied populations of the Volga–Oka region, the macro-haplogroup R1a predominates, accounting for on average 51% (Table). However, populations of the region show considerable differences in its frequency: minimum 33% in Moksha, maximum 64% in Russians of the Kasimovsky district, Ryazan Region. The majority of other haplogroups vary much more (fig. 1). Thus, the haplogroup N3 frequency varies between 5.5% in Russians of the Shilovsky district and 43% in Shoksha. The “Other” category (Table) includes haplogroups with the frequencies 5% in all studied populations (C-M217, G1-M285, I2-M223, J1-M267, L-M20, N2-P43, N-M231*, Q-M242, R1a-M198*, R1b-M269*, R1b-M73, R2a-M124, T1a-M70).

In the gene pool of Russians of the Ryazan Region, haplogroups are distributed by descending order of frequency as follows: R1a-CTS1211 (28%), R1a-Z92 (17%), I2a-P37.2 (9%), R1a-M458 (7%). This set of conditionally “Slavic” haplogroups is typical for the Russian populations of central and southern Russia in general [4]. Among the conditionally “Uralic” haplogroups the most common are N3a3 (7%) and N3a4 (4%).

The Erzya gene pool is close to the Russian populations of the Ryazan Region: R1a shows high frequency (55%), and the branch R1a-CTS1211 (44%) is much more common in Erzya, than R1a-Z92 (4%). In the Erzya gene pool, haplogroups R1a-Z93 (6,5%), N3a1 (9%), and I1-M253 (9%) are more frequent, than in Russian populations.

In Shoksha (Tengushevsky Erzya), the macrohaplogroup N3 (43%) predominates, which makes Shoksha dramatically different from Erzya (13%). All three N3 branches in the Shoksha gene pool reach their maximum in the Volga–Oka region: N3a1 (23%), N3a4 (11%), N3a3 (9%). Based on the frequency of R1a-CTS1211 (40%) Shoksha are close to Erzya (44%).

The Moksha gene pool uniqueness is represented not only by the reduced frequency of R1a (33%), but also by the increased frequencies of haplogroups E-M78 (17%), J2-M172 (16%), G2-P15 (9%), R1b-L51 (7%).

Frequencies of Y-haplogroups revealed significant genetic differences between populations of Mordovia, which suggests the need to consider these as separate populations, not allowing us to combine these into a single Mordva group.

Volga–Oka region gene pool position among populations of Central Russia

The fact that the detailed information about gene pools of the indigenous population of European Russia is available for the team allows us to consider the major patterns of the gene pool variation and assess the position of the Volga–Oka region gene pool in it.

Frequencies of 35 Y-haplogroups were used to calculate a pairwise matrix of Nei's genetic distances between 90 populations of Russia and the Baltics, as well as to create a multidimensional scaling (MDS) plot. Populations of Central Russia formed three major clusters in its genetic space (fig. 2): Slavic, Western Finnish, Ural–Volga.  

The Slavic cluster is divided into two subclusters: the first one includes the northernmost population of Central Russia, the second one includes the main bulk of the Central Russia’s populations.

The subcluster of Slavs including 17 Russian populations of central and southern Russia turned out to be the most dense and compact in the genetic space. It is significant that Poles, representatives of Western Slavs, are very close to it (n = 598) [14]. This suggests a large geographic range of the populations included in the central Slavic subcluster, as well as considerable similarity of their gene pools. High frequencies of haplogroups R1a-CTS1211, R1a-Z92, R1a-M458, I2a-P37.2, and N3a3 are the subcluster characteristic features.

The northern cluster of Central Russia’s Slavs combined five populations of the Kostroma, Novgorod, and Yaroslavl regions [4, 8, 9]. Uniqueness of its gene pool is manifested by higher frequencies of haplogroups E and R1a-M458 and proximity to the gene pool of Finnish-speaking populations of northwestern Russia due to haplogroups I1-M253, I2a-P37.2, N3a4-Z1936, N2-P43, R1a-CTS1211, R1b-L51.

The Western Finnish cluster included all the studied populations of Finnish-speaking residents of northwestern Russia: Veps, Votians, Izhorians, Karelia`s Karelians, Tver Karelians, Ingrian Finns (the relationship between those and the gene pool of the Russian North was reported in [8, 17]). The cluster gene pool combines increased frequencies of haplogroups of the western (I1-M253, R1b-L51) and Uralic (N3a4, N2) origin.

The Ural–Volga cluster combined Finnish-speaking and Turkic-speaking peoples of the region. Its range in the genetic space is the largest, which suggests considerable differences between gene pools. Positions of populations on the plot do not depend on the linguistic identity, but reflect their location in the geographic space. The gene pools of those combine the “Uralic” haplogroups N3a1, N2 and the conditionally “steppe” ones E, G2-P15, J, R1a-Z93, R1b-M73, R1b-Z2105 in different proportions. It is important to highlight two features of the Volga–Oka populations. First, populations of Mordovia (Erzya,

Moksha, Shoksha) are genetically most close to the cluster of Russian populations, which confirms the conclusions of our previous study considering significant pre-Slavic substrate in the gene pool of the central and southern parts of European Russia [3]. Second, the Mishar Tatars and Chuvash are close to the Volga–Oka populations (fig. 3), which requires special discussion.

Gene-geographic landscape of Y-haplogroups in the indigenous population of Central Russia

The most important objective of the study is the analysis of the gene-geographic landscape of the indigenous peoples of European population providing valuable information about the nature and intensity of communication between the indigenous population and the newcomer Slavic tribes.

Being unable to provide all 35 maps of spatial variation of Y-haplogroup frequencies, we selected six most typical maps (fig. 4): maps of “Slavic” (R1a-CTS1211, R1a-M458, and I2a-P37.2) and “Uralic” (N3a1, N3a3, N3a4) haplogroups. Both terms are very relative, since the maps themselves demonstrate how big the ranges are. However, the term “Uralic” also suggests both linguistic affiliation to the Uralic-speaking peoples and the geographic range, including the populations of Turkic-speaking peoples of the Ural–Volga region. Borders of the Ryazan Region and Mordovia allowing one to see the position of the Volga–Oka region in the gene-geographic landscape of Central Russia are provided in the maps. 

Haplogroup R1a-CTS1211 (fig. 4А) reaches its maximum frequencies (shades of red) not only in Russians of Central and Southern Russia, but also in Erzya and Moksha of Mordovia [3]. Its average frequencies (shades of yellow) cover all the periphery of the maps, except for minima (shades of green) in Kazakhstan, Udmurtia and Northwestern Finns.

Haplogroup R1a-M458 (fig. 4B) is typical for the more western portion of the range compared to R1a-CTS1211.

The maximum frequencies are found in the populations of the northern subcluster of the MDS plot. The low-frequency zone (shades of green) stretches to Mordovia and Mari El.

Haplogroup I2a-P37.2 (fig. 4С) having the same average gene-geographic landscape frequency (0.07 ≤ q ≤ 0.08), as R1a-M458, is characterized by the larger scale of frequencies, clearer transition boundaries, and the position of maximum frequencies in the southwest of European Russia.

Haplogroup N3a1 (fig. 4D), as a truly “Uralic” one, is almost an alternative to the previous map of the “Slavic” I2a-P37.2. An absolute maximum (65%) is located in the north-east of Udmurtia, the boundary of average frequencies embraces populations are located in the low-frequency zone (shades of green).

Haplogroup N3a3 (fig. 4E) is traditionally affiliated as “Uralic”, since it is typical for the Uralic-speaking northwestern peoples. The frequency maximum is located in the Baltics (28–58%), Fennoscandia, and adjacent regions of the Russian North [18], but the local maximum is also found in the Volga Region in Mari (15%). N3a3 with the medium frequencies is spread across almost all European Russia, which reflects ancient “Baltic” influences.

Haplogroup N3a4 (fig. 4F) is distinguished by a clear gradient of the frequency decrease from north to south. Its maxima are located in Fennoscandia [15], but very high frequencies (up to 60%) are found in the northwestern Finnish-speaking peoples of Russia and in the Russian North. Furthermore, the medium frequency zone (shades of yellow) extends to the south to Ryazan and to the east to the foothills of the Urals (17% N3a4 in Bashkir).

Volga–Oka region gene pool position in the gene-geographic landscape of Central Russia

Visual assessment of the Volga–Oka region genetic similarity to various populations of European Russia is provided by maps of Nei's genetic distances (dNei) (fig. 5).

Genetic distances from Russians of the Ryazan Region (average values for seven districts; fig. 5А) confirm the detected (MDS; fig. 2) similarity of those to the gene pools of almost all Russian populations of Central Russia (shades of green). A green spot of maximum similarity is also found in Mordovia, in Erzya. The zone of medium genetic distances (shades of purple and blue) covers the northern Russian populations, eastern Slavs, and the entire Volga right bank.

However, the pattern of genetic distances from Erzya is surprising (fig. 5B): the “green” zone of high similarity covers only Russian populations, stretching to the Kaluga and Kursk regions. The “violet” zone of medium values is even broader: this stretches to the Belgorod, Bryansk, Nizhny Novgorod regions, but of Uralic populations includes only Mishar Tatars of the Trans-Kama region (Alexeyevsky, Alkeyevsky, Spassky, Chistopolsky districts of Tatarstan).

The unique nature of the Moksha Y gene pool manifested by the lack of populations being on the small genetic distance from Moksha was mentioned above (fig. 5C). However, the zone of medium values is rather large: along with Erzya and Shoksha, this includes the Mishar Tatars of Bashkiria and the Trans-Volga region (Drozhzhanovsky, Apastovsky districts of Tatarstan), Chuvash of Bashkiria, Anatri Chuvash of Chuvashia.

All these populations are located in the Mordovia subcluster on the plot (MDS, fig. 3). The Shoksha gene pool (fig. 5D) turned out to be the most close to that of the Mishar Tatars of the Trans-Kama region and Meadow Mari.

DISCUSSION

The analysis of all Slavic populations [1] showed a strong correlation of the data on the autosomal genome and Y chromosome. Such a correlation was confirmed in the earlier report: the autosomal gene pool substitution during the Slavic expansion was accompanied by the no less dramatic change of Y-haplogroups that resulted in predominance of haplogroups R1a-CTS1211, R1a-M458, and I2a-P37.2 [2].

Therefore, in this study special attention is paid to the geographic landscape of the conditionally “Slavic” haplogroups. The total frequency of haplogroups R1a-CTS1211, R1a-Z92, R1a-M458, R1a-M198*, and I2a-P37.2 calculated based on the set of 19 populations of Russians and 15 populations of Belarusians and Ukrainians constituted 67% of the gene pool of  Eastern Slavs. This value is fully consistent with the total frequency of these “Slavic” haplogroups in Russians of Central Russia (66%) and is 3 times higher, than in Finnish-speaking and Turkic-speaking peoples of the Ural–Volga region (21%). However, this parameter is 2 times higher in populations of Mordovia (42%), than in the region (21%). 

The in-depth phylogenetic study of modern and ancient carriers of haplogroups R1a-CTS1211 and R1a-Z92 in the Volga–Oka region has shown that about a half of the R1aCTS1211 and R1a-Z92 haplogroup lineages on the Slavic expansion eastern periphery (Ryazan Region) are of pre-Slavic origin [3]. Frequency of these two haplogroups in Russians of the Ryazan Region eastern districts (46%) corresponds to their frequency in all Russian populations of central Russia (46%). This allows us to extend the lower estimate of the contribution of pre-Slavic population to the Y gene pool of Russians of central and southern Russia to all populations of Russians in this large region: this is about 43%. This estimate suggests the lower bound of the indigenous population contribution to the Slavic gene pool, since it was obtained specifically for the haplogroups most typical for the Slavs. Further indepth phylogenetic study of modern and ancient carriers of “Uralic” haplogroups in the Volga–Oka region will show, how much higher could be the estimated contribution of pre-Slavic population to the Russian gene pool.

Among the conditionally “Uralic” haplogroups, haplogroups N3a1, N3a3, and N3a4 turned out to be the most common in the Volga–Oka region. However, we would like to emphasize again that the term “Uralic” is in this case both geographic and linguistic, it suggests the spread of these haplogroups across both studied peoples speaking Uralic languages (both northwestern — Veps, Votians, Izhorians, Karelians, Ingrian Finns; and eastern — Besermyan, Mari, Moksha, Udmurt, Shoksha, Erzya) and the considered Turkic-speaking populations of the Ural–Volga region (Bashkir, Kazan Tatars, Kryashens, Mishar Tatars, Chuvash). The total frequencies of the N3a1, N3a3, and N3a4 “Uralic” haplogroups in these three groups of populations are distributed in the following descending order: northwestern Finnish-speaking peoples (48%), eastern Finnish-speaking peoples (37%), Turks of the Ural–Volga region (12%). Based on the set of “Uralic” haplogroups, populations of Mordovia (23%) occupy an intermediate position in the Ural– Volga region, getting closer to Russians of the Ryazan Region (11%) and Russian populations of Central Russia in general (11%) in this parameter.

The first discovered “kinship” of the Mordovia’s population Y gene pools, all four studied populations of the Mishar Tatars of Tatarstan and Bashkiria, and the majority of Chuvash populations (Anatri, Anat Jenchi, and Chuvash of Bashkiria) was an important finding of the analysis of the gene-geographic landscape of Central Russia. All these gene pools of the set of Finnish-speaking and Turkic-speaking populations form a common subcluster, which suggests their close genetic kinship. In the genetic space, this subcluster turned out to be close not only to the Ryazan Russians, but also to the entire set of Russian populations of Central and Southern Russia, based on the ratio of “Slavic” (35%) and “Uralic” haplogroups (14%) as well.

The fact, that all populations of Mordovia together with the Mishar Tatars and Chuvash form a common genetic set, allows us to hypothesize that all the populations of this cluster can be traced back to the common ancient Finnishspeaking population of the region. The sources of such unity can be different. It can be traced back to the early medieval Imenkovo archaeological culture or be related to migration of ancient Balts [1921]. However, the Meschera hypothesis of the Mishar Tatar origin should be tested once again. According to anthropology data, [22] the Mishar Tatars are closer to Erzya, than to Meschera, but the new data on the Y gene pool suggest that not only Erzya and Mishar Tatars, but also all populations of Mordovia together with the Mishar Tatars and Chuvash form the common genetic set, which can result from the common nature of the tribes that included Meschera and the related tribes before Slavic expansion. These hypotheses can be tested based on the more thorough assessment of appropriate gene pools using the broad panels of Y-STR markers [3] and the ADMIXTURE analysis of the autosomal gene pool ancestral components.

CONCLUSIONS

1. The findings of the gene-geographic analysis of variation of 35 Y-haplogroups in 80 populations of indigenous peoples of Central Russia (our own data) and 10 populations of the North European countries [1315] are presented as multidimensional scaling plots and a series of the gene-geographic landscape maps. 2. It has been shown that the gene pool of Russian populations of the Volga–Oka region (seven districts of the Ryazan Region) is completely inside the compact cluster of populations of Central Russia, it belongs to the common gene pool of the Eastern and Western Slavs. The map of genetic distances from Russians of the Ryazan Region and all the studied populations of indigenous peoples of Central Russia has fully confirmed this conclusion. 3. The gene pool of Finnish-speaking peoples of the Volga–Oka region (Moksha, Erzya, Shoksha) belongs to the large cluster of the Ural– Volga populations, but it shows maximum genetic proximity to the Russian populations of Central Russia. 4. It has been found that the subcluster that is close to the Russian gene pool includes the majority of Chuvash populations and all the studied populations of the Mishar Tatars of Tatarstan and Bashkiria, along with the Mordovia’s populations. It has been hypothesized that this set of Finnish-speaking and Turkicspeaking populations can be traced back to the gene pool of the ancient indigenous Finnish-speaking population of the Volga–Oka region.

КОММЕНТАРИИ (0)