
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (CC BY).
ORIGINAL RESEARCH
Gene pool of the Ural-Volga region: genetic history of mordovia’s population based on the Y-chromosomal haplogroup N3a1-Y23475 phylogeography
1 Bochkov Research Center for Medical Genetics, Moscow, Russia
2 Lomonosov Moscow State University, Moscow, Russia
Correspondence should be addressed: Anastasia T. Agdzhoyan
Moskvorechye, д. 1, 115522, Moscow, Russia; moc.liamg@nayohzdgaa
Funding: State Assignment of the Ministry of Science and Higher Education of the Russian Federation for the Research Centre for Medical Genetics
Acknowlegements: the authors would like to thank all participants of the expedition survey (sample donors) and Biobank of North Eurasia (for access to DNA collections).
Author contribution: Balanovska EV — management; Shtrunov-Shlykov AG — expedition survey of the populations of Mordovia; Ponomarev GYu, Voronina MM, Adamov DS — Y-SNP and Y-STR marker genotyping; Agdzhoyan AT, Ponomarev GYu, Adamov DS, Gorin IO, Potanina AYu, Koshel SM — statistical, phylogenetic, cartographic analysis; Balanovska EV, Agdzhoyan AT — study design and manuscript writing.
Compliance with ethical standards: the study was approved by the Ethics Commitee of the Research Centre for Medical Genetics (protocol No. 1 dated 29 June 2020).
Y-haplogroup N3 represents one of the basic components of the North Eurasian gene pool and is considered to be a marker of the ancient population expansion, during which the Uralic languages spread [1–2]. Most of the haplogroup N3 range is the territory of Russia, where there are currently more than 1.6 million speakers of languages of the Uralic linguistic group belonging to 20 ethnic groups [3]. More than 90% of the population of the Uralic language-speakers in the Russian Federation is represented by the Finnish-speaking peoples of the Ural-Volga region of three language groups: Mari (Mari), Mordovian (Moksha, Shoksha, Erzya), and Permic (Besermyan, Komi-Zyryans, Komi-Permyaks, Udmurts). Among branches of haplogroup N3 most common in Europe (N3a1, N3a3, N3a4), high significant correlation with the Uralic ancestral component of the autosomal genome was reported for haplogroup N3a1 only [4].
Haplogroup N3a1 is common in the gene pools of peoples of the Ural region and is rare outside the region: in Udmurts (67%), Komi-Zyryans (18–43%), Chuvash (20%), Khanty and Mansi (19%), Komi-Permyaks (12%), Mari (14%), and Mordvins (5–10%) [1]. The spread of N3a1 is characterized by considerable frequency variation within its range: from 1% on the outskirts of the range (in Bashkirs, Belarusians, Karelians, Russians, Khakas) to the world’s maximum in Udmurts (67%). This haplogroup, like other N3 lineages, could be brought to Europe by the population related to the Seima-Turbino transcultural phenomenon in the Bronze Age, despite the fact that this hypothesis has not yet been confirmed by direct paleoDNA analysis [5–6].
The structure of haplogroup N3a1-B211, common in the Finnish-speaking peoples of the Ural-Volga region, and the time of origin of its branches are poorly understood, but basic understanding is ensured by the YFull open source data [7]. Among the commercial testing participants, a common one in the populations of the Ural-Volga region is the N3a1-Y23475 branch spread across the western part of the region: in Mordvins-Erzya, Volga Tatars, and Russians (Bryansk, Nizhny Novgorod, Penza, Kirov, and Sverdlovsk regions). According to the YFull data, active accumulation of today’s diversity within N3a1-Y23475 took place on average in the last 2.4–2.7 thousand years.
The study aimed to investigate gene geography and phylogenetic structure of haplogroup N3a1-Y23475 based on the extensive data on the populations of North Eurasia.
METHODS
Biological samples were collected after obtaining the informed consent from the donors during the expedition surveys managed by Professor E.V. Balanovska and RAS Professor O.P. Balanovsky in accordance with the same program that had been described earlier [8]. The donors were unrelated adult males, whose ancestors for at least three generations considered themselves to belong to this ethnic group and were born into this population. In Mordovia, the Moksha and Erzya ethnic groups were assessed, including the distinct group of Erzya living in the Tengushevsky District, hereinafter referred to as Shoksha.
DNA was isolated from the venous blood samples using the QIAsymphony SP nucleic acid purification system or by phenol-chloroform extraction using proteinase К; the sample preparation phases had been described earlier [9]. Among 4051 samples (tab. 1) there were 395 samples of the haplogroup N3a1-B211 carriers from 29 populations of Eastern Europe, Ural-Volga region, and Siberia. For these genotyping of the Y23475 SNP marker was performed by real-time PCR using the TaqMan probes and the OpenArray technique in the QuantStudio 12 Flex thermocycler (Thermo Fisher Scientific, USA). A total of 78 carriers of the N3a1-Y23475 branch were identified. Of those the results of fragment analysis (37 YSTR markers) were obtained for 74 samples using the commercially available Yfiler Plus (Thermo Fisher Scientific, USA), Powerplex Y23 (Promega, USA) kits and the Nanophore 05 genetic analyzer (Syntol, Russia).
Cartographic analysis was performed using the original GeneGeo cartographic software package [10] developed under the leadership of E. V. Balanovska and O. P. Balanovsky. The gene geographic map of the haplogroup N3a1-Y23475 spread was created based on the genotyping data by the weighted average interpolation (radius 400 km, and weight function degree 3).
The N3a1-Y23475 phylogenetic network was constructed based on the median-joining principle [11] using the Network v.10.2.0.0 software tool (Fluxus Technology Ltd, UK). The network image was visualized in Network Publisher v.2.1.2.5 (Fluxus Technology Ltd, UK). The weight of each of 37 STR markers was considered to be 10 with ε = 0. The time to the most recent common ancestor (TMRCA) for the entire phylogenetic network and the clusters distinguished (tab. 2) was calculated by the ASD method [12]. We excluded the DYF387S1b locus due to partial AZFc deletions in the P1 palindromic sequence in the haplogroup N3-M178 samples. The mutation rate constant for the 36-marker haplotype was selected based on the world’s data [13–14]: 0.0038 per locus per generation. The average generation interval for males was considered to be 31.5 years [15].
RESULTS
Gene geography of haplogroup N3a1-Y23475
The range of haplogroup N3a1-Y23475 extends from northwest to southeast (from the Vologda Russians to Mongols), but shows intermittent and irregular frequency distribution (tab. 1).
Most of the haplogroup carriers (84%) are in the Finnish-speaking populations of the Ural-Volga region, where the N3a1-Y23475 frequency varies between 0.4% and 25% (fig. 1).
The haplogroup frequency reaches its maximum in the populations of Mordovia, where there more than 70% of its carriers with the following frequency values: 8% in Moksha, 9% in Erzya, and the maximum value (25%) in Tengushevsky Erzya-Shoksha (hereinafter, Shoksha). N3a1-Y23475 is almost an order of magnitude more rare in the neighbouring Turkic-speaking peoples: Mishar Tatars (2%), Kazan Tatars (1%), and extremely rare in Bashkirs (0.4%). The haplogroup is rare in the Russian populations: 3% in the Vologda Region and 1% in the Belgorod, Kaluga, Oryol regions (tab. 1). In Siberia, haplogroup N3a1-Y23475 has been found in Altaians (6%) and one Mongol.
The reported haplogroup N3a1-Y23475 gene geography necessitates consideration of two issues: 1) of the factors that led to its accumulation in Mordovia’s populations; 2) of its sources in Altaians. The search for answers to these questions was performed using phylogenetic analysis.
Phylogenetic structure and chronology of clusters/branches
The samples of the haplogroup N3a1-Y23475 carriers were assessed using the 37 Y-STR marker panel in all populations, where the haplogroup was found. A total of 74 haplotypes were obtained (Table in Appendix), based on which a phylogenetic network was constructed (fig. 2).
The use of 37 Y-STR markers allowed us to achieve high phylogenetic resolution: most haplotypes were distributed across six clusters (fig. 2). Four clusters (B, C, D, E) are characterized by absolute specificity: each cluster is formed by the samples belonging to only one Mordovia’s ethnic group, i.e. Erzya, Moksha or Shoksha. Clusters А and F include three specific subclusters, two of which (А1 and F1) are formed by the haplotypes from Mordovia’s populations. Chronology of their emergence was calculated for all clusters and subclusters (tab. 2 — time to the most recent common ancestor, TMRCA).
Subcluster А1 and cluster B (fig. 2) of Moksha were formed within the same period: about 500 years ago (tab. 2). Subcluster А1 includes for different Moksha haplotypes from the Insarsky District of Mordovia (hereinafter, birthplaces of the assessed individuals’ paternal grandfathers are specified). Cluster B is represented by the Moksha samples from three neighboring districts of Mordovia: four samples from the Insarsky District, three samples from the Atyuryevsky District, and one sample from the Kovylkinsky District.
Clusters C and E (fig. 2) of Erzya were formed about 900– 1000 years ago (tab. 2). Cluster C includes representatives of the Ichalkovsky District of Mordovia, and cluster E includes mostly those of the adjacent Chamzinsky District.
Cluster D and subcluster F1 of Shoksha are far from each other in the phypogenetic network, despite the fact that these originate from the populations of the Tengushevsky District in northwestern Mordovia. The dates differ almost two times: subcluster F1 was formed about 500 years ago, cluster D about 900 years ago (tab. 2).
All haplotypes of the Kazan Tatars and Mishar Tatars are located outside the clusters. A casual resemblance to these is reported for two Moksha samples and two Bashkir samples (fig. 2), as well as for subcluster D of Shoksha.
The samples of Russians also do not form any separate subcluster; these are included in the most heterogeneous cluster F, along with the haplotypes of Moksha, Bashkir, and Mongol and subclusters of Shoksha and Telengit Altaians.
Haplotypes of Altaians are distributed across clusters A and F in the phylogenetic network. The age of cluster A including only one sample of the Kumandin Altaian is about 1150 years. This value twice exceeds the value of the Moksha subcluster А1, the Altaian haplotype originates from. Samples of southern Altaians-Telengits are merged into the specific subcluster F2, which suggests their descent from a common ancestor. The time to the most recent common ancestor for subcluster F2 is about 800 years (tab. 2).
The age of the entire haplogroup N3a1-Y23475 calculated based on the Y-STR haplotypes (2340 ± 330 years) within the margins of error is consistent with the estimates of the YFull team [7] obtained based on SNP markers (2700 ± 300 years). Matches in the YFull phylogenetic tree can be found for a half of the Y-STR clusters identified in the population-based study (tab. 2).
DISCUSSION
The detected accumulation of haplogroup N3a1-Y23475 in the gene pools of Mordovia’s population and two population growth momenta reflect their demographic history. The population growth about 500 years ago could result from liberation of the population from the system of dependence on the Mongol Empire and the Golden Horde (that lasted from mid-13th century to late 15th century). The population growth momentum about 1000 years ago is considered to be associated with the formation of ethnic foci of Moksha (on the Tsna River and in Prisurie) and Erzya (in Poteshie) in the 10th century. The late Ryazan-Oka traditions followed by the ceramic complex of the Shokshinsky burial ground developed within the Shoksha range in an isolated manner, inheriting and developing the features of ceramics of the preceding period [16].
Peculiarity of the Shoksha gene pool is traced through high frequency of N3a1-Y23475 (25% vs. 8–9% in Moksha and Erzya) and the analysis of autosomal gene pool. The ADMIXTURE method revealed two ancestral components [17]. The first ancestral component (Moksha-Erzya) merges the populations of Moksha and Erzya only, while the second (Shoksha) is typical only for Shoksha populations. Both ancestral components are found in the genomes of most Russian populations, suggesting the contribution of pre-Slavic population to the Russian gene pool (tab. 3).
Accumulation of haplogroup N3a1-Y23475 in the population of Mordovia can be explained by consecutive effects of two factors: migration and the founder effect. The presence of this lineage in all Mordovia’s populations suggests that it was inherited by the Mordvin proto-population from the same source (probably, from the alien carriers of haplogroup N3a1).
The founder effect presumably manifested itself after isolation of Mordovia’s ethnic groups. This explains high specificity of clusters and subclusters (accumulation of haplotype diversity within each population) and their structure in the phylogenetic network (fig. 2).
The N3a1-Y23475 phylogenetic analysis results make it possible to give an interim answer to the question about the sources of origin of haplogroup N3a1-Y23475 in Altaians. Haplotype of the Northern Altaian-Kumandin is most close to the Moksha haplotypes (fig. 2, cluster А). Since the population of Kumandins is small (2400 people), a single Moksha lineage could emerge in Kumandins due to mass resettlement of Mordvins to Altai in the 19th–20th centuries [18]: in the early 20th century, Mordvins ranked third (following Russian and Ukranians) among ethnic groups based on the population size in the Altai Region.
The issue of the population being the source of branch N3a1-Y23475 in Southern Altaians is related to the issue of its ancestral homeland that could hypothetically be located in the Ural region (“Uralic ancestral homeland”) or South Siberia (“Siberian ancestral homeland”).
Southern Altaians-Telengits form their own subcluster F1 in the phylogenetic network, which, along with two Belgorod samples and the Mongol sample can be traced back to the most common Shoksha haplotype (fig. 2, cluster F). The subcluster F2 chronology suggests that the ancestor of today’s N3a1-Y23475 branch carriers could emerge within the range of Southern Altaians-Telengits about 800 years ago (500–1100 years ago).
Reasons for the “Uralic ancestral homeland” hypothesis. The greatest diversity of both clusters and single N3a1-Y23475 haplotypes is observed in today’s populations of the Ural- Volga region, which indirectly indicates the haplogroup origin can be in this area. In this case the emergence of haplogroup N3a1-Y23475 in Southern Altaians and Mongols is associated with the eastward migration from the Ural region. Furthermore, migration to Altai had to occur no later as 500 years ago, since subcluster F2 is significantly isolated from the main pool of haplotypes from the populations of the Ural-Volga region.
Reasons for the “Siberian ancestral homeland” hypothesis. An alternative hypothesis places the haplogroup N3a1-Y23475 ancestral homeland to South Siberia. This hypothesis is supported by the previously reported general vector of the haplogroup N spread from east to west [1]. In this case, subcluster F2 of Southern Altaians-Telengits preserves genetic memories about the ancient haplogroup N3a1-Y23475 carriers, who migrated to the Ural region from South Siberia. Cluster F differs from other clusters of the phylogenetic network (fig. 2) by the highest population heterogeneity and the most ancient date (1710 ± 290 years ago; tab. 2). There is a large number of reticulations in the structure of cluster F, along with rather distant relationships between haplotypes from different regions and both subclusters. The chronology calculated suggests that cluster F was formed on average only 600 years later than the entire haplogroup N3a1-Y23475. Both observations suggest that in the past there were many clusters within cluster F, from which only distinct haplotypes survived by today.
The hypothesis about the South Siberian ancestral homeland of haplogroup N3a1-Y23475 is not contradicted by the pattern of its spread across today’s populations, since high frequency and haplotype diversity in Mordovia’s populations formed as late as in the last millennium. Chronology of cluster F suggests that the South Siberian, some Volga region, and all Russian populations had the same root about 1700 years ago. During this period N3a1-Y23475 could emerge in South Siberia in the population of haplogroup N3a1-B211 carriers, who migrated to the west. The subcluster F2 founder can be a descendant of this population, which left sporadic marks in the gene pool of today’s Southern Altaians-Telengits.
The available data do not yet provide unambiguous confirmation of validity of one or another hypothesis. That is why it is planned to analyze the entire complex of the Y chromosome haplogroups and haplotypes in order to trace genetic links between the gene pools of the Ural region and South Siberia.
CONCLUSIONS
Haplogroup N3a1-Y23475 was formed 2.3–2.7 thousand years ago, but intense accumulation of the haplogroup took place mostly in Mordovia’s populations since the 10th century. High share of the haplogroup in all Mordovia’s populations suggests that it originates from the common source. Two population growth momenta in the ancestors of Mordvins can be traced: about 1000 years ago in Erzya, about 500 years ago in Moksha; both momenta have been reported for the Shoksha population. Low N3a1-Y23475 frequency in the populations of Tukcic-speaking and Slavic-speaking peoples is associated with the presence of the substrate layer of Finnish-speaking peoples in their gene pools. The presence of haplogroup N3a1-Y23475 in Northern Altaians-Kumandins is likely to be associated with migration of Mordovia’s population to Altai in the 19th–20th centuries. The source of N3a1-Y23475 in Southern Altaians- Telengits requires verification of two hypotheses: of the “Uralic ancestral homeland” with negligible migration to the east and of the “Siberian ancestral homeland” with migration from South Siberia to the Ural-Vplga region about 1700 years ago.