Peculiarity of Pomors of Onega Peninsula and Winter Coast in the genetic context of Northern Europe
The study of the genetic history of the Russian people develops to cover an ever expanding range of both Russian populations and their neighbors [1–18]. A clear “white spot” on the emerging panorama concerns the northernmost Russian populations of the White Sea region, the Pomors. Their importance is not only that of a model for the evolution of populations on the very periphery of the ethnic range. In fact, the White Sea periphery is in general extremely peculiar: the history of the Pomors, a society greatly influenced by the sea, led not only to an unusual way of life, but to similarly unusual ways of interacting with Northern and North-Eastern Europe communities.
The "White Sea Pomors'' moniker emerged as a name for Russians dwelling on the coast of the White Sea. They hunted sea animals, fished on the high seas, and traded by sea, and were distinguished by many peculiarities in their way of life and preservation of features of ancient Russian culture . The first mentions of permanent Russian settlements on the White Sea coast at the turn of the 14th century are linked to the Novgorod colonization [20–22]. According to the chronicles, the settlers met a Finnic-speaking population on these lands - the tribes of the Zavolochskaya Chud, often associated with the Vepsians. But the settlement of the region began in the Mesolithic period, about 8 thousand years ago. In the Neolithic period, traces of the two closely related Kargopol and White Sea archaeological cultures are recorded. A new wave of newcomers in the Bronze Age (4–3 thousand years ago) is linked with the Finno-Ugric population (primarily with the Sámi [23–28]), and the last Slavic wave of migrants is associated with the Novgorod colonization of the North.
Existence on the very northern periphery of the Russian people (where toponymy has preserved traces of the preSlavic population to this day) coupled with the unique culture and economy of the Pomors suggests that their gene pool was likewise one of a kind; but only a small sample of Pomors (n = 28) was previously studied using extremely narrow panels of Y-chromosome and mtDNA markers [1, 2]. In the broad context of the population of Northern Eurasia, the researchers considered Pomors a part of the “Northern Russian” population . The features of their gene pool are explained by possible assimilation of the Ural- or Baltic-speaking population by the Slavs . Data on the autosomal genome of the Arkhangelsk Oblast Mezensky District Pomors (n = 96) indicates similarity with the Finnic-speaking population (more so with the Finno-Perm than with the Finno-Volga) .
A wider range of populations, namely the Russian North gene pool, was characterized by four systems of markers (Y chromosomes, mtDNA, autosomal DNA markers, and surname frequencies). The analysis includes the populations of the Pinezhsky, Leshukonsky, Krasnoborsky and Lensky Districts of the Arkhangelsk Oblast ; the scale of the genetic originality of those populations was shown in earlier research . mtDNA and autosomal DNA markers indicate the similarity of these populations to the Northern European population. The diversity of "paternal" lines, linked to the heritage of the most ancient Paleo-European population, reveals the similarity of the Russian North gene pool with that of the population of a vast territory from the Baltic States to Pechora. Genomic data on these populations allowed to analyze the genetic history of the Balto-Slavic peoples. On maps of genetic distances, the Russian North populations form one of the main patterns of the European gene pool . The search for genetic traces of Novgorodian colonization in the Russian North gene pool, carried out using a genome-wide panel of the autosomal genome , revealed the absence of the “Novgorod” ancestral component in the north of the Arkhangelsk Oblast, and in the southern Arkhangelsk Oblast Krasnoborsky and Lensky Districts, the contribution of the “Novgorod” component amounted to no more than one third of the gene pool.
Even this brief review of the study of the gene pool of the Russian North reveals a serious lack of data on its northernmost periphery: the focus is either on the “mainland” populations of the Arkhangelsk Oblast, or on a small sample of Pomors using a narrow panel that comprises only a small fraction of the modern DNA marker spectrum.
New data on the gene pool of Pomor populations, obtained using an extensive panel of markers, can provide clues to understanding the enormous genetic diversity and originality of the populations of the Russian North [4, 5, 7]. Due to the tradition of patrilocality among Pomors  and the high efficiency of studying “paternal lines” [1, 9–11], this work considers the polymorphism of Y-chromosome markers with the aim of solving two problems: creating “genetic portraits” of the three Pomor populations, all studied for the first time, and searching for genetic traces of Novgorodian colonization in their gene pool.
The indigenous population of the White Sea coast was studied using a modern panel of Y-chromosome markers (fig. 1). During the 2021 expedition, the settlements of the Onega Peninsula (Onega Coast and Summer Coast) and the western (Onega) fragment of the Winter Coast were surveyed (fig. 1) (for brevity, hereinafter all three populations are referred to as “Onega Pomors”). The survey was conducted subtotally: blood samples were taken in settlements with dense Pomor communities from almost all men meeting the inclusion criteria — the sample included only unrelated individuals whose ancestors (up to the third generation) belonged to the studied population and considered themselves Russians (or Pomors). Literary information and unpublished data from the Biobank of Northern Eurasia were used to compare the collected samples with those of the indigenous populations of the European North.
Total DNA was isolated from venous blood samples using the magnetic particle method on an automated QIAsymphony facility (QIAGEN; The Netherlands). Genotyping was performed by real-time PCR using TaqMan probes and OpenArray technology on a QuantStudio 12 Flex amplifier (Thermo Fisher Scientific; USA) on the following panel of 60 Y-SNP markers: D-M174, E-M35, E-M78, C-M217, C-F3791, C-F5481, C-F3918, C-M48, C-SK1066, C-M407, G-M201, G1-M285, G2-P15, G2-FGC595, G2-M406, G2-P303, H -M69, I-M170, I-M253, I-P37.2, I-M223, J1-M267, J1-P58, J2-M172, J2-M12, J2-M67, J2-M9, L-M20, L -M317, T-M70, N-M231, N-M128, N-Y3205, N-M178, N-B211, N-M2118, N-CST10760, N-Z1936, N-F4205, N-B202, N-B479 , O-P186, O-M119, O-P31, O-M122, O-P201, O-M134, Q-M242, R1a-M198, R1a-PF6202, R1a-Y2395, R1a-CTS1211, R1a-Z92, R1a -Z93, R1b-M343, R1b-Y13887, R1b-M269, R1b-L51, R1b-Z2105, R2-M124.
Statistical and cartographic analysis
Nei’s matrix of pairwise genetic distances was calculated (in the original DJ program ) based on the data on the frequencies of 14 Y-chromosome haplogroups identified in the three Pomor populations, with a multidimensional scaling plot constructed in the Statistica 7.0 package (StatSoft; USA). Cartographic analysis was performed using the original GeneGeo software package  using an extended spectrum of 26 Y-chromosome haplogroups characteristic of the region. Distribution maps of the 26 haplogroups were constructed according to frequencies from the Y-Base database (developed under the supervision of O.P. Balanovsky) using weighted average interpolation method with an influence radius of 800 km and a weight function value of 3 . The algorithm for creating each map of genetic distances consisted of two stages. First, a map of genetic distances from a given Pomor population to interpolated values at each point of the map was created for each of the 26 haplogroups. Then the average genetic distances from a given population of Pomors to each point of the map were calculated based on the resulting 26 maps. As a result, a map was created for each Onega Pomor population, which shows the degree of genetic similarity of the studied Pomor population with each of the comparison populations.
Y-chromosome haplogroup spectrum
14 Y-chromosome haplogroups were found in the gene pools of the three Onega Pomor populations (fig. 2): E-M78, I1-M253, I2-P37.2, I2-M223, J2-M92, J2-M67, N2a-Y3205, N3a3- CST10760, N3a4-Z1936, R1a-PF6202, R1a-CTS1211, R1a-Z92, R1b-L51, T1a-M70 (haplogroups hereafter referred to by their short names). Haplogroups I1, N3, and R1a were the most frequent; each constitutes circa 25% of the total Onega Pomor gene pool (fig. 2). Haplogroup R1a is represented by three branches (PF6202, CTS1211, Z92), haplogroup N3 by two (CST10760, Z1936). Haplogroups I2-P37.2 and R1b are next in frequency (each constitutes 8% of the gene pool), the rest are rare.
Despite the geographical proximity of the three Pomor populations (80–170 km; table), their genetic portraits differ noticeably, most significantly in four haplogroups: I2, N3a4, R1a, and R1b. Although each of the three Pomor populations has at least nine “general portrait” haplogroups, the haplogroup spectrums of the populations are markedly different from each other. The Winter Coast Pomors have a reduced frequency of the N3a4 haplogroup and an increased frequency of I2; in the Summer Coast Pomors, the haplogroup R1a-PF6202, characteristic of the other two populations, was absent, but the frequency of R1b was increased; in the Onega Coast Pomors, the frequency of N3a4 is high, but that of I2 is low (fig. 2).
There is a decrease in the share of haplogroups I1 and I2a from the east (Winter Coast) to the west (Onega Coast), while the opposite trend is true for haplogroups N3a3 and N3a4; this “longitudinal” trend is absent altogether in haplogroups R1a (increased frequency in the populations of the Winter and Onega Coasts) and R1b (maximum frequency on the Summer Coast). The frequency of R1a is high in the Winter Coast (29%) and Onega Coast (36%) gene pools, with all three branches of R1a found with a frequency of ≥ 5%. However, on the Summer Coast, the R1a frequency is two times lower and only the R1a-CTS1211 (13%) and R1a-Z92 (2%) branches were found. The decreased frequency of R1a and the sharp increase in the frequency of R1b (20%), observed only in the Summer Coast population, may result from either gene drift or migration flow. The frequency of R1b is also high in Arkhangelsk Oblast Pinezhsky District Russians (fig. 2), but a different branch of R1b is common there. In Onega Pomors, the L51 branch was found, which is characteristic of the peoples of North-Western Europe, rather than North-Eastern Europe. Phylogenetic approaches are necessary to link it to either migration or preservation of the ancient genetic landscape of the region.
Haplogroups R1a and N3a4 are frequent in Onega Pomors and other Arkhangelsk Oblast Russian populations alike (although N3a4 is rare in the Winter Coast population). The frequency of haplogroup I1, on the other hand, constitutes a major difference between the “coastal” Pomors and the “mainland” Arkhangelsk Oblast populations: on average, it makes up a quarter of the Onega Pomor gene pool (25%) despite not being typical for other northern Russians (12% in the Krasnoborsky District population, 1% in the Pinezhsky District population, and absent in the Leshukonsky District population).
Onega Pomors in the Northern Europe genetic spectrum
An obvious initial observation on the degrees of genetic similarity (table) is the surprising magnitude of genetic distances between Pomor populations (d = 0.28) despite their geographical and cultural proximity. Furthermore, the distance between Onega Pomors and other Russian populations is almost 3 times greater (d = 0.76), with some significant exceptions (table). The closest to the gene pool of the Onega Pomors (d = 0.29) was a geographically remote (about 500 km) Russian population in the Arkhangelsk Oblast Krasnoborsky and Lensky Districts; and even then, it is extremely close only to the Onega Coast Pomors (d = 0.15), but genetically distant from the Summer (d = 0.33) and Winter (d = 0.38) Coasts populations.
Russian populations outside of the Arkhangelsk Oblast generally show no genetic similarity with the Onega Pomors, except for Kostroma Oblast Russians (d = 0.50) and the Yaroslavl Oblast Mologa population (d = 0.63). Only Onega Coast Pomors are close to the Mologa gene pool (d = 0.17), while the Summer Coast and Winter Coast gene pools are extremely distant (d = 0.85). Previously, it was shown that among the Yaroslavl populations, Mologa specifically retained a clear genetic trace of a pre-Slavic population (presumably Meryans) .
The Vepsians (d = 0.43) and the Northern Karelians (d = 0.46) are genetically the closest to the Onega Pomors when compared to other Russian peoples, and again primarily to the Onega Coast Pomors (d = 0.23 and d = 0.12, respectively). The Onega Coast Pomors are genetically closer to their Finnic-speaking neighbors than to other Pomor populations (d = 0.28). But the representatives of the eastern wing of the Finnic-speaking peoples, the Udmurts, turned out to be the most genetically distant from the Pomors (d = 2.50), which contradicts the conclusion  about the similarity of the gene pool of the Pomors and the Finno-Perm peoples.
However, the Onega Pomors show the greatest genetic similarity with the peoples of foreign Northern Europe (table): the genetic distance from the Pomors to the Swedes and Finns (d = 0.28) is the same as the average distance between Pomor populations, and the distance to the Sámi gene pool is two times less (d = 0.14). Curiously, the distance to the gene pools of the Finns and the Sámi decreases four times as you move west from the Winter Coast to the Onega Coast. The genetic distances to the Scandinavians (Danes, Norwegians, Swedes), however, follow the opposite trend: the distance to the Onega Coast is twice as big as the distance to the Summer Coast and the Winter Coast populations, equally close to the Scandinavians. While the Winter Coast Pomors are close only to the Scandinavians, the Summer Coast Pomors also show genetic similarity with a wide range of European populations, from the Germans (d = 0.36) to the Irish (d = 0.65).
Five tentatively named clusters are distinguished in the genetic space of multidimensional scaling (fig. 3) (the plot is based on 14 “Pomor” Y-chromosome haplogroups); average distances between populations (d͞ ) were calculated for each cluster. The "Slavic" cluster (d͞ = 0.05) included Belarusians, Ukrainians, Poles, Smolensk Oblast Russians, and Yaroslavl Oblasts Russians. The related “Novgorod” cluster (d͞ = 0.06) unites all three Novgorod Oblast populations and the Pskov Oblast Porkhov population (Porkhov having once been part of the Novgorod lands), as well as the Finnic-speaking Vepsians and the Southern Karelians. The "Baltic" cluster (d͞ = 0.04) included all Baltics (Latvians, Lithuanians and Estonians) as well as the Pskov Oblast population (Ostrov group). The “Arkhangelsk” cluster (d͞ = 0.09) united the populations of the Arkhangelsk Oblast Pinezhsky and Leshukonsky Districts with the Yaroslavl Oblast Mologa population and the Tver Karelians.
The Pomors formed their own large cluster — the distances between the Pomor populations (d͞ = 0.28) are almost five times greater than the average distance within other clusters (d͞ = 0.06), and the area of the "Pomor" cluster is only slightly less than the sum of all four comparison group clusters, which included Finnic-speaking, Baltic-speaking and Slavic populations. But we emphasize that although the differences between the Pomor populations are great, they all took their own “Pomor” place in the genetic space of Northeastern Europe.
Maps of genetic distances (fig. 4) calculated from 26 Y-chromosome haplogroups typical for the entire region help determine the regions with which the Pomor gene pools are similar with more accuracy and significantly expand the range of comparison populations.
Total Pomor gene pool (fig. 4A) is genetically close to the southern part of Finland, as opposed to its north, represented by the Sámi.
Onega Coast gene pool (fig. 4B) reveals a vast area of genetic similarity: it covers almost all of Finland in the west, is clearly delineated by the Northern Dvina and Sukhona from the east, and reaches the Yaroslavl and Leningrad Oblasts in the south and southwest of Russia. This area of similarity also includes Finnic-speaking peoples (Vepsians, Izhoras, Ingrians, Karelians, Finns), and those Russian populations in whose gene pool a significant contribution of the pre-Slavic population can be traced.
Summer Coast gene pool (fig. 4C) showed the greatest similarity with the distant Swedes and Norwegians, and a less pronounced one with the Sámi (representing the very north of Scandinavia).
Winter Coast gene pool (fig. 4D) is relatively genetically close to only a few populations of Finns and Swedes. This is the only Pomor population for which one may assume that its genetic portrait has largely been shaped by genetic drift. However, the population of the Winter Coast is still represented only by its “Prionega” part (fig. 1). The study of the gene pool of the entire Winter Coast is currently underway, which will soon make it possible to draw a reasonable conclusion about its genetic history.
The three considered Pomors populations are close not only in the purely geographical sense (fig. 1): their economy and culture, in contrast to settled farmers, involves movement by sea over long distances. Therefore, it was assumed that the differences between their gene pools would be extremely small. During the expeditionary survey, the task was to form a subtotal sample in order to capture even minor genetic differences between the three Pomor populations: all settlements with dense Onega Pomor communities were surveyed (fig. 1). Although the analyzed samples are small (37–48 people for each population), they are reliable, as they represent the general population due to the subtotal nature of the survey and reflect reality, not sampling error. Even with such a small sample size, the differences in four haplogroups (I2, N3a4, R1a, R1b) out of 14 identified are significant, even though this type of analysis is based on the assumption that the samples were taken from an infinite general population of individuals. Therefore, the analysis of the significance of differences is not applicable to subtotal studies of small populations: subtotal samples provide the most accurate portrait of the population and do not require additional assessment of significance of differences.
Contrary to the initial hypothesis, it turned out that each of the three Pomor populations has a pronouncedly unique genetic portrait. The Onega Coast Pomors are genetically close both to the Finnic-speaking communities of Russia and Finland, and to the Arkhangelsk Oblast Russians. They are in general more genetically close to their Finnic-speaking neighbors than to other Onega Pomors. The Summer Coast Pomors are genetically similar only to the population of Scandinavia. Finally, the Winter Coast Pomors have practically no similar gene pools, save for some proximity to the Finns and Swedes. The great differences between the three Pomor populations are only slightly inferior in magnitude to the differences between the considered populations of Western and Eastern Slavs, Balts, and Finnic-speaking populations (fig. 3). At the same time, all three Pomor populations occupy their own “Pomor” place in the genetic space despite the wide range of comparison populations.
It is impossible to attribute such originality of the Pomor gene pools to genetic drift only. Genetic drift acts independently on different haplogroups. Therefore, a “drifting” population may appear similar to a comparison population according to one marker, to a completely different population according to another, etc. Then when analysis of genetic distances for the entire set of genetic markers is conducted, such a “drifting” population, regardless of its real origin, turns out to be unlike any comparison group.
This model can, to some extent, explain the peculiarity of the gene pool of the Winter Coast Pomors. But the final conclusion can only be drawn after analyzing genetic portraits of other populations of the vast Winter Coast (fig. 1). Genetic drift has doubtlessly been an important factor in the genetic history of all Pomor populations, which have declined in numbers over the past generations. However, it failed to erase the genetic memory of the fact that their gene pools were based on different substrates. Onega Coast Pomors have common roots with a wide range of Finnic-speaking North European populations, while the Summer Coast Pomors are similar only to Scandinavian populations. Whole-genome studies will allow to verify the hypothesis of their different origins, as genes that formed their gene pools engaged in ancient migration flows. However, study of Y-chromosome polymorphism (Y-chromosome being the most stable part of the Pomor gene pool due to their patrilocality) directly indicates that the genetic identity of the Onega Pomor populations is linked to different genetic substrates underlying the populations, although these differences were covered by powerful gene drift.
The second important question concerns genetic similarity between Pomors and Novgorodians. The average genetic distance between these populations (d = 0.77) turned out to be the same (d = 0.76) as the distance between the gene pools of the Pomors and the other examined Russian populations (table). Novgorodians show great genetic difference from the Onega Coast Pomors (d = 0.48), and differ even more strikingly from the other Pomor populations (Winter Coast d = 0.75, Summer Coast d = 1.09). We previously concluded that the autosomal genome of Novgorodians differs from Russians in the north of the Arkhangelsk Oblast . Now Y-chromosome markers reveal further pronounced differences between the gene pools of Novgorodians and Pomors. Both results contradict the view that the Russian North gene pool was shaped by the Novgorodian expansion. However, this is far from the only case in world history when internal colonization manifested through expansion of power and economic influence, but did not lead to decisive changes in the gene pool.
These and other results of studying the indigenous European population [4–5, 8–12, 30] provide a convincing argument against interpolating ideas about the region's history developed solely on the basis of the humanities data to the gene pool without additional research.
Y-chromosome polymorphism was studied in three White Sea Pomor populations: those of Onega, Summer and Winter Coasts. An analysis of subtotal samples of unrelated individuals from all locations with dense Pomor communities made it possible to create reliable genetic portraits of the three Pomor populations.
The study of the Pomor gene pool using a wide panel of Y-chromosome markers revealed 14 haplogroups, of which four (I2, N3a4, R1a, R1b) differed significantly in distribution by population despite small sample sizes (37–48 people). Differences in the genetic portraits of the Pomors are formed due to the gene pool originality of each of the populations: the Pomors of Winter Coast have a reduced frequency of the haplogroup N3a4 and an increased frequency of I2; in the Summer Coast population, there were no branches of the haplogroup R1a, characteristic of the other two populations, with an increased frequency of R1b; in the Onega Coast population, the frequency of N3a4 is high, but that of I2 is low. Spectrum features and haplogroup frequencies make the genetic portrait of each Pomor population unique.
Each of the three Pomor populations has its own range of genetically close populations. The Onega Coast Pomors are genetically similar to a wide range of Finnic-speaking peoples of North Europe, as well as to some Russian populations that express the contribution of the pre-Slavic population in their gene pool. The Summer Coast Pomors exhibited similarity only with the population of Scandinavia, which can be explained by a common Paleo-European substrate or later interactions between Scandinavians and Pomors. For the Winter Coast Pomors, some genetic affinity was recorded with only two populations of Finland and Sweden.
The genetic distances between the Pomor populations turned out to be comparable with the general range of variability between the Eastern Slavs, Balts, and Finno-Ugric peoples of the region. Aside from the genetic drift, the reason for this may be a different substrate underlying the gene pool of each population, since it is difficult to assume such different migration relationships in such geographically, ethnically and culturally close populations.
None of the three populations of the Onega Pomors show noticeable genetic similarity with the indigenous population of the Novgorod Oblast, indicating the absence of genetic traces of demic expansion during the Novgorod colonization of the Russian North.