THE ACCURACY OF PREDICTING EYE AND HAIR PIGMENTATION BASED ON GENETIC MARKERS IN RUSSIAN POPULATIONS

Prediction of eye and hair color from DNA is being increasingly employed in forensic medicine and the studies of ancient populations. HIrisPlex-S is a prediction tool that was developed using the data collected from Dutch donors and verified for some other European populations. The accuracy of its predictions for other world populations has not been studied yet. Unlike the majority of other world populations, Russian populations are characterized not only by dark but also by light color eyes and hair and therefore pose a special interest in this respect. The aim of this work was to determine the accuracy of eye and hair color predictions for Russian populations. We studied 144 representatives of indigenous populations of Russia (Avars, Aleuts, Buryats, Itelmens, Karelians, Koryaks, Maris, Nanais, Russians, Rutulians, Chuvashes, Chukchi, Evenks, and Evens). Anthropological photos were taken of all individuals. Based on the photos, the anthropologists identified eye and hair color phenotypes. SNP-markers were genotyped using the HIrisPlex panel. Based on the genotypes, the phenotypes were predicted and subsequently compared to the actual phenotypes. We obtained a series of HIrisPlex accuracy indicators for the populations inhabiting the European part of Russia and Siberia. On the whole, prediction accuracy was satisfactory, although a bit lower than for West European populations. Further research could look for additional markers increasing the accuracy of predictions for Russian populations. of phenotyping of the samples, photography, photo processing, tabular data processing; manuscript preparation; genotyping.

In the last decade, prediction of eye and hair color from DNA has paved its way into forensic medicine and population genetics. Today, it is possible to predict the physical appearance of an unknown person from their biological sample. Phenotype prediction is used to help crime investigations, identify disaster victims, study DNA samples of ancient populations, conduct genetic genealogy analysis, etc. So far, there have been abounding studies [1][2][3][4][5][6][7][8][9][10] that have identified a number of key genes and genetic sites involved in pigmentation. The most critical of them were included in the HIrisPlex panel and its expanded version HIrisPlex-S [8][9][10][11]. Genotyping of 25 DNA markers (SNPs and indels) included in HIrisPlex [10] helps to rapidly and reliably predict eye and hair color; HIrisPlex-S analyzes these 25 polymorphisms + 16 more predictive of skin color.
Original publications on HIrisPlex [8][9][10][11] demonstrate that the system generates reliable results for European populations. HIrisPlex was developed using European datasets, primarily Dutch, and verified in Polish, Greek and Irish populations. The accuracy of HIrisPlex-based prediction has not been tested yet in the populations inhabiting other parts of the world. Because the majority of non-Europeans have dark eyes and dark hair, such tests will not have any informative value in most other non-European continents. However, in some populations living on the border between Europe and Asia (Altai region, the Caucasus, regions to the East of the Ural Mountains), both dark and light hair/eye phenotypes are common. Genetically, such individuals can significantly differ from Western Europeans [12]; this means that the range of genetic markers determining their hair/eye pigmentation may also be different. Even populations of the Ural region, which are genetically closer to Western Europeans than to the inhabitants of the Caucasus and Western Siberia, are more genetically distant from the Dutch as compared to Irish, Polish and Greek populations, whose specimens were used for HIrisPlex verification.
The aim of this work was to evaluate predictive power of HIrisPlex-S for eye and hair color prediction for the populations of North Eurasia using biological samples and photos of indigenous peoples taken during our expedition fieldwork.

Sample collection and phenotyping
As part of the field study of gene pools conducted by our team [13], we took images of the indigenous populations of Russia and bordering countries. The populations included in the study were examined during a few expeditions in 2015-2019. The following inclusion criteria were applied: 1) age over 18 years; 2) 4 ancestors (two grandfathers and 2 grandmothers) identifying themselves as belonging to the studied ethnic group; 3) the anthropological image of a participant; 4) written informed consent to participate. Exclusion criteria were as follows: 1) the lack of enough images preventing reliable identification of eye and hair color; 2) incomplete profile of the genotyped markers.
Eye/hair color was identified from the obtained photos by 3 experts; 2 of the experts were physical anthropologists with extensive experience in phenotyping; the other one was a geneticist specially trained in phenotyping. The experts worked independently. If the results were inconsistent with each other, the phenotyping procedure was repeated: this time, the experts worked together in order to reach a consensus. Eye color (dark, blue or intermediate) was successfully determined for 144 study participants. Hair color identification was successful in fewer cases because it was impossible to tell the natural hair color of most women from the photos and because some men had grey hair or were bald. Phenotyping results are shown in Table 1.
Genotyping and prediction of eye and hair color from genotypes DNA was isolated from blood/saliva samples by classic phenolchloroform extraction [14]. Genotyping was done using an Infinium Omni5Exome-4 v1.3 BeadChip kit (Illumina; USA) and an iScan array scanner. The quality of genotyping data was analyzed in GenomeStudio v2.0 (Illumina; USA). For all samples, the call rate (CR) was over 0.99, suggesting that the obtained data were suitable for further analysis. The BeadChip array can genotype over 4 million SNPs; the data it generates can be used in a variety of different studies. Genotypes matching 29 markers for eye/hair/skin color prediction included in the HirisPlex panel were extracted from the obtained array of genotyping data. The HIrisPlex-S panel contains a total of 25 predictive DNA markers for eye/hair color and 16 DNA markers for skin color. Of them, we successfully genotyped 19 markers of eye/hair color and 10 markers of skin color. The HIrisPlex panel allows prediction from a partial genotyping profile (a few obligatory markers are critical, others merely improve the accuracy of prediction), therefore a set of 19 out of 25 markers is sufficient to achieve good quality of prediction with HIrisPlex (predictive markers of skin color were not accounted for in our study). Nevertheless, clarification should be provided about the excluded marker rs312262906. Without it, predictions were generated only for eye color but not for hair color. The rs312262906 polymorphism causes a reading frame shift in the MC1R gene and is associated with red hair color. According to ExAC, the frequency of this polymorphism reaches 0.0038 in European populations and is 0.0000 (< 0.0001) in Asian populations; therefore, the probability of occurrence of at least 2 alternative alleles in our sample was negligible. This allowed us to assign the 0/0 genotype to this marker for all samples in order to predict hair color.
Genotypes were shortlisted using PLINK 1.9 [15]. The obtained genotypes are presented in Table 2.
HIrisPlex-S and the online webtool of the Department of Genetic Identification (Erasmus MC) [16]

Evaluation of eye/hair color prediction accuracy
Phenotypes predicted by HIrisPlex from the obtained genotypes were compared to the actual phenotypes identified by the anthropologists from the images taken during our expeditions; quality metrics were calculated for all 144 samples. The constructed 5-grade scales for eye/hair pigmentation were converted into conventional 3-grade scales in order to make phenotyping results suitable for comparison with HIrisPlex-S data.
To analyze the accuracy of HIrisPlex-S-based predictions, the following quality metrics were calculated: -precision (the ratio of true positives to the total number of positive predictions); -recall (the ratio of true positives to the sum of true positives and false negatives in the class); -accuracy (the proportion of correct predictions); -F₁ score (the harmonic mean of precision and recall), -AUC (area under curve) for ROC-curves (the true positive rate plotted against the false-positive rate at various threshold settings).
Quality metrics values are provided in Tables 3 and 4.

RESULTS
We photographed 144 representatives of the indigenous populations inhabiting European Russia and Siberia. Their DNA samples were genotyped for the markers included in the HirisPlex panel. Phenotyping and genotyping data obtained for each study participant were saved to a combined database.
To evaluate the quality of eye/hair color prediction by HIrisPlex-S in new populations phenotyped in advance, we predicted eye and hair color from their genotypes using the online webtool [16]. Results of eye color prediction for each individual case are shown in Table 5. Tables 1 and 5 allow comparing the observed and predicted phenotypes for each individual sample. Prediction quality metrics for the entire dataset are provided in Table 3.

DISCUSSION
taken in 3 planes in accordance with anthropological standards are a valuable resource for research into the associations between phenotypic traits and genotypes. In this study, such images were used to identify eye and hair color. The fact that phenotyping was independently conducted by 3 different experts and the availability of photos for verification render the results of our study reliable and reproducible. For genotyping, we used the most comprehensive, state-of-the art, popular HirisPlex-S system that has proven its accuracy in the studies of modern and ancient Western European populations [8,11,17]. HirisPlex-S prediction accuracy for the populations outside Western Europe was evaluated by comparing the observed phenotypes identified from the photos to the phenotypes predicted from DNA. Of all quality metrics (Table 3), AUC posed the greatest interest because AUC values characterizing HIrisPlex performance are available for Western European populations [16]. So, we were able to directly compare the accuracy of HIrisPlex predictions between Western European and Russian populations.
On the whole, the values of prediction quality metrics obtained for the majority of phenotypic classes (Table 3) were quite high (0.6-0.9), suggesting that use of HIrisPlex in Russian populations is justified. None of the systems predicting phenotypes from DNA is 100% accurate; for some classes, HIrisPlex prediction accuracy is below 0.9 even for Western European populations. In our opinion, this study has demonstrated the fitness of HIrisPlex for use in Russian populations and its satisfactory accuracy of prediction. However, HIrisPlex prediction accuracy is lower for Russian populations than for Western Europeans (0.8 vs 0.9 on average). Therefore, we believe that HIrisPlex can be used in Russian populations but still recommend to account for the detected decline in accuracy when interpreting the obtained data. In our study, Russian populations were divided into 2 datasets: European Russia and Siberia. Previous population genetic studies [18,19] revealed that these metapopulations are contrasting in terms of their genetic origin. They also turned out to be contrasting in terms of phenotype prediction quality, which was considerably lower for Siberia ( Table  4). The data in Tables 1 and 5 demonstrate that HIrisPlex predicts dark eyes for almost all Siberian samples, although some representatives of Siberian populations have light color eyes; the division into light and intermediate shades is arbitrary, but even so, the color of their eyes is not dark as predicted by HIrisPlex. Perhaps, light color eyes sometimes seen in indigenous Siberian populations is associated with other alleles or other genes, as compared to Europeans, meaning that the system based on Western European datasets cannot correctly predict light (not dark) color of eyes in those populations. A decrease in the prediction accuracy for the inhabitants of European Russia may have the same nature, but because this population is genetically closer to the populations of Western Europe, the differences in the allele spectrum and the decrease in prediction accuracy are not so pronounced. This can inspire new research aimed at identifying additional genetic markers that could improve the accuracy of prediction of pigmentation phenotypes from genotypes.

CONCLUSIONS
The analysis of correlations between genotypes and eye/hair pigmentation phenotypes of Russian populations aided by the widely used HIrisPlex-S panel confirmed its fitness for use in these End of Table 5 Sample Predicted eye color Predicted hair color previously unstudied populations, although its prediction accuracy was lower than in Western European datasets that had served as a basis for this classifier. A decrease in accuracy (from 0.94 to 0.89) is not that dramatic for the populations of European Russia, as compared to Siberian samples. This decrease might be associated with an impact of population-specific SNPs well-represented in the populations of North Eurasia but rarely found in Western Europe and, therefore, not included in the HIrisPlex-S panel.