ORIGINAL RESEARCH
The accuracy of predicting eye and hair pigmentation based on genetic markers in Russian populations
1 Research Center for Medical Genetics, Moscow, Russia
2 Vavilov Institute of General Genetics, RAS, Moscow
3 Biobank of North Eurasia, Moscow, Russia
4 Moscow Institute of Physics and Technology, Moscow, Russia
5 Federal Research and Clinical Center of Physical-Chemical Medicine, Moscow, Russia
6 Institute of Anthropology and Ethnography, Moscow, Russia
7 Anuchin Research Institute and Museum of Anthropology, Moscow, Russia
Correspondence should be addressed: Elena V. Balanovska
Moskvorechie, 1, Research Centre for Medical Genetics, Moscow, 115522; ur.liam@aksvonalab
Funding: the study was supported by the Ministry of Science and Education of the Russian Federation (State contract 011–17 dated 26.09.2017) as part of the Union State Research and Technical Project DNA-based identification, which included genotyping and phenotyping of European samples and preparation of this manuscript, and the State assignment of the Ministry of Science and Higher Education of the Russian Federation for the Research Centre for Medical Genetics (phenotyping of Siberian samples, creating a database, data analysis).
Acknowledgements: we thank all donors participating in our study. DNA samples and anthropological images were provided by the Biobank of North Eurasia.
Author contribution: Balanovska EV — supervision and study design; Petrushenko VS, Gorin IO — bioinformatic analysis, literature analysis, manuscript preparation; Maurer AM, Leybova NA — phenotyping of the samples; Kagazezheva ZhA — phenotyping of the samples, photography, photo processing, tabular data processing; Balanovsky OP, Markina NV — manuscript preparation; Kostryukova ES — genotyping.
In the last decade, prediction of eye and hair color from DNA has paved its way into forensic medicine and population genetics. Today, it is possible to predict the physical appearance of an unknown person from their biological sample. Phenotype prediction is used to help crime investigations, identify disaster victims, study DNA samples of ancient populations, conduct genetic genealogy analysis, etc. So far, there have been abounding studies [1–10] that have identified a number of key genes and genetic sites involved in pigmentation. The most critical of them were included in the HIrisPlex panel and its expanded version HIrisPlex-S [8–11]. Genotyping of 25 DNA markers (SNPs and indels) included in HIrisPlex [10] helps to rapidly and reliably predict eye and hair color; HIrisPlex-S analyzes these 25 polymorphisms + 16 more predictive of skin color.
Original publications on HIrisPlex [8–11] demonstrate that the system generates reliable results for European populations. HIrisPlex was developed using European datasets, primarily Dutch, and verified in Polish, Greek and Irish populations. The accuracy of HIrisPlex-based prediction has not been tested yet in the populations inhabiting other parts of the world. Because the majority of non-Europeans have dark eyes and dark hair, such tests will not have any informative value in most other non-European continents. However, in some populations living on the border between Europe and Asia (Altai region, the Caucasus, regions to the East of the Ural Mountains), both dark and light hair/eye phenotypes are common. Genetically, such individuals can significantly differ from Western Europeans [12]; this means that the range of genetic markers determining their hair/eye pigmentation may also be different. Even populations of the Ural region, which are genetically closer to Western Europeans than to the inhabitants of the Caucasus and Western Siberia, are more genetically distant from the Dutch as compared to Irish, Polish and Greek populations, whose specimens were used for HIrisPlex verification.
The aim of this work was to evaluate predictive power of HIrisPlex-S for eye and hair color prediction for the populations of North Eurasia using biological samples and photos of indigenous peoples taken during our expedition fieldwork.
METHODS
Sample collection and phenotyping
As part of the field study of gene pools conducted by our team [13], we took images of the indigenous populations of Russia and bordering countries. The populations included in the study were examined during a few expeditions in 2015-2019. The following inclusion criteria were applied:
1) age over 18 years; 2) 4 ancestors (two grandfathers and 2 grandmothers) identifying themselves as belonging to the studied ethnic group; 3) the anthropological image of a participant; 4) written informed consent to participate. Exclusion criteria were as follows: 1) the lack of enough images preventing reliable identification of eye and hair color; 2) incomplete profile of the genotyped markers.
The study was carried out in 144 individuals representing the following populations:
1) European Russia — Russian, Mari, Chuvash, Karelian, Rutulian, Avar (n = 66, 65 males and 1 female);
2) Siberia and Far East — Buryat, Evenk, Even, Nanai, Koryak, Itelmen, Chukchi, Aleut (n = 78, 45 males and 33 females).
Eye/hair color was identified from the obtained photos by 3 experts; 2 of the experts were physical anthropologists with extensive experience in phenotyping; the other one was a geneticist specially trained in phenotyping. The experts worked independently. If the results were inconsistent with each other, the phenotyping procedure was repeated: this time, the experts worked together in order to reach a consensus. Eye color (dark, blue or intermediate) was successfully determined for 144 study participants. Hair color identification was successful in fewer cases because it was impossible to tell the natural hair color of most women from the photos and because some men had grey hair or were bald. Phenotyping results are shown in tab. 1.
Genotyping and prediction of eye and hair color from genotypes
DNA was isolated from blood/saliva samples by classic phenol-chloroform extraction [14]. Genotyping was done using an Infinium Omni5Exome-4 v1.3 BeadChip kit (Illumina; USA) and an iScan array scanner. The quality of genotyping data was analyzed in GenomeStudio v2.0 (Illumina; USA). For all samples, the call rate (CR) was over 0.99, suggesting that the obtained data were suitable for further analysis. The BeadChip array can genotype over 4 million SNPs; the data it generates can be used in a variety of different studies. Genotypes matching 29 markers for eye/hair/skin color prediction included in the HirisPlex panel were extracted from the obtained array of genotyping data. The HIrisPlex-S panel contains a total of 25 predictive DNA markers for eye/hair color and 16 DNA markers for skin color. Of them, we successfully genotyped 19 markers of eye/hair color and 10 markers of skin color. The HIrisPlex panel allows prediction from a partial genotyping profile (a few obligatory markers are critical, others merely improve the accuracy of prediction), therefore a set of 19 out of 25 markers is sufficient to achieve good quality of prediction with HIrisPlex (predictive markers of skin color were not accounted for in our study). Nevertheless, clarification should be provided about the excluded marker rs312262906. Without it, predictions were generated only for eye color but not for hair color. The rs312262906 polymorphism causes a reading frame shift in the MC1R gene and is associated with red hair color. According to ExAC, the frequency of this polymorphism reaches 0.0038 in European populations and is 0.0000 (< 0.0001) in Asian populations; therefore, the probability of occurrence of at least 2 alternative alleles in our sample was negligible. This allowed us to assign the 0/0 genotype to this marker for all samples in order to predict hair color.
Genotypes were shortlisted using PLINK 1.9 [15]. The obtained genotypes are presented in tab. 2.
HIrisPlex-S and the online webtool of the Department of Genetic Identification (Erasmus MC) [16] generated predictions for eye color (light, intermediate, or dark) and hair color (red, light, intermediate, or dark) for all the samples.
Evaluation of eye/hair color prediction accuracy
Phenotypes predicted by HIrisPlex from the obtained genotypes were compared to the actual phenotypes identified by the anthropologists from the images taken during our expeditions; quality metrics were calculated for all 144 samples. The constructed 5-grade scales for eye/hair pigmentation were converted into conventional 3-grade scales in order to make phenotyping results suitable for comparison with HIrisPlex-S data.
To analyze the accuracy of HIrisPlex-S-based predictions, the following quality metrics were calculated:
– precision (the ratio of true positives to the total number of positive predictions);
– recall (the ratio of true positives to the sum of true positives and false negatives in the class);
– accuracy (the proportion of correct predictions);
– F₁ score (the harmonic mean of precision and recall),
– AUC (area under curve) for ROC-curves (the true positive rate plotted against the false-positive rate at various threshold settings).
Quality metrics values are provided in tab. 3 and tab. 4.
RESULTS
We photographed 144 representatives of the indigenous populations inhabiting European Russia and Siberia. Their DNA samples were genotyped for the markers included in the HirisPlex panel. Phenotyping and genotyping data obtained for each study participant were saved to a combined database.
To evaluate the quality of eye/hair color prediction by HIrisPlex-S in new populations phenotyped in advance, we predicted eye and hair color from their genotypes using the online webtool [16]. Results of eye color prediction for each individual case are shown in tab. 5.
Tables 1 and 5 allow comparing the observed and predicted phenotypes for each individual sample. Prediction quality metrics for the entire dataset are provided in tab. 3.
In our study, AUC values, the most widely used quality metric, ranged between 0.89 and 0.59 for different phenotypic classes, averaging 0.79. For Russian populations, AUC values are a bit lower than those observed in Western Europeans (0.89). For example, AUC for light eye color is 0.94 for Western Europeans vs 0.89 for Russians. A decrease in AUC values can be observed for all eye and hair color phenotypes represented in our study. Of note, prediction quality metrics for intermediate eye color and light hair color are not provided in this article because these 2 phenotypes were underrepresented in our sample. If necessary, they can be calculated using the data from tab. 1 and tab. 5. Their values turned out to be even lower than in Western European populations, but due to a very small sample size of these 2 phenotypic classes ( < 5 individuals), the obtained results cannot be considered reliable.
Russian populations are very heterogeneous genetically. We purposefully included genetically contrasting groups of indigenous populations of European Russia and Siberia in the sample. tab. 4 describes the quality of eye color prediction by HIrisPlex for these 2 metapopulations (the quality of hair color prediction was not evaluated due to a small sample size, see Methods). The accuracy of eye color prediction for the populations of European Russia was close to the prediction accuracy for the pooled sample. There was some decline in accuracy, in comparison with Western European samples, but, on the whole, the accuracy of prediction was satisfactory (AUC about 0.8). For Siberian populations, prediction quality was much poorer (AUC = 0.6).
DISCUSSION
The collection of anthropological images of the indigenous peoples of Russia laid the foundation for our study. The photos taken in 3 planes in accordance with anthropological standards are a valuable resource for research into the associations between phenotypic traits and genotypes. In this study, such images were used to identify eye and hair color. The fact that phenotyping was independently conducted by 3 different experts and the availability of photos for verification render the results of our study reliable and reproducible.
For genotyping, we used the most comprehensive, state-of-the art, popular HirisPlex-S system that has proven its accuracy in the studies of modern and ancient Western European populations [8, 11, 17]. HirisPlex-S prediction accuracy for the populations outside Western Europe was evaluated by comparing the observed phenotypes identified from the photos to the phenotypes predicted from DNA. Of all quality metrics (tab. 3), AUC posed the greatest interest because AUC values characterizing HIrisPlex performance are available for Western European populations [16]. So, we were able to directly compare the accuracy of HIrisPlex predictions between Western European and Russian populations.
On the whole, the values of prediction quality metrics obtained for the majority of phenotypic classes (tab. 3) were quite high (0.6–0.9), suggesting that use of HIrisPlex in Russian populations is justified. None of the systems predicting phenotypes from DNA is 100% accurate; for some classes, HIrisPlex prediction accuracy is below 0.9 even for Western European populations. In our opinion, this study has demonstrated the fitness of HIrisPlex for use in Russian populations and its satisfactory accuracy of prediction. However, HIrisPlex prediction accuracy is lower for Russian populations than for Western Europeans (0.8 vs 0.9 on average). Therefore, we believe that HIrisPlex can be used in Russian populations but still recommend to account for the detected decline in accuracy when interpreting the obtained data.
In our study, Russian populations were divided into 2 datasets: European Russia and Siberia. Previous population genetic studies [18, 19] revealed that these metapopulations are contrasting in terms of their genetic origin. They also turned out to be contrasting in terms of phenotype prediction quality, which was considerably lower for Siberia (tab. 4). The data in tab. 1 and tab. 5 demonstrate that HIrisPlex predicts dark eyes for almost all Siberian samples, although some representatives of Siberian populations have light color eyes; the division into light and intermediate shades is arbitrary, but even so, the color of their eyes is not dark as predicted by HIrisPlex. Perhaps, light color eyes sometimes seen in indigenous Siberian populations is associated with other alleles or other genes, as compared to Europeans, meaning that the system based on Western European datasets cannot correctly predict light (not dark) color of eyes in those populations. A decrease in the prediction accuracy for the inhabitants of European Russia may have the same nature, but because this population is genetically closer to the populations of Western Europe, the differences in the allele spectrum and the decrease in prediction accuracy are not so pronounced. This can inspire new research aimed at identifying additional genetic markers that could improve the accuracy of prediction of pigmentation phenotypes from genotypes.
CONCLUSIONS
The analysis of correlations between genotypes and eye/hair pigmentation phenotypes of Russian populations aided by the widely used HIrisPlex-S panel confirmed its fitness for use in these previously unstudied populations, although its prediction accuracy was lower than in Western European datasets that had served as a basis for this classifier. A decrease in accuracy (from 0.94 to 0.89) is not that dramatic for the populations of European Russia, as compared to Siberian samples. This decrease might be associated with an impact of population-specific SNPs well-represented in the populations of North Eurasia but rarely found in Western Europe and, therefore, not included in the HIrisPlex-S panel.