ORIGINAL RESEARCH
The impact of image orientation on distribution of visual fixations while solving simple cognitive problems
1 Pirogov Russian National Research Medical University, Moscow, Russia
2 Kursk State Medical University, Kursk, Russia
Correspondence should be addressed: Ekaterina A. Petrash
Ostrovityanova st., 1, Moscow, 117997, Russia; ur.liam@hsartep
Compliance with ethical standards: the study was approved by the Ethics Committee of Pirogov Russian National Research Medical University (minutes #211 of October 18, 2021) and conducted in conformity with the requirements of the Framework Legislation "On the Protection of Health of the Citizens." All participants signed the informed consent form agreeing to undergo examination
Tracking of oculomotor reactions is a non-invasive technique employed to look into a wide range of cognitive and regulatory processes: attention, mnemonic activities, thinking categorization [1–3].
The technique enabling evaluation of the characteristics of oculomotor reactions in the context of a search for solutions to simple cognitive problems facilitates optimization of the educational processes, including those relying on the distance learning technology [4–7]. A cognitive task triggers a number of activities: invoking initial representations, subsequent clarification, expansion, concretization, systematization, differentiation and generalization of knowledge. For the purpose of this study, we selected image recognition as a simple cognitive task. The object to be recognized is a face of a person, which is a complex social stimulus of perception [8–14]. The complexity of this object stems from the multiplicity of details organized in a single symmetrical space that factors in the pre-determined location of each element. The social characteristics of the object allow identification of its species (human being) and gender.
We searched the Elibrary and Web of Science databases (using keywords "глазодвигательные реакции" (Russian for "oculomotor reactions") and "eye movements", respectively) covering papers published from 2015 to 2020). The results of this bibliometric analysis show that Russian researchers are less interested in the subject matter than their foreign counterparts. Oculomotor reactions were studied as part of research efforts in neurosciences, psychology, medical fields (ophthalmology and psychiatry), computer science, engineering. Papers dedicated to oculomotor reactions make up 29% of the total amount of the relevant scientific reports. Foreign studies focus on the many dimensions and diverse aspects of oculomotor reactions, which proves the technique selected for studying them is highly informative and universally applicable in the context of investigation of the processes of cognition and finding solutions to practical tasks, which rely of thinking and visual perception as such.
The attitude to the perceived stimuli and their categorization affects characteristics of oculomotor reactions, which means that oculomotor activity enables person's interactions with the world. Eye movements, acquiring the status of operations and actions, form integral oculomotor structures. Each of the formed oculomotor structures is associated with certain motives and conditions that govern how the person perform this or that activity [8, 9]. Eye tracking also allows measuring variables that are difficult to capture with other research methods, such as the exact spot looked at when receiving static or dynamic visual stimuli and instantaneous activation of the cognitive resources as required by the task [14].
The rating of oculomotor reactions in the context of visual perception activity is considered to be a statistical procedure that describes distribution of the studied parameters within an age group, and the subjects of such a rating should have no somatic and mental pathologies (be generally healthy). Optimization of the educational process, including distance learning, requires orderly arrangement of the information presented while factoring in the oculomotor reactions that accompany the search for solutions to simple cognitive tasks.
A study that aimed to determine how the number of gaze fixations affects face image recognition found that two fixations make the chances of successful recognition significantly better compared to a single fixation, regardless of whether the face in the image is familiar or not. Besides, the researches have established that the greater number of fixations does not translate into better quality of recognition [15]. As a key takeaway, the authors concluded that two gaze fixations are enough to recognize a person's face in an image. The face scanning direction — left to right — should also be mentioned here as an observed general trend, same as the significant differences in localization of the gaze at the tutorial and actual identification stages of the experiment.
Studies of oculomotor reactions conducted by the Russian scientists support the aforementioned conclusion: a successful recognition of a person's face in an image takes two visual fixations [16].
The two factors affecting the parameters of oculomotor reactions are the task, which can alter the distribution of gaze fixations on the stimulus image, and the format of the image shown. We assessed the impact of these two factors on the oculomotor reactions in a set of two experiments. Both had the same task (recognize faces on the images) but different formats of the images shown and varying angle it which they were presented. Studying the specific features of oculomotor reactions associated with contemplation of a face, researchers mainly focus on the number of gaze fixations in the substantial areas of the face image, those around eyes, mouth and nose [11–13]. However, they disregard spatial and orientational characteristics, i.e., directions (right-left, top-bottom) and angle of inclination of the image. Researchers also point to the significance of age as a factor affecting visual-spatial functions (field of visual perception, measurement by eye, etc). In adulthood, as opposed to the ages preceding it, correlations between the coordinate axes of visual sensory field are either unidentifiable or selective. Functional connections between the boundaries of the field of view in certain directions grow significantly weaker with age. In the perceptual visual field, on the contrary, structure of the perception becomes better pronounced with age, the improvement pattern coinciding with that of the spatial-discriminative capability of a person. Thus, the clearly shaped structure of visual perception enables maturation of this visual-spatial function and its maintenance at the optimal level throughout life [17].
Trying to identify the dominant characteristics (related to content or orientation) that affect the perception of a visual stimulus, we assumed that a change in the angle of inclination will condition distribution and number of fixations on the image. If content-related characteristics dominate the patterns of perception of an image of a face, the distribution of gaze fixations will remain relatively constant and concentrate in the areas of eyes, nose and mouth. If it is orientation that governs the perception, visual fixations will be predominantly registered in one of the four quadrants of the image, regardless of the angle at which it is shown.
The purpose of this study was to investigate the parameters of oculomotor reactions (number of fixations required to solve a simple cognitive task (recognition); distribution of fixations on specific areas of the image) associated with the process of solving simple cognitive tasks and assessed through the lens of age.
METHODS
The sample included 97 persons, 47 males and 50 females, ages 21 to 36 (early adulthood). For the purpose of rating the oculomotor reactions, the sample was divided into age groups: 21–26 years (n = 34); 27–32 years (n = 29); 33–36 years (n = 34). Forty-nine percent of the participants used vision correction aids (glasses or lenses).
The methodology that governed the rating procedures was developed by the authors of the study and relied on the Tobii EyeX eye tracking hardware and software solution (GazeControl software) [18]. The image recording frequency of a Tobii EyeX Controller is 90 Hz. The working distance of the eye tracker is 50–95 cm, the dimensions of the tracked space are 40–30 cm at a distance of 75 cm.
According to the methodology, the participants had to determine whether the two sequentially presented images showed faces of two different people or if both contained the face of the same person. The answers were registered for each pair of images presented.
The sets of stimuli included images of two types, schematic monochrome (fig. 1A) and full-color. There were 45 pairs of face images of each type. They were divided into two groups: 15 pairs that had the face images central axis uninclined (0° angle), and 30 pairs that had the paired images differing from each other in the face's central axis inclination angle.
Regardless of the direction, the inclination increment within a pair was 12°. This value was chosen based on the results of the earlier studies [19].
The resolution of all stimulus images was 1600–900 pixels, the files were .jpg. The monochrome images were made up of black lines showing the facial contours, hairline, ears, eyebrows, eyes, nose, mouth on a white background; the fullcolor face images were photographs of a man's face against a white background. Each image (including the interference images) was shown for 3 seconds, the duration of presentation of one pair of faces (including the interference) was 15 seconds; it took 7 minutes and 30 seconds to show one set of stimuli, and the total time of presentation of monochrome and full-color stimulus sets amounted to 15 minutes.
Between presentations of monochrome and full-color images, the participants rested for 2 min and could freely examine the environment and speak.
The number of errors made in judgments about the similarity or difference between the two face images enables assessment of the visual perception process specifics peculiar to the situations when the angle of the center line of one image differs from that of another image in a pair. The analysis of erroneous answers offered for a simple cognitive task (recognition) allowed identifying the shown face image's center line inclination angles that made recognition of the faces more difficult.
The study yielded heatmaps showing gaze distribution for each presented stimulus. The red zones on these maps, which were obtained based on the methodology developed by the study authors, are the registered and counted gaze fixations. The counting factored in the quadrants of the presented face.
For the purposes of statistical processing of the results we employed the methods of comparative statistics (Mann– Whitney U-test, use restrictions observed; Wilcoxon T-test, single sample and two sets of values obtained under different conditions). The three groups were compared in pairs.
RESULTS
The first step was the sample-wide analysis of the number of correctly recognized pairs of monochrome and full-color face images. Among the first findings was the fact that gender had no significant effect on the rate of recognition: both male and female participants have shown approximately similar results for schematic monochrome (U = 246; p = 0.453) and fullcolor (U = 278; p = 0.887) images. Same is true about vision correction: the participants that used vision correction aids were as likely to answer the experimental question correctly as their counterparts that did not rely on glasses or lenses (U = 272, p = 0.597 for monochrome images; U = 264.5, p = 0.505 for full-color images). These findings allow concluding that neither gender of the participants nor their level of visual acuity (and subsequent need for vision correction aids or lack thereof) influence the effectiveness of solving a simple cognitive task of recognizing face images significantly. Therefore, gender and vision correction aids are not the factors that have a significant effect on the process of recognition.
The next step was the analysis of effectiveness of solving a simple cognitive task (face recognition), which we conducted by calculating the indicators of the total number of errors and the total number of fixations (fixations were distributed over four image quadrants). As a result, we identified the face inclination angles associated with the majority of errors (sample-wide, both monochrome and full-color images). Figure 2 (fig. 2) shows these angles.
The presented face image center line angles that complicated solution of the simple cognitive task of face recognition (both monochrome and full-color images) were 72°, 216°, 312°, 324° and 336°. When the center axis of the second face image of a shown pair was rotated relative to the first one within the specified angles, the frequency of The presented face image center line angles that complicated solution of the simple cognitive task of face recognition (both monochrome and full-color images) were 72°, 216°, 312°, 324° and 336°. When the center axis of the second face image of a shown pair was rotated relative to the first one within the specified angles, the frequency of recognition errors averaged at 57.6%. It was also established that fullcolor images shown with their center lines at an angle of 24° (fig. 2B) were recognized erroneously in a considerable number of cases, but this phenomenon was not observed for monochrome images (fig. 2A).
The results of comparison of errors made in the three age groups allow drawing a conclusion that the amount of errors grows significantly with age (p1 — indicator of significance of differences between age groups 21–26 and 27–32; p2 — indicator of significance of differences between age groups 27–32 and 33–36; p3 — indicator of significance of differences between age groups 21–26 and 33–36), this statement being relevant for both monochrome (p1 = 0.014; p2 = 0.016; p3 = 0.014) and a full-color images (p1 = 0.015; p2 = 0.015; p3 = 0.017) (fig. 3).
Probably, the number of errors made progresses with age because of the growing reliance on stereotypes in visual perception and fading ability to perceive finer details. The nature of the errors made by the participants supports this assumption. The mistakes made by the members of the first age group (21–26 years old) had to do with the level of perceived details: shown same image several times in a sequence, each time at a different angle, they claimed that these were images of different faces. On the contrary, members of the third age group (33–36 years old), when shown images of different faces sequentially and with different central axis angles, claimed that the see one and the same image, i.e., their errors were associated with stereotyping of perception. Second group, ages 27 through 32, made errors of both types with equal frequency: they did not recognize similar face, thus making the error associated with the level of perceived details, and they failed to recognize two images of the same face as such, which means the error has to do with perception stereotyping.
The paired comparison of the numbers of gaze fixations registered in the groups (using the Mann–Whitney U-test, р ˂ 0.05) revealed no significant differences. Therefore, by this indicator the participants were united into a common research sample.
The errors quantity comparative analysis that aimed to investigate the difference between schematic monochrome and full-color image tasks (done using the Wilcoxon test, p ˂ 0.05) revealed no significant differences (T = 605; p = 0.763). These findings allow concluding that the quality of the stimulus image (schematic monochrome or full-color) does not affect the effectiveness of solving a simple cognitive task of recognizing face images significantly. Recognition relies on gaze fixations on the key points of the face image, regardless of whether it is schematic monochrome or full-color. The gaze fixation points are concentrated on the eye line, nose and mouth.
At the next stage of the study we sought to investigate the number of gaze fixations by face quadrants, differentiating between schematic monochrome and realistic full-color images but disregarding age as a factor. In case of monochrome images, the quadrants received the maximum number of gaze fixations when the image was shown at the following angles: first quadrant — 24–96°; second quadrant — 216–348°; third quadrant — 192–228°; fourth quadrant — 108–180° (fig. 4).
The results for full-color images were same as for monochrome images. Participants concentrated most on the first quadrant when shown the image with the central axis inclined in the range between 0 and 84°, for the second quadrant the range was 240–324°, for the third — 192–276°, the fourth — 96–168 ° (fig. 3).
In this study, we have experimentally confirmed that, in the context of solving a simple cognitive task, the distribution of gaze fixations depends on spatial orientation (center line inclination angle) characteristics of the face image, the differences registered being significant and statement applicable to both both schematic monochrome and fullcolor images. The general area that attracts gaze fixations regardless of the angle inclination of the central axis is the top left part of the face image, as illustrated by in heat maps (fig. 5).
The uneven distribution of gaze fixations across the quadrants, as well as the multiple repetition of movement trajectories, should be noted as a general trend. The fixation points are concentrated in the area of the eye line, and the participants repeatedly returned their gaze to those points. For both schematic monochrome and full-color images we have also registered repeated fixations around the left zygomatic part (fourth quadrant) and the region of the mouth on the right (third quadrant) (fig. 6).
Multiple gaze fixations in the nose area on the right (third quadrant) is a specific feature recorded for full-color images only. This spot attracts no fixations on a schematic monochrome image. The reason is that the orientational characteristics of a face image, i.e., eye shape, shape and size of nose, shape and size of lips, etc., determine the trajectory of eye movement and the areas of gaze fixations in the context a search for solutions to a simple cognitive task.
DISCUSSION
The rating procedure involves standardization; we found that neither gender nor vision correction aids (or lack thereof) have any significant effect.
Based on the results of the study, ranges of normative values for the number of gaze fixations were established. They factor in angle of inclination of the central axis of the presented face image and the number of errors made (as an indicator of the average number of fixations and standard deviation with confidence intervals). It should be noted that the rated numbers of fixations disregard age and quality of the stimulus (monochrome or full-color image) as factors, since the comparative analysis revealed no significant differences imposed by them (see table).
Based on the experimental data, the ranges of rated values were compiled for the third group (33–36 years) only, since this is the only group where the amount of errors exceeded 75% of the total number of cognitive tasks solved, with 75% being the threshold between likely random mistakes (below 75%) and a registerable pattern (above 75%). Moreover, the number of errors goes above 75% only at certain angles of the presented face image's central axis.
Thus, the age factor becomes significant for the simple cognitive task of face image recognition (both schematic monochrome and full-color images): the recognition success rate goes down as age goes up. The format of the presented image plays no significant part in the process of solving a simple cognitive problem of face image recognition. Repetition of the research procedure does not affect the results obtained.
The results of this study are consistent with findings of the previous studies. Earlier, it was proven that recognition effectiveness does not depend on the number of fixations provided there are at least two of them [20], which was also confirmed in our study. We have also confirmed the dependence of recognition success rate on the spatialorientational characteristics of the stimulus image, the former, being a simple cognitive task, changing with the latter.
CONCLUSIONS
This study reliably establishes the effect orientational characteristics of the image have on distribution of gaze fixations. Regardless of the angle of the image's center line and its properties (schematic monochrome or full color image), gaze fixations tend to amass in the first quadrant of the image, which is due to the cultural and historical traditions of reading and writing left to right and top to bottom. It can be assumed that people with different cultural and historical traditions will exhibit a different distribution of gaze fixations: Arab peoples read right to left, so the gaze fixations in their case will predominantly occupy the upper right part of the image (second quadrant). At the same time, the format of the presented image does not affect the distribution of gaze fixations. We have identified the angles of inclination of the presented image that complicate the search for solution to a simple cognitive task (comparison and recognition of two images). The maximum number of errors in image recognition (schematic monochrome and full color face images) was registered when the images were shown at the angles of 72°, 216°, 312°, 336°. The angle of 24° makes recognition of full-color images more difficult but has not such effect in case of schematic monochrome images; this fact is the result of a more complex structure of a full-color image in comparison with a schematic monochrome one. In addition to the eye lines and lips, gaze fixations were also registered in the areas of nose, forehead and ears when participants looked at full-color images. These elements are the criteria for comparison; the pattern did not repeat for schematic monochrome images. The practical significance of the results is that gaze fixation in the upper left part of the image allows avoiding erroneous characterization based on the interpretation of the relationship between the meaningful areas of the image and the parameters of oculomotor activity.