ORIGINAL RESEARCH
Sources and impact of human brain potential variability in the brain-computer interface
1 Lomonosov Moscow State University, Moscow, Russia
2 Neurocognitive Research Center (MEG Center), Moscow State University of Psychology and Education, Moscow, Russia
Correspondence should be addressed: Ilya P. Ganin
Leninskiye Gory, 1, str. 12, k. 246, Moscow, 119234, Russia; ur.liam@ninagpi
Funding: the study was supported by the Russian Science Foundation Grant № 21-75-00021, https://rscf.ru/project/21-75-00021/
Acknowledgements: the authors would like to thank Yu. Nuzhdin (Kurchatov Institute) for for developing and supporting software for EEG recording used to perform the study
Author contribution: Ganin IP — conducting research, data analysis and interpretation, literature review, manuscript writing; Vasilyev AN — data analysis and interpretation, literature review, manuscript writing; Glazova TD — conducting research, literature review; Kaplan AYa — data interpretation.
Compliance with ethical standards: the study was approved by the Ethics Committee of the Lomonosov Moscow State University (protocol № 113-d of 19 June 2020); the informed consent was submitted by all study participants.
Brain-computer interfaces (BCI) make it possible to directly translate brain activity into commands to control computer or any other device without involving muscles and nerves, only via analysis of the user's electroencephalogram (EEG) [1]. The concept of BCI, proposed and developed many years ago, has become an interdisciplinary technology, the primary purpose of which is supporting people with severe speech and movement disorders [2], along with the use as a tool for instrumental diagnosis or cognitive training [3–5].
BCI technologies ofter involve the use of event-related potentials (ERPs) [6]. One is the most widely used and wellproven systems is referred to as P300 BCI, since it is based on the analysis of the P300 component related to attention [7, 8]. The user of such interface usually mentally counts the number of flashes of the character or other command symbol of interest. The ERPs elicited to flashing of this (target) object are distinguished from ERPs elicited to flashing of all other (non-target) symbols by the presence of the P300 component [9]. The BCI algorithm recognizes the target symbol (command) by this feature and the presence of other components (primarily N1) in the ERP [10, 11].
The P300 BCI systems are in demand for communication: during typewriting or step-by-step control of certain device [12]. However, the main disadvantages of those include the need for repetition of stimuli aimed at accumulating ERP responses with the least error when the BCI user has to focus on the task for a long time. Furthermore, despite the assumption of similarity of the brain responses to the repeated stimuli, there is some temporal variability of certain responses relative to stimuli [13, 14]. This is a well-known neurophysiological phenomenon that generally reflects a number of natural brain processes at different levels, from cellular to the neural network level, and is also determined by fluctuation in the processes underlying perception of external stimuli [15].
It is known that such variability can affect the shape of the resulting averaged ERPs, including reducing the peak amplitude of certain components [16]. Lack of accounting of the variability effects may negatively affect the effectiveness of the P300 BCI based on the ERP extraction method, thereby reducing accuracy of the target command recognition [17, 18].
In general, changes in the ERP variability are considered to be associated with fatigue, increased cognitive load, complication of the user's task [15, 19], as well as conditions characterized by reduced attention, such as ADHD and autism [20, 21]. However, the factors affecting ERP variability in terms of P300 BCI were never systematically studied. Meanwhile, identification of the BCI operation modes having a beneficial or negative effect on ERPs and the command classification accuracy would make it possible to develop more effective systems capable of ensuring more reliable control, especially when it comes to potential users with reduced attention.
It also seems appropriate to consider ERP variability in the P300 BCI by modifying the command classification algorithms. This can be particularly important when a relatively small number of stimuli is accumulated in the interface, and the effects of variability may not be compensated by the number of averaging procedures. Given the ERP components' different contributions to classification along with variation in their topography among various users [22], extracting independent spatial components to analyze and consider their variability separately can be a more effective approach.
The study was aimed to identify possible factors of the stimulus environment and P300 BCI operation modes affecting ERP variability, as well as to develop and test more effective methods for independent detection of variability of individual ERP components during classification.
METHODS
The study involved 19 healthy subjects (5 males and 14 females) aged 18–23. Inclusion criteria: healthy male and female volunteers aged 18–35. Exclusion criteria: diagnosed neurological/mental disorder, episodes of seizures or diagnosed status epilepticus.
During the experiment the subject sat in a chair in front of the monitor on which a standard P300 BCI matrix sized 6 × 6 with letters of Russian alphabet and numbers was presented. The angular dimensions of the matrix were 18° × 18°, the cell size was 1.7°, and the cell spacing was 1.1°. The background of the screen and cells was black (RGB 0,0,0), while cell frames and characters within the cells were grey (RGB 89,90,97). Stimuli were represented by random flashes (the background color changed from black to grey, and the color of characters changed from grey to black) of rows and columns in the matrix. The duration of stimulus and the interstimulus interval were 97 and 48.5 ms, respectively (16 and 8 frames for the refresh rate of 165 Hz). Stimulation involved using the stimulus sequences, each sequence included presentation of all 12 stimuli available in the matrix (six rows and six columns).
A separate experimental mode included 15 blocks, one cell of the matrix per block was designated as a target cell (it was marked by repeated wink at the start of the block). Five stimulus sequences per block were presented, which corresponded to 60 stimuli (10 target stimuli and 50 non-target ones). Thus, each mode included 150 target stimuli and 750 non-target ones.
Several modes distinguished by parameters of the stimulus environment and the subject's task were used to study the impact of various factors on the ERP variability. In the passive attention mode the subject was not supposed to count flashes of the target stimulus as in the P300 BCI, he/she simply fixed his/her gaze on the target cell. The task was complicated by using the mode involving mixing-up letters: the characters in all cells of the matrix randomly changed their places with each target flash. The subjects were asked to count not only all target flashes, but the number of consonants in the target cell when the character changed. To make it easier to fix gaze on the cell and reduce the effects of distractor in the modes involving the use of half-empty matrix, the characters were not made permanently visible, these appeared only with flashes (fig. 1).
The modes and brief instructions for the subject were as follows:
- ordinary matrix, passive attention (“just look at the target cell”);
- ordinary matrix, active attention (“count the number of flashes of the target cell”);
- half-empty matrix, active attention (“count the number of flashes of the target cell”);
- half-empty matrix, mixing up, active attention (“count the number of flashes of the target cell”);
- half-empty matrix, mixing up, cognitive load (“count the number of consonants in the target cell”);
- ordinary matrix, mixing up, active attention: (“count the number of flashes of the target cell”);
- ordinary matrix, mixing up, cognitive load (“count the number of consonants in the target cell”).
All modes alternated to generate a pseudo-random sequence, except for the passive attention mode that was always the first due to special instruction.
EEG was recorded with 30 scalp electrodes (Fp1, Fp2, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO7, POz, PO8, O1, O2) and a common reference electrode TP9 + TP10 using the NVX52 amplifier (MCS, Zelenograd; Russia). The sampling frequency was 1000 Hz. A miniature photosensor mounted in the upper left corner of the screen was used to ensure EEG synchronization with the flashes. Signal recording and management of experimental procedure were implemented in the original Resonance programming environment written in C++ (http://resonance.bcilab.net/documentation).
EEG signal processing and classification were performed in MATLAB 9.13 (R2022b) (MathWorks; USA). The EEG signal was band-pass filtered within the 1–10 Hz range using a FIR filter without a phase shift. Then ocular artifacts were removed by independent component analysis (ICA). After that the continuous signal was split into epochs from –400 to 1200 ms relative to the stimulus onset.
The next phase of analysis involved acquisition of spatial filters to extract the components of interest (N1 and P300) from the multichannel EEG signal. For that the epochs in the vicinity of individual ERP peaks were extracted in each subject, after that optimal spatial projections (spatial filters) were calculated based on the Fisher’s criterion [23]. Such method made it possible to reduce the EEG signal dimension, increase the signal-to-noise ratios of the studied components, and largely isolate two components from each other for independent study [23]. The further analysis was performed for these two extracted spatial components (once for N1 and once for P300). Signals of the components were normalized to the standard deviation of all non-target epochs within each subject (hereinafter, AU instead of µV).
A set of target and non-target epochs was formed within each subject, component (N1 and P300), and mode. To acquire ERPs averaged by conventional method, all the epochs of the same subject were averaged individually for each mode, for the class of the target and non-target epochs of the N1 and P300 sets. The amplitude of these components was defined as the minimum/maximum signal value within the 100–350 and 200–500 ms windows, respectively, and the peak latencies were defined as the time after the stimulus onset when the signal reached its maximum or minimum.
Furthermore, to analyze the ERP variability, the N1 and P300 component latencies were calculated within certain non-averaged target epochs as local minima or maxima in the same windows as for ERP. The component's amplitude was determined by the signal values for the latencies found within this epoch. To assess variability of the ERP peak latencies, the mean absolute deviation (MAD) was calculated in each mode for each subject. To estimate the effect of variability on the ERP amplitude, the epochs were shifted along the time axis by the difference between the average latency and the component latency within certain epoch prior to averaging.
To estimate the effects of ERP variability on the effectiveness of command recognition in the BCI, classification accuracy was calculated for ordinary EEG channels (standard approach) and for the extracted spatial components N1 and P300. It is important to note that classification scores of two types were calculated for the latter: without equalization of latency peaks and with equalization (correction for N1 or P300 only and correction for both peaks, N1 and P300). The signal amplitude values within the 0–600 ms window (every 10th point) in 11 channels of EEG leads Cz, CP1, CP2, P3, Pz, P4, PO7, POz, PO8, O1, O2 or two channels obtained for N1 and P300 of appropriate spatial components were used as the Fisher's linear discriminant features. The classification accuracy was assessed by cross-validation with sequential testing of the data of a single block (all epochs of the same target cell) of the classifier trained using the other 14 blocks. The classification accuracy was determined as a proportion of the correctly recognized letters (out of 15). When performing testing, accuracy was calculated for different number of the stimulus sequences (one to five). The accuracy was calculated for each mode, subject, and signal feature extraction method.
Statistical analysis was performed in MATLAB using the generalized linear mixed effects models. A single constant coefficient was used as a random factor for the "subject" variable, while experimental conditions (“active attention”, “cognitive load”, “half-empty matrix”, “mixing up elements”) and latency correction modes were considered as fixed effects. The fixed effect significance was assessed using F-test. The following dependent variables were assessed: amplitude, latency, MAD of the N1 and P300 latencies, and classification accuracy. We used binomial regression to assess classification accuracy and linear regression to assess other parameters.
RESULTS
Fig. 2 (fig. 2) shows the extracted spatial components N1 and P300 and the corresponding patterns (topographic distribution of weighting coefficients). The N1 component with the average latency of 187 ms had typical lateral occipital localization, while P300 with the latency of 315 ms had medial parietal localization.
Table 1 (tab. 1) provides the group-averaged amplitudes of the N1 and P300 components obtained in each mode, before and after correction of latencies within individual epochs. The N1 and P300 amplitudes of the averaged ERPs increased after applying correction: F(1.258) = 581.24; p = 0.00000. The factor of active attention turned out to be significant for the N1 amplitude that increased relative to passive attention to the stimulus (mode 1): F(1.36) = 17.87; p = 0.00015. The increase in the N1 amplitude was reported for such factors, as “half-empty matrix” (F(1,110) = 16.10; p = 0.00011) and “cognitive load” (F(1.110) = 48.49; p = 0.00000). The increase in the P300 amplitude was reported for the “cognitive load” factor (F(1.110) = 18.01; p = 0.00005), while the decrease was reported for the factor of “mixing up elements” (F(1.110) = 4.72; p = 0.032).
The average latencies of the N1 and P300 components together with the indicator of the latency variability (MAD) are provided in tab. 2. The decrease in the N1 and P300 latencies were reported for the factor of “half-empty matrix”: F(1.110) = 45.87, p = 0.00000 and F(1,110) = 24.51, p = 0.00000, respectively. The increase in the N1 latency was also reported for the factor of “mixing up elements”: F(1.110) = 5.17; p = 0.025. Active attention resulted in the decrease of the N1 component MAD relative to the passive attention mode: F(1.36) = 1.60; p = 0.0016. The decrease in the N1 MAD was reported for the factors of “half-empty matrix” (F(1.110) = 12.43; p = 0.00061) and “cognitive load” (F(1.110) = 11.56; p = 0.00094). As for P300, the increase in MAD was reported for the factor of “mixing up elements”: F(1.110) = 4.80; p = 0.03056.
Table 3 (tab. 3) provides assessment of the average classification accuracy in all modes using different signal feature extraction methods: EEG channels and the channels for N1 and P300 of the corresponding spatial components, to which the latency correction was applied or not applied. The table provides data for the minimum number (1 or 2) of the stimulus sequences per letter, when accuracy is still low, and the differences between the modes are larger. The trend towards an increase in accuracy is reported for the “cognitive load” factor: F(1.108) = 3.39; p = 0.068.
Fig. 3 (fig. 3) presents the average classification accuracy for different signal feature extraction methods and different number of the stimulus sequences. When using spatial filters (only two data vectors, for N1 and P300) without latency correction, the accuracy was the lowest and was even lower than when using the usual 11 EEG electrodes: F(1.3284) = 5.99, p = 0.014. Applying latency correction to the spatial component N1 only yielded higher accuracy, however, this option did not differ significantly from the option involving the use of usual EEG electrodes: F(1.3284) = 1.1771, p = 0.28. However, applying latency correction to the spatial component P300 only resulted in higher accuracy compared to the use of usual EEG electrodes: F(1.3284) = 24.51, p = 0.00000. The highest classification accuracy values were obtained when applying latency correction to both N1 and P300 (in each of the two appropriate spatial components). In this case, the accuracy was higher compared to the use of usual EEG electrodes (F(1.3284) = 24.29, p = 0.00000) and higher than when applying latency correction to P300 only (F(1.3284) = 4.34, p = 0.037) (as for the latter, the differences were reported for the 2nd and 3rd stimulus sequences: p < 0.05).
DISCUSSION
In our study we proposed an effective approach to assessing the ERP variability in the P300 BCI that allowed us to identify a number of factors affecting the ERP characteristics and explore the contribution of the variability effects to the command recognition accuracy in this interface.
To analyze the effects of the ERP latency variability, it is necessary to detect the components in individual (non-averaged) epochs. This process is very complicated due to both technogenic and physiological noise, that is why it is extremely important to make the most of valuable information contained in the EEG signal. Despite the fact that in some trials the effects of variability were studied in terms of the P300 BCI, the impact of these effects was estimated in usual EEG channels for the P300 component only [17, 24]. In our previous study, we applied latency correction to two components, N1 and P300, however, each component was analyzed in its own channel set [18]. The use of the combined information from all channels with simultaneous analysis of several components in each of these channels can be a more effective approach. For example, the independent components extracted by ICA have been already used by the authors of papers on assessing variability (not related to BCI), however, these researchers have analyzed only one early component of ERP [21. 25]. Furthermore, the ICA method does not guarantee extraction of the components of interest for analysis. In this study we have proposed the use of spatial filters for extraction of two components, N1 and P300, that are functionally significant for the P300 BCI, with subsequent analysis of variability in these components instead of individual EEG channels. This method was used earlier [23], but in that study it was an additional step of preprocessing and extraction of the signal features for classification in the BCI, it had nothing to do with assessment of the ERP variability effects. The extraction of spatial components aimed at independent correction of these components has never been applied previously. Moreover, the use of the approach involving spatial components reduces the likelihood of erroneous peak detection within individual epochs compared to the use of signal in certain EEG leads, thereby making the variability analysis more objective.
An essential aspect of the work was to reveal the possible factors affecting the ERP characteristics in the P300 BCI. Active attention (the directive to emotionally count flashes) resulted in the increase in the N1 component amplitude, and the mechanism underlying such an increase was likely to include the decrease in the latency variability of responses to individual stimuli, since a simultaneous decrease in MAD was observed. The increase in the ERP components' amplitude relative to passive attention to stimuli in the P300 BCI has been earlier reported for such construct in this group [26]. Presumably, the directive to actively count the stimuli improves fixation of gaze on the target position within the matrix, which is important for the N1 component [27]. Lack of characters in all cells of the matrix is also likely to improve fixation of gaze on the target cell, since the N1 amplitude increase in the “half-empty matrix” mode has been reported along with the decrease in its variability. This is consistent with opposite effects on the N1 component in the environment, where tracing the target objects is complicated by their mobility [18], and supports the relationship between the features of oculomotor system function and the ERP components' variability [28].
The constant changing of characters in the matrix cells is likely to adversely affect attention to the target stimulus, as evidenced by the decrease in the P300 amplitude and the increase in its variability, along with the increase in the N1 latency. The negative impact of such manipulations with the stimulus environment on the P300 BCI is also confirmed by the fact that the subjects have reported trouble following instructions in the modes involving mixing-up characters. At the same time, an interesting and not entirely obvious result is that additional cognitive load applied in the modes involving mixing-up characters (counting consonants with the change of letters), in contrast, resulted in the increase in the N1 and P300 amplitudes. Furthermore, the effect reported for N1 at least partially resulted from the reduced variability. It is well-known that the effects of individual responses' variability are enhanced when the subject's attention flits between two competing tasks [29]. Perhaps, the cognitive load integrated into a task of tracing the target events, that was used in our study, on the contrary, caused the increase in attention, that is why such modification of the stimulus environment may be prospective for the P300 BCI.
The potential effectiveness of using the factors that have a beneficial effect on attention in the BCI is also confirmed by the trend towards the increased target stimuli classification accuracy in the modes with cognitive load (tab. 3). The method of applying variability correction not to usual EEG leads, but to the extracted spatial components N1 and P300, that has been proposed in our study, has ensured the best classification accuracy (fig. 2). Furthermore, the largest increase in accuracy is observed when using the least number of the stimulus sequences (94% vs. 84%). This emphasizes the value of this method for the P300 BCI operation modes and provides superior results compared to that yielded by the studies also involving extraction of spatial components, but not taking into account the effects of variability [23, 30]. The fact, that the N1 and P300 components' contributions to the effectiveness of classification are unequal, attracts attention: the contribution of uncorrected N1 is larger than that of uncorrected P300. However, given the higher P300 variability, correction of its latency resulted in the significantly increased accuracy, thereby overperforming both correction of N1 only and the use of standard EEG electrodes.
To date, the fact, that in this study we have not adjusted latency in the non-target epochs, is considered to be a limitation of the approach. In the future, it would be necessary to develop an algorithm, which, for example, would allow us to avoid correction of low-amplitude peaks in the non-target epochs, for implementation of the online BCI.
CONCLUSIONS
The paper proposes an approach to analysis of the ERP latency variability in the extracted EEG spatial components. The use of this method in the P300 BCI has made it possible to achieve better results in terms of the command classification accuracy compared to the existing methods. Furthermore, the use of such an approach has revealed some factors of the stimulus environment and the P300 BCI operation modes having an impact on the ERP variability effects. Specifically, modifications of the interface affecting the user's attention, including the cognitive load applied in addition to the main task, and making it easier to fix gaze on the target objects have a beneficial effect on the ERP amplitude and the decrease in variability of individual responses to stimuli. The findings complement the existing knowledge of the mechanisms underlying the ERP latency variability and provide new reasons for the development of more effective BCI systems.