ORIGINAL RESEARCH
High-speed brain-computer communication interface based on code-modulated visual evoked potentials
1 Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
2 Department of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia
Correspondence should be addressed: Rafael K. Grigoryan
Leninskie Gory 1, bld. 12, Moscow, 119234; moc.liamg@oib.hparrg
Author contribution: Grigoryan RK — experiment planning and conducting, data processing, article authoring; Filatov DB — experiment planning, software development, article authoring; Kaplan AY — task setting, experiment planning, general research effort management, article authoring.
Brain-computer interface (BCI) is a technology that allows patients with speech and movement disorders to control a computer through the analysis of correlates of their neuronal activity. BCI requires the user to focus attention either on internal images, e.g., limb movements, or on objects on the screen, like letters needed at the given moment. BCI systems translate such mental efforts into computer input commands by registering specific EEG markers peculiar to such efforts [1–3]. Interfaces that make use of visual potentials evoked, for example, by flashing objects on the screen, offer a wide range of EEG-detectable commands, their amount being the same as that of on-screen stimuli. P300 component is the traditional EEG marker signaling of the user's attention to a specific event, like flashing a certain letter [4, 5]. Such interfaces have lately been actively introduced into the medical rehabilitation practice to enable communication with patients suffering from severe speech and movement disorders [6]. Their reliability in translating commands given by cognitively intact patients is sufficient, but the speed of operation is quite low, which is a drawback. For example, a comparison of BCI capabilities as used by healthy people and patients with amyotrophic lateral sclerosis has revealed that the text typing rate does not exceed 2–3 letters per minute with each target object shown 14 times. With the accuracy in both groups exceeding 95%, such a rate translates to 11–14 bits per minute (bpm) [7], which makes BCI using P300 as marker uncomfortable even for healthy people. Code-modulated visual evoked potentials (CVEP) promise to speed up BCI transfer rates. CVEP is a joint EEG-detectable response to the special irregular frequency sequences of flashes of the required on-screen object. Such sequences stimulate a steady state visually evoked potential (SSVEP) registered by EEG, which is phase-locked with stimulation. Phase synchronization, which is also peculiar to SSVEP-enabled BCI featuring regular frequency stimulation, allows the evoked potential to retain the properties of the stimuli sequence to a certain degree, in particular — its cyclicity, autocorrelation and spectral characteristics. Presenting a number of visual stimuli through a number of different sequences that correlate with each other at least minimally allows distinguishing between the evoked potentials brought by different stimuli through a correlation analysis. There are various sets of binary sequences with suitable cross-correlation properties, such as the Gold codes, Barker codes and m-sequences. They are used to identify signals carried on the same frequency in various spheres, like mobile communications and satellite navigation.
M-sequence is a pseudo-random binary sequence that has a single peak of autocorrelation function at zero shift. With a cyclic shift, one m-sequence can produce several sequences not correlated with each other, which facilitates its application in BCI featuring a large number of stimuli, as it shortening the classifier learning period. To distinguish between the potentials evoked by different stimuli, it is necessary to assemble a learning sample. Such a sample should contain potentials corresponding to each stimulus. With each stimulus using its own binary sequence generated by a single m-sequence, it is enough to get the reference evoked potential peculiar only to that binary sequence. Then, it is possible to detect the target stimulus by shift of the correlation function's peak between spatially filtered sections of the recorded EEG and that reference evoked potential. As a result, the duration of learning does not depend on the number of different stimuli, which allows using the amount of stimuli sufficient to print a text.
Canonical correlation analysis of EEG allows reliable detection of short instances of code-modulated evoked potentials synchronous with the flashes of the currently needed on-screen object. The number of stimuli activation repetition cycles therein can be as low as 2 or 3. Thus enabled BCI offer information transfer rates in excess of 100 bpm [8]. Both EEG and electrocorticogram [9] allow registering code-modulated evoked potentials; they can be used to optimize BCI operation to adjust to error-related potentials and to utilize color gamut in the stimulus environment [11].
Seeking to find the optimal modes of code-modulated on-screen object flashing for BCI, we tested operation with different sequences and rates of stimuli flashes.
METHODS
Participants
Fifteen healthy volunteers (7 female and 8 male) aged 18–30 years participated in the study. The inclusion criteria were: no history of neurological diseases, including epilepsy; normal or corrected vision. The exclusion criteria were: age different from the required; history of neurological diseases; vision problems.
EEG registration
We used the Neurovisor-BMM 40 EEG amplifier (Meditsinskiye Computerniye Systemy; Russia) to record biopotentials, and 22 channels (FCz, C3, C1, Cz, C2, C4, CP3, CP1, CPz, CP2, CP4, P5, P3, Pz, P4, P6, PO3, POz, PO4, O1, Oz, O2) with AFz as a grounding electrode and two averaged ear electrodes as reference. Before recording, we checked the interelectrode impendance; the recording was started after the resistance was brought to the values below 10 kOhm. The sampling rate was 500 Hz.
Experimental rig
The experiment was controlled from a computer using custom software developed by the authors in C++. The stimuli were shown to participants on a 24-inch display with a refresh rate of 120 Hz. The participants sat in a chair approximately 60 cm away from the display. A photosensor was used to ensure synchronization of EEG recording and stimuli presentation.
Stimuli presentation
The objects were presented in 32 square cells (4 by 8 table) containing letters on a black background. The stimulation was effected through changing the color of the cell from black to white.
The color change algorithm was determined by a 63-bit m-sequence. Each cell changed color in accordance with its own m-sequence derived from the basis sequence through 2-bit cyclic shifts. Thus, activation of the first cell followed the basis sequence, that of the second cell was shifted by 2 bits, of the 10th — by 18 bits, etc. Overall, we used two basis m-sequences: basis — [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0] and inverted — [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1].
There are no other 63-bit m-sequences that are not cyclic shifts of these. The inverted sequence is similar to the basis one in terms of autocorrelation properties, but it generates a significantly different visual stimulation.
In the course of our experiment, we tested four modes of BCI, each with its own m-sequence and stimulation rate. The first two modes featured basis and inverted sequences, respectively. The period was 1 second. Parameters of the third, "slow" mode - basis sequence and 2-second period, those of the fourth, "fast" mode — period of 500 ms.
Thus, the duration of one bit in white and black colors in standard, fast and slow modes were approximately 16, 8 and 32 ms, respectively.
Structure of the study
Each volunteer participated in 4 experimental sessions. The order of modes was selected at random after briefing and electrodes placement. Each mode session began with the classifier learning from the participant's viewing one of the stimuli for 40 full sequence presentation periods. After that, the participant had to enter 32 commands following a predetermined order of concentration on stimuli. Entering a command took a careful look at a particular letter while concentrating on its flashes. After a few seconds, the system produces an answer, which could be correct or wrong. Then the stimulation is stopped. Following a break of several seconds, the stimulation resumed and the participant had to try to enter another command.
The command was considered entered when the classifier reached a certain threshold. The accuracy of the choice of commands was determined as the ratio of correctly entered commands to the total number of input attempts.
Pattern classification
Canonical correlation analysis allows obtaining the weights of channels used to spatially filter EEG and to isolate a significant response to the sequence of stimuli. Weights obtained through analyzing EEG readings recorded while learning were used to decrease the signal's dimension. Learning yielded a single-channel m-sequence response signal averaged over 40 full periods. During actual operation, a one-dimensional signal peculiar to demonstration of the m-sequence in full triggers compilation of the function describing its correlation with the signal obtained during learning. The command selected by the user is determined by the shift of peak of this correlation function. Determining the number of the target stimulus takes division of the correlation function's time shift maximum by the time of one bit implementation and the shift between successive stimuli.
Data analysis
We used the scipy 1.1.0 package [12] to process the results, normalized cross-correlation to build correlation maps and applied the Wilcoxon test (Holm-Sidak multiple comparison correction) to pairwise comparisons.
RESULTS
Assessment of classification accuracy and information transfer rate
In slow mode, with the m-sequence period of 2 s, the median accuracy of command selection reached 1 (fig. 1). In basis and inverted m-sequence modes the accuracy was 0.96 and 0.95, respectively. In the fast mode the median accuracy was 0.33, which makes it the only mode the accuracy of which is significantly different from that of all other modes with the multiple comparison correction applied (p < 0.05). However, one participant showed the accuracy of 0.96 in this mode, a result that cannot be explained by random reasons since it was preceded by input of 32 commands.
The command input rate is another important property of BCI. In the modes with m-sequence period of 1 s, the median time required to identify one command was 2 s. In the slow mode, the figure was 3.5 s, in the fast mode — 1.2 s.
Information transfer rate is an integrative indicator of the BCI quality: it combines both the rate and the accuracy of command selection. We used the Shannon definition as applied to neurocomputer interfaces [13] to calculate the indicator. The median information transfer rate in the basis and inverted sequence modes was 141 and 142 bpm, in the slow mode — 93 bpm, while the fast mode yielded the smallest value: 37 bpm, which is the result of low accuracy in command selection. In the latter mode, however, one user was able to enter commands accurately and showed the highest transfer rate of 287 bpm, with the command input time being 1 second and accuracy of 0.96. The information transfer rate at the m-sequence period of 1 second was significantly different from the slow mode (Z = 2.7; p = 0.019).
The shape of evoked potentials and topographic distribution of evoked activity
fig. 2 shows the shape of code-modulated evoked potentials averaged relative to the first bit of the sequence, restored for the zero-shift m-sequence. As a quantitative characteristic, we used the correlation between the averaged evoked potential and single potentials corresponding to single m-sequences. fig. 3 shows the maps of maximum values of normalized cross-correlations between averaged evoked potentials and responses to single sequences.
According to the figure, the degree of similarity between evoked potentials reaches its maximum in occipital channels. All modes that allowed a high accuracy of command selection had the highest correlation degree at the Oz site. The maximum correlation was registered in 8 channels: P4, P6, PO3, POz, PO4, O1, O2, Oz. The absolute values of the cross-correlation maxima do not differ significantly between the modes when compared in corresponding channels. In the fast mode, localization of evoked potentials was less pronounced, which is probably one of the reasons behind the poorest results shown by the participants.
DISCUSSION
Research of the CVEP-enabled BCI reveals a number of interesting patterns that play a role in the development of a high-quality medical neurocommunicator for a wide range of patients. First of all, it is the ratio between rate, accuracy of input and the overall information transfer rate peculiar to a specific modification of the interface. Obviously, from the user's viewpoint, the main property of a BCI is the information transfer rate. The data obtained indicate that this type of interface is capable of a transfer rate severalfold greater than that of the traditional BCI making use of P300, which makes the new interface a promising tool in the clinical practice. In the first three modes, the information transfer rate is within the limits usual for interfaces of this type [8, 14]. However, one participant managed to reach the rate of 287 bpm in the fast mode, which proved impossible for the majority of other participants. This is an important fact; being unique for the sample, this result substantiates the development of a BCI that would allow fine tuning the parameters to individual characteristics of its user with the aim to find their optimal combination (stimulation rate in particular). Such an approach can help overcome the known problems associated with adaptation of results of laboratory research involving healthy participants to clinical practice [15]. Another problem it would solve is the so-called BCI-illiteracy, i.e. inability of patients to learn to operate a brain-computer interface [16]. There are different approaches to tackling these problems, including modification of the training stage [17] and the individualization of interfaces. Fine tuning the m-sequence carrier frequency and its period could help find optimal values that maximize the information transfer rate for each specific user. In fact, this is a routine already practiced when tuning P300-based BCI [18]. Unfortunately, modern displays, even those with high refresh rates, do not allow sufficient flexibility in adjustment of the stimulus sequence period frequency for CVEP BCI. For example, in the context of this study we were unable to demonstrate a sequence with the period of 0.8 seconds, which suggests that designing a special device for such purpose would be feasible. Several such attempts have already been made (based on evoked potentials) [19], but the specific implementations presented offer a low information transfer rate due to the small number of stimuli.
CONCLUSIONS
The results of this study suggest that optimizing BCI operation for a user requires fine tuning the parameters depending on the individual characteristics of each such user. We have shown that inverting the coding stimulus sequence does not affect the accuracy of selection of commands by BCI users, which translates into equal applicability of both direct and inverted stimulation modes. At the same time, faster modes of BCI operation with the sequences twice shorter proved to be suboptimal for the majority of participants of the experiment. The significant individual differences in accuracy and information transfer rate revealed by this study suggest that it is possible to optimize BCI through its fine-tuning to the specifics of the given patient.