REVIEW
ECG-based biometric identification: some modern approaches
1 Laboratory of Medical Instrumentation Engineering,Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow oblast, Russia
2 OOO Altomedika, Moscow, Russia
Correspondence should be addressed: Artem Astapov
Institutskiy per. 9, str. 7, Dolgoprudny, Moskovskaya oblast, Russia, 141700; ude.hcetsyhp@vopatsa.metra
In this era of technological infrastructure, security issues are particularly important. Growing industries, network integration, a rapid development of information technologies urge us to search for new identity-based means of data protection.
Applications often need to identify a person: to match un unknown individual to a known identity from a database, to perform a “one-to-many” comparison, to verify an individual, i. e. to check if he is the person he claims to be, to perform a “one-to-one” comparison against a specific template. Such tasks can be found everywhere: from computer systems to systems that grant access to closed or corporate facilities. The identification of family members in their daily life also presents a particular interest.
Traditional password-based and identification systems have a number of flaws. A password can be forgotten or elicited, and such systems are easy to hack. One needs to always carry the identification key around, which is inconvenient. There are chances that the intruder can get hold of your password or a physical identifier. Besides, a person cannot be identified without any specific physical carrier. Together, all those factors prompt us to look for new approaches to the problem.
Biometrics (life measurement in Greek) refers to a system of human identification based on one or more than one physiological or behavioral traits [1]. Various physiological or behavioral characteristics can serve as biometrics if they more or less satisfy the following criteria: universality; uniqueness; permanence; measurability; performance; acceptability; circumvention (ease of use of a substitute) [2].
Currently the following biometric characteristics are used: fingerprints, face, iris, hand geometry, voice, DNA, facial thermogram, signature, gait, labial form, etc. [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. The advantages and disadvantages of these characteristics are related to the criteria listed above [13]. For example, it is almost impossible to make a mistake with a DNA-based identification or verification, and the samples can be used in forensics; however, this method requires special laboratory equipment. The same is true for fingerprints: though the reader can be quite miniature in comparison to a device used for a DNA-based detection, fingerprints can change with time or be affected by other factors. Thus, the use of various biometric identifiers is dependent on the goals, limitations and resources within a specific task.
Recently, scientists have focused their attention on the development of a new type of biometric recognition, namely on the electrical activity of the heart, a human physiological trait. Specifically, the ECG is becoming an adequate mean for a medium level protection in applications since it is easy and cheap to extract the signal and difficult to fake it or obtain nonconsensually. It is worth noting that the uniqueness of the ECG is a sum of various physiological factors such as heart anatomy, weight, gender, chest size, age, health, etc. Thus, with age or affected by a disease, the heart electrical activity changes, and it is not reasonable to use the ECG as a long-term biometric parameter. For example, Bionym, a Canadian company, has announced the development of the Nymi band, an electronic device that will record a user’s ECG every day, verify him and grant him access to certain infrastructure objects (a mobile phone, a computer, a hotel room, a car, etc.). For identification purposes, the ECG is most likely to be used while working with databases, as the advancement of telemedicine technologies allows storing huge data arrays, including patients’ ECG records. If an operator or a doctor fills in patient data incorrectly (mistypes a family name, date of birth, etc.), the identification of such records can contribute to a more accurate observation of the course of a disease.
Another possible field of application can be found with a small fixed number of users of certain ECG recorders: for example, in various medical institutions for the sake of convenience patients will only need to record an ECG, and the identification system will determine whose record it is. With identification, using ECG recorders at home will be easier; gadgets in the form of mobile telephone cases have already appeared on the market. They can record patient’s heart electrical activity and send it over to a doctor via Internet.
The main principles of building a biometric identification system and different approaches to the ECG-based human identification are reviewed below. The diversity of mathematical tools is described. The results of basic research works are presented.
ECG signal formation
Electrocardiogram is a time curve of a total electrical potential occurring in a heart muscle due to the flowing of ions through a muscle membrane [3]. IECG recording is one of the most common tools used for the diagnostics of cardiovascular disorders due to its high informative value and accuracy.
In cardiology, the ECG is often measured in several leads that carry information about the potential difference between the two points of the heart electric field, using electrodes. Each of the leads reflects the condition of a certain region of the heart muscle.
Basic principles of building EGG-based human identification system
The identification process includes the following stages:
- initial data collection;
- signal pre-processing (filtration etc.);
- extraction of typical features, their procession and template creation;
- comparison of a submitted template with previously enrolled templates in a database.
After that, an identification decision is made using various classification algorithms.
The most difficult task with identification is to select those features that are truly characteristic of a studied object. This particular area offers broad opportunities for experimenting with various approaches. The main idea is that a plurality of such features (descriptors) forms a vector that can be compared to other vectors using various mathematical methods.
There are approaches based on the extraction of such features as amplitudes, angles, vertical and horizontal constituents of ECG signal segments [15, 16].
Another approach is related to the extraction of analytical properties presented by signal decomposition coefficients in various bases, such as Fourier coefficients [17], wavelets, linear prediction coefficients [18], etc.
On this stage of the identification process, standard methods of classification are used. The simplest is the “nearest centroid” method. It labels a new input feature vector as the class that gives a minimal distance to the class centre. Another common approach is the “k-nearest neighbours” algorithm; it is based on assigning an object to the most common class among its neighbours. For recognition, support vector machines and neural networks are also used [19].
Comparison and results
One of the first scientific works that demonstrated the possibility of using the ECG for identification purposes, was an article by Lena Biel et al. [15] In the experiments with 20 healthy subjects it was shown that for a quality ECG-based identification 1 lead instead of 12 standard leads is sufficient.
As a basis for the ECG signal analysis, 30 signal features characterizing its form were chosen. These features are normally used for medical diagnosis. Their correlation with each other was analyzed, which helped to reduce the total number of features and to select those most specific for each individual. A set (vector) of 8 features (variables) characterizing (classifying) each individual was considered the most successful combination (fig. 1). To account for the variability of feature changes, the sample data were obtained from each participant at different times.
For identification, the so-called SIMCA method (Soft Independent Modeling of Class Analogy) was applied. It is widely used in chemometrics for spectroscopic data classification. It also allows working with a large number of features [20]. Classification tasks and algorithms and identification tasks often overlap, if we treat an object chosen for identification as a class.
The first step in SIMCA is a more common PCA (Principal Component Analysis), which in its essence is a mathematical tool for reducing data dimensionality or data compression [21]. Transforming a large number of variables to a new representation with considerably lower dimensions makes it possible to simplify data by orders of magnitude, for example, to reduce 1000 variables to 100, with no data loss and no variables being ignored. At the same time, the data which are irrelevant for the analysis are detected and removed as noise. Being discovered, principal components give an indication of hidden variables controlling data structure. Thus, an ECG feature space distinguishing an individual is projected on the principal components direction, which in that particular work was a plane, where each point is related to an individual, or a class in mathematical terms. In this space classification can be performed.
After building a PCA-decomposition, SIMCA is used to calculate the distance between classes as well as the distance from each class to a new object. Two values are used as such metrics. The distance between an object and a class is calculated as a root mean squared residual, occurring when projecting the object onto the class. The other value defines the distance from an object to a class centre and is calculated as the range (squared Mahalanobis distance). In this space a classification rule is set up and identification becomes possible.
In Biel’s work, the results of human ECG-based identification depended on a number of ECG features selected for the research. In average, the scientists accomplished 49 correct identifications out of 50.
Another study was performed by Steven A. Israel et al. [16]. They established that the psychological state of the subjects did not affect the outcome of the identification process. Interestingly, that the authors used LDA (Linear Discriminant Analysis) as a method for reducing the space of the studied parameters. The efficiency of LDA compared to Principal Component Analysis, as well as the combination of both, was studied by Y. Wang et al. [22].
In his work Y. Wang also used coordinate ECG parameters (amplitudes, angles and distances) as a basis for classification. However, the alignment of each complex by the R-peak was this work’s distinctive feature (fig. 2).
LDA is one of the oldest statistical methods [23] and is used for finding linear combinations of features that best discriminate two or more objects. Like PCA, it is commonly used as a classifier or for dimensionality reduction. Initially, Y. Wang investigated, which of the two following simple classification methods was more efficient as an algorithm for reducing the number of ECG signal properties: the k-nearest neighbours (a class of a classified object is the most common class among k-nearest neighbours) and the nearest centroid method (the closer a classified object is to a “gravity centre” of a group of objects belonging to a known class, the higher is the probability that it belongs to this class). It was shown that the best result can be achieved by using Principal Component Analysis with the k-nearest neighbours. Using a hierarchical combination of LDA and PCA, Y. Wang achieved a 98.9 % accuracy in recognition. 13 volunteers participated in the experiments; identification was performed more than once at different times and under different conditions.
Methods based on the extraction of analytical properties and neural network classification present a particular interest. In 2010 a study was conducted by a group of scientists led by Justin Leo Cheang Loong [18]. ECGs with one chest lead were recorded in 15 subjects. Two bases were chosen as algorithms for the analytical ECG signal representation and for comparison of their performance with each other, namely wavelets and coefficients of linear prediction.
A basis for the wavelet packet decomposition algorithm (WDP) is a wavelet, a term introduced by A. Grossman and J. Morlet in the mid 1980s in the context of feature analysis of seismic and acoustic signals [24]. Wavelet transform based algorithms are also used for electrocardiogram analysis. A wavelet transform is a tool that splits data into different frequency components. Each frequency is then studied with a required resolution. Thus, a wavelet transform is a tool for time-frequency localization of signal features. Among the advantages of WPD are a high decomposition rate, universality and a possibility to alter the decomposition level. However, this method cannot be automated. For best decomposition it is necessary to manually analyze several WPD levels. Another drawback is related to the core of wavelet analysis: a necessity to choose a basis wavelet depending on the character of initial time series.
J. L. Ch. Loong et al. subjected a signal to a 5-level wavelet packet decomposition using the db2 (Daubechies wavelet) and obtained overall 50 parameters that were used as a feature set for identification.
The Linear Predictive Coding (LPC) is normally used to model different parameters of human speech transmitted instead of samples or sample differences that require a larger bandwidth [14]. LPC algorithms are traditionally used for studying the vocal tract signal, i. e. for the analysis, recognition and procession of human speech. LPC coefficients allow predicting signal feature values as a linear function of previous segments. For ECG-based identification, the signal was processed using LPC algorithms. The first 40 points of the LPC spectrum were taken as a feature set for further research. Fig. 3 shows ECG LPC spectrum differences in 4 subjects.
As a classifier, an artificial neural network (ANN) was used as a classifier after applying the error back propagation algorithm. The idea of ANN originated from an attempt to describe the processes of information perception in human brain. Like human brain, the ANN consists of neurons, multiple elements that are connected to each other and imitate brain neurons. A basic structure of this network is shown in fig. 4.
Each neuron in a neural network transforms input signals into output signals and is connected to other neurons. Input neurons form the so-called neural network interface. Data input to a neural network is performed through the input layer that receives signals. All neural network layers process signals until they reach the output layer that generates output signals.
The ANN task is to transform data in a required way. For that, the network needs to be trained. During the training process, ideal (reference) values of input-output pairs or “teachers” are used. The “teacher” evaluates the behaviour of the neural network. For training the so-called training algorithm is used. The untrained neural network cannot imitate the anticipated behaviour. The training algorithm modifies individual neurons of the network and its connection weights in such a way, that its behaviour matches the expected performance. The main
idea of the method applied in that study is that error signals propagate reversely from the network output to its input while in the standard operation mode the signals propagate from the input to the output. The LPC algorithm showed better results against the WPD method with the recognition accuracy of 99.5 % and 91.5 % respectively [18].
CONCLUSIONS
The possibilities of human ECG-based identification have not been sufficiently studied so far. However, research in this area is actively pursued at the moment. The increasing number of such experiments all over the world gives us the reason to consider the electrical activity of the heart a very prospective research object. Different approaches to extracting individual parameters of ECG receive the most attention; uniform standards and effective methods are yet to be developed. However, the inspiring results obtained in previous studies hold promise for future development in this area.