ORIGINAL RESEARCH

Labelling of data on fundus color pictures used to train a deep learning model enhances its macular pathology recognition capabilities

Takhchidi KhP1, Gliznitsa PV2, Svetozarskiy SN3, Bursov AI4, Shusterzon KA5
About authors

1 Pirogov Russian National Research Medical University, Moscow, Russia

2 OOO Innovatsioonniye Tekhnologii (Innovative Technologies, LLC), Nizhny Novgorod, Russia

3 Volga District Medical Center under the Federal Medical-Biological Agency, Nizhny Novgorod, Russia

4 Ivannikov Institute for System Programming of RAS, Moscow, Russia

5 L.A. Melentiev Energy Systems Institute, Irkutsk, Russia

Correspondence should be addressed: Pavel V. Gliznitsa
Belinskogo, 58/60, et. 5, 603000, Nizhny Novgorod; moc.duolci@pastinzilg

About paper

Funding: this work was financially supported by the Foundation for Assistance to Small Innovative Enterprises in Science and Technology (contract №150ГС1ЦТНТИС5/64226 dated December 22, 2020)

Author contribution: Takhchidi HP — manuscript editing; Gliznitsa PV — study concept and design, data collection and processing, results analysis, manuscript writing; Svetozarskiy SN — participation in data collection, literature and results analysis, manuscript writing; Bursov AI — literature analysis, algorithms development, manuscript editing; Shusterzon KA — algorithms development and validation, illustrations preparation, text writing.

Received: 2021-07-27 Accepted: 2021-08-15 Published online: 2021-08-28
|

In the Russian Federation, retinal diseases rank second and cause 28.9% of the visual impairment cases [1]. An effective retinal pathology early detection system that would be part of the mass preventive examination campaigns is yet to be deployed. Such systems require special logistics and dedicated staff, which, in addition to the one-time deployment expenses, translates into the need for regular funding to support the system and pay the people powering it. Computers can analyze big data faster, and machine learning algorithms automate the time-consuming and labor-intensive screening of patients to nominate those who need extensive examination. Thus, artificial intelligence capable of screening for eye diseases can mitigate the primary health care personnel shortage and reduce the clinical examination costs while increasing the number of patients reasonably referred to an ophthalmologist because of the suspected ophthalmic pathology [2].

Age-related macular degeneration (AMD), a retinal disease common among people aged 50 and over, remains one of the main causes of poor eyesight. The disease manifests in soft drusen measuring 63 μm or above in the macular zone, hyperpigmentation and/or hypopigmentation of the pigment epithelium, detachment of pigment and neuroepithelium, pigment epithelium geographic atrophy, retinal hemorrhages and cicatricial changes in the retina [3].

AMD is of great clinical and social importance. The prevalence of AMD among people aged 50 to 85 years is 8.69%, with 8.01% being early AMD and 0.37% late stage AMD [4]. Mathematical model forecasts growth of the absolute number of AMD patients from 196 million in 2020 to 288 million in 2040. [4]. Late stage AMD translates into a pronounced degradation of central vision, which worsens quality of life, limits daily living activities and impairs working capacity. Timely detection of the disease and adequate monitoring of the patients are instrumental to successful treatment of neovascular AMD because the efficacy of antiangiogenic therapy directly depends on the time elapsed from the moment of manifestation to administration of the first dose of the drug [5]. Fundus photography is a widely adopted and highly sensitive method of macular pathology visualization; it has been used in a number of countries for mass screening and yielded a significant increase of the early stage AMD detection rates [6].

The objective of this work was to develop and validate machine learning algorithms diagnosing macular pathology (AMD) based on the analysis of color pictures of the fundus with data labeled and unlabeled, and to assess sensitivity and specificity of the developed method with the help of a test dataset.

METHODS

The sets of color images of the fundus used in this study were collected at the Tsentr Zreniya clinic (Chelyabinsk) and the ophthalmological department of the Volga District Medical Center under FMBA of Russia (Nizhny Novgorod). All the pictures were taken with Visucam 500 fundus camera (Carl Zeiss; USA). The inclusion criteria applied to the images were: diagnosed AMD in one eye, registered in the patient's digital medical record; presence of specific signs of AMD on the image; absence of signs of other retinal diseases (diabetic retinopathy, etc). Image quality was assessed in points on a scale from 1 to 4, the assessment relied on the method by Klais C et al., with 1 point given to high quality pictures, 2 points to average quality images, 3 points to those of low quality and 4 points to indiscernible pictures [7]. The images that scored 3–4 points were rejected. We used the widely adopted clinical classification of AMD that distinguishes early, intermediate and late stages of the disease (tab. 1) [8]. The initial set of images was anonymized and blind classified independently by two ophthalmologists with over 5 years of experience.

The resulting set included 1200 color fundus photographs, including 575 retinal images of AMD patients and 625 pictures of the retina of healthy people. Under the AMD classification, 127 images were classified into the early AMD group, 341 were marked as intermediate stage and 107 as late stage AMD pictures.

The data were distributed into training and test sets randomly, with 994 images used in the neural network training (475 eyes with AMD, 519 eyes of healthy people) and 206 photographs used for testing (100 from patients with AMD, 106 from healthy people).

To accomplish the task set, we practiced two approaches to training:

  • training a convolutional neural network (CNN) on a dataset consisting of binary classified images without specified regions of interest;
  • training a CNN on a dataset consisting of binary classified images with the regions of interest specified in bounding boxes;

We relied on the ResNet-50 deep learning architecture and transfer learning for both approaches [9]. Transfer learning involves use of CNNs that are pretrained on a large set of thirdparty data. Following pretraining, the network, which already has its weighing system set up, goes through training on a small set of data of immediate interest. The large set of thirdparty data used for pretraining in this work was the ImageNET dataset, which includes millions of images divided into 1000 different classes [10].

Fundus pictures from the local databases were preprocessed (converted to 512 × 512 pixel images) and then processed by a pretrained Faster RCNN neural network with ResNet50 enabling convolution. Each output window was linked with a category tag and a softmax score at [0, 1]. A score threshold of 0.7 was used to display these images. The execution time needed to obtain these results was 120 ms per image, all steps included. All in all, the image analysis sequence can be outlined as follows: preprocessing, processing by the CNN with a feature map as output, highlighting regional suggestions thereon, determining regions of interest and classifying the image as either an AMD picture or a normal eye photograph based on the features found within the regions of interest (fig. 1).

 All algorithms were developed in Python 3.7 using libraries PyTorch 1.5.0, TorchVision 0.6.0, Tensorflow 1.14.0, Keras 2.0.8, Pillow 7.2, OpenCV 4.5.2, Cuda 10.1, cudnn 7.6.5. The hardware configuration of the computer used to do the calculations was as follows: Intel Core i7 9750H (Intel; USA), RTX 2070 Max-Q 8GB GDDR6, 16 GB RAM 2666 MHz.

 

RESULTS

Image classification by a CNN without specified regions of interest

All color images of the fundus belonging to the training set were reduced to a resolution of 512 × 512 pixels and normalized to the average pixel. Then the dataset was submitted to the neural network for training. The training lasted 193 min and took 50 iterations. A batch (combined load) included 10 images. Nesterov accelerated gradient was used as an optimizer; the learning rate parameter was 0.0005, the moment was 0.9. Loss function categorical cross-entropy, metric accuracy.

Validation of the resulting model on the test dataset revealed that its specificity reached 77.4%, sensitivity 80.9%, accuracy 79% (tab. 2). To learn what regions of the images the model used for classification we imported the class activation heatmaps (fig. 2).

As a result, it was found that the network selected the areas of attention incorrectly: one of them was the area of the optic nerve head, which is not involved in AMD's pathological process, another paramacular area. Thus, the neural network used incorrect features in training, which nevertheless correlate with the classification result.

 

Image classification by a CNN with regions of interest pre-specified

The training dataset was the same as for the first approach, but for this case, we marked the macular region as the region of interest with the help of bounding boxes. All the images were reduced to a resolution of 512 × 512 pixels and normalized to the average pixel. Faster RCNN + FPN network combination enabled object detection [11]. The training lasted 158 min and took 10 iterations. A batch included 10 images. Nesterov accelerated gradient was used as an optimizer; the learning rate parameter was 0.0001, the moment was 0.05, weight decay 0.0005. Classification categorical cross-entropy was the loss function, mean average accuracy was the classification accuracy metric, intersection over union detection accuracy metric. The training was stopped after 10 iterations because of the emerging overtraining effect [12].

On the test dataset, the model demonstrated the classification accuracy of 96.6% at sensitivity of 99.0% and specificity of 94.3% (tab. 2). Visualization of the areas of interest showed that the model identified informative areas of the images adequately (fig. 3).

DISCUSSION

This study showed that Faster RCNN neural network with ResNet50 enabling convolution can effectively differentiate between AMD patient fundus pictures and those of healthy retina. We have also established that even with a small sample (1200 images) the resulting classification accuracy can be high if the data are pre-labeled.

Researchers investigating application of neural networks to diagnose AMD through analysis of color pictures of the retina reported sensitivity of 84.5–89.0%, specificity of 83.1–89.0% and accuracy of 88.4–91.6% [13, 14]. One study aimed to detect AMD at the early stage using images of the fundus; its authors claimed to have achieved sensitivity and specificity of 96.7%, 96.4% [15]. The datasets used in these works were not pre-labeled, but each of them relied on the sample comprised of over 50000 images, which is an order of magnitude greater than the sample used for this study [1315]. In this connection, it is interesting to note that, considering the relatively small dataset employed, by some parameters we received comparable results with the help of a simple and fast labeling procedure.

A meta-analysis of 13 studies averaged the neural networks' sensitivity and specificity in AMD detection at 0.92 and 0.89, respectively [16]. However, this analysis included studies that made use of fundus camera images exclusively and works that relied on the pictures obtained with optical coherence tomography. Another meta-analysis considered papers reporting on the automated AMD diagnosing models that processed only color photographs of the retina; this analysis averaged the models' sensitivity and specificity at 0.88 and 0.90, respectively [17].  Thus, the level of accuracy we have achieved is comparable to the results of studies based on much larger datasets.

It should be noted that instant AMD diagnostics using color images of the fundus traditionally underpins the relevant mass screening programs, but has limited application in specialized care. What shows promise in this field is the determination of AMD stages from the available dataset [1820] and the identification of individual pathological elements in the images [21], which can serve the purposes of monitoring in the context of clinical observation and during clinical trials.

On the one hand, small size of the training dataset and the decision to not differentiate between stages of AMD (we used one class for all of them) can be considered a limitation of this work. On the other hand, with these prerequisites, we managed to answer the questions posed. The small dataset confirmed that, with a limited sample available at a local database, it is possible to successfully develop models capable of automated retinal disease diagnosing provided the training dataset is prelabeled. The clinical heterogeneity of pathological changes allows simulation of a real life screening situation, where it is necessary to detect various pathologies with high sensitivity in order to refer the patients for further examination.

CONCLUSIONS

Automated diagnostics of retinal diseases, which are among the top causes of blindness and poor eyesight, opens new opportunities for mass screening for AMD. The fast and easy-to-use method of image markup with bounding boxes significantly increases accuracy of the developed methods of recognition of medical images relying on neural networks. As a result, it is possible to achieve high classification accuracy even when there are only small local databases available. At the same time, it underscores the importance of the role played by medical specialists in the development of new diagnostic methods based on machine learning, which requires consolidation of efforts of ophthalmologists and IT engineers in order to create large annotated databases of retinal images collected with various models of fundus cameras, which, when labeling the data thereon, would ensure high accuracy and reproducibility of the results in real clinical practice.

КОММЕНТАРИИ (0)