This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (CC BY).
ORIGINAL RESEARCH
Development of the vascular condition classifier using supervised machine learning methods
1 Vladimir Zelman Center for Neurobiology and Neurorehabilitation Skolkovo Institute of Science and Technology, Moscow
2 Artificial Intelligence Center Skolkovo Institute of Science and Technology, Moscow
3 Center for Photonics and Photonic Technologies Skolkovo Institute of Science and Technology, Moscow
4 Center for Molecular and Cellular Biology Skolkovo Institute of Science and Technology, Moscow
Correspondence should be addressed: Zlata Besedovskaia
Bolshoy Bulvar, 30, Building 1, Moscow, 121205; moc.liamg@dlogantari
Acknowledgments: All authors of this article express their gratitude to the authors of article [15] for providing the open data used in this study.
Author contribution: Z. Besedovskaya — development of the pipeline and clustering tools, image preparation for publication, and draft publication. A. Korobov — creation and integration of new vessel segment features into the pipeline and draft publication. N. Kudryashova — medical conceptualization and validation of the vessel segment features and draft publication. All authors contributed equally to this study.
Optoacoustic (OA) angiography is a promising new vascular imaging technique applicable in scientific research and clinical practice. It is based on the optoacoustic effect: the acoustic response of materials to intermittent light [1, 2]. Thus, the method combines optical and acoustic approaches, and angiography itself is based on the optical absorption of light by hemoglobin [3].
One of the advantages of OA angiography is its noninvasive nature [1]. In diagnostics, it can complement large-vessel imaging techniques [4]. Compared to small-vessel imaging techniques (capillaroscopy, optical coherence tomography), OA provides greater visualization depth [5, 6] and enables assessment of arterioles, venules, as well as small arteries and veins [5].
OA is successfully used in various fields of both applied medicine and fundamental research. The method's successes can be highlighted in ophthalmology [7, 8], dermatology [9], cardiology [10], oncology [11], and neuroimaging [12].
Preclinical and fundamental biological research play a key role both in the study of basic biological processes and in the further development of OA imaging methodology itself [5].
Despite significant advances in the development of OA imaging technology, its widespread clinical implementation is hampered by a number of significant technological limitations. Limitations of the method include low imaging speed, a fundamental contradiction between spatial resolution and penetration depth, and engineering challenges in integrating light delivery systems and ultrasound (US) transducers [5, 6].
Artificial intelligence (AI) application is dramatically improving OA imaging. Neural networks such as U-Net effectively eliminate artifacts and improve image quality. Furthermore, AI automates data analysis, making diagnostics faster and more objective [13, 14]. Thus, the use of AI helps overcome technological barriers, enables wider clinical adoption of OA imaging, and opens new horizons for noninvasive diagnostics [5].
The goal of this work is to develop a methodology for analyzing various vascular features and validate it on an open dataset containing mouse optoacoustic 3D angiograms. The developed software will be suitable not only for the specific task described above, but also (with minor modifications) for a wide range of studies involving OA angiograms, including the diagnosis of various vascular diseases in patients.
METHODS
In this work, we used the set of features mentioned in [15] for calculations, as well as features calculated by Amira/Avizo (Thermo Fisher Scientific, USA), a general-purpose software for visualization and analysis of scientific data. We also developed new features to describe segmental spatial characteristics and improve the accuracy of machine learning (ML) models (tab. 1).
Obtaining initial open data
One Balb/C nu/nu mouse was used for OA angiography. Skin vessels in the animal's left thigh were visualized in a chamber filled with distilled water. Three technical replicates were obtained (fig. 1A) for each water temperature: body temperature (BT) (30 °C); room temperature (RT) (23 °C); and cold temperature (CT) (16 °C). Each imaging was conducted in 10 minutes after immersion of the mouse's thigh in water to stabilize the tissue and water temperatures.
An OA mesoscopy system (Institute of Applied Physics, Russian Academy of Sciences, Russia) based on an ONDA532 diode laser (wavelength 532 nm) was used for imaging.
Ultrasound signals were collected using a scanning module in a sealed immersion chamber filled with distilled water. The scanning range was 10 mm in both axes, the scanning step was 20 μm, the time interval was approximately 5 min, and the signal detection depth was up to 3 mm (fig. 1B). The acquired signals were converted into 3D angiographic datasets using reconstruction and deconvolution algorithms. The 3D angiographic images were processed using the SKYQUANT 3D tool [15].
Details of the experiment are available in a previous publication, which describes in detail the object, methods, and equipment used to acquire the images [15].
Description of the original open data
The dataset contains 16,619 vessel segments with 9 quantitative features for each segment (based on which 9 new features described in tab. 1 were derived and calculated) and 11 quantitative features for each image. The experimental design is described in [15] and included:
Cold temperature (16 °C): 3 images, 10,418 segments;
Room temperature (23 °C): 3 images, 4,663 segments;
Body temperature (30 °C): 3 images, 1,538 segments.
For all subsequent operations, the original dataset was split into training, testing, and validation sets of 60%, 25%, and 15%, respectively (random seed = 42). For ML-based clustering, the features of individual segments were retained, while the features of individual images were used for descriptive statistics and quality control described in [15].
All features were standardized by removing the mean and scaling to unit variance using statistics computed only on the training samples; the same affine transformation was applied to the validation/test splits to prevent information leakage.
Unsupervised clustering based on centroid and density
Before moving on to more complex clusterers, a projection map of multidimensional objects was obtained using the nonlinear dimensionality reduction method UMAP (Uniform Manifold Approximation and Projection) with features n_neighbors = 15 and min_dist = 0.1.
After assessing visual separability of the data, K-means was applied to optimize within-cluster variance.
DBSCAN was used to capture arbitrary cluster shapes and clearly separate "noise" from the "core."
To reduce collinearity and noise while maintaining maximum variance, we applied principal component analysis (PCA) with five principal components extracted.
Following PCA, K-means and DBSCAN were repeated, and internal clustering metrics were compared.
Supervised classification
To test the separability of the data by supervised classifiers, we selected three additional models covering the linear-nonlinear and generative-discriminative families:
- CatBoost — gradient boosting, which sequentially builds decision trees to minimize the loss function and improve the quality of the model.
- SGDClassifier — a stochastic gradient descent learning approach that supports loss functions and penalties for classification.
- Logistic Regression Softmax — adaptation of the logistic regression function to multiclass data.
We used a cross-validation strategy to prepare the dataset with all preprocessing. The metrics used were accuracy, overall accuracy, balanced accuracy, logarithmic loss function Llog, Matthews correlation coefficient (MCC), Cohen's kappa (Cohen's κ), as well as ROC-AUC with macro- and micro-averaging and F1-measure with macro- and weighted averaging. Confusion matrices normalized by true classes are also presented.
RESULTS
Dimensionality reduction using the UMAP method
The UMAP method was used to visualize the multidimensional data structure.
UMAP visualization demonstrated a clear division of the data into three clusters corresponding to the three experimental temperature conditions (fig. 2). The cold temperature cluster (16 °C) was the most compact, indicating pronounced and uniform changes in vascular architecture under hypothermia. The room temperature cluster (23 °C) occupied an intermediate position between the other two groups, while the body temperature cluster (30 °C) demonstrated a greater dispersion of points, indicating variability in vascular characteristics under physiological conditions.
Unsupervised clustering
To evaluate the effectiveness of unsupervised ML methods in the problem of separating temperature states, the DBSCAN and K-Means algorithms were used both with and without PCA data preprocessing (tab. 2).
An analysis of the results showed that the use of PCA did not significantly improve clustering quality. The DBSCAN algorithm demonstrated higher accuracy (0.626) and silhouette coefficient (0.632) compared to K-Means, but showed low values of the macro-averaged F1-measure (F1macro) (0.259), indicating unbalanced class recognition. K-Means, in contrast, provided a better balance between classes (F1macro = 0.367), but with lower overall accuracy (0.524). The extremely low values of the adjusted Rand index (ARI < 0.015) and normalized mutual information (NMI < 0.02) for both algorithms indicate a weak correspondence between the resulting clusters and the true temperature groups when using unsupervised approaches.
Despite multi-stage training, the internal clustering evaluation metrics demonstrated insufficient consistency and stability for subsequent interpretation.
Supervised Classification
Classification with the Catboost Model
To solve the problem of supervised classification of temperature states, gradient boosting on decision trees in the Catboost implementation was applied. The classification results are presented in fig. 2B.
The Catboost model demonstrated exceptional performance in classifying temperature states in the vascular network. Overall accuracy was 98.9%, with a balanced accuracy of 98.5%, indicating correct model performance even with varying numbers of observations in classes. The area under the ROC curve, exceeding 0.999 for all averaging variants, indicates near-perfect separability of the classes in the feature space.
The highest classification recall (recall = 0.997) was achieved for the cold temperature group, consistent with the results of the UMAP analysis, which showed the greatest compactness and isolation of this cluster. The body temperature group demonstrated the highest accuracy (precision = 0.997), but slightly lower recall (0.985). The lowest completeness was observed for the room temperature group (0.971), which may be associated with the transitional nature of this state between hypo- and normothermia.
Classification by the SGDClassifier model
The SGDClassifier model demonstrated robust performance on the test set (tab. 3): overall accuracy was 95.7%, balanced accuracy was 95.1%, F1macro was 0.960, and F1weighted was 0.956. The goodness-of-fit coefficients were also high (MCC = 0.917, Cohen's κ = 0.915), indicating reliable prediction consistency above chance. The area under the ROC curve was high (macro-AUC = 0.988; micro-AUC = 0.994), but it still performed worse than the nonlinear gradient boosting model; the log loss of 0.176 indicates a more conservative probabilities calibration compared to the Catboost model.
Very high recall (recall = 0.997) and very high precision (precision = 0.994) were shown for the body temperature group (F1 = 0.997), indicating clear separability of this condition in the space of all features. For the cold temperature group, the model demonstrated record sensitivity (recall = 0.995) simultaneously with moderately reduced precision (precision = 0.940; F1 = 0.967). The greatest clustering difficulties, as in the previous case, were shown for the room temperature group: despite high precision (precision = 0.988), the recall is shown below (recall = 0.857; F1 = 0.918), indicating frequent assignments of objects from this group to neighboring groups, primarily to the cold temperature group. This effect is typical of linear separators in the case of a transitional class.
Classification by logistic regression model
The logistic regression model, like Catboost, demonstrated near-limit performance on the test set (tab. 3). The area under the ROC curve is close to ideal (ROC-AUCmacro = 0.99983; ROC-AUCmicro = 0.99987), while the low logarithmic loss (Llog = 0.033) indicates good calibration of the probabilities— better than the other linear and boosted models.
For the body temperature group, extremely high recall values (recall = 0.996) with very high precision (precision = 0.998; F1 = 0.999) are shown, indicating good separability of this condition. For the cold temperature group, the quality is also close to ideal (precision = 0.999; recall = 0.997; F1 = 0.998), meaning that the model makes virtually no errors in classifying segments from images in the cold temperature group. The greatest decrease is observed for the room temperature group: precision = 0.993, recall = 0.996, F1 = 0.995. This indicates rare false positives in favor of the room temperature group for boundary objects.
Assessing the significance of features for three models revealed two stable, highly significant classes (fig. 3A): topological (tortuosity, verticality/planarity, normalized dispersion, and linearity) and geometric (radii, lengths, volume, and segment Z-coordinate). It is important to note that the linear models (SGDClassifier and logistic regression) base their decisions primarily on topological features.
The confusion matrix (fig. 3B) allows for a detailed assessment of misclassification patterns for the three models.
For logistic regression, the significance of tortuosity (fig. 3A) is noteworthy, with its contribution exceeding that of the other features by an order of magnitude. Smaller but consistent contributions are provided by normalized verticality, normalized linearity, planarity, verticality angle, normalized dispersion, and linearity (approximately 0.28–1.07). The scale features — mean radii, length/tortuosity-weighted mean radii, curved and cord length, volume and weighted volume, and segment z-coordinate — are of virtually zero importance.
SGDClassifier follows the same linear weighting pattern: the tortuosity feature contributes the most, followed by normalized verticality, planarity, verticality angle, normalized scattering, and linearity; geometric features remain close to zero.
Unlike linear models, the Catboost model exhibits a different feature hierarchy: the tortuosity-weighted mean radius, volume, and curved length are of maximum importance. Topological features such as tortuosity make a minor contribution.
DISCUSSION
A standard approach for analyzing OA imaging results is the use of AI methods [13]. An example is the work of N. Davoudi et al., who used the U-Net neural network to improve the quality of images distorted by artifacts. The authors trained the model on a hybrid dataset that included simulations, phantoms, and cross-sectional images of mice in vivo. The trained network effectively removed artifacts even with a sixfold undersampling of the original data. Validation on an experimental setup confirmed that the algorithm successfully copes with the task, significantly improving image quality [14]. However, it is worth considering that the main limiting factor for the use of artificial intelligence methods is the amount of data being analyzed [16]. In our study, unlike the aforementioned studies, we were able to overcome the data limitation by using a hybrid approach: image filtering and extraction of the analyzed vascular features are performed using a modified SKYQUANT-3D pipeline [15], which does not employ machine learning methods. However, further analysis was performed using various machine learning classifiers and clusterers. The effectiveness of the SKYQUANT3D pipeline methodology has already been confirmed in tests on a vascular phantom, preclinical experiments, and clinical experiments [15].
The most important step in the analysis of OA angiograms is the selection and calculation of the analyzed vascular features. For example, in [17], 64 microvascular features were obtained, characterizing vessel blood flow, changes in their geometric configuration, branching, spatial localization, and other features [17]. The authors of the study used a random forest-based classifier for feature selection in order to identify the most significant biomarkers from the initial 64 features. By reducing the dataset to 32 key features, they were able to focus on the most informative features for differentiating healthy volunteers from diabetic patients, thereby confirming the importance of selecting significant vascular features for further analysis [17].
In [18], visual assessment of vascular features such as diameter and tortuosity was performed, which enabled effective discrimination between patients with post-thrombotic syndrome and healthy volunteers [18]. In the study [15], vessel radii, lengths, and tortuosity were analyzed in various variations. This made it possible to characterize changes in the vessels of an experimental animal during a temperature test, as well as changes in the vessels of a healthy volunteer during a positional test [15]. Our study uses the features from the article [15], supplementing them with features of branching and spatial localization similar to those mentioned in [17]. The unique features of vessel planarity, verticality, and linearity also deserve special attention, indicating changes in the microcirculatory bed due to fluid redistribution. In this experiment, fluid redistribution in the body of the experimental animal was associated with temperature changes, but similar processes can occur in humans as a result of the development of vascular disease, such as chronic venous insufficiency [19].
The feature values we obtained reflect changes in the vessels caused by cooling, previously shown by other authors. Cooling causes dilation of small peripheral vessels, leading to a greater volume of blood containing hemoglobin, the main source of contrast in optoacoustics, to pass through them [20–22]. This makes them "visible" to the imaging system. The overall increase in blood volume in the studied area is a direct consequence of the vasodilation process. The peripheral vascular network is inherently more tortuous and branched than the main vessels, so its visibility in the image leads to an increase in the mean tortuosity feature [20–22]. Thus, the effectiveness of the classification is explained by existing biological effects. It is worth noting that many of the analysis components are universal for both animals and humans, which allows the development to be implemented into clinical practice.
CONCLUSIONS
In this study, quantitative vascular features were calculated and analyzed to describe the state of the microcirculatory bed.
Various machine learning methods were compared for determining different temperature states in experimental animals. Supervised classification methods demonstrated the greatest effectiveness, with near-absolute accuracy. The Catboost and logistic regression models demonstrated the greatest success, accounting for the most significant physiological features. Further, the choice between the two models should be made on a case-by-case basis, depending on the specific feature distribution. Feature weights reflect the actual physiology of vascular changes.
The methodology developed in this study will potentially help not only effectively distinguish between experimental conditions but also differentiate pathological vascular changes from each other and from the norm in patients with various diseases. This will help overcome some of the limitations of OA angiography, enabling its wider implementation in clinical practice. This will enable more accurate diagnosis of vascular changes in the early stages of diseases.