ORIGINAL RESEARCH

Isoforms of miR-148a and miR-203a are putative suppressors of colorectal cancer

About authors

1 National Research University Higher School of Economics (HSE), Moscow, Russia

2 Institute of Molecular Biology (IMB) of the National Academy of Sciences of the Republic of Armenia, Yerevan, Armenia

Correspondence should be addressed: Stepan A. Nersisyan
Vavilova, 7, Moscow, 117312, Russia; ur.esh@naysisrens

About paper

Funding: the study was supported by HSE Basic Research Program.

Acknowledgement: the author thanks Aleksey Galatenko of the HSE Laboratory of Molecular Physiology for the fruitful critique and valuable comments.

Compliance with ethical standards: the study complies with the ethical principles of the World Medical Association Declaration of Helsinki.

Received: 2022-04-29 Accepted: 2022-05-22 Published online: 2022-05-30
|

MicroRNAs are short non-coding RNA molecules which regulate gene expression at the post-transcriptional level [1]. Effective binding of microRNA to their target transcripts depends on perfect match between seed-region (nucleotides 2–7 counting from the 5′-terminus of microRNA) and its complementary site in mRNA [2]. The microRNA-mRNA binding inhibits translation and may also promote mRNA degradation provided a sufficient number of complementary bonds outside the seed-region [2]. Various microRNAs act as tumor suppressors and oncogenes in many cancers [35].

MicroRNA processing endonucleases Drosha and Dicer may cleave the long precursor molecules with certain imprecision, giving rise to microRNA isoforms with few extra or missing nucleotides at the termini [6]. Meanwhile, the exact positioning of 5′-terminus is pivotal for target specificity, as it defines the seedregion. MicroRNA 5'-isoforms (5'-isomiR) differing by a single extra or missing terminal nucleotide may target quite different sets of transcripts compared with the prototype canonical microRNA.

Colorectal cancer (CRC) is the third most common and the second most fatal cancer globally [7]. MicroRNAs have been widely implicated in the mechanisms of CRC progression and metastasis. For instance, miR-200 family has been shown to inhibit ZEB1 and ZEB2 genes that encode key transcription factors (TF) of the epithelial-mesenchymal transition (EMT) characteristic of CRC [8]. Accordingly, suppression of miR200 at the level of transcription and/or processing facilitates EMT and metastasis [8]. MicroRNA expression profiles are actively employed as a source of putative CRC markers, both diagnostic and prognostic [9]. Contemporary research on the role of microRNA isoforms in CRC is focused on comparative evaluation of their expression levels in tumors and matching healthy tissues [10]. To the best of our knowledge, functional activity of 5'-isomiR in CRC has remained outside the focus.

A great diversity of microRNAs and at least an order of magnitude higher number of regulatory interactions they participate in (with about 200 targets per microRNA on average [1]) invoke the use of bioinformatic approaches. One of them is building and analysis of regulatory networks, with microRNA molecules and genes represented by graph's vertices and the interactions represented by edges between particular microRNAs and their targets [11]. The construction of regulatory networks is traditionally based on two sources: (1) knowledge databases of interactions between different types of molecules and (2) correlation analysis of mRNA and microRNA expression levels in clinical samples. We have previously developed an algorithm that allows integration of these two sources in a single network with the addition of TF as another class of major regulatory molecules [12]. The miRGTF-net algorithm affords complete and reliable description of intracellular interaction landscapes in cell/tissue types of interest.

This study aimed to identify functional roles of 5′-isomiR in CRC by integration of the gene expression profiles, sequencebased target prediction and TF activity data using the miRGTFnet algorithm.

METHODS

Target prediction for microRNA isoforms

Pri-microRNA hairpin sequences and canonical Drosha and Dicer cleavage sites were retrieved from miRBase version 21 (https://www. mirbase.org). Designation of 5′-isomiR uses standard nomenclature: a digit after the vertical slash indicates a 5′ to 3′ shift in the cleavage position with regard to the canonical variant. For example, hsa-miR10a-5p|+1 corresponds to the canonical hsa-miR-10a-5p devoid of the first nucleotide counting from the 5′-terminus.

For the target prediction, canonical microRNA and isomiR sequences were loaded in miRDB version 6.0 [13] and TargetScan version 7.2 [2]. For miRDB, putative targets with prediction scores ≥ 80 were qualified as valid in accordance with the developer's recommendations. The TargetScan predictions were tailored to the number of miRDB predictions by choosing the appropriate number of the strongest interactions for each microRNA isoform.

Collection and analysis of sequencing data for mRNA and microRNA isoforms

The publicly available raw sequencing data for mRNA and miRNA isoforms were retrieved from The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) project available at the Genomic Data Commons Data Portal (https://portal.gdc. cancer.gov/). The data were normalized in the edgeR package version 3.30.0 [14] using the Trimmed Mean of M-values (TMM) normalization algorithm yielding TMM-normalized Reads Per Kilobase of transcript per Million mapped reads (TMM-RPKM) matrices for mRNA expression and TMM-normalized Reads Per Million mapped reads (TMM-RPM) matrices for microRNA expression.

The 5'-isomiR expression matrices were sorted by total expression in the studied samples and the cumulative distribution functions were calculated. The minimal set of 5′-isomiR covering 95% of all sequencing reads were qualified as highly expressed and used in further analysis.

Construction of a regulatory network of interactions among microRNA isoforms, their targets and transcription factors

The miRGTF-net algorithm [12] was applied to build a regulatory network of interactions among microRNA isoforms, their targets and transcription factors. A major advantage of the algorithm is the ability to integrate mRNA and miRNA isoform expression data retrieved from TCGA-COAD with biological knowledge databases including:

  • TRRUST version 2 (https://www.grnpedia.org/trrust/): interactions between TF and genes;
  • TransmiR version 2 (http://www.cuilab.cn/transmir): interactions between TF and microRNA;
  • miRDB, TargetScan (see previous sections): interactions between microRNA isoforms and their targets;
  • miRIAD (https://www.miriad-database.org): coexpression of host genes and their intronic microRNA.

The analysis involved a canonical sequence of steps of the miRGTF-net algorithm briefly described as follows. At first we built a network based on the interactions retrieved from databases. Then we calculated Spearman correlation coefficients based on the corresponding expression levels retrieved from TCGA-COAD. Edges connecting molecules with weakly correlated expression levels were eliminated (cutoff for the absolute value of Spearman's correlation was determined as the 0.9 quantile of the correlation distribution). Edges between microRNA isoforms and their putative targets showing positive correlations and edges between host genes and their intronic microRNA showing negative correlations were removed as well.

Next, we evaluated the strength of the linear relationships between expression of each vertex and its direct regulators. The corresponding linear models were built with the ridge regression method. Model quality was evaluated by the coefficient of determination (R2) and strength and direction of regulation were evaluated using standardized regression coefficients (β-coefficients). The vertices and the edges were filtered using the default threshold values of miRGTF-net: β-coefficient absolute value ≥ 0.3 and 90% of the highest R2. Accordingly, the final network contained vertices corresponding to regulators of expression along with regulated entities (genes and microRNA isoforms).

The search of strongly connected components was carried out using the NetworkX package version 2.8 (https://networkx. org). The regulatory networks were visualized in Gephi (https:// gephi.org) and yED Graph Editor (yWorks GmbH; Germany).

Functional enrichment analysis

Functional annotation of gene sets (targets of microRNA isoforms) was carried out using DAVID web service version 12/2021 [15] and Gene Ontology (GO) biological pathway descriptions [16].

RESULTS

Expression of microRNA isoforms in CRC samples

The analyzed TCGA-COAD dataset contained expression profiles of mRNA and 5′-isoforms of microRNA in a sample of 426 primary CRC tumor specimens. We identified a total of 55 highly expressed microRNA isoforms, 10 of them non-canonical (Fig. 1). Two of the identified non-canonical 5′-isomiR species accounted for over 1% of total microRNA expression each: hsa-miR-192-5p|+1 (2.4%) and hsa-miR-10a-5p|+1 (1.3%).

A shift of the 5′-terminus in microRNA results in altered seed-region of the mature molecule with a major effect on the putative target scope. Sequences of the identified canonical and non-canonical microRNA isoforms were used for biocomputational prediction of their targets. As expected, microRNA isoforms differing by a single extra nucleotide at the 5'-terminus had only slightly overlapping sets of targets (table). For example, the canonical hsa-miR-10a-5p and its 5′-isoform without one terminal nucleotide had only 11 common targets in a union of 267 (4.1%). The maximal proportion of common targets was observed for hsa-miR-29a-3p and its longer isoform: 246 common targets in a union of 788 (31.2%).

Regulatory network of interactions among microRNA isoforms, their targets and transcription factors

At the next step of analysis, we constructed a regulatory network of interactions for CRC cells. The miRGTF-net algorithm enables the construction of such networks by integrating two types of data: (1) biologically substantiated interactions from biological knowledge databases and (2) mRNA and microRNA expression levels measured in tumor samples. The regulatory network contained four types of interactions:

  • TF regulates protein-coding gene expression;
  • TF regulates microRNA expression;
  • microRNA isoform regulates protein-coding gene expression;
  • protein-coding host gene is coexpressed with its intronic microRNA

The transcriptomic data for mRNA and microRNA isoforms retrieved from TCGA-COAD were used to look for interactions supported by significant correlations in the studied sample of tumor specimens.

The constructed network encompassed 333 molecules: 24 microRNA (5′-isoforms counted), 166 TF and 143 non-TF coding genes (Fig. 2). Of a total of 456 interactions, 42 represented inhibition of target genes by microRNA isoforms, 413 represented regulation of protein-coding and microRNA gene expression by TF and just a single interaction (between HOXB3 and hsa-mir-10a) represented coexpression of a host gene and its intronic microRNA.

The highest number of negatively correlated targets (seven) was identified for the canonical hsa-miR-148a-3p. The list included known oncogenes including proliferation regulators and markers (CSF1, ETS1, FLT1, MEIS1, MITF and RUNX2; GO:0008284 — positive regulation of cell proliferation) and integrin-encoding gene ITGA5 involved in regulation of cell proliferation, invasion and migration by transmitting signals into cells [17]. Importantly, hsa-miR-148a-3p was a member of the maximal strongly connected component of the regulatory network (i.e., a subgraph with a directed path between any two of the vertices) straightforwardly associated with EMT and estrogen signaling (Fig. 3). The plausible inhibition of protooncogenes by hsa-miR-148a-3p makes it a putative tumor suppressor in CRC.

A pair of canonical hsa-miR-203a-3p|0 and its 5′-isoform hsa-miR-203a-3p|+1 had the second highest number of putative targets. Amid the absence of negatively correlated common targets, both isoforms were likely to share an antitumor functionality by inhibiting expression of oncogenes. For example, expression of canonical hsa-miR-203a-3p|0 negatively correlated with expression of TF SNAI2 known to be a prominent genetic driver in EMT [18], whereas its noncanonical 5′-isoform hsa-miR-203a-3p|+1 was characterized as a putative regulator of TNC — an extracellular matrix proteinencoding gene which also plays a key role in EMT characteristic of CRC [19].

DISCUSSION

The study applied bioinformatics analysis to assess the functional activity of 5′-isomiR in malignant colorectal tumors. The results indicate that microRNA isoforms differing by a single extra nucleotide at the 5'-terminus have limited number of common targets with a maximum overlap of 31.2%. The most active regulators identified among microRNA included hsa-miR-148a-3p (canonical) and two 5‘-isoforms of hsa-miR203a-3p: hsa-miR-203a-3p|0 (canonical) and hsa-miR-203a3p|+1. Noteworthy, all three microRNAs were identified as putative inhibitors of pro-tumor gene expression, though the intersection of anti-correlated targets for canonical miR-203a and its 5′-isoform was empty.

The functional activity of 5′-isomiR was previously studied in the context of breast cancer. Two 5′-isoforms of hsa-miR183-5p were shown to exert different transcriptomic effects in MDA-MB-231 cells; moreover, certain target genes of top clinical significance (EGFR, NRAS) were indirectly regulated by these isoforms with the effects being opposite [20]. To our knowledge, the current study is the first to deal with target scopes of 5′-isomiR in CRC.

Of seven putative targets of hsa-miR-148a-3p selected by us in this study, four genes have been experimentally validated in vitroCSF1, ITGA5 [21], MITF [22] and RUNX2 [23]. In our own experimental setting, the hypoxia-induced inhibition of hsa-miR-148a-3p expression in CRC cell lines HT-29 and Caco-2 resulted in increased expression of the target gene ITGA5 [24]. Other studies demonstrate the pro-apoptotic effect of miR-148a and the corresponding suppressive effects on proliferation, migration and invasiveness of CRC cells in vitro conferred through Bcl-2 [25], ErbB3 [26] and WNT10b [27] inhibition. Apart from CRC, the role of hsa-miR-148a-3p as a suppressor of tumor cell proliferation was confirmed in the contexts of breast, prostate and urothelial cancers [28]. Thus, our in silico findings on the anti-tumor role of miR-148a show good agreement with the published experimental evidence.

A similar picture is observed for the canonical form of miR-203a: the inferred interaction between miR-203a and SNAI2 have been already validated in vitro [29], whereas overexpression of miR-203a in CRC cell lines has been shown to suppress migration and invasiveness [30]. Thus, the newly identified putative target scope of the non-canonical hsa-miR203a-3p|+1 is consistent with the established functional profile of the canonical prototype microRNA.

CONCLUSIONS

Construction and analysis of regulatory networks of interactions by tailored bioinformatics tools provide a useful key to the functional activity of 5'-isomiR in colorectal cancer cells. We demonstrate that 5′-isoforms of hsa-miR-203a-3p may exert similar anti-tumor effects by regulating completely different sets of target genes. Further experimental studies, e.g., overexpression of 5′-isoforms of hsa-miR-203a-3p and other microRNA in vitro and in vivo, will be required to understand molecular mechanisms of colorectal tumor development and progression.

КОММЕНТАРИИ (0)