ORIGINAL RESEARCH

Breast cancer: analysis of driver somatic mutations detected by next-generation sequencing

About authors

1 Genotek Inc., Moscow

2 Karelian Research Centre of the Russian Academy of Sciences, Petrozavodsk, Russia

3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia

4 Blokhin National Medical Research Center of Oncology, Moscow, Russia

5 Institute of Biomedical Chemistry (IBMC), Moscow, Russia

Correspondence should be addressed: Valery Ilyinsky
Nastavnichesky per., d. 17, str. 1, pod. 14, 15, Moscow, 105120; ur.ketoneg@ofni

About paper

Funding: this work was supported by the Ministry of Education and Science of the Russian Federation (Project ID RFMEFI60716X0152).

All authors' contribution to this work is equal: selection and analysis of literature, research planning, data collection, analysis, and interpretation, drafting of a manuscript, editing.

Received: 2017-12-12 Accepted: 2017-12-22 Published online: 2018-01-24
|

Breast cancer (BC) is the second most common type of cancer and the second leading cause of death in women; it is also the most incident cancer worldwide [1]. The risk of BC increases with age: the majority of new cases are reported in women who are 60 to 65 years old. High BC mortality is explained by late diagnosis established when the disease has already progressed to the advanced stage. Metastatic BC is particularly dangerous, since it is resistant even to combination treatments based on chemotherapy, hormones and targeted drugs. The 5-year survival rate in patients with BC is 55 %. This brings the need for novel approaches towards more effective screening as well as targeted therapy of BC based on the molecular genetic profiling of tumors.

The rapid development of next generation sequencing (NGS) has yielded a bulk of information about genetic variants [2]. A lot of mutations are associated with BC, including somatic and germinal mutations in the genes PIK3CA, STK11/LKB1, CDH1, ATM, CHEK2, BRIP1, and PALB2 and mutant variants of the highly penetrant genes associated with hereditary BC, such as TP53, PTEN, MLH1, BRCA1, and BRCA2 [3].

The majority of tumor mutations are somatic; they have an important role in the pathogenesis of cancer and confer de novo resistance to treatment. Thus, a lot of ongoing studies utilize NGS in an attempt to profile mutant variants in tumors. As a result, it has been identified a significant amount of new mutations with unknown function. To describe these polymorphisms, mathematical algorithms are necessary that can automatically process huge data arrays, predict potentially pathogenic mutations and distinguish them from harmless variants. The resulting data can be used when developing screening or diagnostic tools (including liquid biopsy) and selecting adequate targeted therapies.

In this work we analyze a range of mutations identified in key BC oncogenes by NGS, using a previously developed bioinformatic pipeline for the functional annotation of mutations and assessment of their pathogenicity.

METHODS

We obtained tumor samples from 16 patients of Blokhin Russian Cancer Research Center, Moscow. The participants’ age range was 27 to 76 years, with a mean of 50.7 ± 11.3 years. All patients had breast malignancies and received combination therapy. The inclusion criteria were as follows: age of 18 to 70 years, sex (all patients were females), histologically and cytologically confirmed breast cancer. The exclusion criteria were a medical history of other tumor types and pregnancy.

Disease stages were determined according to the TNM classification [4]. The study was carried out in the patients with stages T1–3N0–3M0–1.

All patients gave voluntary informed consent. The study complied with the principles of confidentiality. Patients’ clinicopathologic features are summarized in tab. 1.

DNA isolation and quality control

DNA was isolated from the samples of tumor tissue using DNeasy Blood and Tissue Kit (Qiagen, USA). Tumor tissue was cut into small pieces, and buffer ATL was added to the samples. The samples were then treated with proteinase K, incubated at 56 °C until fully lysed, and treated with RNase A. Next, we added 200 μl buffer AL and 96 % ethanol. The resulting mixture was transferred to spin columns and centrifuged at 8,000 g for 1 min. The samples were washed with AW1 and AW2 buffers to remove salts (guanidine and SDS). The columns were eluted twice with 30 μl Low-TE buffer; the samples were incubated and centrifuged according to the manufacturer’s protocol. Quality control of the obtained DNA was performed on Qubit 3.0 (Thermo Fisher Scientific, USA). The samples were also run on 1 % agarose gel electrophoresis with ethidium bromide.

Sequencing of targeted oncogenes

DNA libraries were prepared using NEBNext Ultra DNA Library Kit for Illumina (New England Biolabs, USA). The libraries were dual-indexed by PCR using NEBNext Ultra DNA Library Prep Kit for Illumina and NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1, New England Biolabs). Quality control of the obtained DNA libraries was performed on Agilent Bioanalyzer 2100 (Agilent Technologies, USA) using High Sensitivity Kit by the same manufacturer according to the official protocol.

For targeted enrichment of the coding regions of tumor genomes we used MYbaits Onconome KL v1.5 Panel (MYcroarray, USA). The enriched fragments were sequenced with 100 b. p. paired-end reads on HiSeq 2500 (Illumina, USA). Sample preparation and sequencing were done according to Illumina’s protocols.

Bioinformatic analysis

Sequencing data were analyzed using an original algorithm developed previously [5]. First, the quality of reads was checked: sequences with read quality below 10 were removed from NGS data using Cutadapt software [6]. Then the reads were mapped to the reference genome hg19 (GRCh37. p13) using the Burrows–Wheeler Aligner algorithm [7]. PCR- duplicates were removed by running the rmdup command in SAMtools [8].

Mutations were called with MuTect [9]. DNA sequences covered by at least 12 reads were considered the most significant.

To assess the functional effect of the discovered mutations, they were annotated in SnpEff and their effect on the encoded protein was predicted based on the analysis of genomic coordinates [10].

RESULTS

Using Illumina-based NGS, we have screened 16 breast tumors for mutations harbored by cancer-associated genes BRCA1, BRCA2, ATM, CDH1, CHEK2, MRE11A, NBN, PALB2, PTEN, RAD50, RAD51C, TP53, and SEC23B. Out original bioinformatic algorithm has detected 58 point mutations in the genes BRCA1, BRCA2, ATM, CDH1, CHEK2 and TP53, including 19 homozygous and 39 heterozygous variants. The list of unique mutations is provided in tab. 2.

The figure below shows the frequency of mutations in the genes with the highest abundance of mutations, namely ATM, TP53 and BRCA1. The most frequent mutations were c.376- 283T>C (TP53), c.3994-193T>C, c.8010+186C>T (ATM), and c.5215+66G>A (BRCA1).

Based on the bioinformatic analysis and annotation of the identified polymorphisms, we selected those mutations that could significantly affect the regulatory or protein sequences. To assess pathogenicity and conservation of the mutations, we used data from COSMIC (Catalogue of Somatic Mutations In Cancer) [11] and dbNSFP [12]. Additionally, SIFT (Sorting Intolerant From Tolerant) and PolyPhen2 tools were used to predict pathogenicity of the mutations and assess their effect on the function of the encoded protein [13, 14]. Information about mutation frequencies was obtained from the 100 Genomes project and the Exome Aggregation Consortium [15, 16].

Altogether, we singled out 14 mutations affecting the protein sequence: BRCA2 — c.4828G>A (p.Val1610Met), c.5070A>C (p.Lys1690Asn); TP53 — c.524G>A (p.Arg175His), c.469G>T (p.Val157Phe); CHEK2 — c.1289C>T (p.Thr430Ile); ATM — c.146C>G (p.Ser49Cys), c.4258C>T (p.Leu1420Phe), c.1192G>C (p.Asp398His); CDH1 — c.790C>T (p.Gln264), c.1342C>T (p.Gln448); BRCA1 — c.1865C>T (p.Ala622Val), c.384G>A (p.Met128Ile), and c.54G>T (p.Met18Ile).

DISCUSSION

In Russia, the PCR-based methods for the detection of known mutations in BC-associated genes have become most widespread. However, today there are more advanced methods of genetic screening, the most promising being next generation sequencing that can be used for identifying genetic variants in malignant tumors and is especially suitable in exploring the variability of highly heterogeneous regions of tumor genomes. In this work we applied NGS to study a number of mutations of key oncogenes associated with BC and tested a previously developed algorithm for bioinformatic analysis of sequencing data.

One of the most well-studied genes playing a significant role in BC pathogenesis is TP53. It is involved in the regulation of the cell cycle, apoptotic activity and DNA repair. Mutations in TP53 lead to the disruption of these regulatory mechanisms and may trigger formation of cancer. TP53 is a tumor suppressor; mutant variants of this gene are detected in half of all cancers and in more than 30 % of BC cases. In turn, sporadic breast cancer is characterized by a varying frequency of TP53 mutations between 25 % and 86 %, depending on the disease stage and the screening technique applied. The prognostic value of TP53 mutations in BC has been sufficiently studied [17]. Among the mutations identified in our study the most frequent was c.376-283T>C discovered in 13 of 16 patients (81 %).

Patients with BC and with some of its types in particular have relatively high frequency of BRCA1 and BRCA2 mutations. BRCA1 and BRCA2 are involved in the regulation of many cell processes maintaining genomic stability and homologous recombination during repair of double-strand DNA breaks. Mutations occurring in these genes often disrupt their normal function and are a major causative factor of hereditary BC, increasing the risk of cancer in an individual. About a quarter of all hereditary BC cases are associated with mutations in BRCA1/2 [17].

Mutations in BRCA1 account for 80 % of all BRCA1 and BRCA2 mutations in Russians with BC. One of the most common mutant variants identified in Russian patients is 5382insC (rs80357906) that causes a reading frame shift and the loss of function of the encoded protein. The majority of the polymorphisms identified in our study were mutations in BRCA1 and BRCA2, the most common being c.5215+66G>A (rs3092994) in BRCA1, detected in 9 of 16 patients (52.9 %).

Our findings on ATM, TP53 and BRCA1 mutations are on the whole consistent with the literature, which reports TP53 variants to be the most common mutations in BC [17]. Our results of the diversity of BRCA1/2 variants are also comparable with the literature data. Importantly, mutations in these genes are associated with poor prognosis and development of invasive ductal breast cancer. The existences of these mutations are considered at assessment of volume of surgical intervention [17]. In our study, of 12 patients with BC who had mutations in BRCA1 and BRCA2, 8 were diagnosed with invasive ductal carcinoma. Of those 8, six had the mutation c.5215+66G>A in BRCA1.

We have analyzed next generation sequencing data using the original bioinformatic approach and discovered many driver mutations in the samples of malignant breast tumors. Using different databases, we have selected and annotated functionally significant mutations. Altogether, we have discovered 14 mutations affecting the amino acid sequence of the encoded proteins. Each of the studied samples had at least one such mutation. The original bioinformatic protocol allowed us to automatically process DNA sequencing data obtained with NGS.

CONCLUSIONS

A combination of next-generation sequencing and modern algorithms for bioinformatic analysis is a good and clinically attractive method of screening for genetic polymorphisms and assessing the functional effect of mutations detected in the tumor. To date, NGS enables molecular classification of breast tumors and can be used to determine their subtypes depending on the spectrum of the identified mutations and the expression profiles of the studied genes. NGS data can facilitate the choice of adequate targeted therapies. One of the major tasks of cancer genetics is development of convenient tools for the detection of breast cancer biomarkers that can be used by clinicians for more accurate diagnosis and effective treatment. We believe that advances in the filed should include improvement of bioinformatic approaches, adoption of the systems for automatic analysis of tumor genetic profiles and introduction of NGS into clinical routine.

КОММЕНТАРИИ (0)