BREAST CANCER: ANALYSIS OF DRIVER SOMATIC MUTATIONS DETECTED BY NEXT-GENERATION SEQUENCING

Breast cancer (BC) is one of the most common malignancies. There is a need for novel approaches to screening for genetic mutations in patients with BC that will help to reduce high mortality rates caused by this disease and improve treatment outcomes. In this study we employed next generation sequencing to screen a few key genes associated with the risk of breast cancer for mutations. We also evaluated their pathogenicity using the previously proposed bioinformatics-based algorithm and analyzed the associations between some of the detected mutations and the clinical manifestations of the disease. Our study recruited 16 female patients with BC (mean age was 50.7 ± 11.3 years). A total of 58 mutations were detected in the oncogenes BRCA1 , BRCA2 , ATM , CDH1 , CHEK2 and TP53 . Bioinformatic analysis of the sequencing data revealed 14 mutations that affect the sequence of the encoded proteins. Most deleterious mutations were harbored by the genes BRCA1/2 , ATM and TP53 .

Breast cancer (BC) is the second most common type of cancer and the second leading cause of death in women; it is also the most incident cancer worldwide [1]. The risk of BC increases with age: the majority of new cases are reported in women who are 60 to 65 years old. High BC mortality is explained by late diagnosis established when the disease has already progressed to the advanced stage. Metastatic BC is particularly dangerous, since it is resistant even to combination treatments based on chemotherapy, hormones and targeted drugs. The 5-year survival rate in patients with BC is 55 %. This brings the need for novel approaches towards more effective screening as well as targeted therapy of BC based on the molecular genetic profiling of tumors.
The rapid development of next generation sequencing (NGS) has yielded a bulk of information about genetic variants [2]. A lot of mutations are associated with BC, including somatic and germinal mutations in the genes PIK3CA, STK11/LKB1, CDH1, ATM, CHEK2, BRIP1, and PALB2 and mutant variants of the highly penetrant genes associated with hereditary BC, such as TP53, PTEN, MLH1, BRCA1, and BRCA2 [3].
The majority of tumor mutations are somatic; they have an important role in the pathogenesis of cancer and confer de novo resistance to treatment. Thus, a lot of ongoing studies utilize NGS in an attempt to profile mutant variants in tumors. As a result, it has been identified a significant amount of new mutations with unknown function. To describe these polymorphisms, mathematical algorithms are necessary that can automatically process huge data arrays, predict potentially pathogenic mutations and distinguish them from harmless variants. The resulting data can be used when developing screening or diagnostic tools (including liquid biopsy) and selecting adequate targeted therapies.
In this work we analyze a range of mutations identified in key BC oncogenes by NGS, using a previously developed bioinformatic pipeline for the functional annotation of mutations and assessment of their pathogenicity.

METHODS
We obtained tumor samples from 16 patients of Blokhin Russian Cancer Research Center, Moscow. The participants' age range was 27 to 76 years, with a mean of 50.7 ± 11.3 years. All patients had breast malignancies and received combination therapy. The inclusion criteria were as follows: age of 18 to 70 years, sex (all patients were females), histologically and cytologically confirmed breast cancer. The exclusion criteria were a medical history of other tumor types and pregnancy.
Disease stages were determined according to the TNM classification [4]. The study was carried out in the patients with stages T1-3N0-3M0-1.
All patients gave voluntary informed consent. The study complied with the principles of confidentiality. Patients' clinicopathologic features are summarized in Table 1.
DNA isolation and quality control DNA was isolated from the samples of tumor tissue using DNeasy Blood and Tissue Kit (Qiagen, USA). Tumor tissue was cut into small pieces, and buffer ATL was added to the samples. The samples were then treated with proteinase K, incubated at 56 °C until fully lysed, and treated with RNase A. Next, we added 200 µl buffer AL and 96 % ethanol. The resulting mixture was transferred to spin columns and centrifuged at 8,000 g for 1 min. The samples were washed with AW1 and AW2 buffers to remove salts (guanidine and SDS). The columns were eluted twice with 30 µl Low-TE buffer; the samples were incubated and centrifuged according to the manufacturer's protocol. Quality control of the obtained DNA was performed on Qubit 3.0 (Thermo Fisher Scientific, USA). The samples were also run on 1 % agarose gel electrophoresis with ethidium bromide.

Sequencing of targeted oncogenes
DNA libraries were prepared using NEBNext Ultra DNA Library Kit for Illumina (New England Biolabs, USA). The libraries were dual-indexed by PCR using NEBNext Ultra DNA Library Prep Kit for Illumina and NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1, New England Biolabs). Quality control of the obtained DNA libraries was performed on Agilent Bioanalyzer 2100 (Agilent Technologies, USA) using High Sensitivity Kit by the same manufacturer according to the official protocol.
For targeted enrichment of the coding regions of tumor genomes we used MYbaits Onconome KL v1.5 Panel (MYcroarray, USA). The enriched fragments were sequenced with 100 b. p. paired-end reads on HiSeq 2500 (Illumina, USA). Sample preparation and sequencing were done according to Illumina's protocols.

Bioinformatic analysis
Sequencing data were analyzed using an original algorithm developed previously [5]. First, the quality of reads was checked: sequences with read quality below 10 were removed from NGS data using Cutadapt software [6]. Then the reads were mapped to the reference genome hg19 (GRCh37. p13) using the Burrows-Wheeler Aligner algorithm [7]. PCRduplicates were removed by running the rmdup command in SAMtools [8].
Mutations were called with MuTect [9]. DNA sequences covered by at least 12 reads were considered the most significant.
To assess the functional effect of the discovered mutations, they were annotated in SnpEff and their effect on the encoded protein was predicted based on the analysis of genomic coordinates [10].
Based on the bioinformatic analysis and annotation of the identified polymorphisms, we selected those mutations that could significantly affect the regulatory or protein sequences. To assess pathogenicity and conservation of the mutations, we used data from COSMIC (Catalogue of Somatic Mutations In Cancer) [11] and dbNSFP [12]. Additionally, SIFT (Sorting Intolerant From Tolerant) and PolyPhen2 tools were used to BULLETIN OF RSMU 6, 2017 VESTNIKRGMU.RU | | predict pathogenicity of the mutations and assess their effect on the function of the encoded protein [13,14]. Information about mutation frequencies was obtained from the 100 Genomes project and the Exome Aggregation Consortium

DISCUSSION
In Russia, the PCR-based methods for the detection of known mutations in BC-associated genes have become most widespread. However, today there are more advanced methods of genetic screening, the most promising being next generation sequencing that can be used for identifying genetic variants in malignant tumors and is especially suitable in exploring the variability of highly heterogeneous regions of tumor genomes. In this work we applied NGS to study a number of mutations of key oncogenes associated with BC and tested a previously developed algorithm for bioinformatic analysis of sequencing data.
One of the most well-studied genes playing a significant role in BC pathogenesis is TP53. It is involved in the regulation of the cell cycle, apoptotic activity and DNA repair. Mutations in TP53 lead to the disruption of these regulatory mechanisms and may trigger formation of cancer. TP53 is a tumor suppressor; mutant variants of this gene are detected in half of all cancers and in more than 30 % of BC cases. In turn, sporadic breast cancer is characterized by a varying frequency of TP53 mutations between 25 % and 86 %, depending on the disease stage and the screening technique applied. The prognostic value of TP53 mutations in BC has been sufficiently studied [17]. Among the mutations identified in our study the most frequent was c.376-283T>C discovered in 13 of 16 patients (81 %).
Patients with BC and with some of its types in particular have relatively high frequency of BRCA1 and BRCA2 mutations. BRCA1 and BRCA2 are involved in the regulation of many cell processes maintaining genomic stability and homologous    [17]. Mutations in BRCA1 account for 80 % of all BRCA1 and BRCA2 mutations in Russians with BC. One of the most common mutant variants identified in Russian patients is 5382insC (rs80357906) that causes a reading frame shift and the loss of function of the encoded protein. The majority of the polymorphisms identified in our study were mutations in BRCA1 and BRCA2, the most common being c.5215+66G>A (rs3092994) in BRCA1, detected in 9 of 16 patients (52.9 %).
Our findings on ATM, TP53 and BRCA1 mutations are on the whole consistent with the literature, which reports TP53 variants to be the most common mutations in BC [17]. Our results of the diversity of BRCA1/2 variants are also comparable with the literature data. Importantly, mutations in these genes are associated with poor prognosis and development of invasive ductal breast cancer. The existences of these mutations are considered at assessment of volume of surgical intervention [17]. In our study, of 12 patients with BC who had mutations in BRCA1 and BRCA2, 8 were diagnosed with invasive ductal carcinoma. Of those 8, six had the mutation c.5215+66G>A in BRCA1.
We have analyzed next generation sequencing data using the original bioinformatic approach and discovered many driver mutations in the samples of malignant breast tumors. Using different databases, we have selected and annotated functionally significant mutations. Altogether, we have discovered 14 mutations affecting the amino acid sequence of the encoded proteins. Each of the studied samples had at least one such mutation. The original bioinformatic protocol allowed us to automatically process DNA sequencing data obtained with NGS.

CONCLUSIONS
A combination of next generation sequencing and modern algorithms for bioinformatic analysis is a good and clinically attractive method of screening for genetic polymorphisms and assessing the functional effect of mutations detected in the tumor. To date, NGS enables molecular classification of breast tumors and can be used to determine their subtypes depending on the spectrum of the identified mutations and the expression profiles of the studied genes. NGS data can facilitate the choice of adequate targeted therapies. One of the major tasks of cancer genetics is development of convenient tools for the detection of breast cancer biomarkers that can be used by clinicians for more accurate diagnosis and effective treatment. We believe that advances in the filed should include improvement of bioinformatic approaches, adoption of the systems for automatic analysis of tumor genetic profiles and introduction of NGS into clinical routine.