Научная статья на тему 'MULTI-OMICS PORTRAYAL OF BREAST CANCERS'

MULTI-OMICS PORTRAYAL OF BREAST CANCERS Текст научной статьи по специальности «Биологические науки»

CC BY
0
0
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
breast cancer / multi-omics data / self-organizing maps. / опухоль молочной железы / мульти-омиксные данные / самоорганизующиеся карты.

Аннотация научной статьи по биологическим наукам, автор научной работы — S. Davitavyan

The factors that influence tumor development are diverse and ambiguous. Consequently, there is a necessity for the development of new data-mining approaches that will contribute to efficient disease diagnostics, prognostics, and monitoring of disease courses. This study is aimed to identify patterns of genetic changes that provide changes in expression in tumor samples. Multiomics data for analyzing the main manifestations of genetic shifts were obtained from the GDC database. We used a multilayer self-organizing map (ML-SOM) algorithm for clustering and dimension reduction. This approach allows for distinguishing the sets of factors associated with cancer, and on the other hand, it provides information on the distribution of multilayer genetic data among tumor samples.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

МУЛЬТИ-ОМИКС ИССЛЕДОВАНИЕ ОПУХОЛИ МОЛОЧНОЙ ЖЕЛЕЗЫ

Факторы, влияющие на развитие опухоли, разнообразны и неоднозначны. Существует необходимость в разработке новых подходов к сбору данных, которые будут способствовать эффективной диагностике заболеваний, прогнозированию и мониторингу прогресса заболеваний. Целью данного исследования является выявление закономерностей генетических изменений, которые обуславливают изменения экспрессии в опухолевых образцах. Мульти-омиксные данные для анализа основных проявлений генетических изменений были получены из базы данных GDC. Мы использовали алгоритмы самоорганизующейся карты (ML-SOM) для кластеризации и уменьшения размерно93 сти данных. Такой подход позволяет выделить наборы факторов, связанных с опухолью, и предоставляет информацию о распределении мульти-омиксных генетических данных среди образцов.

Текст научной работы на тему «MULTI-OMICS PORTRAYAL OF BREAST CANCERS»

DOI 10.24412/cl-37235-2024-1-87-93

MULTI-OMICS PORTRAYAL OF BREAST CANCERS

S. Davitavyan1'2

institute of Biomedicine and Pharmacy, Russian-Armenian (Slavonic) University 2Research Group of Bioinformatics, Institute of Molecular Biology NAS RA surendavitavyan98@gmail. com

ABSTRACT

The factors that influence tumor development are diverse and ambiguous. Consequently, there is a necessity for the development of new data-mining approaches that will contribute to efficient disease diagnostics, prognostics, and monitoring of disease courses.

This study is aimed to identify patterns of genetic changes that provide changes in expression in tumor samples. Multiomics data for analyzing the main manifestations of genetic shifts were obtained from the GDC database. We used a multilayer self-organizing map (ML-SOM) algorithm for clustering and dimension reduction. This approach allows for distinguishing the sets of factors associated with cancer, and on the other hand, it provides information on the distribution of multilayer genetic data among tumor samples.

Keywords: breast cancer, multi-omics data, self-organizing maps.

Introduction

Breast cancer is the most common cancer worldwide with 7.8 million women alive as of the end of 2020 who had received a diagnosis within the previous five years. Women in diverse age groups are at risk of receiving a breast cancer diagnosis, moreover, the development of this disease is directly correlated with aging [1]. The implication of new approaches is becoming imperative because of the diverse course across the different subtypes of ductal cancer. Receptor-based molecular classification methods continue to be applied for modern clinical purposes.

Nevertheless, new classification approaches have been developed. One of the examples is transcriptomics-based PAM50 classification which uses expression signals from 50 genes to classify breast cancer [2]. Multi "-omics" data has revealed new ways to understand the covert relationships between genomic features and serves as a basis for developing machine learning models capable of comprehensive analysis of big biological data.

In our previous studies, we showcased the advantages of self-organizing map pipelines in analyzing and describing multi-layer datasets [3]. Low-grade gliomas study [4], B-cell lymphomas study[5], and others prove the credibility of the SOM method to depict the origins and further progression of the illness.

To unveil the cornerstone of the Breast Carcinoma development process, we have used multi-SOM machine learning algorithms on gen-, tran-scriptomic, and epigenetic data.

Materials and methods

Study Datasets

In this study, we used available -omic datasets of the TCGA-BRCA project [6]. Total, RNA-seq counts, microarray promoter and gene body methylation, microar-ray CNV, and SNV were obtained for 996 samples.

Data preprocessing

RNA-seq counts were subjected to library size normalization and converted to log counts using variant stabilization transformation.

Promoter methylation data was converted from betta to m values.

CNVs were normalized by adding small numbers to avoid constant values.

SNVs were summarized by genes.

Also, we have divided samples into subtypes by PAM50 gene signature classification, which separates breast cancer into Basal (N= 161), LumA (N= 480), LumB (n= 199), HER2E (N= 74), normal-like (N= 32), True Normal (N= 46), and CLOW (N= 4) groups.

Integrated analysis of cancer molecular features with multi-layer SOM

To conduct a comprehensive integrative analysis of -omics datasets of breast cancer, we employed a refined multi-layer self-organizing maps (ml-SOM) approach, extending our prior research. The self-organizing map (SOM) algorithm is a neural network-based technique for dimensionality reduction and clustering. Numerous studies have demonstrated its effectiveness in grouping genes based on their expression profiles into co-expressed modules. Furthermore, the SOM implementation in the "oposSOM" package is enriched with robust function mining capabilities, facilitating the assignment of biological functions to gene clusters. This capability efficiently reduces the high-dimensional gene space into numerous differentially expressed functional modules. As a result, this approach enables a seamless transition from analyzing individual genes to conducting systems-level analyses while preserving the integrity of the original information.

In this approach, we organized all the datasets into distinct layers and trained them collectively on a single SOM grid, similar to a classical single-layer SOM (sl-SOM) [7].

Analysis of survival

Survival analysis (overall survival, disease-free survival, disease-specific survival, and progression-free survival) was performed using the Cox proportional hazards regression using contsurvplot R package [8]. The model includes survival as a dependent variable and group information, and spot -omic profiles as predictors.

Results

The ml-SOM algorithm not only highlighted the differences between subtypes, as we expected, but showed relationships between data categories as well. (Figure 1A). By primary visual comparison of group portraits, we have identified specific spots for subtypes, meanwhile, we have observed several common clusters between these groups. To confirm the last hypothesis, we have conducted a correlational analysis of the expressional levels for each subtype (Figure 1B). The analysis indicates the similarity of True Normal and normal-like, Basal and HER2E groups. There was a weak correlation between LumA and Basal, as well as LumB and normal-like groups.

Figure 1. Visualization of SOMgroup portraits (A). From left to right are group portraits of transcriptomics, methylation, copy number variations, and single nucleotide variants, respectively. Red spots belong to highly expressed clusters, which are the target for research. Correlations between groups in transcriptomic level (B). True Normal and normal-like, also Basal and HER2E subtypes show a high correlation. K-means defined gene clusters (C). Spot A differs not only between subtypes but also between data categories.

Furthermore, gene clusters based on k-means rates were investigated (Figure 1C). This part of the study aimed to identify specific transcriptome-level changes across subtypes and suggest possible reasons for these changes by uncovering the influence of other categories of data. We have chosen Spot A because of its specificity in several subtypes and different values among data categories. For unveiling the dependents of gene expression levels from methylation, CNV, and SNV data categories we have trained linear regression models. The methylation profile was significantly negatively correlated with expression levels in basal and normal-like groups, while a positive association with CNV was observed in lum A, lum B, and HER2 cancers. No significant trend was observed for SNV in all cancer subtypes (Figure 2).

Groups

—— T»ue Not mal

---MEfKE

- - - - Lum6

• — • • UimA .... Basal

• — • nornioMikn

Figure 2. Representation of trained linear regression models. In the Y-axis are placed log transformed Gex values, from left to right X-axis refers to Gmx, CNV, and SNV data, and colors describe subtypes.

The cluster contains 114 genes, to analyze this number of genes we performed gene enrichment analysis using the online tool "Webgestalt" (Figure 3) [9]. We used KEGG pathways [10] to understand the likely phenotypic changes due to the mixing of changes at different levels in genes from spot A.

DNA replication One carbon poo! by lolate Fatly acid biosynthesis Cell cycle Amino sugar metabolism p53 signaling pathway Pathways In cancer

Enrichment ratio

Figure 3. Gene enrichment analysis. KEGG pathways differed in genes from selected spots. The enrichment ratio shows the significance of the results.

The enrichment shows significance with the DNA replication[11], One carbon pool by folate[12], and Cell cycle pathways[13].

Also, survival analysis of samples showed that the LumA and LumB subtypes better survived than HER2E and Basal [14] (Figure 4).

Cross-checking the results of gene enrichment and survival analysis against literature sources showed the applicability of using the approach on multi-omics data.

Overall survival

Group

Btul

ci cm*

LitfnA UanB noufwl Mw

Time(days)

Figure 4. Survival plot of breast subtypes based on transcriptomic level. The colors refer to subtypes. Some of the subtypes do not provide sufficient significant results due to the small number of patients.

Conclusions

Machine learning methods enable a comprehensive study of the nature of breast cancer, identifying the factors that determine the levels of gene expression.

The methylation profile exhibited a significant negative correlation with expression levels in basal and normal-like groups, while a positive association with CNV was observed in lum A, lum B, and HER2 cancers. No significant trend was observed for SNV in all cancer subtypes.

LumA and LumB subtypes have a better survival probability than other subtypes.

REFERENCES

1. Chia Stephen K. et al. "A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen" // Clinical cancer research. 18.16 (2012): 4465-4472.

2. Loffler-Wirth Henry, Martin Kalcher and Hans Binder. "oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on Bioconductor". Bioinfor-matics 31.19 (2015): 3225-3227.

3. Binder Hans et al. "Integrated Multi-Omics Maps of Lower-Grade Gliomas". Cancers vol. 14, 11 2797. 4 Jun. 2022, doi:10.3390/cancers14112797

4. Hopp L., Nersisyan L., Loffler-Wirth, H., Arakelyan, A., & Binder H. (2015). Epigenetic Heterogeneity of B-Cell Lymphoma: Chromatin Modifiers. Genes, 6 (4), 1076-1112. https://doi.org/10.3390/genes6041076

5. Cancer Genome Atlas Research Network et al. "The Cancer Genome Atlas Pan-Cancer analysis project." Nature genetics vol. 45,10 (2013): 1113-20. doi:10.1038/ng.2764

6. Loffler-Wirth Henry et al. "oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor." Bioinformatics (Oxford, England) vol. 31, 19 (2015): 3225-7. DOI:10.1093/bioinformatics/btv342

7. Denz Robina, Timmesfeld Ninaa. Visualizing the (Causal) Effect of a Continuous Variable on a Time-To-Event Outcome. Epidemiology 34(5). PP. 652-660, September 2023. DOI: 10.1097/EDE.0000000000001630

8. https://www.who.int/news-room/fact-sheets/detail/breast-cancer

9. Jing Wang, Suhas Vasaikar, Zhiao Shi, Michael Greer, Bing Zhang, WebGestalt (2017): a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit, Nucleic Acids Research, Volume 45, Issue W1, 3 July 2017, Pages W130-W137: https://doi.org/10.1093/nar/gkx356

10. Kanehisa, Minoru et al. "KEGG for taxonomy-based analysis of pathways and genomes." Nucleic acids research vol. 51,D1 (2023): D587-D592. doi:10.1093/nar/gkac963

11. Rajamanickam S., Park J., Subbarayalu P. et al. Targeting aberrant replication and DNA repair events for treating breast cancers. Commun Biol 5, 493 (2022). https://doi.org/10.1038/s42003-022-03413-w

12. Xu, Xinran, and Jia Chen. "One-carbon metabolism and breast cancer: an epidemiological perspective." // Journal of genetics and genomics = Yi chuan xue bao vol. 36,4 (2009): 20314. DOI:10.1016/S1673 -8527(08)60108-3

13. Thu, KL et al. "Targeting the cell cycle in breast cancer: towards the next phase." Cell cycle (Georgetown, Tex.) vol. 17,15 (2018): 1871-1885. doi:10.1080/15384101.2018.1502567

14. Nguyen, Chu Van et al. "Molecular classification predicts survival for breast cancer patients in Vietnam: a single institutional retrospective analysis." // International journal of clinical and experimental pathology vol. 14,3 322-337. 1 Mar. 2021.

МУЛЬТИ-ОМИКС ИССЛЕДОВАНИЕ ОПУХОЛИ МОЛОЧНОЙ ЖЕЛЕЗЫ

С.С. Давитавян1'2

1Институт Биомедицины и Фармации, Российско-Армянский университет 2Исследовательская Группа Биоинформатики, Инситут Молекулярной Биологии

НАН РА

АННОТАЦИЯ

Факторы, влияющие на развитие опухоли, разнообразны и неоднозначны. Существует необходимость в разработке новых подходов к сбору данных, которые будут способствовать эффективной диагностике заболеваний, прогнозированию и мониторингу прогресса заболеваний.

Целью данного исследования является выявление закономерностей генетических изменений, которые обуславливают изменения экспрессии в опухолевых образцах. Мульти-омиксные данные для анализа основных проявлений генетических изменений были получены из базы данных GDC. Мы использовали алгоритмы самоорганизующейся карты (ML-SOM) для кластеризации и уменьшения размерно-

сти данных. Такой подход позволяет выделить наборы факторов, связанных с опухолью, и предоставляет информацию о распределении мульти-омиксных генетических данных среди образцов. Ключевые слова: опухоль молочной железы, мульти-омиксные данные, самоорганизующиеся карты.

i Надоели баннеры? Вы всегда можете отключить рекламу.