Научная статья на тему 'Using fast homology search tools for protein sequence functional annotation: a comparison'

Using fast homology search tools for protein sequence functional annotation: a comparison Текст научной статьи по специальности «Биологические науки»

CC BY
99
18
i Надоели баннеры? Вы всегда можете отключить рекламу.
i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Using fast homology search tools for protein sequence functional annotation: a comparison»

168 Секция 11

Using fast homology search tools for protein sequence functional annotation: a comparison

A. Pronozin, M. Genaev, D. Afonnikov Institute of Cytology and Genetics SB RAS Email: pronozinartem95@gmail.com DOI: 10.24411/9999-017A-2020-10367

Large size of sequence databases make homologous sequence search difficult in reasonable time. We compare performance of the highly homologous sequence detection by several fast search tools applied for A.thaliana protein sequences represented in OrthoDB database with the sequence ranking obtained by ClustalW. Query: 27,636 sequences of the A. thaliana proteins from TAIR v10 [1]. Query homologs: OrthoDB database v10 [2] orthologous genes. Selected: 9193 (8,522,503 sequences) orthogroups containing A. thaliana proteins. Programs: BLASTP, BLASTP-fast, Diamond, Usearch ('ublast', 'usearch_local'), Mmseq2, ClustalW. Sequence list comparison: F1, MAPK. GO terms associated with homologous hits: semantic similarity (SS), F1. The best values of performance metrics at k=1 - BLASTP (0.95). The performance decreased at k=5 (0.75) and increased to 0.80 at k=6-10. These trends are similar for all tools. The high similarity of GO list (F1 metric) at k=1-10 (0,97). Semantic similarity measure has almost no dependence on k for all gene ages. Smallest search time: Diamond. Results demonstrate that the optimal number of hits returned by fast search program for the functional annotation of the query sequence is 10. The fast homology search tools are able to identify true best hits from large databases within k up to 20 with sufficient accuracy.

Supported by Russian Science Foundation grant 18-14-00293. The computational resources of the Joint HPC Facility 'Bioinformatics' was used with the support of the budget project №0324-2019-0040-C-01.

References:

1. Z. Mustafin,et al. "Phylostratigraphic Analysis Shows the Earliest Origination of the Abiotic Stress Associated Genes in A. thaliana", Genes, 10.12, pp. 963, 2019.

2. E. Kriventseva, et al. "OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs", Nucleic acids research, 47.D1, pp. D807-D811, 2019.

Development of an algorithm for determining morphometric parameters of surface structures of lymphocytes on images of blood cells in research of type of leukosis

К. К. Самхарадзе, В. М. Михелев

Белгородский государственный национальный исследовательский университет

Email: mikhelev@bsu.edu.ru

DOI: 10.24411/9999-017A-2020-10296

This article is devoted to the development of a computer system for studying the type of leukemia based on the analysis of the surface relief of lymphocytes on three-dimensional images of blood cells and the implementation of the framework of this development algorithm for determining the morphometric parameters of their surface structures. Determining the type of leukemia is an important factor in the diagnosis, since the choice of treatment tac-tics and prognosis of the disease depend on this [1, 2]. Recently, relevant in solving this issue is the study of the surface relief of lymphocytes on three-dimensional images of blood cells obtained using AFM [3-5]. However, existing software products that are supplied with the equipment do not provide an opportunity to qualitatively study the surface structures of lymphocytes, characterized by the presence of globular depressions and protrusions, the morphometric parameters of which dynamically change during pathology. In this regard, there is a need to develop a computer system capable of searching, determining the exact number and geometric parameters of globular depressions and protrusions on the surface of lymphocytes. For its implementation, an algorithm has been developed for studying the surface relief of blood cell lymphocytes in three-dimensional images, with the help of which it is possible to accurately and quickly determine all available globular depressions and protrusions, from microscopic to visually detectable, as well as their geometric parameters [6, 7].

Работа выполнена при финансовой поддержке Российского фонда фундаментальных исследований (код проекта 19-07-00133_A).

Список литературы

1. Leukemia. American Cancer Society. URL: https://www.cancer.org/cancer/leukemia-in-children.html (дата обращения: 25.02.2019).

i Надоели баннеры? Вы всегда можете отключить рекламу.