Научная статья на тему 'Identification of patients with breast cancer by using machine learning algorithms over scikit-learn ml framework'

Identification of patients with breast cancer by using machine learning algorithms over scikit-learn ml framework Текст научной статьи по специальности «Медицинские технологии»

CC BY
201
24
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
РАК МОЛОЧНОЙ ЖЕЛЕЗЫ / BREAST CANCER / АЛГОРИТМЫ МАШИННОГО ОБУЧЕНИЯ / MACHINE LEARNING ALGORITHMS / КЛАССИФИКАЦИЯ ДАННЫХ / DATA CLASSIFICATION / КОМПЬЮТЕРНЫЙ ПРОГНОЗ И ДИАГНОСТИКА / COMPUTER AIDED PROGNOSIS AND DIAGNOSIS / СүТ БЕЗі қАТЕРЛі іСіГі / КОМПЬЮТЕРЛіК ОқЫТУ АЛГОРИТМДЕРі / ДЕРЕКТЕРДі ЖіКТЕУ / КОМПЬЮТЕРЛіК БОЛЖАУ ЖәНЕ ДИАГНОСТИКА

Аннотация научной статьи по медицинским технологиям, автор научной работы — Shamiluulu Sh., Djakbarova U.

In this research study the effect of normalization techniques is examined. The five different supervised machine learning algorithms i.e., KNN, Decision tree, Naïve-base, Logistic regression and ANN are used on breast cancer dataset obtained from UCI machine learning repository and their performances are compared. The study reveal that different preprocessing techniques can increase the classification accuracy over 90% where high performance is given to Logistic regression and ANN. The proposed approach can be implemented in a well-known benchmark medical problem with real clinical data forbreast cancer disease diagnosis.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

ИДЕНТИФИКАЦИЯ ПАЦИЕНТОВ С РАКОМ МОЛОЧНОЙ ЖЕЛЕЗЫ С ИСПОЛЬЗОВАНИЕМ АЛГОРИТМОВ ОБУЧЕНИЯ МАШИНЫ SCIKIT-LEARN ML FRAMEWORK

В данной статье исследуется пять различных контролируемых алгоритмов машинного обучениядля набора данных о раке молочной железы, и сравниваются полученные результаты. Исследование показывает, что различные методы предварительной обработки могут повысить точность диагностики более чем на 90%, когда высокая производительность предоставляется логистической регрессии и ANN. Предлагаемый методвместе с клиническими данными может быть использован для диагностики медицинских проблем рака молочной железы.

Текст научной работы на тему «Identification of patients with breast cancer by using machine learning algorithms over scikit-learn ml framework»

Вестник Ка^НЖУ №1-2018

REFERENCES

1 Temtamy SA, McKusick VA. The Genetics of Hand Malformations. - New York: Alan R Liss, 1978. - 219 p.

2 Temtamy SA, Aglan MS. Brachydactyly // Orphanet Journal of Rare Diseases. - 2008. - №2. - P. 96-105.

3 Bell J. On brachydactyly and symphalangism // Treasury of Human Inheritance. - London: Cambridge University Press; 1951. - Vol.5. - P. 21-31.

4 OMIM, Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/Omim/searchomim.html

5 Winter RM, Baraitser M. The London Medical Database. - Oxford: Oxford University Press, 2006. - 152 p.

6 Zhao X, Sun M, Zhao J, Leyva JA, Zhu H, Yang W, Zeng X, Ao Y, Liu Q, Liu G, Lo WH, Jabs EW, Amzel LM, Shan X, Zhang X. Mutations in HOXD13 underlie syndactyly type V and a novel brachydactyly-syndactyly syndrome // Am J Hum Genet. - 2007. - №80. - P. 361-371. doi: 10.1086/511387.

7 Polinkovsky A, Robin NH, Thomas JT and al. Mutations in CDMP1 cause autosomal dominant brachydactyly type C [Letter] // Nature Genet. - 1997. - №17. - P. 18-19. doi: 10.1038/ng0997-18.

А. Карака

Гарвардмедициналыцмектебь Бостон, А^Ш

БРАХИДАКТИЛИЯГА ЦАТЫСТЫ СИРЕК КЕЗДЕСЕТ1Н ГЕНЕТИКАЛЫЦ АУРУЛАР

ty^h: рахидактилия [BD] фалангтардыц дамымауыменжэне ая; немесе ;олсауса;тарыныц ;ыск;аруымен сипатталады. Юшкентай санды; гипоплазиядан апаразияга дешнп клиникалы; cneKTpi болады. A3 жэне Д типтерш белек, о;шауланган брахидактилиялардыц эртYрлi тYрлерi сирек кездеседь Брахидактилия о;шауланган немесе KYрделi морфология синдромыныц белИ ретшде пайда болуы mym^^ Генетикалы; кецес отбасында брахидактилия тYрлершiц ту;ым куалауына жэне ауруга тэн симптомдардыц болуы немесе болмауына байланысты ЖYргiзiледi. ЖYргiзiлген зерттеу жумыстарыныц нэтижесiнде брахидактилия жайлы эртYрлi физикалы; жэне молекулярлы; мэлiметтер жина;талып жэне осы мэлiметтердi алтеме ретiнде багыттауга болады.

ТYЙiндi сездер: брахидактилия[BD), генетика; ;олдыц жэне ая;тыц ;ыс;аруы; CYЙектiц туа бiткен жет^пеушыжтерь

А. Карака

Гарвардская медицинская школа, Бостон, США

РЕДКИЕ ГЕНЕТИЧЕСКИЕ ЗАБОЛЕВАНИЯ, СВЯЗАННЫЕ С БРАХИДАКТИЛИЕЙ

Резюме: Брахидактилия характеризуется недоразвитием фаланг и укорочением пальцев на руках или ногах. Есть и клинический спектр, начиная от незначительной цифровой гипоплазии и заканчивая аплазией. Он наследуется аутосомно-доминантным образом и уменьшает пенетрантность и переменную экспрессию. Различные типы изолированных брахидактилий встречаются редко, за исключением типов А3 и D. Брахидактилия может происходить либо как изолированная мальформация, либо как часть сложного мальформационного синдрома. Характер генетического консультирования будет зависеть от характера наследования типа брахидактилии и наличия или отсутствия сопутствующих симптомов. Таким образом, этот обзор суммирует физические и молекулярные данные о разных типах брахидактилий.

Ключевые слова: брахидактилия [BD), генетика; нарушения рук; укорочение ноги; врожденные пороки развития костей

UDC 57.088

Sh. Shamiluulu1, U. Djakbarova2

1Department of Computer Science, SuleymanDemirel University, Kaskelen, Kazakhstan, 040900 2Department of Molecular Biology and Medical Genetics, Kazakh National Medicine University named after Asfendiyarov, Almaty, 035000 Department of Molecular Biology and Medical Genetics

IDENTIFICATION OF PATIENTS WITH BREAST CANCER BY USING MACHINE LEARNING ALGORITHMS OVER SCIKIT-LEARN ML FRAMEWORK

In this research study the effect of normalization techniques is examined. The five different supervised machine learning algorithms i.e., KNN, Decision tree, Naive-base, Logistic regression and ANN are used on breast cancer dataset obtained from UCI machine learning repository and their performances are compared. The study reveal that different preprocessing techniques can increase the classification accuracy over 90% where high performance is given to Logistic regression and ANN. The proposed approach can be implemented in a well-known benchmark medical problem with real clinical data forbreast cancer disease diagnosis.

Keywords: Breast cancer, Machine Learning Algorithms, Data Classification, Computer Aided Prognosis and Diagnosis

I - introduction. breast cancer plays an important role. Breast cancer is the most

Presently, the use of artificial intelligence (AI) has become widely common cancer among women, except for skin cancers.

accepted in medical applications. This is manifested by an According to CDC statistics 1 in 8 (12%) women in the US will

increasing number of medical devices currently available on the develop invasive breast cancer during their lifetime. Breast

market with embedded AI algorithms [1]. Such devices are being cancer starts when cells in the breast begin to grow out of control

used cancer diagnosis areas where prognosis and diagnosis of [8]. These cells usually form a tumor that can often be seen on an

x-ray or felt as a lump. The tumor is malignant (cancerous) if the cells can grow into (invade) surrounding tissues or spread (metastasize) to distant areas of the body. Breast cancer occurs almost entirely in women, but also possible to occur in men.It's also important to understand that most breast lumps are not cancer, they are benign. Benign breast tumors are abnormal growths, but they do not spread outside of the breast and they are not life threatening. But some benign breast lumps can increase a woman's risk of getting breast cancer. Any breast lump or change needs to be checked by a health care provider to determine whether it is benign or cancer, and whether it might impact your future cancer risk [4].

The goal of a study is to reveal the presence of tumor and classify into two classes benign or malignant.During the analysis we studied the effect of preprocessing and normalization techniques on classification model. The published literature suggests that machine learning (ML) algorithms have been shown to be valuable tools in reducing the workload on the clinicians by detecting artefact and providing decision support, potentially with the ability to automatically re-estimate the prediction or classification model in real-time. II - materials and methods. 2.1 Machine Learning Algorithms.

The scikit-learn machine learning framework with five algorithms has been used to evaluate the classification performance on breast cancer dataset. The brief explanations for algorithms are provided below.

1. K Nearest Neighbors (KNN) algorithm is one of the first simple supervised learning machine learning algorithms. The logic behind this method is to find a predefined number of training samples closest in distance to the new point, and predict the label from these given data-points. Despite its simplicity, nearest neighbors has been successful in a large number of classification and regression problems. As a distance metric generally the Euclidean distance measure is used. For detailed information refer [1].

2. Decision Trees (D-Tree)is a supervised learning method that is used for classification and regression. The feature is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. This

method has some advantages like being simple to understand and easy to interpret and also trees can be visualized and requires little data preparation. The method is based on information theory paradigm. The more information can be obtained [1].

3. Gaussian Naïve Bayes (NB) is a classification technique based on Bayes' Theorem. In general, the Naïve Bayes classifier assumes the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an orange if it is orange, round, and about 10 cm in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as 'Naive'. This method's advantage is that Naive Bayes model is easy to build and particularly useful for very large data sets. For details refer [1].

4. Logistic regression (Logit)is a part of regression models where the output value is binary or dichotomous. The prediction curve is S-shaped and based on a sigmoid function [1]. Because of non-linear nature this algorithm shows one of the best results on getting the classification model for the data, for details refer results and discussion section.

5. Artificial Neural Network (ANN)is a new alternative to Logit, the statistical technique with which they share the most similarities.Neural networks are algorithms that are patterned after the structure of the human brain [1]. They contain a series of mathematical equations that are used to simulate the biological processes such as learning and memory.In a ANNs, one has the same goal as in Logit modeling, predicting an outcome based on the values of some predictor variables.

2.2. Data collections.

The dataset obtained from UCI machine learning repository. There are 31features and over 600 instances.Table 1 shows details of attributes with correlation coefficients. The target attribute provides 4 categories where first three are related to heart diseases and last one to healthy state. The hold-out method used for training and testing the models, where 70% for training set and 30% for testing set.

fraelaLdimens ion_worsl symmetry worsl points, worst concave ooncaYity_worst compactness worsl smooihnsss worsl ate a worst perimetsr.worst texture. worst iwUmiwocst fractat dimension sc symmetry, .sc concave points se

compac<nes9 se smoothness se area.se perimatsr.se !enlura_se radiu3_.se fractal _dimensioti_™ar: symmstiy .mean

eoncave_PQinis_ mgan

cor>cii'jity. mean compactn&ss_mean smoothness mean area., mean perimeter mean [e*lure_mean ladius. mean

Kt

3 4 5 6 7 8 9 10 11 12 13 14 15 (6 17 18 19 20 21 22 23 24 25 26 27 28 29 Î3 31 Figure 1 - Correlational plot for all features

BecmHUK Ka^HMy №1-2018

In the Figure 1, we can see a correlational plot for 31 features. The red square cells indicate the high correlation whereas the blue dots low correlation. The dataset has been divided into two parts with highly correlated features and low ones. The goal was to study the effect of correlation and preprocessing techniques

on classification performance of algorithms. The correlation was found by using spearman method, because it will be more precise for non-linear dataset. The features in highly correlated dataset is between ±0.5<r<±1 whereas in low correlated is -0.49<r<+0.49.

Table 1 - Highly correlated featuresset

Feature ID Name Correlation coefficient Description

F11 radius_mean 0.730 Mean of distances from center to points on the perimeter

F12 perimeter_mean 0.743 Mean of perimeter

F13 area mean 0.709 Mean of area

F14 compactness_mean 0.597 Mean of compactness, perimeterA2 / area - 1.0

F15 concavity_mean 0.696 Mean of concavity, severity of concave portions of the contour

F16 concave_points_mean 0.777 Mean of concave points, number of concave portions of the contour

F17 radius_se 0.567 Standard error of distances from center to points on the perimeter

F18 perimeter_se 0.556 Standard error of perimeter

F19 area se 0.548 Standard error of area

F110 radius_worst 0.776 "worst" or largest (mean of the three largest values) of distances from center to points on the perimeter

F111 perimeter_worst 0.783 Mean of the three largest values of perimeter

F112 area_worst 0.734 Largest (mean of the three largest values) of area

F113 compactness_worst 0.591 Compactness's mean of the three largest values

F114 concavity_worst 0.660 Concavity's largest

F115 concave_points_worst 0.794 Worst of concave points

concave ..points., worst concavity_worst COmpaCln9S£_vvor5t area_worst perimeter_woist radlusjrast area_se

perimeter_5e radiu5_5e

concave„pQ¡ms._mean concavity_mean compact ness_mea n area_rnean perimetet .mean radii>s_mean

0 1 2 3 4 S 6 7 a 9 10 11 12 13 14

Figure 2 - Correlational plot for highlyrelated features

Table 2 - Low correlated features set

Feature ID Feature name Correlation coefficient Description

F21 id 0.040 ID number

F22 texture mean 0.415 Standard deviation of gray-scale values

F23 smoothness mean 0.359 Mean of local variation in radius lengths

F24 symmetry_mean 0.330 Mean of symmetry

F25 fractal dimension mean -0.013 Mean of coastline approximation" - 1

F26 texture_se -0.008 Standard error of texture (standard deviation of gray-scale values)

F27 smoothness_se -0.067 Standard error of smoothness (local variation in

radius lengths)

F28 compactness_se 0.293 Standard error of compactness (perimeterA2 / area - 1.0)

F29 concavity_se 0.254 Standard error of concavity (severity of concave portions of the contour)

F210 concave points_se 0.408 Standard error of concave points (number of concave portions of the contour)

F211 symmetry_se -0.007 Standard error of symmetry

F212 fractal_dimension_se 0.078 Standard error of fractal dimension ("coastline

approximation" - 1)

F213 texture_worst 0.457 Largest (mean of the three largest values) of texture

F214 smoothness worst 0.421 Worst local variation in radius lengths

F215 symmetry_worst 0.416 Mean of the three largest values of symmetry

F216 fractal dimension worst 0.324 Largest of "coastline approximation" - 1

fractal .dimension, wotsl symtrietry_worsl smoothness_worst texture_worsl fractaLdimensit)n_se symmetry _se concsvg points_s9 concavity_se coinpactncss_se smoottiness_se tejdurvs fractaLdinMnsbrunean

symmetry _mean 5moothncs5_mcan teïtti(e_mean id

I J

S

0 1234 56789 10 Figure 3 - Correlational plot for lowrelated features

III - literature review.

This section reviews several studies related to applications of machine learning algorithms for working with medical data especially related to cancer. It can be seen that a great variety of methods were used which reached high prediction and classification accuracies using the datasets generally taken from UCI-ML repositories.Zhongyu Pang and Lloyd, S.R (2008) developed an innovative signal classification method that is capable of differentiating subjects with sleep disorders which cause excessive daytime sleepiness (EDS) from normal control subjects who do not have a sleep disorder based on EEG and pupil size [2]. In another study, Kiyan et. al., trained Neural Network using back propagation and achieved an accuracy level on the test data of approximately 94% on breast cancer data [3].

The authors in this research study [4] presented BP-ANN attempt where they used 47 input features and achieved an accuracy of 95%.Moris et al. used logistic regression algorithm on heart diseases dataset. By applying various preprocessing techniques, he achieved in obtaining 77.0% of classification accuracy [5]. Further, Kamruzzaman et al. proposed a neural network ensemble based methodology for diagnosing of the heart disease diagnosis and achieved prediction accuracy over 80% [6]. Moreover, Das et al.[7] in 2008 applied genetic algorithm (GA) based Neuro Fuzzy Techniques for breast cancer identification and adaptive neuro fuzzy classifier has been introduced to classify the tumor mass in breast. So from the research studies above it can be seen the ml algorithms can be successfully applied in medical field.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

BeernuuK Ka^MMy №1-2018

IV-implementation, results and discussions.

Models simulations performed over scikit-learn ml framework for 5 different algorithms explained in Section 2.1 over breast cancer dataset. In order to reveal the true potential of algorithms

the dataset has been divided into two parts shown in Table 1 and 2. The first part contains the highly correlated features where r >= 0.5 and second part is lower than r < 0.5.

E

|

|

i 1 I E Ï S t I

V

lis

j*

• - _

. - .

v.:;

. ï . **

, j* '

i J>j i * ■

Figure 4 - Descriptive analysis for highly correlated features

The descriptive analysis studies performed for two sets and show in Figure 4 and 5. Based on this analysis

the one of the features with correlation of 0.85 were removed from dataset.

? % A .4 jf!; ' f'l* ■ -V*' '' ■""V *

V A i.ft- ■y.v; ..H* ' .

i . - * V*" a'iii 1 p"* ■ r.1 ' '. ..J.. V ■

ï ""•v. Df. . w -

I- "v _ t HSSst ■ "i v ■ - 1Ï."

I .j.y /S

i -f ; ■■"«ï- ■ -

fi $ /\ ..'fff ■ ■< Ci;-

!f H V*: '.'-'■it" ___. •:- ■V ■/-"■

* _ * J A ■ ■ ■■ ( ' '

f f s; ■ A v . ■■ fi'-

ft >

J-'- ... J'J. ......

i H ■ ji- "fef. t'.- -W. -J ' . -

Figure 5 - Descriptive analysis for low correlated features

The accuracy scores before preprocessing are given in Figure 6. We can see that Logit and DT are performing the best. The

standardization preprocessing technique gave the highest accuracy scores for ml algorithms.

011>;t. vc№lnlkifib

Figure 6 - Accuracy scores on not preprocessed data

There are other important concepts related to real-world applications where our data will not come naturally as a list of real-valued features. In these case, we will need to have methods to transform non real-valued features to real-valued ones.

Besides, there are other steps related to feature standardization and normalization, are needed to avoid undesired effects regarding the different value ranges.

I "

Dd

i iTB i_va I da Inns

Figure 7 - Accuracy scores after normalization technique applied

After applying the normalization technique, we can see that there the highest scores for over 90% but the accuracy scores for other

is an effect on accuracy scores. In this case KNN and DT shows drops shown in Figure 7.

Вестник Ка^ЖМУ №1-2018

Т'.\ .^iriHlllr'

Figure 8 - Accuracy scores after standardization technique applied

The standardization technique best fits the Logit and ANN algorithms; we can see that the accuracy scores for them increases over 92% shown in Figure 8. So there is an effect of correlation and normalization techniques on accuracy scores. Conclusions.

The breast cancer is one of the most common and deadly diseases in the world. The detection and diagnosis of breast cancer in its early stage is the key of its cure for women. In this

research study we have analyzed the effect of different preprocessing techniques on ml algorithms accuracy scores. The study found that normalization increases scores for KNN and DT for over 90% but the accuracy scores for other drops. On the other hand, the standardization increases the scores for the Logit and ANN algorithms for over 92%. In conclusion we can say that before making any diagnostic assumptions several preprocessing techniques has to be applied and accuracy scores tested.

REFERENCES

1 Harrington, Peter. "Machine learning in action". - Greenwich, CT: Manning, 2012. - Vol. 5. - P. 88-96.

2 Derong Liu; Zhongyu Pang; Lloyd S.R, "A Neural Network Method for Detection of Obstructive Sleep Apnea and Narcolepsy" // Based on Pupil Sizeand EEG. - 2008. - V.19, I.2. - P. 126-169.

3 Kiyan, T., and Yildirim, T. (2003). Breast Cancer Diagnosis Using Statistical Neural Networks // International XII. Turkish Symposium on Artificial Intelligence and Neural Networks. University Besiktas, Istanbul. - Turkey: 2003. - P. 51-56.

4 Seker .H., Odetao M.,Petroric D. and Naguib R.N.G.(1994)- "A fuzzy logic based method for prognostic decision making in breast and prostrate cancers" // Biomedicine (IEEE transactions). - 2003. - №73. - P. 88-96.

5 S. Haykin. Neural Networks: A Comprehensive Foundation. - New York: 1994. - 523 p.

6 Morise, A. P., Detrano, R., Bobbio, M., & Diamond, G. A. (1992). Development and validation of a logistic regression-derived algorithm for estimating the incremental probability of coronary artery disease before and after exercise testing // Journal of the American College of Cardiology. - 1992. - №20(5). - P. 1187-1196.

7 S. M. Kamruzzaman, Ahmed Ryadh Hasan, Abu Bakar Siddiquee and Md. EhsanulHoqueMazumder // Medical diagnosis using neural network, ICECE 2004, 28-30 December 2004. - Dhaka, Bangladesh: 2004. - P. 28-34.

8 Arpita Das and Mahua Bhattacharya, GA based Neuro Fuzzy Techniques for breast cancer Identification // IEEE. - 2008. - №2. - P. 978986

9 Hongmin Zhang, Xuefeng Dai, The Application of Fuzzy Neural Network in Medicine-A Survey // International Conference on Biological and Biomedical Sciences Advances in Biomedical Engineering. - 2012. - Vol.9. - P. 52-56.

Ш. Шамилулу1, У. Диакбарова2

1Сулеймен Демирел университету Алматы, К,азацстан040900 Компьютерлк гылым кафедрасы 2С.Ж. Асфендияров атындагы К,азац улттыцмедицинаунивеpcитетi 035000 Молекулалыц биология жэне медициналыцгенетика кафедрасы

SCIKIT-LEARNMLFRAMEWORK МАШИНАСЫНЬЩ АЛГОРИТМ1Н ЦОЛДАНУ АРЦЫЛЫ СУТ БЕЗ1 ЦАТЕРЛ1 1С1Г1НЕ ШАЛДЬЩЦАН НАУЦАСТАРДЫ САРАЛАУ

ТYЙiн: Мак;аладак;алыпк;а кел^ру эд^тершщ эсерi зерттелдШС1 оку машинасырепозиторшнен алынган бес тYрлi бас;арылатын о;ыту машинасыныц алгоритмдер^т безi к;атерлi гапнщ деректер жинагы Yшiн пайдаланылады жэне алынган нэтижелердi салыстыру ЖYргiзiлдi. Емдеудщ алдын-алудыц эртYрлi эдiстерi логистикалы; регрессиясы жэне АНН жогары ешмдшпмен ;амтамасыз етыген кезде жжтелудщ90%о - дан астам ;атеаз болатындыгы зерттеу жумысыныц нэтижес керсетедь На;ты клиникалы; керсеткiштерi бар усынылып отырган эд^ CYт безi к;атерлi iсiгiнiц медициналы; диагностика мэселелерi Yшiн ;олданылу мYмкiн.

ТYЙiндi сездер: CYт безi к;атерлi iсiгi, компьютерлж о;ыту алгоритмдерь деректердi жiктеу, компьютерлiк болжау жэне диагностика

Ш. Шамилулу1, У. Диакбарова2

1Университет Сулеймана Демиреля, Алматы, Казахстан Кафедра компьютерных наук 2Казахский Национальный медицинский университет имени С.Д.Асфендиярова Кафедра молекулярной биологии и медицинской генетики

ИДЕНТИФИКАЦИЯ ПАЦИЕНТОВ С РАКОМ МОЛОЧНОЙ ЖЕЛЕЗЫ С ИСПОЛЬЗОВАНИЕМ АЛГОРИТМОВ ОБУЧЕНИЯ МАШИНЫ SCIKIT-LEARN ML FRAMEWORK

Резюме: В данной статье исследуется пять различных контролируемых алгоритмов машинного обучениядля набора данных о раке молочной железы, и сравниваются полученные результаты. Исследование показывает, что различные методы предварительной обработки могут повысить точность диагностики более чем на 90%, когда высокая производительность предоставляется логистической регрессии и ANN. Предлагаемый методвместе с клиническими данными может быть использован для диагностики медицинских проблем рака молочной железы.

Ключевые слова: рак молочной железы, алгоритмы машинного обучения, классификация данных, компьютерный прогноз и диагностика

GRNTI 577.29

*B.A. Ussipbek, XN.T. Ablayhanova, XM.K. Murzakhmetova, XZ.B. Esimsiitova, 2V. Isachenko, 3P. Tleubekkyzy

1Kazakh National University named after al-Farabi, Kazakhstan, Almaty 2Department of Obstetrics and Gynecology, University Maternal Hospital, Cologne University,

Cologne, Germany 3JSC «Astana medical university», Astana city, Kazakhstan

CRYOPRESERVATION OF HUMAN SPERMATOZOA

Cryopreservation of spermatozoa plays an important role in modern assisted reproductive technologies. Today, the freezing of human sperm in liquid nitrogen is a reliable and widespread method of storage for many years. Cryopreservation of sperm is a method of storage of ejaculate, which implies its freezing and further stay in liquid nitrogen in special tanks at a temperature of -196 ° Celsius. The shelf life is not limited. Under these conditions, the biochemical processes in the cells are suspended until the time of defrosting, and their biological functions remain after defrosting. Cryopreservation of spermatozoa is a reliable protection against transmission of various diseases in their undeveloped stage. Through the use of cryopreservation in infertile couples, it is possible to use donor genetic material to find a long-awaited child. The study of cryopreservation methods has an important experimental value in the application.

Keywords: cryopreservation, artificial insemination, spermatozoon, temperature, liquid nitrogen, freezing, auxiliary reproductive technologies.

Cryopreservation is the freezing and storage of living biological objects with the possibility of restoring their biological functions after defrosting. Over the past decade in the development of reproductive medicine worldwide, there has been a trend towards cryopreservation of spermatozoa. The method of cryopreservation allows preserving the quality of biological material for several years, which is achieved due to careful development and study of freezing and thawing techniques. This approach provides greater controllability and effectiveness of treatment in overcoming male and female infertility. Very low temperatures are used for cryopreservation. The standard is -196 ° C. Vessels with genetic material are placed in liquid nitrogen, which provides this temperature. To use higher temperatures for storage of this biological material is inexpedient, since they are ineffective and do not allow preserving the reproductive function. One of the main problems of cryopreservation is to minimize the duration of exposure to embryos and oocytes of fundamentally harmful effects, as well as to avoid damaging the cells formed during freezing by ice crystals. Today, there are two ways of cryopreservation of embryos: slow freezing and vitrification. Slow freezing until recently was the only effective way to preserve embryos. However, using this technique, ice crystals are formed, which injure the cells. Relatively recently, an alternative method, vitrification, has become widespread. Vitrification facilitates and simplifies the process of freezing embryos. The advantage of this method is that the embryos are not damaged by ice crystals, as occurs under controlled slow cryonconservation - the liquid contained in the cells of embryos under the influence of special substances is transferred to the vitreous state [1]. The researchers found that with superfast cooling of small amounts of sperm (direct immersion in liquid nitrogen), sperm

die. Exception was represented only by some samples of sperm of stallions, and also of a person. Freezing rabbit sperm in thin-walled (10 ^m) aluminum bags, the sperm movement was restored only after rapid thawing in warm (380 ° C) water. With slow thawing in the air, all the spermatozoa invariably died. This death in this case is most easily explained from the position of the vitrification hypothesis. I.V. Smirnov and A.E. Bruenko found that with an increase in the thawing rate of frozen sperm granules (volume of the pellet 0.1 ml), the percentage of sperm resuming motion sharply increases. Such spermatozoa at the thawing temperature of 0°C were 19%, at 20 ° C - 30%, at 40°-43.5%, at 50°-50.55%, at 60°C -58%, at 70°C -65%. To eliminate the possible death of sperm from overheating, the sperm bottles, without waiting for complete thawing of the granules, were transferred from the "hot" baths into the water at a temperature of + 30 ° C. Obviously, with rapid heating it is possible to avoid recrystallization of vitrified sperms in the temperature zone, where the rate of crystallization processes is the highest. Note, by the way, that these data indicate that some cells die not in the process of freezing, but in thawing.

In experiments it was shown that in spermatozoa of the cock, partially dehydrated by adding levulose to the sperm, after quick freezing at -76 ° and thawing at a temperature of 42-45 °, mobility was restored. When samples were stored for many days or several months at -76 ° C, about 30% of the spermatozoa recovered. Out of 48 eggs that were smoked by the chickens, which were inseminated with frozen sperm, 12 appeared to have been fertilized, but the development of the embryo in them, observed with the naked eye, lasted no more than 10-15 hours. When later some researchers tried to use the method of vitrification of human sperm by Lyuyet, only in a small number of spermatozoa cooled to very low temperatures, after warming,

i Надоели баннеры? Вы всегда можете отключить рекламу.