Научная статья на тему 'STRUCTURE–REACTIVITY MODELING USING MIXTURE-BASED SIMPLEX DESCRIPTORS (SiRMS) AS THE REPRESENTATION OF CHEMICAL REACTIONS'

STRUCTURE–REACTIVITY MODELING USING MIXTURE-BASED SIMPLEX DESCRIPTORS (SiRMS) AS THE REPRESENTATION OF CHEMICAL REACTIONS Текст научной статьи по специальности «Химические науки»

CC BY
106
46
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
simplex descriptors / machine learning / condensed reaction graph / chemoinformat-ics / molecular modelling / SiRMS / симплексные дескрипторы / машинное обучение / конденсированный граф реакций / хемоинформатика / молекулярное моделирование / SiRMS

Аннотация научной статьи по химическим наукам, автор научной работы — Фролычева Юлия Андреевная, Ибатулина Люция Мунировна

At present, modeling “structure property” is one of the main areas of chemoin-formatics, the purpose of which is to predict various properties of chemical objects. More and more scientists in this area are attracting chemical reactions as objects of modeling. Reactions are more complex modeling objects than chemical compounds. When dealing with a computer representation of a chemical reaction, the scientist is faced with the need to take into ac-count the contributions of many components that form the reaction system: solvent, substrate and rea-gent. The properties of the reaction also substantially depend on the reaction conditions; therefore, when building a model, temperature, pressure, the presence of a catalyst, and much more must be tak-en into account. The work is related to the search for descriptions using chemical reaction descriptors. We built structure-property models using simplex descriptors (SiRMS) as representations of reactions. As a machine learning method, we used the random forest method. As a result of the work, was shown the possibility of representing chemical reactions using mixed simplex descriptors along with other methods of representing chemical reactions (in the form of a condensed reaction graph, difference fragment descriptors).

i Надоели баннеры? Вы всегда можете отключить рекламу.

Похожие темы научных работ по химическим наукам , автор научной работы — Фролычева Юлия Андреевная, Ибатулина Люция Мунировна

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

МОДЕЛИРОВАНИЕ СТРУКТУРЫ-РЕАКТИВНОСТИ С ИСПОЛЬЗОВАНИЕМ СИМПЛЕКС-ДЕКРИПТОРОВ НА ОСНОВЕ СМЕСИ (SiRMS) КАК ПРЕДСТАВЛЕНИЕ ХИМИЧЕСКИХ РЕАКЦИЙ

В настоящее время моделирование «структура – свойство» является одним из основных направлений химиоформатики, целью которого является прогнозирование раз-личных свойств химических объектов. Все больше и больше ученых в этой области привлека-ют химические реакции как объекты моделирования. Реакции – более сложные объекты моделирования, чем химические соединения. Имея дело с компьютерным представлением химической реакции, ученый сталкивается с необходи-мостью учитывать вклад многих компонентов, образующих реакционную систему: раствори-теля, субстрата и реагента. Свойства реакции также существенно зависят от условий реакции; поэтому при построении модели необходимо учитывать температуру, давление, наличие ката-лизатора и многое другое. Работа связана с поиском описаний с использованием дескрипторов химических реак-ций. Мы построили модели структур-свойств, используя симплексные дескрипторы (SiRMS) в качестве представления реакций. В качестве метода машинного обучения мы использовали ме-тод случайного леса. В результате работы была показана возможность представления химиче-ских реакций с использованием смешанных симплексных дескрипторов наряду с другими ме-тодами представления химических реакций (в виде сжатого графа реакций, разностных де-скрипторов фрагментов).

Текст научной работы на тему «STRUCTURE–REACTIVITY MODELING USING MIXTURE-BASED SIMPLEX DESCRIPTORS (SiRMS) AS THE REPRESENTATION OF CHEMICAL REACTIONS»

y^K 378

STRUCTURE-REACTIVITY MODELING USING MIXTURE-BASED SIMPLEX DESCRIPTORS (SiRMS) AS THE REPRESENTATION OF CHEMICAL REACTIONS

Y.A. Phrolycheva, L.M. Ibatulina

phrolycheva@gmail .com

Kazan (Volga region) Federal University Kazan, Russia

Abstract. At present, modeling "structure - property" is one of the main areas of chemoin-formatics, the purpose of which is to predict various properties of chemical objects. More and more scientists in this area are attracting chemical reactions as objects of modeling.

Reactions are more complex modeling objects than chemical compounds. When dealing with a computer representation of a chemical reaction, the scientist is faced with the need to take into account the contributions of many components that form the reaction system: solvent, substrate and reagent. The properties of the reaction also substantially depend on the reaction conditions; therefore, when building a model, temperature, pressure, the presence of a catalyst, and much more must be taken into account.

The work is related to the search for descriptions using chemical reaction descriptors. We built structure-property models using simplex descriptors (SiRMS) as representations of reactions. As a machine learning method, we used the random forest method. As a result of the work, was shown the possibility of representing chemical reactions using mixed simplex descriptors along with other methods of representing chemical reactions (in the form of a condensed reaction graph, difference fragment descriptors).

Keywords: simplex descriptors, machine learning, condensed reaction graph, chemoinformat-ics, molecular modelling, SiRMS.

For citation: Phrolycheva Y.A., Ibatulina L.M. Structure-reactivity modeling using mixture-based simplex descriptors (SiRMS) as the representation of chemical reactions. Kazan Bulletin of Young Scientists. 2020;4(4):59-68.

Introduction

The research laboratory "Chemoinformatics and molecular modeling" conducts research in the following areas, for example: using and development of chemoinformatics and molecular modeling tools for the design of new materials and drugs; modeling of organic and metabolic reactions using chemoinformatics methods: from empirical to predictive chemistry; data mining for translational medicine.

Chemoinformatics is a multidisciplinary field of theoretical chemistry, located at the intersection of chemistry, computer science, biology, pharmacology, physics and mathematical statistics, focused on the development of mathematical models that relate to the physical, chemical or biological properties of molecules based on known experimental data. Its main application is the computer design of

new molecules, materials, or reactions that possess the required characteristics based on computer processing of available data.

The high efficiency of such virtual synthesis can significantly reduce financial and labor resources, create a product safe for living systems and minimize the environmental impact of chemical production. This is extremely attractive for such high-tech technologies as the creation of new materials, substances for industry and pharmaceuticals. Currently, almost every major pharmaceutical company has a department of chemoinformatics, bioinformatics and molecular modeling.

An urgent task for world science and technology is also the application of chemoinformatics approaches to predicting the properties of new materials and na-nomaterials, in particular.

At present, modeling "structure-property" is one of the main areas of chemoinformatics, the purpose of which is to predict various properties of chemical objects. More and more scientists in this area are attracting chemical reactions as objects of modeling.

Reactions are more complex modeling objects than chemical compounds. When dealing with a computer representation of a chemical reaction, the scientist is faced with the need to take into account the contributions of many components that form the reaction system: solvent, substrate and reagent. The properties of the reaction also substantially depend on the reaction conditions; therefore, when building a model, temperature, pressure, the presence of a catalyst, and much more must be taken into account.

Purpose of work: building a structure-property model on mixed simplex descriptors (SiRMS) and comparing the indicators of the constructed model with structure-property models built on other methods of model building.

Main goals:

1. comparison of various ways of representing chemical reactions;

2. consider the approach of presenting the reaction as a combination of two mixtures: a mixture of reagents and a mixture of products (using simplex descriptors (SiRMS));

3. familiarization with the methods of building models "with a teacher": the essence of the method, cross-validation, quality assessment metrics for the resulting models;

4. creating a model that predicts reaction rate constants using the random forest machine learning method (Random Forest) for bimolecular nucleophilic substitution, bimolecular elimination, Diels-Alder and tautomeric equilibrium reactions;

5. evaluation of the obtained models using two parameters - determination coefficient (R2) and standard error (RMSE).

Methods

Datasets

Four types of reactions were used to build the models: bimolecular nucleo-philic substitution (SN2) reactions [16], bimolecular elimination (E2) [14], Diels-Alder (DA) [15] and tautomeric equilibrium (Tautomers) [13].

In the term SN2, the S stands for substitution, the N stands for nucleophilic, and the number two stands for bimolecular, meaning there are two molecules involved in the rate determining step. The rate of bimolecular nucleophilic substitution reactions depends on the concentration of both the haloalkane and the nucleophile.

E2 reactions are typically seen with secondary and tertiary alkyl halides, but a hindered base is necessary with a primary halide. The mechanism by which it occurs is a single step concerted reaction with one transition state. The rate at which this mechanism occurs is second order kinetics, and depends on both the base and alkyl halide. A good leaving group is required because it is involved in the rate determining step. The leaving groups must be coplanar in order to form a pi bond; carbons go from sp3 to sp2 hybridization states.

In organic chemistry, the Diels-Alder reaction is a chemical reaction between a conjugated diene and a substituted alkene, commonly termed the dieno-phile (also spelled dieneophile), to form a substituted cyclohexene derivative. It is the prototypical example of a pericyclic reaction with a concerted mechanism.

Tautomerization is pervasive in organic chemistry. It is typically associated with polar molecules and ions containing functional groups that are at least weakly acidic. Most common tautomers exist in pairs, which means that the proton is located at one of two positions, and even more specifically the most common form involves a hydrogen changing places with a double bond: H-X-Y=Z ^ X=Y-Z-H.

Descriptors

The methods of quantitative correlation of "structure - property" relationships (the common abbreviation QSPR, from the English Quantitative Structure -Property Relationships) are widely used to predict the properties of chemical objects. The basis of modeling "structure - property" is the assumption that the property (physical, chemical, physico-chemical, biological and others) of an object is determined by its structure. To describe the structure of a chemical object, the so-called descriptors are used - various characteristics of the object [1]. According to the generally accepted definition [2], the descriptor is "the end result of a logical and mathematical procedure that converts the chemical information encoded in the symbolic representation of an object into a useful number or the result of some standardized experiment". To date, more than 3,000 descriptors are known that are used to construct "structure - property" models [2].

In recent years, scientists have attracted much attention to non-traditional chemical objects as objects of QSPR modeling (these include mixtures [3], nano-materials [4], polymers [5], inorganic salts [6], and so on). In general, the construction of prognostic models for such objects is a much more complicated task compared to the construction of models for individual compounds.

So, for example, when considering chemical reactions as objects of modeling "structure - reactivity", it is necessary to take into account the structure of all compounds (reagents and products) and the conditions of the reaction (solvent or mixtures thereof, temperature, the presence of a catalyst and others) [1].

One of the problems in the field of structure - reactivity modeling is the search for an optimal set of descriptors that could describe the relationship of structure with reactivity for various types of chemical reactions. Using different descriptor methods for describing chemical reactions leads to different models for the same reactions, which raises the problem of choosing the best model [8].

In the course of the work, the representation of a chemical reaction was used as the subtraction of the products of simplex reagent descriptors from simplex descriptors.

Within the framework of the SiRMS methodology, one compound can be represented as a set of tetraatomic fragments (simplexes) of a fixed composition and topology. The number of identical simplexes is used as descriptor values. Generated simplexes can also be labeled in accordance with various atomic properties (partial atomic charges, lipophilicity, H-bond donor / acceptor, etc.). This happens in three stages:

I. Simplex descriptors representing bound or unbound molecular subgraphs of N atoms (N = 2-6 in this study) are generated. For a mixture of the three components A, B and C, the program generates simplexes of individual species, including atoms only A and B, as well as mixtures of simplexes that include atoms of two (AB, BC, AC) or three (ABC) components.

II. feature vectors of individual simplexes are summed, which leads to the vector DS = A + B + C.

III. the combination of DS and DM leads to SiRMS-mix - a vector of features of the entire mixture. We have chosen the representation of the chemical reaction in the form of the difference between the descriptors of the mixture of the product and the reagent.

Representation of chemical reactions using mixed simplex descriptors was used to construct "structure-property" models predicting the rate constants of bimolecular elimination (E2), bimolecular nucleophilic substitution (SN2), Diels-Alder (DA), and tautomeric equilibrium (Tautomers) reactions. Catalyst and solvent descriptors were added to the resulting vector to improve the predictive ability of the model.

Model building and validation

The simulation was carried out using the random forest machine learning method (from the English Random Forest). The rate constants of chemical reactions (for SN2, DA, E2) and the values of the tautomeric equilibrium constants were used as predictive properties.

To write code for creating models, it is first necessary to import auxiliary libraries into the development environment and configure the virtual environment. It is also worth noting that in the course of work we used: PYTHON version 3.7; NumPy array Pandas library DataFrame objects? SciPy library; Scikit-Learn library.

In the protocol for constructing a "structure-property" model, there are several stages, one of which is training the model on a data set and comparing the dependent variables and experimental values predicted by the model (validation). The selection of model hyper parameters and model quality control was carried out using 5-fold cross-

validation. For this, the data set was divided into five equal samples. The first Vs part of the objects was declared a control sample, all the rest 4/s training. Using the model built on the training set, the properties of the objects of the control set were predicted. Then the next part was declared control, and the procedure was repeated until each part acted as a control sample (Fig. 1). At the final stage, the values of the predicted characteristics were compared with experimental data.

IV

V

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

■ контрольной

ныборкл

■ обучающие выборки

- Hûkiep эггала

Fig. 1. Stages of 5x Cross Validation Quantitative indicators of the predictive ability of models

Predictive ability of models is most often estimated using two parameters -determination coefficient (R2) and standard error (RMSE). The coefficient of determination (R2) is the fraction of the variance of the dependent variable, explained by the model of dependence under consideration, that is, explanatory variables. It is considered as a universal measure of the dependence of one random variable on many others. The coefficient of determination describes the ability of the model to predict the rate constants of chemical reactions. The closer R2 is to unity, the more accurate the forecast for these compounds is calculated as follows:

YNtrain(..ccilc _ exP\2 ?2 _ -i ^ i = 1 (y i yi J

= 1 - :

Ж1 (УГ-УГ ) 2

Where y - is the value of the observed quantity.

The standard deviation is in probability theory and statistics the most common indicator of the dispersion of values of a random variable relative to its mathematical expectation. The standard error indicates the average deviation of the predicted values from the true ones. The larger the value of this parameter, the worse the model predicts new values and is calculated as follows:

RMSE =

M

Ntrainr.,calc

i;=trrn (У >

уГУ

N

Where N is the number of training samples. 3. Results and discussion

After using the "learning with a teacher" method of constructing models and modeling using the Random Forest machine learning method on a set of obtained data, we obtained some estimates of the constructed models.

In Fig. 2 presents a diagram where the used data sets are located on the abscissa axis, and the values of determination coefficient (R2) (Fig. 2) and standard deviation (RMSE) (Fig. 3) are located on the ordinate axis, respectively.

Determination coefficient (R2) for four data sets

1,00000 0,80000 0,60000 0,40000 0,20000 0,00000

0,86408

0,81072

0,71731

0,55467

Реакции Дильса-АльдераТаутомеры Реакции E2 Реакции SN2

Fig. 2. Determination coefficient for 4 data sets

1,20000 1,00000 0,80000 0,60000 0,40000 0,20000 0,00000

Standard Deviation (RMSE) logK for four datasets -1,08173

0,70885

0,78498

0,50788

I

Реакции Таутомеры Дильса-Альдера

Реакции E2 Реакции SN2

Fig. 3. The standard deviation of logK for 4 data sets

Further we will present the results of representing a chemical reaction as simplex descriptors with the values of R2 and RMSE representing a chemical reaction (Fig. 4) using: a condensed reaction graph (CGR); only reagents; reaction products; combining descriptors of reagents and products; subtraction of descriptors of reagents and products, atomic pairs and topological torsions. The graphs (Fig. 4 and Fig. 5) were obtained in the Research Laboratory of Chemoin-formatics and Molecular Modeling earlier, when comparing various descriptor methods for chemical reactions in the structure-reactivity modeling.

Наборы данных

Fig. 4. The dependence of the determination coefficient on the method of descriptive description of chemical reactions for four data sets using cross-validation [14]

Fig. 4 shows a graph where, for each of the ways of presenting reactions, the simulation results obtained after building and testing models using cross-validation "reaction-out" were demonstrated. Analysis of the results showed that for two data sets (E2, SN2) a similar distribution of the values of the determination coefficient was observed depending on the method of representing the chemical reactions. The highest determination coefficients were obtained in the case of models where descriptors of (i) RAG, (ii) descriptors of reagents, and (iii) combined descriptors of reagents and products were used to describe the reactions. It is important to note that models based on difference descriptors were also among the leaders in terms of quantity. However, in this case, the selection of fragmentation parameters is important, since the upper boundary and the upper quartile of the span diagram are located at a sufficient distance from each other. In the case of a set of Diels-Alder reactions, all models showed good prognostic ability regardless of fragmentation parameters, since the range of accepted values is rather limited. Also in Fig. 4, for the set of tautomeric equilibria, the values of the coefficient of determination of models based on RAG descriptors for certain fragmentation parameters were clearly distinguished.

As you can see when comparing the two diagrams, the value of R2 for all sets of reactions represented as simplex descriptors lies in the same range as when representing chemical reactions in other ways.

The RMSE indices (Fig. 5) for the reactions of nucleophilic substitution of SN2, bimolecular elimination (E2), and Diels-Alder (DA) also have similar values. Therefore, based on the values of assessing the health of the constructed model using the method of training with the teacher, we can conclude that it is possible to compete in the representation of chemical reactions in the form of simplex descriptors (SiRMS) with other available methods for representing chemical reactions.

Composition of the training sample The composition of the control sample

SN2 E2 DA

R2 RMSE R2 RMSE R2 RMSE

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

SN2, E2, DA 0.82 0.49 0.72 0. 78 0.85 0.75

SN2 0.82 0.50

E2 0.71 0.79

DA 0.84 0.75

Fig. 5. Quantitative indicators for modeling with different composition of training samples [14]

The best descriptor description methods obtained after searching for the optimal descriptor description of chemical structures for each set of reactions were subsequently used when comparing solvent description methods in the structure-reactivity simulation.

Conclusion

After almost five decades "in the development process", QSAR modeling has established itself as one of the main methodologies of computational molecular modeling. Like any mature research discipline, QSAR modeling can be characterized by a set of clearly defined protocols and procedures that allow the expert application of the method to study and exploit the ever-growing collections of biologically active chemical compounds. This review discusses the most important QSAR modeling procedures that we consider to be best practices in this area. We discuss these procedures in the context of the integrative predictive modeling process QSAR, which is focused on achieving models with the highest statistics.

Specific elements of the workflow consist of preparing data, including chemical structure (and, when possible, related biological data), determining emissions, detecting emissions, balancing datasets, and model validation. We emphasize the procedures used to validate models, both internal and external, as well as the need to determine the areas of applicability of models that should be used when models are used to predict external connections or composite libraries. Finally, we present some examples of the successful application of QSAR models for virtual screening to identify experimentally confirmed matches.

In the course of the work, the overall goal of which was to build models of predictive reaction rate constants, we implemented the following tasks:

- learning the model on a data set;

-using the scikit-learn library, building a model that predicts the rate constants of chemical reactions (for SN2, DA, E2) and the values of the tautomeric equilibrium constants;

- assessment of the predictive ability of models using the representation of chemical reactions as a set of simplex descriptors using two parameters - determination coefficient (R2) and mean square error (RMSE);

- confirmed the possibility of using simplex descriptors as a way of representing a chemical reaction.

References

1. Baskin, I.I. Introduction to chemoinformatics. Tutorial. Part 3. Modeling "structure-property" / II. Baskin, T.I. Majidov, A.A. Warnek. - Kazan: Kazan University Press, 2015. - 350p.

2. Palm V. A. P 14 Fundamentals of the quantitative theory of organic reactions. Ed. 2nd, per. and add. L., "Chemistry", 1977.-200p.

3. Hammett, L.P. Physical Organic Chemistry Reaction Rates, Equilibria, and Mechanisms / Heidelberg: McGraw-Hill, 1940 .-- 348 p.

4. Polishchuk P. Structure-reactivity modeling using mixture-based representation of chemical reactions / P. Polishchuk, T. Madzhidov, T. Gimadiev, A. Bodrov, R. Nugmanov, A. Varnek // Journal of Computer - Aided Molecular Design . - 2017. - V. 31 -1342 rub.

5. Varnek, A. Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures / A. Varnek, D. Fourches, F. Hoonakker, V.P. Solov'ev // Journal of Computer - Aided Molecular Design. - 2005. - V. 19 - 703 r.

6. Fingerprints - Screening and Similarity. Daylight Theory Manual // C.A. James, D. Weininger, J. Delaney. - 783p.

7. Baskin, I. Fragment Descriptors in SAR / QSAR / QSPR Studies, Molecular Similarity Analysis and in Virtual Screening // Chemoinformatics Approaches to Virtual Screening / ed. by A. Varnek, A. Tropsha - Cambridge: RSC Publisher, 2008 - 50 p.

8. Majidov, T.I. Introduction to chemoinformatics: textbook. manual Part 1. Computer representation of chemical structures / Majidov T.I., Baskin I.I., Antipin I.S., Varnek A.A., - Kazan: Kazan. Univ., 2013 .-- 174 p.Madzhidov, T. I. Structure-reactivity relationships in terms of the condensed graphs of reactions / T.I. Madzhidov, P. G. Polishchuk, R. I. Nugmanov, A. V Bodrov, A. I. Lin, I. I. Baskin, A. A. Varnek, I. S. Antipin // Russian Journal of Organic Chemistry. - 2014. - V. 50 - 463p..

9. Toropov, A.A. QSPR study on solubility of fullerene in organic solvents using optimal descriptors calculated with SMILES / A.A. Toropov, D. Leszczynska, J. Leszczynski // Chemical Physics Letters. - 2007. - V. 441 - 122p.

10. Kravtsov, A.A. "Bimolecular" QSPR: estimation of the solvation free energy of organic molecules in different solvents / A.A. Kravtsov, P.V. Karpov, I.I. Baskin, V.A. Palyullin, N.S. Zefi-rov // Doklady Chemistry. - 207. - V. 414 - 342p.

11. ISIDA Fragmentor 2017 - User Manual // Ruggiu F., Marcou G., Solov'ev V., Horvath D., Varnek A. [Электронный ресурс] - URL: http://infochim.u-strasbg.fr/recherche/ Download/F ragmentor/F ragmentor2017_Manual.pdf

12. Madzhidov, T. I. Structure-reactivity relationship in bimolecular elimination reactions based on the condensed graph of a reaction / T. I. Madzhidov, P. A.V. Bodrov, T.R. Gimadiev, R.I. Nugmanov, A. I. Lin, A.A. Varnek, I.S. Antipin // Journal of Structural Chemistry. - 2015. - V. 56 - 1234 p.

13. Madzhidov, T.I. Structure-reactivity relationship in Diels-Alder reactions obtained using the condensed reaction graph approach / T. I. Madzhidov, T.R. Gimadiev, R.I. Nugmanov, I.I. Baskin, I.S. Antipin, A.A. Varnek // Journal of Structural Chemistry. - 2017. - V. 58 - 656p.

14. Gimadiev, R.I. Assessment of tautomer distribution using the condensed reaction graph approach / T.R. Gimadiev, T.I. Madzhidov, R.I. Nugmanov, I.I. Baskin, I.S. Antipin, A.A. Varnek // Journal of Computer-Aided Molecular Design. - 2018. - V. 32 - 414p.

15. Gimadiev, T. Bimolecular nucleophilic substitution reactions: Predictive models for rate constants and molecular reaction pairs analysis / T. Gimadiev, T. Madzhidov, I. Tetko, R. Nugmanov, I. Casciuc, O. Klimchuk, A. Bodrov, P. Polishchuk, I. Antipin, A. Varnek //Molecular informatics. -2019. - V. 38. - 115p.

16. Hu, Q.-N. Assignment of EC numbers to enzymatic reactions with reaction difference fingerprints / Q.-N Hu, H. Zhu, X. Li, M. Zhang, Z. Deng // PLOS One. - 2012. - V. 7

МОДЕЛИРОВАНИЕ СТРУКТУРЫ-РЕАКТИВНОСТИ С ИСПОЛЬЗОВАНИЕМ СИМПЛЕКС-ДЕКРИПТОРОВ НА ОСНОВЕ СМЕСИ (SiRMS) КАК ПРЕДСТАВЛЕНИЕ

ХИМИЧЕСКИХ РЕАКЦИЙ

Ю.А. Фролычева, Л.М. Ибатулина

phrolycheva@gmail.com

Казанский (Приволжский) федеральный университет г. Казань, Россия

Аннотация. В настоящее время моделирование «структура - свойство» является одним из основных направлений химиоформатики, целью которого является прогнозирование различных свойств химических объектов. Все больше и больше ученых в этой области привлекают химические реакции как объекты моделирования.

Реакции - более сложные объекты моделирования, чем химические соединения. Имея дело с компьютерным представлением химической реакции, ученый сталкивается с необходимостью учитывать вклад многих компонентов, образующих реакционную систему: растворителя, субстрата и реагента. Свойства реакции также существенно зависят от условий реакции; поэтому при построении модели необходимо учитывать температуру, давление, наличие катализатора и многое другое.

Работа связана с поиском описаний с использованием дескрипторов химических реакций. Мы построили модели структур-свойств, используя симплексные дескрипторы (SiRMS) в качестве представления реакций. В качестве метода машинного обучения мы использовали метод случайного леса. В результате работы была показана возможность представления химических реакций с использованием смешанных симплексных дескрипторов наряду с другими методами представления химических реакций (в виде сжатого графа реакций, разностных дескрипторов фрагментов).

Ключевые слова: симплексные дескрипторы, машинное обучение, конденсированный граф реакций, хемоинформатика, молекулярное моделирование, SiRMS.

Для цитирования: Фролычева Ю.А., Ибатулина Л.М. Моделирование структуры-реактивности с использованием симплекс-декрипторов на основе смеси (SiRMS) как представление химических реакций. Казанский вестник молодых ученых. 2020;4(4): 59-68.

Авторы публикации

Фролычева Юлия Андреевная, бакалавр, Казанский (Приволжский) федеральный университет, г. Казань, Россия. Email: phrolycheva@gmail.com

Ибатулина Люция Мунировна, старший преподаватель, Казанский (Приволжский) федеральный университет, г. Казань, Россия. Email: lucide@list.ru

Authors of the publication

Julia A. Phrolycheva, Undergraduate, Kazan (Volga region) Federal University, Kazan, Russia. Email: phrolycheva@gmail.com

Lucia M. Ibatulina, Senior lecturer, Kazan (Volga region) Federal University, Kazan, Russia. Email: lucide@list.ru

Дата поступления 14.07.2020 Принята к публикации 17.09.2020

i Надоели баннеры? Вы всегда можете отключить рекламу.