Научная статья на тему 'Modeling of quantitative structure-property relationships by means of group contribution methods'

Modeling of quantitative structure-property relationships by means of group contribution methods Текст научной статьи по специальности «Медицинские технологии»

CC BY
87
11
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
GROUP CONTRIBUTION METHOD / ADDITIVE SCHEME / ATOM/BOND ADDITIVE SCHEME / QSPR / OCTANOL/WATER PARTITION COEFFICIENT / LOGP / MOLAR REFRACTIVITY / HEAT OF FORMATION

Аннотация научной статьи по медицинским технологиям, автор научной работы — Pukalov Ognyan, Paskaleva Vesselina, Kochev Nikolay, Jeliazkova Nina

We present models for theoretical calculation of three physicochemical properties with high importance in drug discovery process. Several group contribution models were developed for prediction of octanol-water partition coefficient (logP), heat of formation (Hf), and molar refractivity (MR). We have used a prototype version of an in-house developed software GCM (Group Contribution Module) which is a part of Ambit software platform.Each target property was theoretically calculated as a sum of individual increments assigned to specific fragments present in the molecule. Zero-order atomic additive schemes and first order bondbased schemes were studied where the atom class definitions were improved varying local atomic descriptors such as: atom type, H-atoms, hybridization, etc. Additionally some global topological descriptors were used as correction factors. Group contribution values were calculated by means of linear regression analysis applied for the training data sets with experimental values consisting of 13097 (logP), 165 (MR) and 464 (Hf) organic compounds respectively. All structures were topologically represented by SMILES linear notation. Different combinations of the chosen descriptors were studied. The models were tested and statistically validated. Models’ test results are presented and discussed.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Modeling of quantitative structure-property relationships by means of group contribution methods»

Научни трудове на Съюза на учените в България-Пловдив, серия Б. Естествени и хуманитарни науки, т. XVIII, ISSN 1311-9192 (Print), ISSN 2534-9376 (On-line), 2018. Scientific researches of the Union of Scientists in Bulgaria-Plovdiv, series B. Natural Sciences and the Humanities, Vol. XVIII, ISSN 1311-9192 (Print), ISSN 2534-9376 (On-line), 2018.

МОДЕЛИРАНЕ НА КОЛИЧЕСТВЕНИ ВРЪЗКИ СТРУКТУРА-СВОЙСТВО ЧРЕЗ ОТЧИТАНЕ ПРИНОСИТЕ НА ХИМИЧНИТЕ

ГРУПИ

Огнян Пукалова, Веселина Паскалеваа, Николай Кочева, Нина Желязковаь a Химически факултет към Пловдивски университет „П. Хил ендарске", ул. „Цар Асен" 24,гр. Пловдив4000, b Идеяконсулт ООД, ул. Ангел Кънчев No 4. 1000 София

MODELING OF QUANTITATIVE STRUCTURE-PROPERTY RELATIONSHIPS BY MEANS OF GROUP CONTRIBUTION

METHODS

Ognyan Pukalova, Vesselina Paskalevaa, Nikolay Kocheva, Nina Jeliazkovab a Faculty of Chemistry,University of Plovidv "P. Hilendarski", 24 Tsar Assen str., Plovdiv 4000, Bulgaria b Ideaconsult Ltd, 4 A. Kanchev sti"., Sofia 1000, Bulgaria

Abstract

We present models for theoretical calculation of three physicochemical properties with high importance in drug discovery process. Several group contribution models were developed for prediction of octanol-water partition coefficient (logP), heat of formation (Hf), and molar refractivity (MR). We have used a prototype version of an in-house developed software GCM (Group Contribution Module) which is a part of Ambit software platform.

Each target property was theoretically calculated as a sum of individual increments assigned to specific fragments present in the molecule. Zero-order atomic additive schemes and first order bond-based schemes were studied where the atom class definitions were improved varying local atomic descriptors such as: atom type, H-atoms, hybridization, etc. Additionally some global topological descriptors were used as correction factors. Group contribution values were calculated by means of linear regression analysis applied for the training data sets with experimental values consisting of 13097 (logP), 165 (MR) and 464 (Hf) organic compounds respectively. All structures were topologically represented by SMILES linear notation. Different combinations of the chosen descriptors were studied. The models were tested and statistically validated. Models' test results are presented and discussed.

Key words: group contribution method, additive scheme, atom/bond additive scheme, QSPR, octanol/water partition coefficient, logP, molar refractivity, heat of formation

Introduction

One of the main challenges of chemoinformatics is to create models allowing the prediction of particular chemical properties for a broad and diverse set of chemical compounds. Efficient tools for handling the latter challenge are so called additive modeling methods also known as group contribution methods (Benson, 1969; Kolska, 2012). The target property value is obtained additively by summing the contributions of each fragment, which can be expressed by the following equation:

P=X nFiag(i)IFiag(i) (1)

where iFrag(i) is the increment (contribution) of fragment Frag(i); nFrag(i) is the number of occurrences of Frag(i) within particular compound. Typically group contribution estimation from eq. (1) models efficiently properties that depend on intra-molecular interactions in short distances where the distance is taken into account by the size of fragments {Frag(i)}. This approach could be extended by correction factors taking into account intra-molecular interactions of larger distances by means of specific structural features such as intra-molecular hydrogen bonds, atom pairs, global molecular descriptors etc. In this case eq. (1) must be rewritten as follows:

P_XnFrag(i)IFiag(i) +X nCor(j)ICor(j) (2)

where Ccor(j) is the increment of correction factor of type Cor(j) ; ncorj is the number of occurrences of Cor(j) within particular compound.

On the base of equation (2) we have developed an in-house software system, GCM, for property prediction described in next section as well as we further describe the results from test cases for modeling of three important physicochemical properties.

Model creation and used software

GCM (Group Contribution Module) is developed in our group and is based on CDK library (Steinbeck, 2003). The software module is integrated within open source chemoinformatics platform Ambit (Jeliazkova, 2011). The main functionalities of GCM provide an efficient environment for prediction of molecular properties using additive scheme methods of zero or higher orders. The first prototype of GCM software supports for input several standard molecular formats (SMILES, InChI, MOL files), usage of local and global descriptors, filters for removing linear and co-liner descriptors. The module calculates increment values for the additive scheme by means of linear regression analysis applied for a given training data set. GCM also supports correction factors and external molecular descriptors in order to take into account additional specific interactions in the molecule: hydrogen bonds, long ranged atom-atom interactions etc. The architecture of the GCM module is represented in figure 1.

Figure 1. GCM architecture and chemoinformatics flow chart for group contribution based QSPR.

Use Cases for Group Contribution Modeling

Three important physicochemical properties were considered to evaluate the performance of GCM software: 1) n-octanol/water partition coefficient (logP); 2) Heat of formation (Hf); 3) Molar refractivity (MR). Group contribution models were created for each target property. Both atomic and bond based additive schemes were studied. The atoms were described by different combinations of the following local descriptors: atom type (A), atom hybridization (Hyb), atom valence (Val), number of heavy (non-hydrogen) neighbors (HeN), number of hydrogen atoms (H), formal charge (FC). Example atom groups coding is shown in figure 2. The increments for the fragments of different types were calculated by means of linear regression analysis applied for a given set of compounds. Additionally, set of 1378 global 0D-2D descriptors were tested as correction factors and their influence was evaluated. The external descriptors were calculated using Dragon 7 software (Kode srl, 2017). For the calculated descriptors, variable selection procedure was performed using genetic algorithm based and principal component analyses methods implemented in Weka software (Frank, 2016).

/

0<0,2,1>

OH

0<1,3,1>

Figure 2. Coding of different atomic groups for the structure of furaneol by means of local atomic descriptor configuration: A<H,Hyb,HeN>.

Results and discussion

The obtained statistical results for the six best models are summarized in Table 1. Model validations were performed by means of leave-one-out (LOO) cross validation, Y-scrambling procedure with 1000 iterations and tests with the training data sets.

For building atom based molar refractivity model (MRa) we found 15 different atomic classes (groups) defined by the local descriptor configuration A<H,HeN,Hyb,FC,Val> Atomic types for the bond-based model (MRb) were configured without any information for the atom neighbors. The atom based model for Heat of formation (Hfa) was built using 9 atomic classes described with the local atom properties A<H,Hyb,Val> and 5 additional descriptors nDB, H%, nCsp, nR07, D/Dtr03. The bond-based Hfb model was built with 23 descriptors (atoms within bond groups were described with configuration, A<H>, and chosen external descriptors nDB, nR07, H%, nCsp). The bond based model exhibits slightly better statistical results compared to the atom based model Hfa. The logP model is build with 43 descriptors from which 9 external descriptors and atomic groups defined as A<H> configuration. The obtained results have poor accuracy even for the best atom based model. Applying higher order atom scheme for logP gave much better statistical result. The number of used descriptors almost doubled (94). Atom types were described in the form A<H,FC> and no additional external descriptors were used. Detailed information about the group contributions (increments) for the six models represented in table 1 can be found in Zenodo repository: https://doi.org/10.5281/zenodo.1066277.

Tablel. Statistical characteristics of the created models.

Model Training YS1000 LOO

Nd R2 RMSE MAE R2 Rc2 Q2 RMSE MAE

MRa 15 0.991 0.618 0.235 0.085 0.989 0.989 0.989 0.698 0.262

MRb 7 0.955 1.41 1.03 -0.81 0.973 0.959 0.953 1.451 1.067

Hfa 14 0.931 52.229 28.541 0.026 0.895 0.895 0.893 64.56 32.959

Hfb 23 0.928 53.56 33.163 0.047 0.913 0.912 0.913 58.871 35.598

LogPa 43 0.693 1.081 0.788 0.001 0.691 0.669 0.691 1.086 0.791

LogPb 94 0.847 0.763 0.560 -0.08 0.845 0.842 0.844 0.770 0.565

Visual comparison of the experimental vs. predicted property values for atom (A) and bond-based (B) group contribution models for MR, Hf and logP are given respectively in figures 3, 4 and 5.

Figure 3. Experimental vs. predicted values for MR applying (A) atom group contribution scheme with local atom descriptors A< H,HeN,Hyb,FC,Val> and (B) bond-based group scheme.

Figure 4. Experimental vs. predicted values for Hf applying (A) atom group contribution scheme with local descriptors A<H,Hyb,Val> and (B) bond-based group contribution scheme.

Figure 5. Experimental vs. predicted values for logP applying (A) atom group contribution scheme with local atom descriptors A<H> and (B) bond-base scheme with descriptors A<H,FC>.

Error values distributions for the models (MR, Hf and logP) are shown in figures 6, 7 and 8 respectively. For the MR model, the usage of the bond-based scheme with local atom descriptors not including neighbors' atom information leads to worse predictions (the latter seen in figure 6).

u S o S

o

ta

3 4

A)

7

I

37

24

14

6_l

28

5 ■

ÖÖÖÖÖÖÖ ÖÖÖ lllllll

MR error

28 27

B)

MR error

Figure 6. Distribution of the error values for MR atom group contribution model and bond-based

group contribution model.

u S

e

S

o -

ta

A)

270

118

1 2 15 31 15 7 2 3

H I I ■ H

m VO <N >n <N c^ VO m <u

MD cK CD m Ö o

5 0 1 3 7 2 7 1

1 1 6 1 1 1 1 2 A

u S

e

S

e r

ta

210

3 5 5 4 12

29

51

.il

108

24

4 5 3 1

57525758535853

9

7

4

oT^

<N ^

B) '

0 5

0 8

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Figure 7. Distribution of the error values for Hf atom group contribution model and bond-based

group contribution model.

u

e

o S

o

t-

00 oo

vo

OS C^ vo

m

<N

VO

<N

00

cn in

...llllllllll...

00

o t-

m m m (N (N (N ^ ^ ^ ö Ö Ö ööö^^

^ p^ m m ^.o c^ (N <N (N m m m

A)

logP error

u e

e

s ^

e r

t-

<N VO ,—i

<N

<N

<N <N

B)

^t ^ (N 00

o o o o o o logP error

VO r-

00 <N m

<N m VO c^

<N <N <N

Figure 8. Distribution of the error values for logP atom-based group contribution model and bond-

based group contribution method.

The prediction errors for logP atom-based model are in the range -3.9 to +3.9 (except 77 the molecules with even larger errors). Using bond-based group contribution model, the error range shortened to (-2.5, +2.6) where only 37 chemical objects exhibited errors outside this range.

Figure 9 shows application of the atomic group contribution model for predicting of molar refractivity for the molecule of guaiacol where the following equation is applied for the final MR calculation (see guaiacol fragmentation in figure 9):

MRatomic (guaiacol) = Ic<0,3,2 ,0,4>*Nc<0,3,2,0,4> + Ic<1,2,2,0,4>*Nc<1,2,2,0,4> + Ic<3,1,3,0,4>*Nc<3,1,3,0,4> + Io<1,1,3,0,2>*No<i,i,3,o,2> + Io<0,2,3,0,2>*No<o,2,3,0,2> = Ic<0,3,2,0,4>*2 + Ic<1,2,2,0,4>*4 + Ic<3,1,3,0,4>* 1 + Io<1,1,3,0,2>*1 + Io<0,2,3,0,2>* 1= 34.58 sm3/mol

Out of 15 local atom groups, only 5 are found in the molecule of guaiacol (3-Methoxyphenol). The predicted property is very close the value theoretically predicted by ACDLabs software (http://www.chemspider.com/Chemical-Structure.8657.html).

C<1,2,2,0,4>

0<1,1,3,0,2>

OH

C<1,2,2,0,4> C<1,2,2,0,4>

C<1,2,2,0,4>

C<0,3,2,0,4>

C<3,1,3,0,4>

0—CH3

0<0,2,3,0,2>

Group (x)

C<0,3,2,0,4> C<1,2,2,0,4> C<3,1,3,0,4> 0<1,1,3,0,2> 0<0,2,3,0,2>

Increment (Ix)

3.33 4.49 5.73 2.53 1.70

Occurrence (Nx) 2 4 1 1 1

Figure 9. Application of atomic additive scheme for prediction of molar refractivity of guaiacol.

Figure 10 shows bond-based fragmentation for the molecule of guaiacol, which contains 3 out of 7 model groups. Theoretical calculation of molar refractivity this case is as follows:

C<0,3,2,0,4>

MRbond (guaiacol) = Ic-o*Nc-o + Ic=c*Nc=c + Ic-c*Nc-c = Ic-o*3 + Ic=c*3 + Ic-c*3 = 50.37 sm3/mol

OH

Group (x)

Increment (Ix) Occurrence (Nx)

C=C

C-C

C-O

4.46

3

C=C -O0-5CH3

C=C

6.87

3

C-C

5.46

3

Figure 10. Application of bond-based scheme for prediction of molar refractivity of guaiacol. Conclusions

The obtained results show that GCM software module could be successfully used for theoretical prediction of physicochemical properties of organic compounds using group contribution approach. Created models for molar refractivity (MR), heat of formation (Hf) and partition coefficient (logP) contain diverse chemical groups and cover a large part of the chemical space. These models can be used for various chemoinformatics tasks where GCM models can be applied on a wide and diverse range of organic compounds.

Acknowledgements

We would like to thank Plovdiv University Scientific Fund (project Myi7-X®-027) for supporting this scientific work.

References:

Benson S. W., Cruickshank F. R., Golden D. M., Haugen G. R., O'Neal H. E., Rodgers A. S., Shaw R., and Walsh R., Additivity rules for the estimation of thermochemical properties. Chem. Rev., 69(3):279-324, 1969

Kolska Z., Zabransky M., and Randova A., Group Contribution Methods for Estimation of Selected Physico-Chemical Properties of Organic Compounds, Chapter 6 in Thermodynamics -Fundamentals and Its Application in Science, Ricardo Morales-Rodriguez (Editor), InTech, 2012

Steinbeck Ch., Han Y., Kuhn S., Horlacher O., Luttmann E., and Willighagen E., The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics, J. Chem. Inf. Comput. Sci., 43(2):493-500, 2003

Jeliazkova N., Jeliazkov V., AMBIT RESTful web services: an implementation of the OpenTox application programming interface, J. Cheminform., 3:18, 2011

Kode srl, Dragon version 7.0.8 (software for molecular descriptor calculation), 2017, https://chm.kode-solutions.net

Frank E., Hall M. A,, and Witten I. H. (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition,

2016

i Надоели баннеры? Вы всегда можете отключить рекламу.