Abstract
Recently the authors published a robust QSPR model of aqueous solubility which exploited the computationally derived molecular descriptor topographical polar surface area (TPSA) alongside experimentally determined melting point and logP. This model (the “TPSA model”) is able to accurately predict to within ± one log unit the aqueous solubility of 87% of the compounds in a chemically diverse data set of 1265 molecules. This is comparable to results achieved for established models of aqueous solubility e.g. ESOL (79%) and the General Solubility Equation (81%). Hierarchical clustering of this data set according to chemical similarity shows that a significant number of molecules with phenolic and/or phenol-like moieties are poorly predicted by these equations. Modification of the TPSA model to additionally incorporate a descriptor pertaining to a simple count of phenol and phenol-like moieties improves the predictive ability within ± one log unit to 89% for the full data set (1265 compounds −8.48 < logS < 1.58) and 82% for a reduced data set (1160 compounds 6.00 < logS < 0.00) which excludes compounds at the sparsely populated extremities of the data range. This improvement can be rationalized as the additional descriptor in the model acting as a correction factor which acknowledges the effect of phenolic substituents on the electronic characteristics of aromatic molecules i.e. the generally positive contribution to aqueous solubility made by phenolic moieties
Original language | English |
---|---|
Pages (from-to) | 2950-2957 |
Number of pages | 7 |
Journal | Journal of Chemical Information and Modeling |
Volume | 52 |
Issue number | 11 |
DOIs | |
Publication status | Published - 28 Oct 2012 |