Abstract
Objectives
The current study aims to determine the effect of physicochemical descriptor selection on models of polydimethylsiloxane permeation.
Methods
2,942 descriptors were calculated for a dataset of 77 chemicals. Data was processed to remove redundancy, single values, imbalanced and highly correlated data, yielding 1,363 relevant descriptors. For four independent test sets feature selection methods were applied and modelled via a variety of Machine Learning methods.
Key findings
Two sets of molecular descriptors which can provide improved predictions, compared to existing models, have been identified. Best permeation predictions were found with Gaussian Process methods. The molecular descriptors describe lipophilicity, partial charge and hydrogen bonding as key determinants of PDMS permeation.
Conclusions
This study highlights important considerations in the development of relevant models and in the construction and use of the datasets used in such studies, particularly that highly correlated descriptors should be removed from datasets. Predictive models are improved by the methodology adopted in this study, notably the systematic evaluation of descriptors, rather than simply using any and all available descriptors, often based empirically on in vitro experiments. Such findings also have clear relevance to a number of other fields
The current study aims to determine the effect of physicochemical descriptor selection on models of polydimethylsiloxane permeation.
Methods
2,942 descriptors were calculated for a dataset of 77 chemicals. Data was processed to remove redundancy, single values, imbalanced and highly correlated data, yielding 1,363 relevant descriptors. For four independent test sets feature selection methods were applied and modelled via a variety of Machine Learning methods.
Key findings
Two sets of molecular descriptors which can provide improved predictions, compared to existing models, have been identified. Best permeation predictions were found with Gaussian Process methods. The molecular descriptors describe lipophilicity, partial charge and hydrogen bonding as key determinants of PDMS permeation.
Conclusions
This study highlights important considerations in the development of relevant models and in the construction and use of the datasets used in such studies, particularly that highly correlated descriptors should be removed from datasets. Predictive models are improved by the methodology adopted in this study, notably the systematic evaluation of descriptors, rather than simply using any and all available descriptors, often based empirically on in vitro experiments. Such findings also have clear relevance to a number of other fields
Original language | English |
---|---|
Pages (from-to) | 873-888 |
Number of pages | 15 |
Journal | Journal of Pharmacy and Pharmacology |
Volume | 72 |
Issue number | 7 |
Early online date | 8 Apr 2020 |
DOIs | |
Publication status | Published - Jul 2020 |