Molecular Descriptors For Cheminformatics Pdf To Excel
Abstract
We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ∼1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.
ARTICLE SECTIONSMolecular Descriptors For Chemoinformatics Pdf To Excel Free
Measured values for molecular hydrogen, methane and ethane served to derive fragment constants for carbon and hydrogen, free of obscuring interactions. For more complex hydrocarbons, whose measured values were not the sum of fragment values, the differences were defined in terms of correction factors (Tab.
Molecules and Solubility
Crystal Structure and Gas-Phase Calculations
Enthalpy of sublimation:
(1)where Ulatt is the lattice energy (energy of the crystal assuming the crystal is static and at 0 K relative to infinitely separated molecules) and the −2RT term arises from lattice vibrational energy.(2, 36)The entropy of sublimation was calculated by:Entropy of sublimation:
(2)where Srot is the rotational entropy in the gas phase and Strans is the entropy of translation in the gas phase. Scrys is the entropy of phonon vibrations within the crystal. The use of eq 3 makes these assumptions: (i) the rotational and translational entropy of the crystal is minimal, (ii) there is no change in electronic entropy between phases, and (iii) the intramolecular entropy is constant between the two phases. The crystal entropy is calculated by locating the frequencies of the phonon normal modes (lattice vibrations) at the gamma point. This is achieved using lattice dynamics, the results of which are used to calculate the Helmholtz free energy (see eqs S2 and S3 in the Supporting Information).Gibbs free energy:
(3)Solution-Phase Calculations
Gibbs free energy of hydration:
(4)where Esolution is the total energy of the system in the SMD solvation model and Egaseous is the total energy of the system in a vacuum. Scheme 1 represents the workflow for making such predictions.Standard States
Theoretical Log S Prediction
Our final solution free-energy prediction is then given as the sum of the predicted sublimation and hydration free energies:Gibbs free energy of solution:
(5)Informatics Models
Data Preprocessing
Machine Learning Regression Models
Partial Least Squares Regression
Random Forest Regression
Support Vector Regression
The main idea in Support Vector Regression (SVR) is to minimize the risk factor based on the structural risk minimization(44) from structure theory, to obtain a good generalization of the limited patterns available in the given data. First, the given data D are mapped onto a higher dimensional feature space, using the kernel functionMolecular Descriptors For Cheminformatics Pdf To Excel Online
k(xi,xj) and then a predictive function is computed on a subset of support vectors. Here, we have used the radial basis kernel function (eq 7) to map the data onto a higher dimensional space. A graphical representation is supplied in the Supporting Information (Figure S1(B)).SVR mapping on radial basis kernel function:
(7)Statistical Measures
10-Fold Cross-Validation
10-Fold Cross-Validation for Parameter Tuning
Assessing the Final Models by 10-Fold Cross-Validation
Dataset
Theoretical Predictions
DMACRYS + SMD(M06-2X) | DMACRYS + SMD(HF) | |
---|---|---|
RMSE (log S units) | 4.045 | 2.946 |
R2 | 0.252 | 0.327 |
Theoretical Energies as Sole Descriptors in Machine Learning
Cheminformatics Descriptors as the Sole Input to Machine Learning
Theoretical Energies and Cheminformatics Descriptors as Input to Machine Learning
Machine Learning Solubility Challenge | raw data ± stdev | scaled by mean/stdev ± stdev | scaled by PCA ± stdev |
---|---|---|---|
PLS | 1.08 ± 0.04 | 1.03 ± 0.02 | 1.15 ± 0.01 |
RF | 0.93 ± 0.01 | 0.93 ± 0.01 | 1.12 ± 0.01 |
SVR | 1.17 ± 0.04 | 0.93 ± 0.02 | 0.95 ± 0.02 |
Solubility Challenge | raw data | scaled by mean/stdev | scaled by PCA |
---|---|---|---|
PLS | 0.89 | 0.91 | 0.91 |
RF | 0.93 | 1.03 | 1.02 |
SVR | 1.08 | 1.07 | 1.08 |
Informatics_Solubilty_datasets_and_scripts.zip, including R codes, Bash scripts, Python scripts, macro (.xlsb), DLS-100.csv and Solubility_Challenge_dataset.xlsx. DLS-100.csv contains experimental log S values, references, SMILES, sources of smiles, CSD refcodes, molecules names, InChI and Chemspider numbers. SI_document.pdf: Structure data, 2D images of the molecular structures, experimental log S values, CSD refcodes, R2, statistical significance, variable importance. This material is available free of charge via the Internet at http://pubs.acs.org. All scripts and datasets used in this work are available for download from the Mitchell Group web server (http://chemistry.st-andrews.ac.uk/staff/jbom/group/Informatics_Solubility.html, as well as in the Supporting Information.
- pdf
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Datasets were curated by J.L.M. and J.B.O.M. Machine learning R scripts were produced by N.N. and L.D.F. The Taverna workflow was produced by L.D.F. Bash scripts, Excel macros, and the R script to run over multiple directories were produced by J.L.McD. DMACRYS and Gaussian calculations were run by J.L.McD. Advice on computational chemistry and machine learning methods was provided by T.v.M. and J.B.O.M. R calculations were run by J.L.McD. and N.N.
The authors declare no competing financial interest.
ARTICLE SECTIONSScottish Universities Life Science Alliance (SULSA), this work was partly supported by Biotechnology and Biological Sciences Research Council (BBSRC) (No. BB/I00596X/1), Scottish Funding Council (SFC). We thank EaStCHEM for access to the EaStCHEM Research Computing Facility, and Dr. Herbert Früchtl for its maintenance. We are grateful to Dr. Graeme Day (University of Southampton) for providing additional scripts for DMACRYS. We thank Dr. David Palmer (University of Strathclyde) for a script to help automate running DMACRYS. We also thank our colleagues at the University of St. Andrews for useful discussions, particularly Dr. Lazaros Mavridis and Rachael Skyner. We thank the BBSRC for Grant No. BB/I00596X/1 to J.B.O.M., which supports L.D.F.’s research. We thank the Scottish Universities Life Sciences Alliance (SULSA) for supporting J.B.O.M., J.L.McD., and N.N., and we also thank the Scottish Overseas Research Student Awards Scheme of the Scottish Funding Council (SFC) for financial support of N.N.’s studentship.
Abbreviations
Molecular Descriptors For Cheminformatics Pdf To Excel Converter
- 1Palmer, D. S.; McDonagh, J. L.; Mitchell, J. B. O.; van Mourik, T.; Fedorov, M. V.First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike MoleculesJ. Chem. Theory Comput.2012, 8, 3322–3337[ACS Full Text ], [CAS], Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtV2gsrnE&md5=fbcfafe07e5f8ccb8acc2414bdb3a021First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike MoleculesPalmer, David S.; McDonagh, James L.; Mitchell, John B. O.; van Mourik, Tanja; Fedorov, Maxim V.Journal of Chemical Theory and Computation (2012), 8 (9), 3322-3337CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)We demonstrate that the intrinsic aq. soly. of cryst. druglike mols. can be estd. with reasonable accuracy from sublimation free energies calcd. using crystal lattice simulations and hydration free energies calcd. using the 3D Ref. Interaction Site Model (3D-RISM) of the Integral Equation Theory of Mol. Liqs. (IET). The solubilities of 25 cryst. druglike mols. taken from different chem. classes are predicted by the model with a correlation coeff. of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log10S units, which is significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against exptl. soly. data, and it offers a full computational characterization of the thermodn. of transfer of the drug mol. from crystal phase to gas phase to dil. aq. soln.
- 2Palmer, D. S.; Llinas, A.; Morao, I.; Day, G. M.; Goodman, J. M.; Glen, R. C.; Mitchell, J. B. O.Predicting intrinsic aqueous solubility by a thermodynamic cycleMol. Pharm.2008, 5 (2) 266–279[ACS Full Text ], [CAS], Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXitleis74%253D&md5=71502207fc58378d9cb7e4cb6ad15fd0Predicting Intrinsic Aqueous Solubility by a Thermodynamic CyclePalmer, David S.; Llinas, Antonio; Morao, Inaki; Day, Graeme M.; Goodman, Jonathan M.; Glen, Robert C.; Mitchell, John B. O.Molecular Pharmaceutics (2008), 5 (2), 266-279CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)The authors report methods to predict the intrinsic aq. soly. of cryst. org. mols. from two different thermodn. cycles. Direct computation of soly., via ab initio calcn. of thermodn. quantities at an affordable level of theory, cannot deliver the required accuracy. Therefore, the authors have turned to a mixt. of direct computation and informatics, using the calcd. thermodn. properties, along with a few other key descriptors, in regression models. The prediction of log intrinsic soly. (referred to mol/L) by a three-variable linear regression equation gave r2 = 0.77 and RMSE = 0.71 for an external test set comprising drug mols. The model includes a calcd. crystal lattice energy which provides a computational method to account for the interactions in the solid state. Probably it is not necessary to know the polymorphic form prior to prediction. Also, the method developed here may be applicable to other solid-state systems such as salts or cocrystals.
- 3Mitchell, J. B. O.Informatics, machine learning and computational medicinal chemistryFuture Med. Chem.2011, 3 (4) 451–67[Crossref], [CAS], Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXktVClu7Y%253D&md5=9347e17c69cf60de76ff62184a3f4393Informatics, machine learning and computational medicinal chemistryFuture Medicinal Chemistry (2011), 3 (4), 451-467CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)A review. This article reviews the use of informatics and computational chem. methods in medicinal chem., with special consideration of how computational techniques can be adapted and extended to obtain more and higher-quality information. Special consideration is given to the computation of protein--ligand binding affinities, to the prediction of off-target bioactivities, bioactivity spectra and computational toxicol., and also to calcg. absorption-, distribution-, metab.- and excretion-relevant properties, such as soly.
- 4Hughes, L. D.; Palmer, D. S.; Nigsch, F.; Mitchell, J. B. O.Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log PJ. Chem. Inf. Model2008, 48 (1) 220–232[ACS Full Text ], [CAS], Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXksV2htg%253D%253D&md5=7fd5639f3443fa70718ad40e9a9f8957Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log PHughes, Laura D.; Palmer, David S.; Nigsch, Florian; Mitchell, John B. O.Journal of Chemical Information and Modeling (2008), 48 (1), 220-232CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)This paper attempts to elucidate differences in QSPR models of aq. soly. (Log S), m.p. (Tm), and octanol-water partition coeff. (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calcd. in the Mol. Operating Environment were the best models. Octanol-water partition coeff. was the easiest property to predict, as indicated by the RMSE of the external test set and the coeff. of detn. (RMSE = 0.73, r2 = 0.87). M.p. prediction, on the other hand, was the most difficult (RMSE = 52.8 °C, r2 = 0.46), and Log S statistics were intermediate between m.p. and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor m.p. prediction, and we suggest that deficiencies in descriptors used in m.p. prediction contribute significantly to the prediction errors.
- 5(a) Tetko, I. V.Computing chemistry on the webDrug Discovery Today2005, 10 (22) 1497–1500[Crossref], [PubMed], [CAS], Google Scholar5ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2MrnvVOgsw%253D%253D&md5=df73185749c297067e22b2e34629f260Tetko Igor VDrug discovery today (2005), 10 (22), 1497-500 ISSN:1359-6446.The development of on-line software tools is changing the way we traditionally perform our analysis in drug design, but will chemoinformatics be forever behind bioinformatics in this development?(b) Tetko, I. V.; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin, V. A.; Radchenko, E. V.; Zefirov, N. S.; Makarenko, A. S.; Tanchuk, V. Y.; Prokopenko, V. V.Virtual computational chemistry laboratory—Design and descriptionJ. Comput. Aid. Mol. Des2005, 19, 453–63[Crossref], [PubMed], [CAS], Google Scholar5bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtFaht77F&md5=6e48f916c58c1e772ade43fa8e4b4b1aVirtual computational chemistry laboratory - design and descriptionTetko, Igor V.; Gasteiger, Johann; Todeschini, Roberto; Mauri, Andrea; Livingstone, David; Ertl, Peter; Palyulin, Vladimir A.; Radchenko, Eugene V.; Zefirov, Nikolay S.; Makarenko, Alexander S.; Tanchuk, Vsevolod Yu.; Prokopenko, Volodymyr V.Journal of Computer-Aided Molecular Design (2005), 19 (6), 453-463CODEN: JCADEQ; ISSN:0920-654X. (Springer)Internet technol. offers an excellent opportunity for the development of tools by the cooperative effort of various groups and institutions. We have developed a multi-platform software system, Virtual Computational Chem. Lab., http://www.vcclab.org, allowing the computational chemist to perform a comprehensive series of mol. indexes/properties calcns. and data anal. The implemented software is based on a three-tier architecture that is one of the std. technologies to provide client-server services on the Internet. The developed software includes several popular programs, including the indexes generation program, DRAGON, a 3D structure generator, CORINA, a program to predict lipophilicity and aq. soly. of chems., ALOGPS and others. All these programs are running at the host institutes located in five countries over Europe. In this article we review the main features and statistics of the developed system that can be used as a prototype for academic and industry models.
- 6Price, S. L.; Leslie, M.; Welch, G. W. A.; Habgood, M.; Price, L. S.; Karamertzanis, P. G.; Day, G. M.Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentialsPhys. Chem. Chem. Phys.2010, 12 (30) 8478–8490[Crossref], [PubMed], [CAS], Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXptFOkt7w%253D&md5=098e4b7761cc1d0267402a3d64f214a5Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentialsPrice, Sarah L.; Leslie, Maurice; Welch, Gareth W. A.; Habgood, Matthew; Price, Louise S.; Karamertzanis, Panagiotis G.; Day, Graeme M.Physical Chemistry Chemical Physics (2010), 12 (30), 8478-8490CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)Crystal structure prediction for org. mols. requires both the fast assessment of thousands to millions of crystal structures and the greatest possible accuracy in their relative energies. We describe a crystal lattice simulation program, DMACRYS, emphasizing the features that make it suitable for use in crystal structure prediction for pharmaceutical mols. using accurate anisotropic atom-atom model intermol. potentials based on the theory of intermol. forces. DMACRYS can optimize the lattice energy of a crystal, calc. the second deriv. properties, and reduce the symmetry of the space group to move away from a transition state. The calcd. terahertz frequency k = 0 rigid-body lattice modes and elastic tensor can be used to est. free energies. The program uses a distributed multipole electrostatic model (Qat, t = 00,..,44s) for the electrostatic fields, and can use anisotropic atom-atom repulsion models, damped isotropic dispersion up to R-10, as well as a range of empirically fitted isotropic exp-6 atom-atom models with different definitions of at. types. A new feature is that an accurate model for the induction energy contribution to the lattice energy has been implemented that uses at. anisotropic dipole polarizability models (αat, t = (10,10)..(11c,11s)) to evaluate the changes in the mol. charge d. induced by the electrostatic field within the crystal. It is demonstrated, using the four polymorphs of the pharmaceutical carbamazepine C15H12N2O, that while reproducing crystal structures is relatively easy, calcg. the polymorphic energy differences to the accuracy of a few kJ mol-1 required for applications is very demanding of assumptions made in the modeling. Thus DMACRYS enables the comparison of both known and hypothetical crystal structures as an aid to the development of pharmaceuticals and other specialty org. materials, and provides a tool to develop the modeling of the intermol. forces involved in mol. recognition processes.
- 7Segall, M. D.; Lindan, P. J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.First-principles simulation: Ideas, illustrations and the CASTEP codeJ. Phys. Condens. Matter2002, 14 (11) 2717–2744[Crossref], [CAS], Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XivFGrs7c%253D&md5=fc155abe0df3e9ec12d832be5b5aa84eFirst-principles simulation: ideas, illustrations and the CASTEP codeSegall, M. D.; Lindan, Philip J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.Journal of Physics: Condensed Matter (2002), 14 (11), 2717-2744CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)A review. First-principles simulation, meaning d.-functional theory calcns. with plane waves and pseudopotentials, has become a prized technique in condensed-matter theory. Here I look at the basics of the subject, give a brief review of the theory, examg. the strengths and weaknesses of its implementation, and illustrating some of the ways simulators approach problems through a small case study. I also discuss why and how modern software design methods have been used in writing a completely new modular version of the CASTEP code.
- 8Dovesi, R.; Orlando, R.; Civalleri, B.; Roetti, C.; Saunders, V. R.; Zicovich-Wilson, C. M.CRYSTAL: a computational tool for the ab initio study of the electronic properties of crystalsZ Kristallogr.2005, 220 (5-2005–6-2005) 571–573[CAS], Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVSitbY%253D&md5=7bf7c582dd3196c28e16c4fb24ac9fb7CRYSTAL: A computational tool for the ab initio study of the electronic properties of crystalsDovesi, Roberto; Orlando, Roberto; Civalleri, Bartolomeo; Roetti, Carla; Saunders, Victor R.; Zicovich-Wilson, Claudio M.Zeitschrift fuer Kristallographie (2005), 220 (5-6), 571-573CODEN: ZEKRDZ; ISSN:0044-2968. (Oldenbourg Wissenschaftsverlag GmbH)CRYSTAL computes the electronic structure and properties of periodic systems (crystals, surfaces, polymers) within Hartree-Fock, D. Functional and various hybrid approxns. CRYSTAL was developed during nearly 30 years (since 1976) by researchers of the Theor. Chem. Group in Torino (Italy), and the Computational Materials Science group in CLRC (Daresbury, UK), with important contributions from visiting researchers, as documented by the main authors list and the bibliog. The basic features of the program CRYSTAL are presented, with two examples of application in the field of crystallog.
- 9(a) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 2. Free energy of solvation in pure meltsJ. Phys. Chem. B2007, 111 (7) 1883–1892[ACS Full Text ], [CAS], Google Scholar9ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7gtFSrsQ%253D%253D&md5=4a28537a0bde7b9df457d5dde0f2de8aIn silico prediction of drug solubility: 2. Free energy of solvation in pure meltsLuder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander RolandThe journal of physical chemistry. B (2007), 111 (7), 1883-92 ISSN:1520-6106.The solubility of drugs in water is investigated in a series of papers and in the current work. The free energy of solvation, DeltaG*(vl), of a drug molecule in its pure drug melt at 673.15 K (400 degrees C) has been obtained for 46 drug molecules using the free energy perturbation method. The simulations were performed in two steps where first the Coulomb and then the Lennard-Jones interactions were scaled down from full to no interaction. The results have been interpreted using a theory assuming that DeltaG*(vl) = DeltaG(cav) + E(LJ) + E(C)/2 where the free energy of cavity formation, DeltaG(cav), in these pure drug systems was obtained using hard body theories, and E(LJ) and E(C) are the Lennard-Jones and Coulomb interaction energies, respectively, of one molecule with the other ones. Since the main parameter in hard body theories is the volume fraction, an equation of state approach was used to estimate the molecular volume. Promising results were obtained using a theory for hard oblates, in which the oblate axial ratio was calculated from the molecular surface area and volume obtained from simulations. The Coulomb term, E(C)/2, is half of the Coulomb energy in accord with linear response, which showed good agreement with our simulation results. In comparison with our previous results on free energy of hydration, the Coulomb interactions in pure drug systems are weaker, and the van der Waals interactions play a more important role.(b) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matterJ. Phys. Chem. B2007, 111 (25) 7303–7311[ACS Full Text ], [CAS], Google Scholar9bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2szms1Chtw%253D%253D&md5=e09146655f4d28797e7f840919ff30b2In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matterLuder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander RolandThe journal of physical chemistry. B (2007), 111 (25), 7303-11 ISSN:1520-6106.The solubility of drugs in water is investigated in a series of papers. In this work, we address the process of bringing a drug molecule from the vapor into a pure drug amorphous phase. This step enables us to actually calculate the solubility of amorphous drugs in water. In our general approach, we, on one hand, perform rigorous free energy simulations using a combination of the free energy perturbation and thermodynamic integration methods. On the other hand, we develop an approximate theory containing parameters that are easily accessible from conventional Monte Carlo simulations, thereby reducing the computation time significantly. In the theory for solvation, we assume that DeltaG* = DeltaGcav + ELJ + EC/2, where the free energy of cavity formation, DeltaGcav, in pure drug systems is obtained using a theory for hard-oblate spheroids, and ELJ and EC are the Lennard-Jones and Coulomb interaction energies between the chosen molecule and the others in the fluid. The theoretical predictions for the free energy of solvation in pure amorphous matter are in good agreement with free energy simulation data for 46 different drug molecules. These results together with our previous studies support our theoretical approach. By using our previous data for the free energy of hydration, we compute the total free energy change of bringing a molecule from the amorphous phase into water. We obtain good agreement between the theory and simulations. It should be noted that to obtain accurate results for the total process, high precision data are needed for the individual subprocesses. Finally, for eight different substances, we compare the experimental amorphous and crystalline solubility in water with the results obtained by the proposed theory with reasonable success.(c) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Persson, R.; Pedersen, M.In Silico Prediction of Drug Solubility: 4. Will Simple Potentials Suffice?J. Comput. Chem.2009, 30 (12) 1859–1871[Crossref], [PubMed], [CAS], Google Scholar9chttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1MvltlGhtQ%253D%253D&md5=34e04b040a9ae704fd5ea7b969d7e5b4In silico prediction of drug solubility: 4. Will simple potentials suffice?Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Persson Rasmus; Pedersen MikaelaJournal of computational chemistry (2009), 30 (12), 1859-71 ISSN:.In view of the extreme importance of reliable computational prediction of aqueous drug solubility, we have established a Monte Carlo simulation procedure which appears, in principle, to yield reliable solubilities even for complex drug molecules. A theory based on judicious application of linear response and mean field approximations has been found to reproduce the computationally demanding free energy determinations by simulation while at the same time offering mechanistic insight. The focus here is on the suitability of the model of both drug and solvent, i.e., the force fields. The optimized potentials for liquid simulations all atom (OPLS-AA) force field, either intact or combined with partial charges determined either by semiempirical AM1/CM1A calculations or taken from the condensed-phase optimized molecular potentials for atomistic simulation studies (COMPASS) force field has been used. The results illustrate the crucial role of the force field in determining drug solubilities. The errors in interaction energies obtained by the simple force fields tested here are still found to be too large for our purpose but if a component of this error is systematic and readily removed by empirical adjustment the results are significantly improved. In fact, consistent use of the OPLS-AA Lennard-Jones force field parameters with partial charges from the COMPASS force field will in this way produce good predictions of amorphous drug solubility within 1 day on a standard desktop PC. This is shown here by the results of extensive new simulations for a total of 47 drug molecules which were also improved by increasing the water box in the hydration simulations from 500 to 2000 water molecules.(d) Westergren, J.; Lindfors, L.; Hoglund, T.; Luder, K.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 1. Free energy of hydrationJ. Phys. Chem. B2007, 111 (7) 1872–1882[ACS Full Text ], [CAS], Google Scholar9dhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtVynsLc%253D&md5=c77f8ac363fc84656673504c964a0b50In Silico Prediction of Drug Solubility: 1. Free Energy of HydrationWestergren, Jan; Lindfors, Lennart; Hoeglund, Tobias; Lueder, Kai; Nordholm, Sture; Kjellander, RolandJournal of Physical Chemistry B (2007), 111 (7), 1872-1882CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)As a first step in the computational prediction of drug soly. the free energy of hydration, ΔGvw•, in TIP4P water has been computed for a data set of 48 drug mols. using the free energy of perturbation method and the optimized potential for liq. simulations all-atom force field. The simulations were performed in two steps, where first the Coulomb and then the Lennard-Jones interactions between the solute and the water mols. were scaled down from full to zero strength to provide phys. understanding and simpler predictive models. The results have been interpreted using a theory assuming ΔGvw• = AMSγ + ELJ + EC/2 where AMS is the mol. surface area, γ is the water-vapor surface tension, and ELJ and EC are the solute-water Lennard-Jones and Coulomb interaction energies, resp. It was found that by a proper definition of the mol. surface area our results as well as several results from the literature were found to be in quant. agreement using the macroscopic surface tension of TIP4P water. This is in contrast to the surface tension for water around a spherical cavity that previously has been shown to be dependent on the size of the cavity up to a radius of ∼1 nm. The step of scaling down the electrostatic interaction can be represented by linear response theory.
- 10Tomasi, J.; Mennucci, B.; Cammi, R.Quantum Mechanical Continuum Solvation ModelsChem. Rev.2005, 105 (8) 2999–3094[ACS Full Text ], [CAS], Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVynurc%253D&md5=462420dd18b3006ee63d1298b66db247Tomasi, Jacopo; Mennucci, Benedetta; Cammi, RobertoChemical Reviews (Washington, DC, United States) (2005), 105 (8), 2999-3093CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)
- 11(a) Ten-no, S.Free energy of solvation for the reference interaction site model: Critical comparison of expressionsJ. Phys. Chem.2001, 115 (8) 3724–3731[Crossref], [CAS], Google Scholar11ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlvVehsbo%253D&md5=42db0b786f7a20ff6ddecb54d0f40dbcFree energy of solvation for the reference interaction site model: Critical comparison of expressionsJournal of Chemical Physics (2001), 115 (8), 3724-3731CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We investigate expressions of excess chem. potential in the ref. interaction site model (RISM) integral equation theory. In addn. to the previous expressions from the Gaussian d. fluctuation theory and from the extended RISM (XRISM) theory, we examine a new free energy functional from the distributed partial wave expansion of mol. correlation functions, using the embedded site model and alcs. with different parameter sets. The results clearly show that the free energy of solvation in the XRISM theory includes a serious error, which is related to the no. of interaction sites and the geometry of a solute mol.(b) Palmer, D. S.; Frolov, A. I.; Ratkova, E. L.; Fedorov, M. V.Towards a universal method for calculating hydration free energies: A 3D reference interaction site model with partial molar volume correctionJ. Phys.: Condens. Matter2010, 22 (49) 492101[Crossref], [PubMed], [CAS], Google Scholar11bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjs1ahug%253D%253D&md5=6b660f5ec50212304923b5c4f6ed75d7Towards a universal method for calculating hydration free energies: a 3D reference interaction site model with partial molar volume correctionPalmer, David S.; Frolov, Andrey I.; Ratkova, Ekaterina L.; Fedorov, Maxim V.Journal of Physics: Condensed Matter (2010), 22 (49), 492101/1-492101/9CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)We report a simple universal method to systematically improve the accuracy of hydration free energies calcd. using an integral equation theory of mol. liqs., the 3D ref. interaction site model. A strong linear correlation is obsd. between the difference of the exptl. and (uncorrected) calcd. hydration free energies and the calcd. partial molar volume for a data set of 185 neutral org. mols. from different chem. classes. By using the partial molar volume as a linear empirical correction to the calcd. hydration free energy, we obtain predictions of hydration free energies in excellent agreement with expt. (R = 0.94, σ = 0.99 kcal mol-1 for a test set of 120 org. mols.).
- 12Stanton, R. V.; Hartsough, D. S.; Merz, K. M.Calculation of solvation free energies using a density functional/molecular dynamics coupled potentialJ. Phys. Chem.1993, 97 (46) 11868–11870[ACS Full Text ], [CAS], Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXmsFOns70%253D&md5=79256a29182e1a36bbaf822f0ea85b73Calculation of solvation free energies using a density functional/molecular dynamics coupled potentialStanton, Robert V.; Hartsough, David S.; Merz, Kenneth M., Jr.Journal of Physical Chemistry (1993), 97 (46), 11868-70CODEN: JPCHAX; ISSN:0022-3654.Recently there was much interest in the development of methods which couple quantum mech. and mol. mech. computational models. The authors report the 1st coupling of a d. functional Hamiltonian with a mol. mech. method. The AMBER force field was coupled with a d. functional Hamiltonian as implemented in the deMon program. Test calcns. of solvation energies were carried out for a small group of ions. The coupled potential method slightly underestimates the solvation energy of the chloride ion while it overestimates the solvation energy of the other ions studied. Nonetheless, this method allows to study condensed-phase systems at a level of accuracy currently not available.
- 13Ratkova, E. L.; Fedorov, M. V.Combination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic PollutantsJ. Chem. Theory Comput.2011, 7 (5) 1450–1457[ACS Full Text ], [CAS], Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXltFGmsbk%253D&md5=73f9a2a58266597d0f7661e84b4a43ddCombination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic PollutantsJournal of Chemical Theory and Computation (2011), 7 (5), 1450-1457CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The authors discuss a new method for predicting the hydration free energy (HFE) of org. pollutants and illustrate the efficiency of the method on a set of 220 chlorinated arom. hydrocarbons. The new model is computationally inexpensive, with one HFE calcn. taking less than a minute on a PC. The method is based on a combination of a mol. integral equations theory, one-dimensional ref. interaction site model (1D RISM), with the cheminformatics approach. The authors correct HFEs obtained by the 1D RISM with a set of empirical corrections. The corrections are assocd. with the partial molar volume and structural descriptors of the mols. The introduced corrections can significantly improve the quality of the 1D RISM HFE predictions obtained by the partial wave free energy expression and the Kovalenko-Hirata closure. The quality of the model can be further improved by the reparametrization using QM-derived partial charges instead of the originally used OPLS-AA partial charges. The final model gives good results for polychlorinated benzenes (the mean and std. deviation of the error are 0.02 and 0.36 kcal/mol, correspondingly). At the same time, the model gives somewhat worse results for polychlorobiphenyls (PCBs) with a systematic bias of -0.72 kcal/mol but a small std. deviation equal to 0.55 kcal/mol. The error remains the same for the whole set of PCBs, whereas errors of HFEs predicted with continuum solvation models increase significantly for higher chlorinated PCB congeners. The authors discuss potential future applications of the model and several avenues for its further improvement.
- 14Allen, F. H.The Cambridge Structural Database: a quarter of a million crystal structures and risingActa Crystallogr B2002, B58, 380–388[Crossref], [CAS], Google Scholar14https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu74%253D&md5=406cd0df6ea9035a0ebf8dd9eccbd1f8The Cambridge Structural Database: a quarter of a million crystal structures and risingActa Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 380-388CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)The Cambridge Structural Database (CSD) now contains data for more than a quarter of a million small-mol. crystal structures. The information content of the CSD, together with methods for data acquisition, processing and validation, are summarized, with particular emphasis on the chem. information added by CSD editors. Nearly 80% of new structural data arrives electronically, mostly in CIF format, and the CCDC acts as the official crystal structure data depository for 51 major journals. The CCDC now maintains both a CIF archive (more than 73000 CIFs dating from 1996), as well as the distributed binary CSD archive; the availability of data in both archives is discussed. A statistical survey of the CSD is also presented and projections concerning future accession rates indicate that the CSD will contain at least 500000 crystal structures by the year 2010.
- 15Box, K.; Comer, J. E.; Gravestock, T.; Stuart, M.New Ideas about the Solubility of DrugsChem. Biodiversity2009, 6 (11) 1767–1788[Crossref], [PubMed], [CAS], Google Scholar15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhsFSrtLrE&md5=57fc3d9b4f1490ded75d71d1cfe20c58Box, Karl; Comer, John E.; Gravestock, Tom; Stuart, MartinChemistry & Biodiversity (2009), 6 (11), 1767-1788CODEN: CBHIAM; ISSN:1612-1872. (Verlag Helvetica Chimica Acta)Methods are described for detecting pptn. of ionizable drugs under conditions of changing pH, estg. kinetic soly. from the onset of pptn., and measuring soly. by chasing equil. Definitions are presented for kinetic, equil., and intrinsic soly. of ionizable drugs, supersatn. and subsatn., and for chasers and non-chasers, which are 2 classes of ionizable drug with significantly different soly. properties. The use of Bjerrum Curves and Neutral-Species Concn. Profiles to depict soly. properties are described and illustrated with case studies showing super-dissolving behavior, conversion between cryst. forms and enhancement of soly. through supersatn., and the use of additives and simulated gastrointestinal fluids.
- 16(a) Hopfinger, A. J.; Esposito, E. X.; Llinàs, A.; Glen, R. C.; Goodman, J. M.Findings of the Challenge To Predict Aqueous SolubilityJ. Chem. Inf. Model.2008, 49 (1) 1–5[ACS Full Text ], Google ScholarThere is no corresponding record for this reference.(b) Llinàs, A.; Glen, R. C.; Goodman, J. M.Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements?J. Chem. Inf. Model.2008, 48 (7) 1289–1303[ACS Full Text ], [CAS], Google Scholar16bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXosV2itb4%253D&md5=6a8950fc1c51ad9a51731c65e6debc22Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements?Llinas, Antonio; Glen, Robert C.; Goodman, Jonathan M.Journal of Chemical Information and Modeling (2008), 48 (7), 1289-1303CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Soly. is a key physicochem. property of mols. Serious deficiencies exist in the consistency and reliability of soly. data in the literature. The accurate prediction of soly. would be very useful. However, systematic errors and lack of metadata assocd. with measurements greatly reduce the confidence in current models. To address this, we are accurately measuring intrinsic soly. values, and here we report results for a diverse set of 100 druglike mols. at 25° and an ionic strength of 0.15 M using the CheqSol approach. This is a highly reproducible potentiometric technique that ensures the thermodn. equil. is reached rapidly. Results with a coeff. of variation higher than 4% were rejected. In addn., the Potentiometric Cycling for Polymorph Creation method, [PC]2, was used to obtain multiple polymorph forms from aq. soln. We now challenge researchers to predict the intrinsic soly. of 32 other druglike mols. that have been measured but are yet to be published.
- 17The Goodman group. http://www-jmg.ch.cam.ac.uk/data/solubility/ (accessed Feb. 8,2013) .Google ScholarThere is no corresponding record for this reference.
- 18Narasimham, L.; Barhate, V. D.Kinetic and intrinsic solubility determination of some β-blockers and antidiabetics by potentiometryJ. Pharm. Res.2011, 4 (2) 532–536Google ScholarThere is no corresponding record for this reference.
- 19(a) Bergström, C. A. S.; Luthman, K.; Artursson, P.Accuracy of calculated pH-dependent aqueous drug solubilityEur. J. Pharm. Sci.2004, 22 (5) 387–398[Crossref], [PubMed], [CAS], Google Scholar19ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXlsl2nsbc%253D&md5=dc25a69e233afd97c1ae1cb490f9de0dAccuracy of calculated pH-dependent aqueous drug solubilityBergstrom, Christel A. S.; Luthman, Kristina; Artursson, PerEuropean Journal of Pharmaceutical Sciences (2004), 22 (5), 387-398CODEN: EPSCED; ISSN:0928-0987. (Elsevier B.V.)The aim of the present study was to investigate the extent to which the Henderson-Hasselbalch (HH) relationship can be used to predict the pH-dependent aq. soly. of cationic drugs. The pH-dependent soly. for 25 amines, carrying a single pos. charge, was detd. with a small-scale shake flask method. Each sample was prepd. as a suspension in 150 mM phosphate buffer. The pH-dependent soly. curves were obtained using at least 10 different pH values. The intrinsic soly., the soly. at the pKa and the soly. at pH values reflecting the pH of the bulk and acid microclimate in the human small intestine (pH 7.4 and 6.5, resp.) were detd. for all compds. The exptl. study revealed a large diversity in slope, from -0.5 (celiprolol) to -8.6 (hydralazine) in the linear pH-dependent soly. interval, which is in sharp contrast to the slope of -1 assumed by the HH equation. In addn., a large variation in the range of soly. between the completely uncharged and completely charged drug species was obsd. The range for disopyramide was only 1.1 log units, whereas that for amiodarone was greater than 6.3 log units, pointing at the compd. specific response to counter-ion effects. In conclusion, the investigated cationic drugs displayed compd. specific pH-dependent soly. profiles, indicating that the HH equation in many cases will only give rough estns. of the pH-dependent soly. of drugs in divalent buffer systems.(b) Bergström, C. A. S.; Wassvik, C. M.; Norinder, U.; Luthman, K.; Artursson, P.Global and Local Computational Models for Aqueous Solubility Prediction of Drug-Like MoleculesJ. Chem. Inf. Comput. Sci.2004, 44 (4) 1477–1488[ACS Full Text ], [CAS], Google Scholar19bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2cvgtVSkuw%253D%253D&md5=4945232f11abd9bf5b0510226b82198dGlobal and local computational models for aqueous solubility prediction of drug-like moleculesBergstrom Christel A S; Wassvik Carola M; Norinder Ulf; Luthman Kristina; Artursson PerJournal of chemical information and computer sciences (2004), 44 (4), 1477-88 ISSN:0095-2338.The aim of this study was to develop in silico protocols for the prediction of aqueous drug solubility. For this purpose, high quality solubility data of 85 drug-like compounds covering the total drug-like space as identified with the ChemGPS methodology were used. Two-dimensional molecular descriptors describing electron distribution, lipophilicity, flexibility, and size were calculated by Molconn-Z and Selma. Global minimum energy conformers were obtained by Monte Carlo simulations in MacroModel and three-dimensional descriptors of molecular surface area properties were calculated by Marea. PLS models were obtained by use of training and test sets. Both a global drug solubility model (R(2) = 0.80, RMSE(te) = 0.83) and subset specific models (after dividing the 85 compounds into acids, bases, ampholytes, and nonproteolytes) were generated. Furthermore, the final models were successful in predicting the solubility values of external test sets taken from the literature. The results showed that homologous series and subsets can be predicted with high accuracy from easily comprehensible models, whereas consensus modeling might be needed to predict the aqueous drug solubility of datasets with large structural diversity.(c) Ran, Y.; Yalkowsky, S. H.Prediction of Drug Solubility by the General Solubility Equation (GSE)J. Chem. Inf. Comput. Sci.2001, 41 (2) 354–357[ACS Full Text ], [CAS], Google Scholar19chttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXislKntA%253D%253D&md5=c15ffefdf6aacf5fdafd95e5adad7017Prediction of Drug Solubility by the General Solubility Equation (GSE)Journal of Chemical Information and Computer Sciences (2001), 41 (2), 354-357CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)The revised GSE proposed by Jain and Yalkowsky is used to est. the aq. soly. of a set of org. nonelectrolytes studied by Jorgensen and Duffy. The only inputs used in the GSE are the Celsius m.p. (MP) and the octanol water partition coeff. (Kow). These are generally known, easily measured, or easily calcd. The GSE does not utilize any fitted parameters. The av. abs. error for the 150 compds. is 0.43 compared to 0.56 with Jorgensen and Duffy's computational method, which utilizes 5 fitted parameters. Thus, the revised GSE is simpler and provides a more accurate estn. of aq. soly. of the same set of org. compds. It is also more accurate than the original version of the GSE.(d) Rytting, E.; Lentz, K.; Chen, X.-Q.; Qian, F.; Venkatesh, S.Aqueous and cosolvent solubility data for drug-like organic compoundsAAPS J.2005, 7 (1) E78–E105[Crossref], [PubMed], [CAS], Google Scholar19dhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXltlOqtr0%253D&md5=9be1e57d033ed321abbfbee0db943619Aqueous and cosolvent solubility data for drug-like organic compoundsRytting, Erik; Lentz, Kimberley A.; Chen, Xue-Qing; Qian, Feng; Venkatesh, SriniAAPS Journal (2005), 7 (1), E78-E105CODEN: AJAOB6; ISSN:1550-7416. (American Association of Pharmaceutical Scientists)A review. Recently 2 QSPR-based in silico models were developed in the authors' labs. to predict the aq. and non-aq. soly. of drug-like org. compds. For the intrinsic aq. soly. model, a set of 321 structurally diverse drugs was collected from literature for the anal. For the PEG 400 cosolvent model, exptl. data for 122 drugs were obtained by a uniform exptl. procedure at 4 vol. fractions of PEG 400 in water, 0%, 25%, 50%, and 75%. The drugs used in both models represent a wide range of compds., with log P values from -5 to 7.5, and mol. wts. from 100 to >600 g/mol. Because of the standardized procedure used to collect the cosolvent data and the careful assessment of quality used in obtaining literature data, both data sets have potential value for the scientific community for use in building various models that require exptl. soly. data.(e) Shareef, A.; Angove, M. J.; Wells, J. D.; Johnson, B. B.Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol AJ. Chem. Eng. Data2006, 51 (3) 879–881[ACS Full Text ], [CAS], Google Scholar19ehttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XjtFKisbk%253D&md5=e6eddfd0865cc5f1e0a2da81a92873e9Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol AShareef, Ali; Angove, Michael J.; Wells, John D.; Johnson, Bruce B.Journal of Chemical & Engineering Data (2006), 51 (3), 879-881CODEN: JCEAAX; ISSN:0021-9568. (American Chemical Society)The solubilities of three estrogenic hormones-estrone, 17β-estradiol, and 17α-ethynylestradiol - and the industrial pollutant bisphenol A were measured in water, dil. acid and alkali (pH 4 and 10, resp.), and aq. KNO3 (0.01 mol/L-1 and 0.1 mol/L-1). The concns. of satd. solns., after equilibration at (25.0 ± 0.5)° with excess solid for 4 days, were detd. by HPLC. Six replicate results were obtained for each solute-solvent pair and the coeff. of variation was in most cases <5%. The solubilities in pure water with std. deviations were estrone (1.30 ± 0.08) mg/L-1, 17β-estradiol (1.51 ± 0.04) mg/L-1, 17α-ethynylestradiol (9.20 ± 0.09) mg/L-1, and bisphenol A (300 ± 5) mg/L-1. The soly. of each of the hormones was unchanged between pH 4 and pH 7 but was greater at pH 10. At pH 7, the hormones became progressively less sol. as the ionic strength increased from 0.0 to 0.1 mol/L-1. By contrast the soly. of bisphenol A was essentially the same under all of the exptl. conditions tested.
- 20CrystalWeb unfortunately withdrawn in2013. http://cds.dl.ac.uk/cds/datasets/crys/cweb/cweb.html.Google ScholarThere is no corresponding record for this reference.
- 21Bruno, I. J.; Cole, J. C.; Edgington, P. R.; Kessler, M.; Macrae, C. F.; McCabe, P.; Pearson, J.; Taylor, R.New software for searching the Cambridge Structural Database and visualizing crystal structuresActa Crystallogr., Sect. B: Struct. Sci.2002, 58 (3 Part 1) 389–397[Crossref], [PubMed], [CAS], Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu78%253D&md5=b8cd5dddcd43067010fef6d60e37b3c2New software for searching the Cambridge Structural Database and visualizing crystal structuresBruno, Ian J.; Cole, Jason C.; Edgington, Paul R.; Kessler, Magnus; Macrae, Clare F.; McCabe, Patrick; Pearson, Jonathan; Taylor, RobinActa Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 389-397CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)Two new programs were developed for searching the Cambridge Structural Database (CSD) and visualizing database entries: ConQuest and Mercury. The former is a new search interface to the CSD, the latter is a high-performance crystal-structure visualizer with extensive facilities for exploring networks of intermol. contacts. Particular emphasis was placed on making the programs as intuitive as possible. Both ConQuest and Mercury run under Windows and various types of Unix, including Linux.
- 22Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E.The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and BioinformaticsJ. Chem. Inf. Comput. Sci.2003, 43 (2) 493–500[ACS Full Text ], [CAS], Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXhtVaktbg%253D&md5=afc8fd10783af301c73a8183727230bfThe Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and BioinformaticsSteinbeck, Christoph; Han, Yongquan; Kuhn, Stefan; Horlacher, Oliver; Luttmann, Edgar; Willighagen, EgonJournal of Chemical Information and Computer Sciences (2003), 43 (2), 493-500CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)The Chem. Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in mol. informatics, including 2D and 3D rendering of chem. structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
- 23Gupta, R. R.; Gifford, E. M.; Liston, T.; Waller, C. L.; Hohman, M.; Bunin, B. A.; Ekins, S.Using Open Source Computational Tools for Predicting Human Metabolic Stability and Additional Absorption, Distribution, Metabolism, Excretion, and Toxicity PropertiesDrug Metab. Dispos.2010, 38 (11) 2083–2090[Crossref], [PubMed], [CAS], Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVWqurjN&md5=7366b0c99868668e5b95f4e60093814fUsing open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity propertiesGupta, Rishi R.; Gifford, Eric M.; Liston, Ted; Waller, Chris L.; Hohman, Moses; Bunin, Barry A.; Ekins, SeanDrug Metabolism and Disposition (2010), 38 (11), 2083-2090CODEN: DMDSAI; ISSN:0090-9556. (American Society for Pharmacology and Experimental Therapeutics)Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source mol. descriptors [e.g., chem. development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary com. software. We initially evaluated open source descriptors and model building algorithms using a training set of approx. 50,000 mols. and a test set of approx. 25,000 mols. with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and pos. predicted value (PPV) = 0.64], equiv. to those of models built with com. Mol. Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 mols. and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we obsd. a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of com. software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.
- 24O’Boyle, N.Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChIJ. Cheminform.2012, 4 (1) 22[Crossref], [CAS], Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38botVKjsw%253D%253D&md5=c9107b5c0392711cee66979cfa7356c5Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChIJournal of cheminformatics (2012), 4 (1), 22 ISSN:.UNLABELLED: BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain - such as the development of a standard aromatic model for SMILES - the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.
- 25RSC ChemSpider. (accessed Feb. 8,2013) .Google ScholarThere is no corresponding record for this reference.
- 26Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; Soiland-Reyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M. P.; Sufi, S.; Goble, C.The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloudNucleic Acids Res.2013, 41 (W1) W557–W561[Crossref], [PubMed], Google ScholarThere is no corresponding record for this reference.
- 27Little, J. L.; Williams, A. J.; Pshenichnov, A.; Tkachenko, V.Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderJ. Am. Soc. Mass. Spectrom.2012, 23 (1) 179–185[Crossref], [PubMed], [CAS], Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XltVaitbw%253D&md5=4fd410551631762b48d179362a82a971Identification of 'known unknowns' utilizing accurate mass data and ChemSpiderLittle, James L.; Williams, Antony J.; Pshenichnov, Alexey; Tkachenko, ValeryJournal of the American Society for Mass Spectrometry (2012), 23 (1), 179-185CODEN: JAMSEF; ISSN:1044-0305. (Springer)In many cases, an unknown to an investigator is actually known in the chem. literature, a ref. database, or an internet resource. We refer to these types of compds. as 'known unknowns.'. ChemSpider is a very valuable internet database of known compds. useful in the identification of these types of compds. in com., environmental, forensic, and natural product samples. The database contains over 26 million entries from hundreds of data sources and is provided as a free resource to the community. Accurate mass mass spectrometry data is used to query the database by either elemental compn. or a monoisotopic mass. Searching by elemental compn. is the preferred approach. However, it is often difficult to det. a unique elemental compn. for compds. with mol. wts. greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results are refined by sorting the no. of refs. assocd. with each compd. in descending order. This raises the most useful candidates to the top of the list for further evaluation. These approaches were shown to be successful in identifying 'known unknowns' noted in our lab. and for compds. of interest to others.
- 28Goble, C. A.; Bhagat, J.; Aleksejevs, S.; Cruickshank, D.; Michaelides, D.; Newman, D.; Borkum, M.; Bechhofer, S.; Roos, M.; Li, P.; De Roure, D.myExperiment: A repository and social network for the sharing of bioinformatics workflowsNucleic Acids Res.2010, 38 (suppl 2) W677–W682[Crossref], [PubMed], [CAS], Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXotVWjsrs%253D&md5=201899f12a0151252eaebc638813171bmyExperiment: a repository and social network for the sharing of bioinformatics workflowsGoble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, DavidNucleic Acids Research (2010), 38 (Web Server), W677-W682CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)MyExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and anal., to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to [email protected]
- 29De Ferrari, L.Workflow Entry: From molecule name to SMILE and InchI using ChemSpider. http://www.myexperiment.org/workflows/3603.html. (accessed 10th February2014) .Google ScholarThere is no corresponding record for this reference.
- 30Griseofulvin. http://en.wikipedia.org/wiki/Griseofulvin (accessed 11th December 2012. SMILES source).Google ScholarThere is no corresponding record for this reference.
- 31Glipizide. http://en.wikipedia.org/wiki/Glipizide (accessed 11th December 2012. SMILES source).Google ScholarThere is no corresponding record for this reference.
- 32Stone, A.Distributed Multipole Analysis of Gaussian wavefunctions GDMA version 2.2.02. http://www-stone.ch.cam.ac.uk/documentation/gdma/manual.pdf (accessed Feb. 10, 2014).Google ScholarThere is no corresponding record for this reference.
- 33Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; shida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, N. J.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J.Gaussian 09, Gaussian, Inc: Wallingford, CT,2009.Google ScholarThere is no corresponding record for this reference.
- 34Stone, A. J.Distributed multipole analysis, or how to describe a molecular charge distributionChem. Phys. Lett.1981, 83 (2) 233–239[Crossref], [CAS], Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL3MXmt1yitbY%253D&md5=1a7ac695fa444006688ea669d36a3d55Distributed multipole analysis, or how to describe a molecular charge distributionChemical Physics Letters (1981), 83 (2), 233-9CODEN: CHPLBC; ISSN:0009-2614.A method of analyzing mol. wavefunctions is described. It can be regarded as an extension of Mulliken population anal., and can be used both to give a qual. or quant. picture of the mol. charge distribution, and in the accurate evaluation of mol. multipole moments of arbitrary order with negligible computational effort.
- 35Buckingham, R.The classical equation of state of gaseous helium, neon and argonProc. R. Soc. Lon. Ser-A1938, 168 (933) 264–283[Crossref], Google ScholarThere is no corresponding record for this reference.
- 36Gavezzotti, A.; Filippini, G.Theoretical Aspects and Computer Modeling.; Gavezzotti, A., Ed. Wiley and Sons: Chichester,1997; pp 61–97.Google ScholarThere is no corresponding record for this reference.
- 37Marenich, A. V.; Cramer, C. J.; Truhlar, D. G.Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface TensionsJ. Phys. Chem. B2009, 113 (18) 6378–6396[ACS Full Text ], [CAS], Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXksV2is74%253D&md5=54931a64c70d28445ee53876a8b1a4b9Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface TensionsMarenich, Aleksandr V.; Cramer, Christopher J.; Truhlar, Donald G.Journal of Physical Chemistry B (2009), 113 (18), 6378-6396CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)We present a new continuum solvation model based on the quantum mech. charge d. of a solute mol. interacting with a continuum description of the solvent. The model is called SMD, where the 'D' stands for 'd.' to denote that the full solute electron d. is used without defining partial at. charges. 'Continuum' denotes that the solvent is not represented explicitly but rather as a dielec. medium with surface tension at the solute-solvent boundary. SMD is a universal solvation model, where 'universal' denotes its applicability to any charged or uncharged solute in any solvent or liq. medium for which a few key descriptors are known (in particular, dielec. const., refractive index, bulk surface tension, and acidity and basicity parameters). The model separates the observable solvation free energy into two main components. The first component is the bulk electrostatic contribution arising from a self-consistent reaction field treatment that involves the soln. of the nonhomogeneous Poisson equation for electrostatics in terms of the integral-equation-formalism polarizable continuum model (IEF-PCM). The cavities for the bulk electrostatic calcn. are defined by superpositions of nuclear-centered spheres. The second component is called the cavity-dispersion-solvent-structure term and is the contribution arising from short-range interactions between the solute and solvent mols. in the first solvation shell. This contribution is a sum of terms that are proportional (with geometry-dependent proportionality consts. called at. surface tensions) to the solvent-accessible surface areas of the individual atoms of the solute. The SMD model has been parametrized with a training set of 2821 solvation data including 112 aq. ionic solvation free energies, 220 solvation free energies for 166 ions in acetonitrile, methanol, and DMSO, 2346 solvation free energies for 318 neutral solutes in 91 solvents (90 nonaq. org. solvents and water), and 143 transfer free energies for 93 neutral solutes between water and 15 org. solvents. The elements present in the solutes are H, C, N, O, F, Si, P, S, Cl, and Br. The SMD model employs a single set of parameters (intrinsic at. Coulomb radii and at. surface tension coeffs.) optimized over six electronic structure methods: M05-2X/MIDI!6D, M05-2X/6-31G*, M05-2X/6-31+G**, M05-2X/cc-pVTZ, B3LYP/6-31G*, and HF/6-31G*. Although the SMD model has been parametrized using the IEF-PCM protocol for bulk electrostatics, it may also be employed with other algorithms for solving the nonhomogeneous Poisson equation for continuum solvation calcns. in which the solute is represented by its electron d. in real space. This includes, for example, the conductor-like screening algorithm. With the 6-31G* basis set, the SMD model achieves mean unsigned errors of 0.6-1.0 kcal/mol in the solvation free energies of tested neutrals and mean unsigned errors of 4 kcal/mol on av. for ions with either Gaussian03 or GAMESS.
- 38(a) Ben-Naim, A.Standard thermodynamics of transfer. Uses and misusesJ. Phys. Chem.1978, 82 (7) 792–803[ACS Full Text ], [CAS], Google Scholar38ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE1cXht1Ohtrs%253D&md5=7bc3f4c27e458daf5ba115dcd69092d6Standard thermodynamics of transfer. Uses and misusesJournal of Physical Chemistry (1978), 82 (7), 792-803CODEN: JPCHAX; ISSN:0022-3654.The std. free energy of transfer of a solute A between two solvents a and b is discussed at both a thermodn. and a statistical-mech. level. Whereas thermodn. alone cannot be used to choose the 'best' std. quantity, statistical mechanics can help to make such a choice. The std. free energy of transferrin A, ΔμA°, computed by using the no. d. (or molarity) scale has the following advantages: (1) it is the simplest and least ambiguous quantity; (2) it is the quantity that directly probes the difference in the solvation properties of the two solvents with respect to the solute A; (3) it can be used, without any change of notation, in any soln., not necessarily a dil. one, and including even pure A; (4) by straightforward thermodn. manipulations one obtains the entropy, enthalpy, vol. changes, etc., for the same process. All of these quantities have advantages similar to those indicated for the free-energy change. Because of the advantages of this particular choice of std. quantities, it is proposed to 'standardize' the use of the std. thermodn. quantities of transfer and refer to them as the local-std. quantities.(b) Ben-Naim, A.; Marcus, Y.Solvation thermodynamics of nonionic solutesJ. Phys. Chem.1984, 81 (4) 2016–2027[Crossref], [CAS], Google Scholar38bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXlvFyqsb4%253D&md5=48040de9341c4f18c98dc4c66266b017Ben-Naim, A.; Marcus, Y.Journal of Chemical Physics (1984), 81 (4), 2016-27CODEN: JCPSA6; ISSN:0021-9606.A generalized process of solvation is defined. It is argued that the thermodn. of this solvation process is more informative as compared with other processes suggested before. Numerical examples are presented and compared with some recently published related data.
- 39Howley, T.; Madden, M. G.; O’Connell, M.-L.; Ryder, A. G.The effect of principal component analysis on machine learning accuracy with high-dimensional spectral dataKnowl.-Based Syst.2006, 19 (5) 363–370[Crossref], Google ScholarThere is no corresponding record for this reference.
- 40Wold, H.Partial Least Squares (PLS) Regression2003, 1–7Google ScholarThere is no corresponding record for this reference.
- 41(a) Abdi, H.Partial Least Squares (PLS) Regression2003, 1–7Google ScholarThere is no corresponding record for this reference.(b) Wold, S.; Sjöström, M.; Eriksson, L.PLS-regression: A basic tool of chemometricsChemometr. Intell. Lab.2001, 58 (2) 109–130[Crossref], [CAS], Google Scholar41bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXotF2mtLw%253D&md5=2d7fd1e946600e138ac92699ebcc7e29Wold, Svante; Sjostrom, Michael; Eriksson, LennartChemometrics and Intelligent Laboratory Systems (2001), 58 (2), 109-130CODEN: CILSEN; ISSN:0169-7439. (Elsevier Science B.V.)A review on PLS-regression (PLSR) as a std. tool in chemometrics and used in chem. and engineering. The underlying model and its assumption and commonly used diagnostics are discussed, together with the interpretation of resulting parameters. Two examples are used as illustrations: first, a Quant. Structure-Activity Relationship (QSAR)/Quant. Structure Property Relationship (QSPR) data set of peptides is used to outline the development, interpretation, and refinement of a PLSR model. Second, a data set from the manufg. of recycled paper is analyzed to illustrate time series modeling of process data by means of PLSR and time-lagged X-variables.(c) Mevik, B.; Wehrens, R.The pls Package: Principal Component and Partial Least Squares Regression in RJ Stat Softw.2007, 18 (2) 1–24Google ScholarThere is no corresponding record for this reference.
- 42(a) Palmer, D. S.; O’Boyle, N. M.; Glen, R. C.; Mitchell, J. B. O.Random Forest Models To Predict Aqueous SolubilityJ. Chem. Inf. Model2006, 47 (1) 150–158[ACS Full Text ], Google ScholarThere is no corresponding record for this reference.(b) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P.Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingJ. Chem. Inf. Comput. Sci.2003, 43 (6) 1947–1958[ACS Full Text ], [CAS], Google Scholar42bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXos1Wiu7s%253D&md5=dea7867551ec30260b0091b90593a660Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingSvetnik, Vladimir; Liaw, Andy; Tong, Christopher; Culberson, J. Christopher; Sheridan, Robert P.; Feuston, Bradley P.Journal of Chemical Information and Computer Sciences (2003), 43 (6), 1947-1958CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compd.'s quant. or categorical biol. activity based on a quant. description of the compd.'s mol. structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. The authors built predictive models for six cheminformatics data sets. The authors anal. demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. The authors also present three addnl. features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compd. similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
- 43Breiman, L.Random ForestsMach. Learning2001, 45 (1) 5–32[Crossref], Google ScholarThere is no corresponding record for this reference.
- 44(a) Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z.; Chen, X.; Li, H.-D.Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machineJ. Chemometr.2010, 24 (9) 584–595Google ScholarThere is no corresponding record for this reference.(b) Vapnik, V. N.An overview of statistical learning theoryIEEE Trans. Neural Netw.1999, 10 (5) 988–999[Crossref], [PubMed], [CAS], Google Scholar44bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1c%252FpsFSqtA%253D%253D&md5=d4e24c4899519f0c21087b610e28c849Vapnik V NIEEE transactions on neural networks (1999), 10 (5), 988-99 ISSN:1045-9227.Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
- 45Hu, S. In R2 Vs. r2, SCEA/ISPA Conference,2008; pp 1–15.Google ScholarThere is no corresponding record for this reference.
- 46Menke, J.; Martinez, T. R.In Using permutations instead of student’s t distribution for p-values in paired-difference algorithm comparisons, IEEE IJCNN, July 25–29, 2004;2004; Vol. 2, pp 1331–1335.Google ScholarThere is no corresponding record for this reference.
- 47Nath, N.; Mitchell, J. B. O.Is EC class predictable from reaction mechanism?BMC Bioinformatics2012, 13 (1) 60[Crossref], [PubMed], [CAS], Google Scholar47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38rnslKjtA%253D%253D&md5=c3f196743c10b505f74b4528c839c4dcNath Neetika; Mitchell John B OBACKGROUND: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. RESULTS: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. CONCLUSIONS: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways.The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.
- 48Kuhn, M.Variable Importance Using The caret Package 2012; Available via the Internet at http://cran.open-source-solution.org/web/packages/caret/vignettes/caretVarImp.pdf, accessed Feb. 10,2014.Google ScholarThere is no corresponding record for this reference.
- 49Kuhn, M.Variable Importance Using The caret Package2010, 1–7Google ScholarThere is no corresponding record for this reference.
- 50Varma, S.; Simon, R.Bias in error estimation when using cross-validation for model selectionBMC Bioinform.2006, 7 (1) 91[Crossref], [PubMed], [CAS], Google Scholar50https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD287ktFKlsA%253D%253D&md5=6fce0c91e4624476b4134dd4545af4ceBias in error estimation when using cross-validation for model selectionBMC bioinformatics (2006), 7 (), 91 ISSN:.BACKGROUND: Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. RESULTS: We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these 'null' datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With 'null' and 'non null' (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the 'null' datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training and 'non-null' data distributions. CONCLUSION: We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
- 51Simon, R. M.; Subramanian, J.; Li, M.-C.; Menezes, S.Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional dataBrief. Bioinform.2011, 12 (3) 203–214[Crossref], [PubMed], [CAS], Google Scholar51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MvoslSqsg%253D%253D&md5=a1ba7c32d7741dfb57f6dd2564511bb6Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional dataSimon Richard M; Subramanian Jyothi; Li Ming-Chung; Menezes SupriyaBriefings in bioinformatics (2011), 12 (3), 203-14 ISSN:.Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell's concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.
- 52R Development Core Team. R: A language and environment for statistical computing; R Foundation for Statistical Computing, Vienna, Austria,2011.Google ScholarThere is no corresponding record for this reference.
- 53(a) Kuhn, M.Building Predictive Models in R Using the caret PackageJ. Stat. Software2008, 28, 1–26[Crossref], [PubMed], Google ScholarThere is no corresponding record for this reference.(b) Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.R Core Team. caret: Classification and Regression Training. R Package “caret”. http://CRAN.R-project.org/package=caret.Google ScholarThere is no corresponding record for this reference.
- 54Walters, W. P.Modeling, Informatics, and the Quest for ReproducibilityJ. Chem. Inf. Model2013, 53 (7) 1529–1530[ACS Full Text ], [CAS], Google Scholar54https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpt1ylsLo%253D&md5=99ec193c7cc97cee6f0b2cf820454ca4Modeling, Informatics, and the Quest for ReproducibilityJournal of Chemical Information and Modeling (2013), 53 (7), 1529-1530CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. There is no doubt that papers published in the Journal of Chem. Information and Modeling, and related journals, provide valuable scientific information. However, it is often difficult to reproduce the work described in mol. modeling and chemoinformatics papers. In many cases the software described in the paper is not readily available, in other cases the supporting information is not provided in an accessible format. To date, the major journals in the fields of mol. modeling and chemoinformatics have not established guidelines for reproducible research. This letter provides an overview of the reproducibility challenges facing our field and suggests some guidelines for improving the reproducibility of published work.
- 55(a) Dearden, J. C.In silico prediction of aqueous solubilityExpert Opin. Drug Discovery2006, 1 (1) 31–52[Crossref], [PubMed], [CAS], Google Scholar55ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtVChtL%252FO&md5=1daa383519ba4ffc887c773847d8a656Dearden, John C.Expert Opinion on Drug Discovery (2006), 1 (1), 31-52CODEN: EODDBX; ISSN:1746-0441. (Informa Healthcare)A review. The fundamentals of aq. soly., and the factors that affect it, are briefly outlined, followed by a short introduction to quant. structure-property relationships. Early (pre-1990) work on aq. soly. prediction is summarized, and a more detailed presentation and crit. discussion are given of the results of most, if not all, of those published in silico prediction studies from 1990 onwards that have used diverse training sets. A table is presented of a no. of studies that have used a 21-compd. test set of drugs and pesticides to validate their aq. soly. models. Finally, the results are given of a test of 15 com. available software programs for aq. soly. prediction, using a test set of 122 drugs with accurately measured aq. solubilities.(b) Jorgensen, W. L.; Duffy, E. M.Prediction of drug solubility from structureAdv. Drug Delivery Rev.2002, 54 (3) 355–366[Crossref], [PubMed], [CAS], Google Scholar55bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xitlartbc%253D&md5=bc749286d56bf55c26d25b70806217e1Jorgensen, William L.; Duffy, Erin M.Advanced Drug Delivery Reviews (2002), 54 (3), 355-366CODEN: ADDREP; ISSN:0169-409X. (Elsevier Science Ireland Ltd.)A review with refs. The aq. soly. of a drug is an important factor affecting its bioavailability. Numerous computational methods have been developed for the prediction of aq. soly. from a compd.'s structure. A review is provided of the methodol. and quality of results for the most useful procedures including the model implemented in the QikProp program. Viable methods now exist for predictions with <1 log unit uncertainty, which is adequate for prescreening synthetic candidates or design of combinatorial libraries. Further progress with predictive methods would require an exptl. database of highly accurate solubilities for a large, diverse collection of drug-like mols.
- 56Lusci, A.; Pollastri, G.; Baldi, P.Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like MoleculesJ. Chem. Inf. Model.2013, 53, 1563–1575[ACS Full Text ], [CAS], Google Scholar56https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpvVGht7g%253D&md5=d51e537fea2f1f53ea5013224ee1cdc9Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like MoleculesLusci, Alessandro; Pollastri, Gianluca; Baldi, PierreJournal of Chemical Information and Modeling (2013), 53 (7), 1563-1575CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. Shallow machine learning methods have been applied to chemoinformatics problems with some success. As more data becomes available and more complex problems are tackled, deep machine learning methods may also become useful. Here, we present a brief overview of deep learning methods and show in particular how recursive neural network approaches can be applied to the problem of predicting mol. properties. However, mols. are typically described by undirected cyclic graphs, while recursive approaches typically use directed acyclic graphs. Thus, we develop methods to address this discrepancy, essentially by considering an ensemble of recursive neural networks assocd. with all possible vertex-centered acyclic orientations of the mol. graph. One advantage of this approach is that it relies only minimally on the identification of suitable mol. descriptors because suitable representations are learned automatically from the data. Several variants of this approach are applied to the problem of predicting aq. soly. and tested on four benchmark data sets. Exptl. results show that the performance of the deep learning methods matches or exceeds the performance of other state-of-the-art methods according to several evaluation metrics and expose the fundamental limitations arising from training sets that are too small or too noisy. A Web-based predictor, AquaSol, is available online through the ChemDB portal (cdb.ics.uci.edu) together with addnl. material.
- 57Wang, R.; Gao, Y.; Lai, L.Calculating partition coefficient by atom-additive methodPerspect. Drug Discovery Des.2000, 19 (1) 47–66[Crossref], [CAS], Google Scholar57https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXnslaitbg%253D&md5=90d29fc6b268c9af644b244eb0bc6912Calculating partition coefficient by atom-additive methodPerspectives in Drug Discovery and Design (2000), 19 (Hydrophobicity and Solvation in Drug Design, Pt. 3), 47-66CODEN: PDDDEC; ISSN:0928-2866. (Kluwer Academic Publishers)A new atom-additive method is presented for calcg. octanol/H2O partition coeff. (log P) of org. compds. The method, XLOGP v2.0, gives log P values by summing the contributions of component atoms and correction factors. Altogether 90 atom types are used to classify C, N, O, S, P and halogen atoms, and 10 correction factors are used for some special substructures. The contributions of each atom type and correction factor are derived by multivariate regression anal. of 1853 org. compds. with known exptl. log P values. The correlation coeff. (r) for fitting the whole set is 0.973 and the std. deviation (s) is 0.349 log units. Comparison of various log P calcn. procedures demonstrates that method gives much better results than other atom-additive approaches and is at least comparable to fragmental approaches. Because of the simple methodol., the missing fragment problem does not occur in method.
- 58Kier, L. B.; Hall, L. H.Molecular Connectivity in Chemistry and Drug Research; Academic Press: New York,1976.Google ScholarThere is no corresponding record for this reference.
- 59Moreau, G.; Broto, P.The autocorrelation of a topological structure: A new molecular descriptorNew J. Chem.1980, 359–360Google ScholarThere is no corresponding record for this reference.
- 60Randic, M.On molecular identification numbersJ. Chem. Inf. Comput. Sci.1984, 24 (3) 164–175[ACS Full Text ], [CAS], Google Scholar60https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXkvV2rtLs%253D&md5=355a2f5987cde26550a10bf7ef475d9cRandic, MilanJournal of Chemical Information and Computer Sciences (1984), 24 (3), 164-75CODEN: JCISD8; ISSN:0095-2338.The assignment of identification nos. to mols. that are easy to deriv. and have structural significance is discussed and a scheme for assignment is outlined. Output of the ALL-PATH program for study of mol. topol. from graphs with multiple connections is presented which includes weighing factors for individual bonds. Uniqueness and structural significance of the identification nos. are examd. and mol. graphs and identification nos. of some ring compds., terpenes, and some other compds. are presented.
- 61CDK Descriptor Summary (2011–05–28). http://pele.farmbio.uu.se/nightly-1.2.x/dnames.html, accessed Feb. 10,2014.Google ScholarThere is no corresponding record for this reference.
- 62Hewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.In Silico Prediction of Aqueous Solubility: The Solubility ChallengeJ. Chem. Inf. Model.2009, 49 (11) 2572–2587[ACS Full Text ], [CAS], Google Scholar62https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtleqtLvE&md5=7983f8d3133655a4d8967d5b7e9fbdbdIn Silico Prediction of Aqueous Solubility: The Solubility ChallengeHewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.Journal of Chemical Information and Modeling (2009), 49 (11), 2572-2587CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The dissoln. of a chem. into water is a process fundamental to both chem. and biol. The persistence of a chem. within the environment and the effects of a chem. within the body are dependent primarily upon aq. soly. With the well-documented limitations hindering the accurate exptl. detn. of aq. soly., the utilization of predictive methods have been widely investigated and employed. The setting of a soly. challenge by this journal proved an excellent opportunity to explore several different modeling methods, utilizing a supplied dataset of high-quality aq. soly. measurements. Four contrasting approaches (simple linear regression, artificial neural networks, category formation, and available in silico models) were utilized within our lab. and the quality of these predictions was assessed. These were chosen to span the multitude of modeling methods now in use, while also allowing for the evaluation of existing com. soly. models. The conclusions of this study were surprising, in that a simple linear regression approach proved to be superior over more-complex modeling methods. Possible explanations for this observation are discussed and also recommendations are made for future soly. prediction.
- 63Tsvetkova, B.; Pencheva, I.; Zlatkov, A.; Peikov, P.High Performance Liquid Chromatographic Assay of Indomethacin and its Related Substances in Tablet Dosage FormsInt. J. Pharm. Pharm. Sci.2012, 4 (Supplement 3) 549–552[CAS], Google Scholar63https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XptFamurw%253D&md5=dc420050b9fdb9f5e917016ff64b06a4High performance liquid chromatographic assay of indomethacin and its related substances in tablet dosage formsTsvetkova, Boyka; Pencheva, Ivanka; Zlatkov, Alexander; Peikov, PlamenInternational Journal of Pharmacy and Pharmaceutical Sciences (2012), 4 (Suppl. 3), 549-552CODEN: IJPPKB; ISSN:0975-1491. (International Journal of Pharmacy and Pharmaceutical Sciences)A reversed-phase high performance liq. chromatog. (RP-HPLC) method with UV detection was proposed for sepn. of indomethacin and its impurities from tablet dosage forms. The best sepn. was achieved on a LiChrosorb C18, 250 mm × 4.6 mm, 5 μm column at a detector wavelength of 240 nm. The utilization of mixt. of 40 vols. 0.5% vol./vol. orthophosphoric acid, 20 vols. of methanol and 40 vols. of acetonitrile as mobile phase with a flow rate of 2 mL/min enabled acceptable resoln. of indomethacin, in large excess, from possible impurities, in a short elution time (9 min). Anal. parameters linearity, accuracy, precision and specificity were detd. by validation procedure and found to be satisfactory. Overall, the proposed method was found to be simple, rapid, precise and accurate for quality control of indomethacin and its impurities in dosage forms and in raw materials. In this work the kinetic investigation of the alk. hydrolysis of indomethacin was also carried out. The degrdn. reaction was monitored by means of HPLC method developed and was found to follow first-order kinetics. The rate const. and half-life of the hydrolytic decompn. were estd.
This article is cited by 27 publications.
- Xin Yang, Yifei Wang, Ryan Byrne, Gisbert Schneider, Shengyong Yang. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chemical Reviews2019, Article ASAP.
- Dipankar Roy, Andriy Kovalenko. Performance of 3D-RISM-KH in Predicting Hydration Free Energy: Effect of Solute Parameters. The Journal of Physical Chemistry A2019,123 (18) , 4087-4093. DOI: 10.1021/acs.jpca.9b01623.
- James L. McDonagh, Arnaldo F. Silva, Mark A. Vincent, Paul L. A. Popelier. Machine Learning of Dynamic Electron Correlation Energies from Topological Atoms. Journal of Chemical Theory and Computation2018,14 (1) , 216-224. DOI: 10.1021/acs.jctc.7b01157.
- Hannes K. Buchholz, Rebecca K. Hylton, Jan Gerit Brandenburg, Andreas Seidel-Morgenstern, Heike Lorenz, Matthias Stein, and Sarah L. Price . Thermochemistry of Racemic and Enantiopure Organic Crystals for Predicting Enantiomer Separation. Crystal Growth & Design2017,17 (9) , 4676-4686. DOI: 10.1021/acs.cgd.7b00582.
- Sereina Riniker . Molecular Dynamics Fingerprints (MDFP): Machine Learning from MD Data To Predict Free-Energy Differences. Journal of Chemical Information and Modeling2017,57 (4) , 726-741. DOI: 10.1021/acs.jcim.6b00778.
- Sungjin Kim, Adrián Jinich, and Alán Aspuru-Guzik . MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes. Journal of Chemical Information and Modeling2017,57 (4) , 657-668. DOI: 10.1021/acs.jcim.6b00332.
- James L. McDonagh, David S. Palmer, Tanja van Mourik, and John B. O. Mitchell . Are the Sublimation Thermodynamics of Organic Molecules Predictable?. Journal of Chemical Information and Modeling2016,56 (11) , 2162-2179. DOI: 10.1021/acs.jcim.6b00033.
- Yuriy A. Abramov . Major Source of Error in QSPR Prediction of Intrinsic Thermodynamic Solubility of Drugs: Solid vs Nonsolid State Contributions?. Molecular Pharmaceutics2015,12 (6) , 2126-2141. DOI: 10.1021/acs.molpharmaceut.5b00119.
- Richard L. Marchese Robinson, Kevin J. Roberts, Elaine B. Martin. The influence of solid state information and descriptor selection on statistical models of temperature dependent aqueous solubility. Journal of Cheminformatics2018,10 (1) DOI: 10.1186/s13321-018-0298-3.
- Christiaan Jardinez, José L Medina-Franco. QSAR Modeling Using Quantum Chemical Descriptors of Benzimidazole Analogues With Antiparasitic Properties. International Journal of Quantitative Structure-Property Relationships2018,3 (2) , 61-79. DOI: 10.4018/IJQSPR.2018070105.
- Yanqing Zhu, Jiao Chen, Min Zheng, Gaoquan Chen, Ali Farajtabar, Hongkun Zhao. Equilibrium solubility and preferential solvation of 1,1′-sulfonylbis(4-aminobenzene) in binary aqueous solutions of n -propanol, isopropanol and 1,4-dioxane. The Journal of Chemical Thermodynamics2018,122, 102-112. DOI: 10.1016/j.jct.2018.03.010.
- Christel A.S. Bergström, Per Larsson. Computational prediction of drug solubility in water-based systems: Qualitative and quantitative approaches used in the current drug discovery and development setting. International Journal of Pharmaceutics2018,540 (1-2) , 185-193. DOI: 10.1016/j.ijpharm.2018.01.044.
- Samuel Boobier, Anne Osbourn, John B. O. Mitchell. Can human experts predict solubility better than computers?. Journal of Cheminformatics2017,9 (1) DOI: 10.1186/s13321-017-0250-y.
- Gisbert Schneider, Kimito Funatsu, Yasushi Okuno, Dave Winkler. De novo Drug Design - Ye olde Scoring Problem Revisited. Molecular Informatics2017,36 (1-2) , 1681031. DOI: 10.1002/minf.201681031.
- V. Sathyanarayanamoorthi, S. Suganthi, V. Kannappan, R. Kumar. Solubility study of cefpodoxime acid antibiotic in terms of free energy of solution - Insights from polarizable continuum model (PCM) analysis. Journal of Molecular Liquids2016,224, 657-661. DOI: 10.1016/j.molliq.2016.10.019.
- Donghai Yu, Ruobing Du, Suhui Zhang, Renjie Lu, Huaying An, Ji-Chang Xiao. Prediction of Solubility Properties from Transfer Energies for Acidic Phosphorus-Containing Rare-Earth Extractants Using Implicit Solvation Model. Solvent Extraction and Ion Exchange2016,34 (4) , 347-354. DOI: 10.1080/07366299.2016.1156420.
- Ayesha Zafar, Jóhannes Reynisson. Hydration Free Energy as a Molecular Descriptor in Drug Design: A Feasibility Study. Molecular Informatics2016,35 (5) , 207-214. DOI: 10.1002/minf.201501035.
- David S. Palmer, Maxim V. Fedorov. Molecular Simulation Methods to Compute Intrinsic Aqueous Solubility of Crystalline Drug-Like Molecules. 2016,, 263-286. DOI: 10.1002/9781118700686.ch11.
- Edward O. Pyzer-Knapp, Gregor N. Simm, Alán Aspuru Guzik. A Bayesian approach to calibrating high-throughput virtual screening results and application to organic photovoltaic materials. Materials Horizons2016,3 (3) , 226-233. DOI: 10.1039/C5MH00282F.
- Jia Fu, Jianzhong Wu. Toward high-throughput predictions of the hydration free energies of small organic molecules from first principles. Fluid Phase Equilibria2016,407, 304-313. DOI: 10.1016/j.fluid.2015.05.042.
- Shahram Emami, Abolghasem Jouyban, Hadi Valizadeh, Ali Shayanfar. Are Crystallinity Parameters Critical for Drug Solubility Prediction?. Journal of Solution Chemistry2015,44 (12) , 2297-2315. DOI: 10.1007/s10953-015-0410-5.
- J. L. McDonagh, T. van Mourik, J. B. O. Mitchell. Predicting Melting Points of Organic Molecules: Applications to Aqueous Solubility Prediction Using the General Solubility Equation. Molecular Informatics2015,34 (11-12) , 715-724. DOI: 10.1002/minf.201500052.
- Edward O. Pyzer-Knapp, Kewei Li, Alan Aspuru-Guzik. Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery. Advanced Functional Materials2015,25 (41) , 6495-6502. DOI: 10.1002/adfm.201501919.
- William Kew, John B. O. Mitchell. Greedy and Linear Ensembles of Machine Learning Methods Outperform Single Approaches for QSPR Regression Problems. Molecular Informatics2015,34 (9) , 634-647. DOI: 10.1002/minf.201400122.
- Oleg A. Raevsky, Daniel E. Polianczyk, Veniamin Yu. Grigorev, Olga E. Raevskaja, John C. Dearden. In silico Prediction of Aqueous Solubility: a Comparative Study of Local and Global Predictive Models. Molecular Informatics2015,34 (6-7) , 417-430. DOI: 10.1002/minf.201400144.
- Robert Docherty, Klimentina Pencheva, Yuriy A. Abramov. Low solubility in drug development: de-convoluting the relative importance of solvation and crystal packing. Journal of Pharmacy and Pharmacology2015,67 (6) , 847-856. DOI: 10.1111/jphp.12393.
- R. E. Skyner, J. L. McDonagh, C. R. Groom, T. van Mourik, J. B. O. Mitchell. A review of methods for the calculation of solution free energies and the modelling of systems in solution. Physical Chemistry Chemical Physics2015,17 (9) , 6174-6191. DOI: 10.1039/C5CP00288E.
- ARTICLE SECTIONSJump To
This article references 63 other publications.
- 1Palmer, D. S.; McDonagh, J. L.; Mitchell, J. B. O.; van Mourik, T.; Fedorov, M. V.First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike MoleculesJ. Chem. Theory Comput.2012, 8, 3322–3337[ACS Full Text ], [CAS], Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtV2gsrnE&md5=fbcfafe07e5f8ccb8acc2414bdb3a021First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike MoleculesPalmer, David S.; McDonagh, James L.; Mitchell, John B. O.; van Mourik, Tanja; Fedorov, Maxim V.Journal of Chemical Theory and Computation (2012), 8 (9), 3322-3337CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)We demonstrate that the intrinsic aq. soly. of cryst. druglike mols. can be estd. with reasonable accuracy from sublimation free energies calcd. using crystal lattice simulations and hydration free energies calcd. using the 3D Ref. Interaction Site Model (3D-RISM) of the Integral Equation Theory of Mol. Liqs. (IET). The solubilities of 25 cryst. druglike mols. taken from different chem. classes are predicted by the model with a correlation coeff. of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log10S units, which is significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against exptl. soly. data, and it offers a full computational characterization of the thermodn. of transfer of the drug mol. from crystal phase to gas phase to dil. aq. soln.
- 2Palmer, D. S.; Llinas, A.; Morao, I.; Day, G. M.; Goodman, J. M.; Glen, R. C.; Mitchell, J. B. O.Predicting intrinsic aqueous solubility by a thermodynamic cycleMol. Pharm.2008, 5 (2) 266–279[ACS Full Text ], [CAS], Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXitleis74%253D&md5=71502207fc58378d9cb7e4cb6ad15fd0Predicting Intrinsic Aqueous Solubility by a Thermodynamic CyclePalmer, David S.; Llinas, Antonio; Morao, Inaki; Day, Graeme M.; Goodman, Jonathan M.; Glen, Robert C.; Mitchell, John B. O.Molecular Pharmaceutics (2008), 5 (2), 266-279CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)The authors report methods to predict the intrinsic aq. soly. of cryst. org. mols. from two different thermodn. cycles. Direct computation of soly., via ab initio calcn. of thermodn. quantities at an affordable level of theory, cannot deliver the required accuracy. Therefore, the authors have turned to a mixt. of direct computation and informatics, using the calcd. thermodn. properties, along with a few other key descriptors, in regression models. The prediction of log intrinsic soly. (referred to mol/L) by a three-variable linear regression equation gave r2 = 0.77 and RMSE = 0.71 for an external test set comprising drug mols. The model includes a calcd. crystal lattice energy which provides a computational method to account for the interactions in the solid state. Probably it is not necessary to know the polymorphic form prior to prediction. Also, the method developed here may be applicable to other solid-state systems such as salts or cocrystals.
- 3Mitchell, J. B. O.Informatics, machine learning and computational medicinal chemistryFuture Med. Chem.2011, 3 (4) 451–67[Crossref], [CAS], Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXktVClu7Y%253D&md5=9347e17c69cf60de76ff62184a3f4393Informatics, machine learning and computational medicinal chemistryFuture Medicinal Chemistry (2011), 3 (4), 451-467CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)A review. This article reviews the use of informatics and computational chem. methods in medicinal chem., with special consideration of how computational techniques can be adapted and extended to obtain more and higher-quality information. Special consideration is given to the computation of protein--ligand binding affinities, to the prediction of off-target bioactivities, bioactivity spectra and computational toxicol., and also to calcg. absorption-, distribution-, metab.- and excretion-relevant properties, such as soly.
- 4Hughes, L. D.; Palmer, D. S.; Nigsch, F.; Mitchell, J. B. O.Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log PJ. Chem. Inf. Model2008, 48 (1) 220–232[ACS Full Text ], [CAS], Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXksV2htg%253D%253D&md5=7fd5639f3443fa70718ad40e9a9f8957Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log PHughes, Laura D.; Palmer, David S.; Nigsch, Florian; Mitchell, John B. O.Journal of Chemical Information and Modeling (2008), 48 (1), 220-232CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)This paper attempts to elucidate differences in QSPR models of aq. soly. (Log S), m.p. (Tm), and octanol-water partition coeff. (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calcd. in the Mol. Operating Environment were the best models. Octanol-water partition coeff. was the easiest property to predict, as indicated by the RMSE of the external test set and the coeff. of detn. (RMSE = 0.73, r2 = 0.87). M.p. prediction, on the other hand, was the most difficult (RMSE = 52.8 °C, r2 = 0.46), and Log S statistics were intermediate between m.p. and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor m.p. prediction, and we suggest that deficiencies in descriptors used in m.p. prediction contribute significantly to the prediction errors.
- 5(a) Tetko, I. V.Computing chemistry on the webDrug Discovery Today2005, 10 (22) 1497–1500[Crossref], [PubMed], [CAS], Google Scholar5ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2MrnvVOgsw%253D%253D&md5=df73185749c297067e22b2e34629f260Tetko Igor VDrug discovery today (2005), 10 (22), 1497-500 ISSN:1359-6446.The development of on-line software tools is changing the way we traditionally perform our analysis in drug design, but will chemoinformatics be forever behind bioinformatics in this development?(b) Tetko, I. V.; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin, V. A.; Radchenko, E. V.; Zefirov, N. S.; Makarenko, A. S.; Tanchuk, V. Y.; Prokopenko, V. V.Virtual computational chemistry laboratory—Design and descriptionJ. Comput. Aid. Mol. Des2005, 19, 453–63[Crossref], [PubMed], [CAS], Google Scholar5bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtFaht77F&md5=6e48f916c58c1e772ade43fa8e4b4b1aVirtual computational chemistry laboratory - design and descriptionTetko, Igor V.; Gasteiger, Johann; Todeschini, Roberto; Mauri, Andrea; Livingstone, David; Ertl, Peter; Palyulin, Vladimir A.; Radchenko, Eugene V.; Zefirov, Nikolay S.; Makarenko, Alexander S.; Tanchuk, Vsevolod Yu.; Prokopenko, Volodymyr V.Journal of Computer-Aided Molecular Design (2005), 19 (6), 453-463CODEN: JCADEQ; ISSN:0920-654X. (Springer)Internet technol. offers an excellent opportunity for the development of tools by the cooperative effort of various groups and institutions. We have developed a multi-platform software system, Virtual Computational Chem. Lab., http://www.vcclab.org, allowing the computational chemist to perform a comprehensive series of mol. indexes/properties calcns. and data anal. The implemented software is based on a three-tier architecture that is one of the std. technologies to provide client-server services on the Internet. The developed software includes several popular programs, including the indexes generation program, DRAGON, a 3D structure generator, CORINA, a program to predict lipophilicity and aq. soly. of chems., ALOGPS and others. All these programs are running at the host institutes located in five countries over Europe. In this article we review the main features and statistics of the developed system that can be used as a prototype for academic and industry models.
- 6Price, S. L.; Leslie, M.; Welch, G. W. A.; Habgood, M.; Price, L. S.; Karamertzanis, P. G.; Day, G. M.Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentialsPhys. Chem. Chem. Phys.2010, 12 (30) 8478–8490[Crossref], [PubMed], [CAS], Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXptFOkt7w%253D&md5=098e4b7761cc1d0267402a3d64f214a5Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentialsPrice, Sarah L.; Leslie, Maurice; Welch, Gareth W. A.; Habgood, Matthew; Price, Louise S.; Karamertzanis, Panagiotis G.; Day, Graeme M.Physical Chemistry Chemical Physics (2010), 12 (30), 8478-8490CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)Crystal structure prediction for org. mols. requires both the fast assessment of thousands to millions of crystal structures and the greatest possible accuracy in their relative energies. We describe a crystal lattice simulation program, DMACRYS, emphasizing the features that make it suitable for use in crystal structure prediction for pharmaceutical mols. using accurate anisotropic atom-atom model intermol. potentials based on the theory of intermol. forces. DMACRYS can optimize the lattice energy of a crystal, calc. the second deriv. properties, and reduce the symmetry of the space group to move away from a transition state. The calcd. terahertz frequency k = 0 rigid-body lattice modes and elastic tensor can be used to est. free energies. The program uses a distributed multipole electrostatic model (Qat, t = 00,..,44s) for the electrostatic fields, and can use anisotropic atom-atom repulsion models, damped isotropic dispersion up to R-10, as well as a range of empirically fitted isotropic exp-6 atom-atom models with different definitions of at. types. A new feature is that an accurate model for the induction energy contribution to the lattice energy has been implemented that uses at. anisotropic dipole polarizability models (αat, t = (10,10)..(11c,11s)) to evaluate the changes in the mol. charge d. induced by the electrostatic field within the crystal. It is demonstrated, using the four polymorphs of the pharmaceutical carbamazepine C15H12N2O, that while reproducing crystal structures is relatively easy, calcg. the polymorphic energy differences to the accuracy of a few kJ mol-1 required for applications is very demanding of assumptions made in the modeling. Thus DMACRYS enables the comparison of both known and hypothetical crystal structures as an aid to the development of pharmaceuticals and other specialty org. materials, and provides a tool to develop the modeling of the intermol. forces involved in mol. recognition processes.
- 7Segall, M. D.; Lindan, P. J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.First-principles simulation: Ideas, illustrations and the CASTEP codeJ. Phys. Condens. Matter2002, 14 (11) 2717–2744[Crossref], [CAS], Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XivFGrs7c%253D&md5=fc155abe0df3e9ec12d832be5b5aa84eFirst-principles simulation: ideas, illustrations and the CASTEP codeSegall, M. D.; Lindan, Philip J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.Journal of Physics: Condensed Matter (2002), 14 (11), 2717-2744CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)A review. First-principles simulation, meaning d.-functional theory calcns. with plane waves and pseudopotentials, has become a prized technique in condensed-matter theory. Here I look at the basics of the subject, give a brief review of the theory, examg. the strengths and weaknesses of its implementation, and illustrating some of the ways simulators approach problems through a small case study. I also discuss why and how modern software design methods have been used in writing a completely new modular version of the CASTEP code.
- 8Dovesi, R.; Orlando, R.; Civalleri, B.; Roetti, C.; Saunders, V. R.; Zicovich-Wilson, C. M.CRYSTAL: a computational tool for the ab initio study of the electronic properties of crystalsZ Kristallogr.2005, 220 (5-2005–6-2005) 571–573[CAS], Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVSitbY%253D&md5=7bf7c582dd3196c28e16c4fb24ac9fb7CRYSTAL: A computational tool for the ab initio study of the electronic properties of crystalsDovesi, Roberto; Orlando, Roberto; Civalleri, Bartolomeo; Roetti, Carla; Saunders, Victor R.; Zicovich-Wilson, Claudio M.Zeitschrift fuer Kristallographie (2005), 220 (5-6), 571-573CODEN: ZEKRDZ; ISSN:0044-2968. (Oldenbourg Wissenschaftsverlag GmbH)CRYSTAL computes the electronic structure and properties of periodic systems (crystals, surfaces, polymers) within Hartree-Fock, D. Functional and various hybrid approxns. CRYSTAL was developed during nearly 30 years (since 1976) by researchers of the Theor. Chem. Group in Torino (Italy), and the Computational Materials Science group in CLRC (Daresbury, UK), with important contributions from visiting researchers, as documented by the main authors list and the bibliog. The basic features of the program CRYSTAL are presented, with two examples of application in the field of crystallog.
- 9(a) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 2. Free energy of solvation in pure meltsJ. Phys. Chem. B2007, 111 (7) 1883–1892[ACS Full Text ], [CAS], Google Scholar9ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7gtFSrsQ%253D%253D&md5=4a28537a0bde7b9df457d5dde0f2de8aIn silico prediction of drug solubility: 2. Free energy of solvation in pure meltsLuder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander RolandThe journal of physical chemistry. B (2007), 111 (7), 1883-92 ISSN:1520-6106.The solubility of drugs in water is investigated in a series of papers and in the current work. The free energy of solvation, DeltaG*(vl), of a drug molecule in its pure drug melt at 673.15 K (400 degrees C) has been obtained for 46 drug molecules using the free energy perturbation method. The simulations were performed in two steps where first the Coulomb and then the Lennard-Jones interactions were scaled down from full to no interaction. The results have been interpreted using a theory assuming that DeltaG*(vl) = DeltaG(cav) + E(LJ) + E(C)/2 where the free energy of cavity formation, DeltaG(cav), in these pure drug systems was obtained using hard body theories, and E(LJ) and E(C) are the Lennard-Jones and Coulomb interaction energies, respectively, of one molecule with the other ones. Since the main parameter in hard body theories is the volume fraction, an equation of state approach was used to estimate the molecular volume. Promising results were obtained using a theory for hard oblates, in which the oblate axial ratio was calculated from the molecular surface area and volume obtained from simulations. The Coulomb term, E(C)/2, is half of the Coulomb energy in accord with linear response, which showed good agreement with our simulation results. In comparison with our previous results on free energy of hydration, the Coulomb interactions in pure drug systems are weaker, and the van der Waals interactions play a more important role.(b) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matterJ. Phys. Chem. B2007, 111 (25) 7303–7311[ACS Full Text ], [CAS], Google Scholar9bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2szms1Chtw%253D%253D&md5=e09146655f4d28797e7f840919ff30b2In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matterLuder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander RolandThe journal of physical chemistry. B (2007), 111 (25), 7303-11 ISSN:1520-6106.The solubility of drugs in water is investigated in a series of papers. In this work, we address the process of bringing a drug molecule from the vapor into a pure drug amorphous phase. This step enables us to actually calculate the solubility of amorphous drugs in water. In our general approach, we, on one hand, perform rigorous free energy simulations using a combination of the free energy perturbation and thermodynamic integration methods. On the other hand, we develop an approximate theory containing parameters that are easily accessible from conventional Monte Carlo simulations, thereby reducing the computation time significantly. In the theory for solvation, we assume that DeltaG* = DeltaGcav + ELJ + EC/2, where the free energy of cavity formation, DeltaGcav, in pure drug systems is obtained using a theory for hard-oblate spheroids, and ELJ and EC are the Lennard-Jones and Coulomb interaction energies between the chosen molecule and the others in the fluid. The theoretical predictions for the free energy of solvation in pure amorphous matter are in good agreement with free energy simulation data for 46 different drug molecules. These results together with our previous studies support our theoretical approach. By using our previous data for the free energy of hydration, we compute the total free energy change of bringing a molecule from the amorphous phase into water. We obtain good agreement between the theory and simulations. It should be noted that to obtain accurate results for the total process, high precision data are needed for the individual subprocesses. Finally, for eight different substances, we compare the experimental amorphous and crystalline solubility in water with the results obtained by the proposed theory with reasonable success.(c) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Persson, R.; Pedersen, M.In Silico Prediction of Drug Solubility: 4. Will Simple Potentials Suffice?J. Comput. Chem.2009, 30 (12) 1859–1871[Crossref], [PubMed], [CAS], Google Scholar9chttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1MvltlGhtQ%253D%253D&md5=34e04b040a9ae704fd5ea7b969d7e5b4In silico prediction of drug solubility: 4. Will simple potentials suffice?Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Persson Rasmus; Pedersen MikaelaJournal of computational chemistry (2009), 30 (12), 1859-71 ISSN:.In view of the extreme importance of reliable computational prediction of aqueous drug solubility, we have established a Monte Carlo simulation procedure which appears, in principle, to yield reliable solubilities even for complex drug molecules. A theory based on judicious application of linear response and mean field approximations has been found to reproduce the computationally demanding free energy determinations by simulation while at the same time offering mechanistic insight. The focus here is on the suitability of the model of both drug and solvent, i.e., the force fields. The optimized potentials for liquid simulations all atom (OPLS-AA) force field, either intact or combined with partial charges determined either by semiempirical AM1/CM1A calculations or taken from the condensed-phase optimized molecular potentials for atomistic simulation studies (COMPASS) force field has been used. The results illustrate the crucial role of the force field in determining drug solubilities. The errors in interaction energies obtained by the simple force fields tested here are still found to be too large for our purpose but if a component of this error is systematic and readily removed by empirical adjustment the results are significantly improved. In fact, consistent use of the OPLS-AA Lennard-Jones force field parameters with partial charges from the COMPASS force field will in this way produce good predictions of amorphous drug solubility within 1 day on a standard desktop PC. This is shown here by the results of extensive new simulations for a total of 47 drug molecules which were also improved by increasing the water box in the hydration simulations from 500 to 2000 water molecules.(d) Westergren, J.; Lindfors, L.; Hoglund, T.; Luder, K.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 1. Free energy of hydrationJ. Phys. Chem. B2007, 111 (7) 1872–1882[ACS Full Text ], [CAS], Google Scholar9dhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtVynsLc%253D&md5=c77f8ac363fc84656673504c964a0b50In Silico Prediction of Drug Solubility: 1. Free Energy of HydrationWestergren, Jan; Lindfors, Lennart; Hoeglund, Tobias; Lueder, Kai; Nordholm, Sture; Kjellander, RolandJournal of Physical Chemistry B (2007), 111 (7), 1872-1882CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)As a first step in the computational prediction of drug soly. the free energy of hydration, ΔGvw•, in TIP4P water has been computed for a data set of 48 drug mols. using the free energy of perturbation method and the optimized potential for liq. simulations all-atom force field. The simulations were performed in two steps, where first the Coulomb and then the Lennard-Jones interactions between the solute and the water mols. were scaled down from full to zero strength to provide phys. understanding and simpler predictive models. The results have been interpreted using a theory assuming ΔGvw• = AMSγ + ELJ + EC/2 where AMS is the mol. surface area, γ is the water-vapor surface tension, and ELJ and EC are the solute-water Lennard-Jones and Coulomb interaction energies, resp. It was found that by a proper definition of the mol. surface area our results as well as several results from the literature were found to be in quant. agreement using the macroscopic surface tension of TIP4P water. This is in contrast to the surface tension for water around a spherical cavity that previously has been shown to be dependent on the size of the cavity up to a radius of ∼1 nm. The step of scaling down the electrostatic interaction can be represented by linear response theory.
- 10Tomasi, J.; Mennucci, B.; Cammi, R.Quantum Mechanical Continuum Solvation ModelsChem. Rev.2005, 105 (8) 2999–3094[ACS Full Text ], [CAS], Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVynurc%253D&md5=462420dd18b3006ee63d1298b66db247Tomasi, Jacopo; Mennucci, Benedetta; Cammi, RobertoChemical Reviews (Washington, DC, United States) (2005), 105 (8), 2999-3093CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)
- 11(a) Ten-no, S.Free energy of solvation for the reference interaction site model: Critical comparison of expressionsJ. Phys. Chem.2001, 115 (8) 3724–3731[Crossref], [CAS], Google Scholar11ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlvVehsbo%253D&md5=42db0b786f7a20ff6ddecb54d0f40dbcFree energy of solvation for the reference interaction site model: Critical comparison of expressionsJournal of Chemical Physics (2001), 115 (8), 3724-3731CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We investigate expressions of excess chem. potential in the ref. interaction site model (RISM) integral equation theory. In addn. to the previous expressions from the Gaussian d. fluctuation theory and from the extended RISM (XRISM) theory, we examine a new free energy functional from the distributed partial wave expansion of mol. correlation functions, using the embedded site model and alcs. with different parameter sets. The results clearly show that the free energy of solvation in the XRISM theory includes a serious error, which is related to the no. of interaction sites and the geometry of a solute mol.(b) Palmer, D. S.; Frolov, A. I.; Ratkova, E. L.; Fedorov, M. V.Towards a universal method for calculating hydration free energies: A 3D reference interaction site model with partial molar volume correctionJ. Phys.: Condens. Matter2010, 22 (49) 492101[Crossref], [PubMed], [CAS], Google Scholar11bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjs1ahug%253D%253D&md5=6b660f5ec50212304923b5c4f6ed75d7Towards a universal method for calculating hydration free energies: a 3D reference interaction site model with partial molar volume correctionPalmer, David S.; Frolov, Andrey I.; Ratkova, Ekaterina L.; Fedorov, Maxim V.Journal of Physics: Condensed Matter (2010), 22 (49), 492101/1-492101/9CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)We report a simple universal method to systematically improve the accuracy of hydration free energies calcd. using an integral equation theory of mol. liqs., the 3D ref. interaction site model. A strong linear correlation is obsd. between the difference of the exptl. and (uncorrected) calcd. hydration free energies and the calcd. partial molar volume for a data set of 185 neutral org. mols. from different chem. classes. By using the partial molar volume as a linear empirical correction to the calcd. hydration free energy, we obtain predictions of hydration free energies in excellent agreement with expt. (R = 0.94, σ = 0.99 kcal mol-1 for a test set of 120 org. mols.).
- 12Stanton, R. V.; Hartsough, D. S.; Merz, K. M.Calculation of solvation free energies using a density functional/molecular dynamics coupled potentialJ. Phys. Chem.1993, 97 (46) 11868–11870[ACS Full Text ], [CAS], Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXmsFOns70%253D&md5=79256a29182e1a36bbaf822f0ea85b73Calculation of solvation free energies using a density functional/molecular dynamics coupled potentialStanton, Robert V.; Hartsough, David S.; Merz, Kenneth M., Jr.Journal of Physical Chemistry (1993), 97 (46), 11868-70CODEN: JPCHAX; ISSN:0022-3654.Recently there was much interest in the development of methods which couple quantum mech. and mol. mech. computational models. The authors report the 1st coupling of a d. functional Hamiltonian with a mol. mech. method. The AMBER force field was coupled with a d. functional Hamiltonian as implemented in the deMon program. Test calcns. of solvation energies were carried out for a small group of ions. The coupled potential method slightly underestimates the solvation energy of the chloride ion while it overestimates the solvation energy of the other ions studied. Nonetheless, this method allows to study condensed-phase systems at a level of accuracy currently not available.
- 13Ratkova, E. L.; Fedorov, M. V.Combination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic PollutantsJ. Chem. Theory Comput.2011, 7 (5) 1450–1457[ACS Full Text ], [CAS], Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXltFGmsbk%253D&md5=73f9a2a58266597d0f7661e84b4a43ddCombination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic PollutantsJournal of Chemical Theory and Computation (2011), 7 (5), 1450-1457CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The authors discuss a new method for predicting the hydration free energy (HFE) of org. pollutants and illustrate the efficiency of the method on a set of 220 chlorinated arom. hydrocarbons. The new model is computationally inexpensive, with one HFE calcn. taking less than a minute on a PC. The method is based on a combination of a mol. integral equations theory, one-dimensional ref. interaction site model (1D RISM), with the cheminformatics approach. The authors correct HFEs obtained by the 1D RISM with a set of empirical corrections. The corrections are assocd. with the partial molar volume and structural descriptors of the mols. The introduced corrections can significantly improve the quality of the 1D RISM HFE predictions obtained by the partial wave free energy expression and the Kovalenko-Hirata closure. The quality of the model can be further improved by the reparametrization using QM-derived partial charges instead of the originally used OPLS-AA partial charges. The final model gives good results for polychlorinated benzenes (the mean and std. deviation of the error are 0.02 and 0.36 kcal/mol, correspondingly). At the same time, the model gives somewhat worse results for polychlorobiphenyls (PCBs) with a systematic bias of -0.72 kcal/mol but a small std. deviation equal to 0.55 kcal/mol. The error remains the same for the whole set of PCBs, whereas errors of HFEs predicted with continuum solvation models increase significantly for higher chlorinated PCB congeners. The authors discuss potential future applications of the model and several avenues for its further improvement.
- 14Allen, F. H.The Cambridge Structural Database: a quarter of a million crystal structures and risingActa Crystallogr B2002, B58, 380–388[Crossref], [CAS], Google Scholar14https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu74%253D&md5=406cd0df6ea9035a0ebf8dd9eccbd1f8The Cambridge Structural Database: a quarter of a million crystal structures and risingActa Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 380-388CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)The Cambridge Structural Database (CSD) now contains data for more than a quarter of a million small-mol. crystal structures. The information content of the CSD, together with methods for data acquisition, processing and validation, are summarized, with particular emphasis on the chem. information added by CSD editors. Nearly 80% of new structural data arrives electronically, mostly in CIF format, and the CCDC acts as the official crystal structure data depository for 51 major journals. The CCDC now maintains both a CIF archive (more than 73000 CIFs dating from 1996), as well as the distributed binary CSD archive; the availability of data in both archives is discussed. A statistical survey of the CSD is also presented and projections concerning future accession rates indicate that the CSD will contain at least 500000 crystal structures by the year 2010.
- 15Box, K.; Comer, J. E.; Gravestock, T.; Stuart, M.New Ideas about the Solubility of DrugsChem. Biodiversity2009, 6 (11) 1767–1788[Crossref], [PubMed], [CAS], Google Scholar15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhsFSrtLrE&md5=57fc3d9b4f1490ded75d71d1cfe20c58Box, Karl; Comer, John E.; Gravestock, Tom; Stuart, MartinChemistry & Biodiversity (2009), 6 (11), 1767-1788CODEN: CBHIAM; ISSN:1612-1872. (Verlag Helvetica Chimica Acta)Methods are described for detecting pptn. of ionizable drugs under conditions of changing pH, estg. kinetic soly. from the onset of pptn., and measuring soly. by chasing equil. Definitions are presented for kinetic, equil., and intrinsic soly. of ionizable drugs, supersatn. and subsatn., and for chasers and non-chasers, which are 2 classes of ionizable drug with significantly different soly. properties. The use of Bjerrum Curves and Neutral-Species Concn. Profiles to depict soly. properties are described and illustrated with case studies showing super-dissolving behavior, conversion between cryst. forms and enhancement of soly. through supersatn., and the use of additives and simulated gastrointestinal fluids.
- 16(a) Hopfinger, A. J.; Esposito, E. X.; Llinàs, A.; Glen, R. C.; Goodman, J. M.Findings of the Challenge To Predict Aqueous SolubilityJ. Chem. Inf. Model.2008, 49 (1) 1–5[ACS Full Text ], Google ScholarThere is no corresponding record for this reference.(b) Llinàs, A.; Glen, R. C.; Goodman, J. M.Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements?J. Chem. Inf. Model.2008, 48 (7) 1289–1303[ACS Full Text ], [CAS], Google Scholar16bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXosV2itb4%253D&md5=6a8950fc1c51ad9a51731c65e6debc22Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements?Llinas, Antonio; Glen, Robert C.; Goodman, Jonathan M.Journal of Chemical Information and Modeling (2008), 48 (7), 1289-1303CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Soly. is a key physicochem. property of mols. Serious deficiencies exist in the consistency and reliability of soly. data in the literature. The accurate prediction of soly. would be very useful. However, systematic errors and lack of metadata assocd. with measurements greatly reduce the confidence in current models. To address this, we are accurately measuring intrinsic soly. values, and here we report results for a diverse set of 100 druglike mols. at 25° and an ionic strength of 0.15 M using the CheqSol approach. This is a highly reproducible potentiometric technique that ensures the thermodn. equil. is reached rapidly. Results with a coeff. of variation higher than 4% were rejected. In addn., the Potentiometric Cycling for Polymorph Creation method, [PC]2, was used to obtain multiple polymorph forms from aq. soln. We now challenge researchers to predict the intrinsic soly. of 32 other druglike mols. that have been measured but are yet to be published.
- 17The Goodman group. http://www-jmg.ch.cam.ac.uk/data/solubility/ (accessed Feb. 8,2013) .Google ScholarThere is no corresponding record for this reference.
- 18Narasimham, L.; Barhate, V. D.Kinetic and intrinsic solubility determination of some β-blockers and antidiabetics by potentiometryJ. Pharm. Res.2011, 4 (2) 532–536Google ScholarThere is no corresponding record for this reference.
- 19(a) Bergström, C. A. S.; Luthman, K.; Artursson, P.Accuracy of calculated pH-dependent aqueous drug solubilityEur. J. Pharm. Sci.2004, 22 (5) 387–398[Crossref], [PubMed], [CAS], Google Scholar19ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXlsl2nsbc%253D&md5=dc25a69e233afd97c1ae1cb490f9de0dAccuracy of calculated pH-dependent aqueous drug solubilityBergstrom, Christel A. S.; Luthman, Kristina; Artursson, PerEuropean Journal of Pharmaceutical Sciences (2004), 22 (5), 387-398CODEN: EPSCED; ISSN:0928-0987. (Elsevier B.V.)The aim of the present study was to investigate the extent to which the Henderson-Hasselbalch (HH) relationship can be used to predict the pH-dependent aq. soly. of cationic drugs. The pH-dependent soly. for 25 amines, carrying a single pos. charge, was detd. with a small-scale shake flask method. Each sample was prepd. as a suspension in 150 mM phosphate buffer. The pH-dependent soly. curves were obtained using at least 10 different pH values. The intrinsic soly., the soly. at the pKa and the soly. at pH values reflecting the pH of the bulk and acid microclimate in the human small intestine (pH 7.4 and 6.5, resp.) were detd. for all compds. The exptl. study revealed a large diversity in slope, from -0.5 (celiprolol) to -8.6 (hydralazine) in the linear pH-dependent soly. interval, which is in sharp contrast to the slope of -1 assumed by the HH equation. In addn., a large variation in the range of soly. between the completely uncharged and completely charged drug species was obsd. The range for disopyramide was only 1.1 log units, whereas that for amiodarone was greater than 6.3 log units, pointing at the compd. specific response to counter-ion effects. In conclusion, the investigated cationic drugs displayed compd. specific pH-dependent soly. profiles, indicating that the HH equation in many cases will only give rough estns. of the pH-dependent soly. of drugs in divalent buffer systems.(b) Bergström, C. A. S.; Wassvik, C. M.; Norinder, U.; Luthman, K.; Artursson, P.Global and Local Computational Models for Aqueous Solubility Prediction of Drug-Like MoleculesJ. Chem. Inf. Comput. Sci.2004, 44 (4) 1477–1488[ACS Full Text ], [CAS], Google Scholar19bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2cvgtVSkuw%253D%253D&md5=4945232f11abd9bf5b0510226b82198dGlobal and local computational models for aqueous solubility prediction of drug-like moleculesBergstrom Christel A S; Wassvik Carola M; Norinder Ulf; Luthman Kristina; Artursson PerJournal of chemical information and computer sciences (2004), 44 (4), 1477-88 ISSN:0095-2338.The aim of this study was to develop in silico protocols for the prediction of aqueous drug solubility. For this purpose, high quality solubility data of 85 drug-like compounds covering the total drug-like space as identified with the ChemGPS methodology were used. Two-dimensional molecular descriptors describing electron distribution, lipophilicity, flexibility, and size were calculated by Molconn-Z and Selma. Global minimum energy conformers were obtained by Monte Carlo simulations in MacroModel and three-dimensional descriptors of molecular surface area properties were calculated by Marea. PLS models were obtained by use of training and test sets. Both a global drug solubility model (R(2) = 0.80, RMSE(te) = 0.83) and subset specific models (after dividing the 85 compounds into acids, bases, ampholytes, and nonproteolytes) were generated. Furthermore, the final models were successful in predicting the solubility values of external test sets taken from the literature. The results showed that homologous series and subsets can be predicted with high accuracy from easily comprehensible models, whereas consensus modeling might be needed to predict the aqueous drug solubility of datasets with large structural diversity.(c) Ran, Y.; Yalkowsky, S. H.Prediction of Drug Solubility by the General Solubility Equation (GSE)J. Chem. Inf. Comput. Sci.2001, 41 (2) 354–357[ACS Full Text ], [CAS], Google Scholar19chttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXislKntA%253D%253D&md5=c15ffefdf6aacf5fdafd95e5adad7017Prediction of Drug Solubility by the General Solubility Equation (GSE)Journal of Chemical Information and Computer Sciences (2001), 41 (2), 354-357CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)The revised GSE proposed by Jain and Yalkowsky is used to est. the aq. soly. of a set of org. nonelectrolytes studied by Jorgensen and Duffy. The only inputs used in the GSE are the Celsius m.p. (MP) and the octanol water partition coeff. (Kow). These are generally known, easily measured, or easily calcd. The GSE does not utilize any fitted parameters. The av. abs. error for the 150 compds. is 0.43 compared to 0.56 with Jorgensen and Duffy's computational method, which utilizes 5 fitted parameters. Thus, the revised GSE is simpler and provides a more accurate estn. of aq. soly. of the same set of org. compds. It is also more accurate than the original version of the GSE.(d) Rytting, E.; Lentz, K.; Chen, X.-Q.; Qian, F.; Venkatesh, S.Aqueous and cosolvent solubility data for drug-like organic compoundsAAPS J.2005, 7 (1) E78–E105[Crossref], [PubMed], [CAS], Google Scholar19dhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXltlOqtr0%253D&md5=9be1e57d033ed321abbfbee0db943619Aqueous and cosolvent solubility data for drug-like organic compoundsRytting, Erik; Lentz, Kimberley A.; Chen, Xue-Qing; Qian, Feng; Venkatesh, SriniAAPS Journal (2005), 7 (1), E78-E105CODEN: AJAOB6; ISSN:1550-7416. (American Association of Pharmaceutical Scientists)A review. Recently 2 QSPR-based in silico models were developed in the authors' labs. to predict the aq. and non-aq. soly. of drug-like org. compds. For the intrinsic aq. soly. model, a set of 321 structurally diverse drugs was collected from literature for the anal. For the PEG 400 cosolvent model, exptl. data for 122 drugs were obtained by a uniform exptl. procedure at 4 vol. fractions of PEG 400 in water, 0%, 25%, 50%, and 75%. The drugs used in both models represent a wide range of compds., with log P values from -5 to 7.5, and mol. wts. from 100 to >600 g/mol. Because of the standardized procedure used to collect the cosolvent data and the careful assessment of quality used in obtaining literature data, both data sets have potential value for the scientific community for use in building various models that require exptl. soly. data.(e) Shareef, A.; Angove, M. J.; Wells, J. D.; Johnson, B. B.Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol AJ. Chem. Eng. Data2006, 51 (3) 879–881[ACS Full Text ], [CAS], Google Scholar19ehttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XjtFKisbk%253D&md5=e6eddfd0865cc5f1e0a2da81a92873e9Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol AShareef, Ali; Angove, Michael J.; Wells, John D.; Johnson, Bruce B.Journal of Chemical & Engineering Data (2006), 51 (3), 879-881CODEN: JCEAAX; ISSN:0021-9568. (American Chemical Society)The solubilities of three estrogenic hormones-estrone, 17β-estradiol, and 17α-ethynylestradiol - and the industrial pollutant bisphenol A were measured in water, dil. acid and alkali (pH 4 and 10, resp.), and aq. KNO3 (0.01 mol/L-1 and 0.1 mol/L-1). The concns. of satd. solns., after equilibration at (25.0 ± 0.5)° with excess solid for 4 days, were detd. by HPLC. Six replicate results were obtained for each solute-solvent pair and the coeff. of variation was in most cases <5%. The solubilities in pure water with std. deviations were estrone (1.30 ± 0.08) mg/L-1, 17β-estradiol (1.51 ± 0.04) mg/L-1, 17α-ethynylestradiol (9.20 ± 0.09) mg/L-1, and bisphenol A (300 ± 5) mg/L-1. The soly. of each of the hormones was unchanged between pH 4 and pH 7 but was greater at pH 10. At pH 7, the hormones became progressively less sol. as the ionic strength increased from 0.0 to 0.1 mol/L-1. By contrast the soly. of bisphenol A was essentially the same under all of the exptl. conditions tested.
- 20CrystalWeb unfortunately withdrawn in2013. http://cds.dl.ac.uk/cds/datasets/crys/cweb/cweb.html.Google ScholarThere is no corresponding record for this reference.
- 21Bruno, I. J.; Cole, J. C.; Edgington, P. R.; Kessler, M.; Macrae, C. F.; McCabe, P.; Pearson, J.; Taylor, R.New software for searching the Cambridge Structural Database and visualizing crystal structuresActa Crystallogr., Sect. B: Struct. Sci.2002, 58 (3 Part 1) 389–397[Crossref], [PubMed], [CAS], Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu78%253D&md5=b8cd5dddcd43067010fef6d60e37b3c2New software for searching the Cambridge Structural Database and visualizing crystal structuresBruno, Ian J.; Cole, Jason C.; Edgington, Paul R.; Kessler, Magnus; Macrae, Clare F.; McCabe, Patrick; Pearson, Jonathan; Taylor, RobinActa Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 389-397CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)Two new programs were developed for searching the Cambridge Structural Database (CSD) and visualizing database entries: ConQuest and Mercury. The former is a new search interface to the CSD, the latter is a high-performance crystal-structure visualizer with extensive facilities for exploring networks of intermol. contacts. Particular emphasis was placed on making the programs as intuitive as possible. Both ConQuest and Mercury run under Windows and various types of Unix, including Linux.
- 22Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E.The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and BioinformaticsJ. Chem. Inf. Comput. Sci.2003, 43 (2) 493–500[ACS Full Text ], [CAS], Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXhtVaktbg%253D&md5=afc8fd10783af301c73a8183727230bfThe Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and BioinformaticsSteinbeck, Christoph; Han, Yongquan; Kuhn, Stefan; Horlacher, Oliver; Luttmann, Edgar; Willighagen, EgonJournal of Chemical Information and Computer Sciences (2003), 43 (2), 493-500CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)The Chem. Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in mol. informatics, including 2D and 3D rendering of chem. structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
- 23Gupta, R. R.; Gifford, E. M.; Liston, T.; Waller, C. L.; Hohman, M.; Bunin, B. A.; Ekins, S.Using Open Source Computational Tools for Predicting Human Metabolic Stability and Additional Absorption, Distribution, Metabolism, Excretion, and Toxicity PropertiesDrug Metab. Dispos.2010, 38 (11) 2083–2090[Crossref], [PubMed], [CAS], Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVWqurjN&md5=7366b0c99868668e5b95f4e60093814fUsing open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity propertiesGupta, Rishi R.; Gifford, Eric M.; Liston, Ted; Waller, Chris L.; Hohman, Moses; Bunin, Barry A.; Ekins, SeanDrug Metabolism and Disposition (2010), 38 (11), 2083-2090CODEN: DMDSAI; ISSN:0090-9556. (American Society for Pharmacology and Experimental Therapeutics)Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source mol. descriptors [e.g., chem. development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary com. software. We initially evaluated open source descriptors and model building algorithms using a training set of approx. 50,000 mols. and a test set of approx. 25,000 mols. with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and pos. predicted value (PPV) = 0.64], equiv. to those of models built with com. Mol. Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 mols. and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we obsd. a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of com. software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.
- 24O’Boyle, N.Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChIJ. Cheminform.2012, 4 (1) 22[Crossref], [CAS], Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38botVKjsw%253D%253D&md5=c9107b5c0392711cee66979cfa7356c5Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChIJournal of cheminformatics (2012), 4 (1), 22 ISSN:.UNLABELLED: BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain - such as the development of a standard aromatic model for SMILES - the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.
- 25RSC ChemSpider. (accessed Feb. 8,2013) .Google ScholarThere is no corresponding record for this reference.
- 26Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; Soiland-Reyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M. P.; Sufi, S.; Goble, C.The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloudNucleic Acids Res.2013, 41 (W1) W557–W561[Crossref], [PubMed], Google ScholarThere is no corresponding record for this reference.
- 27Little, J. L.; Williams, A. J.; Pshenichnov, A.; Tkachenko, V.Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderJ. Am. Soc. Mass. Spectrom.2012, 23 (1) 179–185[Crossref], [PubMed], [CAS], Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XltVaitbw%253D&md5=4fd410551631762b48d179362a82a971Identification of 'known unknowns' utilizing accurate mass data and ChemSpiderLittle, James L.; Williams, Antony J.; Pshenichnov, Alexey; Tkachenko, ValeryJournal of the American Society for Mass Spectrometry (2012), 23 (1), 179-185CODEN: JAMSEF; ISSN:1044-0305. (Springer)In many cases, an unknown to an investigator is actually known in the chem. literature, a ref. database, or an internet resource. We refer to these types of compds. as 'known unknowns.'. ChemSpider is a very valuable internet database of known compds. useful in the identification of these types of compds. in com., environmental, forensic, and natural product samples. The database contains over 26 million entries from hundreds of data sources and is provided as a free resource to the community. Accurate mass mass spectrometry data is used to query the database by either elemental compn. or a monoisotopic mass. Searching by elemental compn. is the preferred approach. However, it is often difficult to det. a unique elemental compn. for compds. with mol. wts. greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results are refined by sorting the no. of refs. assocd. with each compd. in descending order. This raises the most useful candidates to the top of the list for further evaluation. These approaches were shown to be successful in identifying 'known unknowns' noted in our lab. and for compds. of interest to others.
- 28Goble, C. A.; Bhagat, J.; Aleksejevs, S.; Cruickshank, D.; Michaelides, D.; Newman, D.; Borkum, M.; Bechhofer, S.; Roos, M.; Li, P.; De Roure, D.myExperiment: A repository and social network for the sharing of bioinformatics workflowsNucleic Acids Res.2010, 38 (suppl 2) W677–W682[Crossref], [PubMed], [CAS], Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXotVWjsrs%253D&md5=201899f12a0151252eaebc638813171bmyExperiment: a repository and social network for the sharing of bioinformatics workflowsGoble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, DavidNucleic Acids Research (2010), 38 (Web Server), W677-W682CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)MyExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and anal., to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to [email protected]
- 29De Ferrari, L.Workflow Entry: From molecule name to SMILE and InchI using ChemSpider. http://www.myexperiment.org/workflows/3603.html. (accessed 10th February2014) .Google ScholarThere is no corresponding record for this reference.
- 30Griseofulvin. http://en.wikipedia.org/wiki/Griseofulvin (accessed 11th December 2012. SMILES source).Google ScholarThere is no corresponding record for this reference.
- 31Glipizide. http://en.wikipedia.org/wiki/Glipizide (accessed 11th December 2012. SMILES source).Google ScholarThere is no corresponding record for this reference.
- 32Stone, A.Distributed Multipole Analysis of Gaussian wavefunctions GDMA version 2.2.02. http://www-stone.ch.cam.ac.uk/documentation/gdma/manual.pdf (accessed Feb. 10, 2014).Google ScholarThere is no corresponding record for this reference.
- 33Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; shida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, N. J.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J.Gaussian 09, Gaussian, Inc: Wallingford, CT,2009.Google ScholarThere is no corresponding record for this reference.
- 34Stone, A. J.Distributed multipole analysis, or how to describe a molecular charge distributionChem. Phys. Lett.1981, 83 (2) 233–239[Crossref], [CAS], Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL3MXmt1yitbY%253D&md5=1a7ac695fa444006688ea669d36a3d55Distributed multipole analysis, or how to describe a molecular charge distributionChemical Physics Letters (1981), 83 (2), 233-9CODEN: CHPLBC; ISSN:0009-2614.A method of analyzing mol. wavefunctions is described. It can be regarded as an extension of Mulliken population anal., and can be used both to give a qual. or quant. picture of the mol. charge distribution, and in the accurate evaluation of mol. multipole moments of arbitrary order with negligible computational effort.
- 35Buckingham, R.The classical equation of state of gaseous helium, neon and argonProc. R. Soc. Lon. Ser-A1938, 168 (933) 264–283[Crossref], Google ScholarThere is no corresponding record for this reference.
- 36Gavezzotti, A.; Filippini, G.Theoretical Aspects and Computer Modeling.; Gavezzotti, A., Ed. Wiley and Sons: Chichester,1997; pp 61–97.Google ScholarThere is no corresponding record for this reference.
- 37Marenich, A. V.; Cramer, C. J.; Truhlar, D. G.Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface TensionsJ. Phys. Chem. B2009, 113 (18) 6378–6396[ACS Full Text ], [CAS], Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXksV2is74%253D&md5=54931a64c70d28445ee53876a8b1a4b9Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface TensionsMarenich, Aleksandr V.; Cramer, Christopher J.; Truhlar, Donald G.Journal of Physical Chemistry B (2009), 113 (18), 6378-6396CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)We present a new continuum solvation model based on the quantum mech. charge d. of a solute mol. interacting with a continuum description of the solvent. The model is called SMD, where the 'D' stands for 'd.' to denote that the full solute electron d. is used without defining partial at. charges. 'Continuum' denotes that the solvent is not represented explicitly but rather as a dielec. medium with surface tension at the solute-solvent boundary. SMD is a universal solvation model, where 'universal' denotes its applicability to any charged or uncharged solute in any solvent or liq. medium for which a few key descriptors are known (in particular, dielec. const., refractive index, bulk surface tension, and acidity and basicity parameters). The model separates the observable solvation free energy into two main components. The first component is the bulk electrostatic contribution arising from a self-consistent reaction field treatment that involves the soln. of the nonhomogeneous Poisson equation for electrostatics in terms of the integral-equation-formalism polarizable continuum model (IEF-PCM). The cavities for the bulk electrostatic calcn. are defined by superpositions of nuclear-centered spheres. The second component is called the cavity-dispersion-solvent-structure term and is the contribution arising from short-range interactions between the solute and solvent mols. in the first solvation shell. This contribution is a sum of terms that are proportional (with geometry-dependent proportionality consts. called at. surface tensions) to the solvent-accessible surface areas of the individual atoms of the solute. The SMD model has been parametrized with a training set of 2821 solvation data including 112 aq. ionic solvation free energies, 220 solvation free energies for 166 ions in acetonitrile, methanol, and DMSO, 2346 solvation free energies for 318 neutral solutes in 91 solvents (90 nonaq. org. solvents and water), and 143 transfer free energies for 93 neutral solutes between water and 15 org. solvents. The elements present in the solutes are H, C, N, O, F, Si, P, S, Cl, and Br. The SMD model employs a single set of parameters (intrinsic at. Coulomb radii and at. surface tension coeffs.) optimized over six electronic structure methods: M05-2X/MIDI!6D, M05-2X/6-31G*, M05-2X/6-31+G**, M05-2X/cc-pVTZ, B3LYP/6-31G*, and HF/6-31G*. Although the SMD model has been parametrized using the IEF-PCM protocol for bulk electrostatics, it may also be employed with other algorithms for solving the nonhomogeneous Poisson equation for continuum solvation calcns. in which the solute is represented by its electron d. in real space. This includes, for example, the conductor-like screening algorithm. With the 6-31G* basis set, the SMD model achieves mean unsigned errors of 0.6-1.0 kcal/mol in the solvation free energies of tested neutrals and mean unsigned errors of 4 kcal/mol on av. for ions with either Gaussian03 or GAMESS.
- 38(a) Ben-Naim, A.Standard thermodynamics of transfer. Uses and misusesJ. Phys. Chem.1978, 82 (7) 792–803[ACS Full Text ], [CAS], Google Scholar38ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE1cXht1Ohtrs%253D&md5=7bc3f4c27e458daf5ba115dcd69092d6Standard thermodynamics of transfer. Uses and misusesJournal of Physical Chemistry (1978), 82 (7), 792-803CODEN: JPCHAX; ISSN:0022-3654.The std. free energy of transfer of a solute A between two solvents a and b is discussed at both a thermodn. and a statistical-mech. level. Whereas thermodn. alone cannot be used to choose the 'best' std. quantity, statistical mechanics can help to make such a choice. The std. free energy of transferrin A, ΔμA°, computed by using the no. d. (or molarity) scale has the following advantages: (1) it is the simplest and least ambiguous quantity; (2) it is the quantity that directly probes the difference in the solvation properties of the two solvents with respect to the solute A; (3) it can be used, without any change of notation, in any soln., not necessarily a dil. one, and including even pure A; (4) by straightforward thermodn. manipulations one obtains the entropy, enthalpy, vol. changes, etc., for the same process. All of these quantities have advantages similar to those indicated for the free-energy change. Because of the advantages of this particular choice of std. quantities, it is proposed to 'standardize' the use of the std. thermodn. quantities of transfer and refer to them as the local-std. quantities.(b) Ben-Naim, A.; Marcus, Y.Solvation thermodynamics of nonionic solutesJ. Phys. Chem.1984, 81 (4) 2016–2027[Crossref], [CAS], Google Scholar38bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXlvFyqsb4%253D&md5=48040de9341c4f18c98dc4c66266b017Ben-Naim, A.; Marcus, Y.Journal of Chemical Physics (1984), 81 (4), 2016-27CODEN: JCPSA6; ISSN:0021-9606.A generalized process of solvation is defined. It is argued that the thermodn. of this solvation process is more informative as compared with other processes suggested before. Numerical examples are presented and compared with some recently published related data.
- 39Howley, T.; Madden, M. G.; O’Connell, M.-L.; Ryder, A. G.The effect of principal component analysis on machine learning accuracy with high-dimensional spectral dataKnowl.-Based Syst.2006, 19 (5) 363–370[Crossref], Google ScholarThere is no corresponding record for this reference.
- 40Wold, H.Partial Least Squares (PLS) Regression2003, 1–7Google ScholarThere is no corresponding record for this reference.
- 41(a) Abdi, H.Partial Least Squares (PLS) Regression2003, 1–7Google ScholarThere is no corresponding record for this reference.(b) Wold, S.; Sjöström, M.; Eriksson, L.PLS-regression: A basic tool of chemometricsChemometr. Intell. Lab.2001, 58 (2) 109–130[Crossref], [CAS], Google Scholar41bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXotF2mtLw%253D&md5=2d7fd1e946600e138ac92699ebcc7e29Wold, Svante; Sjostrom, Michael; Eriksson, LennartChemometrics and Intelligent Laboratory Systems (2001), 58 (2), 109-130CODEN: CILSEN; ISSN:0169-7439. (Elsevier Science B.V.)A review on PLS-regression (PLSR) as a std. tool in chemometrics and used in chem. and engineering. The underlying model and its assumption and commonly used diagnostics are discussed, together with the interpretation of resulting parameters. Two examples are used as illustrations: first, a Quant. Structure-Activity Relationship (QSAR)/Quant. Structure Property Relationship (QSPR) data set of peptides is used to outline the development, interpretation, and refinement of a PLSR model. Second, a data set from the manufg. of recycled paper is analyzed to illustrate time series modeling of process data by means of PLSR and time-lagged X-variables.(c) Mevik, B.; Wehrens, R.The pls Package: Principal Component and Partial Least Squares Regression in RJ Stat Softw.2007, 18 (2) 1–24Google ScholarThere is no corresponding record for this reference.
- 42(a) Palmer, D. S.; O’Boyle, N. M.; Glen, R. C.; Mitchell, J. B. O.Random Forest Models To Predict Aqueous SolubilityJ. Chem. Inf. Model2006, 47 (1) 150–158[ACS Full Text ], Google ScholarThere is no corresponding record for this reference.(b) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P.Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingJ. Chem. Inf. Comput. Sci.2003, 43 (6) 1947–1958[ACS Full Text ], [CAS], Google Scholar42bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXos1Wiu7s%253D&md5=dea7867551ec30260b0091b90593a660Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingSvetnik, Vladimir; Liaw, Andy; Tong, Christopher; Culberson, J. Christopher; Sheridan, Robert P.; Feuston, Bradley P.Journal of Chemical Information and Computer Sciences (2003), 43 (6), 1947-1958CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compd.'s quant. or categorical biol. activity based on a quant. description of the compd.'s mol. structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. The authors built predictive models for six cheminformatics data sets. The authors anal. demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. The authors also present three addnl. features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compd. similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
- 43Breiman, L.Random ForestsMach. Learning2001, 45 (1) 5–32[Crossref], Google ScholarThere is no corresponding record for this reference.
- 44(a) Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z.; Chen, X.; Li, H.-D.Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machineJ. Chemometr.2010, 24 (9) 584–595Google ScholarThere is no corresponding record for this reference.(b) Vapnik, V. N.An overview of statistical learning theoryIEEE Trans. Neural Netw.1999, 10 (5) 988–999[Crossref], [PubMed], [CAS], Google Scholar44bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1c%252FpsFSqtA%253D%253D&md5=d4e24c4899519f0c21087b610e28c849Vapnik V NIEEE transactions on neural networks (1999), 10 (5), 988-99 ISSN:1045-9227.Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
- 45Hu, S. In R2 Vs. r2, SCEA/ISPA Conference,2008; pp 1–15.Google ScholarThere is no corresponding record for this reference.
- 46Menke, J.; Martinez, T. R.In Using permutations instead of student’s t distribution for p-values in paired-difference algorithm comparisons, IEEE IJCNN, July 25–29, 2004;2004; Vol. 2, pp 1331–1335.Google ScholarThere is no corresponding record for this reference.
- 47Nath, N.; Mitchell, J. B. O.Is EC class predictable from reaction mechanism?BMC Bioinformatics2012, 13 (1) 60[Crossref], [PubMed], [CAS], Google Scholar47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38rnslKjtA%253D%253D&md5=c3f196743c10b505f74b4528c839c4dcNath Neetika; Mitchell John B OBACKGROUND: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. RESULTS: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. CONCLUSIONS: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways.The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.
- 48Kuhn, M.Variable Importance Using The caret Package 2012; Available via the Internet at http://cran.open-source-solution.org/web/packages/caret/vignettes/caretVarImp.pdf, accessed Feb. 10,2014.Google ScholarThere is no corresponding record for this reference.
- 49Kuhn, M.Variable Importance Using The caret Package2010, 1–7Google ScholarThere is no corresponding record for this reference.
- 50Varma, S.; Simon, R.Bias in error estimation when using cross-validation for model selectionBMC Bioinform.2006, 7 (1) 91[Crossref], [PubMed], [CAS], Google Scholar50https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD287ktFKlsA%253D%253D&md5=6fce0c91e4624476b4134dd4545af4ceBias in error estimation when using cross-validation for model selectionBMC bioinformatics (2006), 7 (), 91 ISSN:.BACKGROUND: Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. RESULTS: We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these 'null' datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With 'null' and 'non null' (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the 'null' datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training and 'non-null' data distributions. CONCLUSION: We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
- 51Simon, R. M.; Subramanian, J.; Li, M.-C.; Menezes, S.Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional dataBrief. Bioinform.2011, 12 (3) 203–214[Crossref], [PubMed], [CAS], Google Scholar51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MvoslSqsg%253D%253D&md5=a1ba7c32d7741dfb57f6dd2564511bb6Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional dataSimon Richard M; Subramanian Jyothi; Li Ming-Chung; Menezes SupriyaBriefings in bioinformatics (2011), 12 (3), 203-14 ISSN:.Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell's concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.
- 52R Development Core Team. R: A language and environment for statistical computing; R Foundation for Statistical Computing, Vienna, Austria,2011.Google ScholarThere is no corresponding record for this reference.
- 53(a) Kuhn, M.Building Predictive Models in R Using the caret PackageJ. Stat. Software2008, 28, 1–26[Crossref], [PubMed], Google ScholarThere is no corresponding record for this reference.(b) Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.R Core Team. caret: Classification and Regression Training. R Package “caret”. http://CRAN.R-project.org/package=caret.Google ScholarThere is no corresponding record for this reference.
- 54Walters, W. P.Modeling, Informatics, and the Quest for ReproducibilityJ. Chem. Inf. Model2013, 53 (7) 1529–1530[ACS Full Text ], [CAS], Google Scholar54https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpt1ylsLo%253D&md5=99ec193c7cc97cee6f0b2cf820454ca4Modeling, Informatics, and the Quest for ReproducibilityJournal of Chemical Information and Modeling (2013), 53 (7), 1529-1530CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. There is no doubt that papers published in the Journal of Chem. Information and Modeling, and related journals, provide valuable scientific information. However, it is often difficult to reproduce the work described in mol. modeling and chemoinformatics papers. In many cases the software described in the paper is not readily available, in other cases the supporting information is not provided in an accessible format. To date, the major journals in the fields of mol. modeling and chemoinformatics have not established guidelines for reproducible research. This letter provides an overview of the reproducibility challenges facing our field and suggests some guidelines for improving the reproducibility of published work.
- 55(a) Dearden, J. C.In silico prediction of aqueous solubilityExpert Opin. Drug Discovery2006, 1 (1) 31–52[Crossref], [PubMed], [CAS], Google Scholar55ahttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtVChtL%252FO&md5=1daa383519ba4ffc887c773847d8a656Dearden, John C.Expert Opinion on Drug Discovery (2006), 1 (1), 31-52CODEN: EODDBX; ISSN:1746-0441. (Informa Healthcare)A review. The fundamentals of aq. soly., and the factors that affect it, are briefly outlined, followed by a short introduction to quant. structure-property relationships. Early (pre-1990) work on aq. soly. prediction is summarized, and a more detailed presentation and crit. discussion are given of the results of most, if not all, of those published in silico prediction studies from 1990 onwards that have used diverse training sets. A table is presented of a no. of studies that have used a 21-compd. test set of drugs and pesticides to validate their aq. soly. models. Finally, the results are given of a test of 15 com. available software programs for aq. soly. prediction, using a test set of 122 drugs with accurately measured aq. solubilities.(b) Jorgensen, W. L.; Duffy, E. M.Prediction of drug solubility from structureAdv. Drug Delivery Rev.2002, 54 (3) 355–366[Crossref], [PubMed], [CAS], Google Scholar55bhttps://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xitlartbc%253D&md5=bc749286d56bf55c26d25b70806217e1Jorgensen, William L.; Duffy, Erin M.Advanced Drug Delivery Reviews (2002), 54 (3), 355-366CODEN: ADDREP; ISSN:0169-409X. (Elsevier Science Ireland Ltd.)A review with refs. The aq. soly. of a drug is an important factor affecting its bioavailability. Numerous computational methods have been developed for the prediction of aq. soly. from a compd.'s structure. A review is provided of the methodol. and quality of results for the most useful procedures including the model implemented in the QikProp program. Viable methods now exist for predictions with <1 log unit uncertainty, which is adequate for prescreening synthetic candidates or design of combinatorial libraries. Further progress with predictive methods would require an exptl. database of highly accurate solubilities for a large, diverse collection of drug-like mols.
- 56Lusci, A.; Pollastri, G.; Baldi, P.Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like MoleculesJ. Chem. Inf. Model.2013, 53, 1563–1575[ACS Full Text ], [CAS], Google Scholar56https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpvVGht7g%253D&md5=d51e537fea2f1f53ea5013224ee1cdc9Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like MoleculesLusci, Alessandro; Pollastri, Gianluca; Baldi, PierreJournal of Chemical Information and Modeling (2013), 53 (7), 1563-1575CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. Shallow machine learning methods have been applied to chemoinformatics problems with some success. As more data becomes available and more complex problems are tackled, deep machine learning methods may also become useful. Here, we present a brief overview of deep learning methods and show in particular how recursive neural network approaches can be applied to the problem of predicting mol. properties. However, mols. are typically described by undirected cyclic graphs, while recursive approaches typically use directed acyclic graphs. Thus, we develop methods to address this discrepancy, essentially by considering an ensemble of recursive neural networks assocd. with all possible vertex-centered acyclic orientations of the mol. graph. One advantage of this approach is that it relies only minimally on the identification of suitable mol. descriptors because suitable representations are learned automatically from the data. Several variants of this approach are applied to the problem of predicting aq. soly. and tested on four benchmark data sets. Exptl. results show that the performance of the deep learning methods matches or exceeds the performance of other state-of-the-art methods according to several evaluation metrics and expose the fundamental limitations arising from training sets that are too small or too noisy. A Web-based predictor, AquaSol, is available online through the ChemDB portal (cdb.ics.uci.edu) together with addnl. material.
- 57Wang, R.; Gao, Y.; Lai, L.Calculating partition coefficient by atom-additive methodPerspect. Drug Discovery Des.2000, 19 (1) 47–66[Crossref], [CAS], Google Scholar57https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXnslaitbg%253D&md5=90d29fc6b268c9af644b244eb0bc6912Calculating partition coefficient by atom-additive methodPerspectives in Drug Discovery and Design (2000), 19 (Hydrophobicity and Solvation in Drug Design, Pt. 3), 47-66CODEN: PDDDEC; ISSN:0928-2866. (Kluwer Academic Publishers)A new atom-additive method is presented for calcg. octanol/H2O partition coeff. (log P) of org. compds. The method, XLOGP v2.0, gives log P values by summing the contributions of component atoms and correction factors. Altogether 90 atom types are used to classify C, N, O, S, P and halogen atoms, and 10 correction factors are used for some special substructures. The contributions of each atom type and correction factor are derived by multivariate regression anal. of 1853 org. compds. with known exptl. log P values. The correlation coeff. (r) for fitting the whole set is 0.973 and the std. deviation (s) is 0.349 log units. Comparison of various log P calcn. procedures demonstrates that method gives much better results than other atom-additive approaches and is at least comparable to fragmental approaches. Because of the simple methodol., the missing fragment problem does not occur in method.
- 58Kier, L. B.; Hall, L. H.Molecular Connectivity in Chemistry and Drug Research; Academic Press: New York,1976.Google ScholarThere is no corresponding record for this reference.
- 59Moreau, G.; Broto, P.The autocorrelation of a topological structure: A new molecular descriptorNew J. Chem.1980, 359–360Google ScholarThere is no corresponding record for this reference.
- 60Randic, M.On molecular identification numbersJ. Chem. Inf. Comput. Sci.1984, 24 (3) 164–175[ACS Full Text ], [CAS], Google Scholar60https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXkvV2rtLs%253D&md5=355a2f5987cde26550a10bf7ef475d9cRandic, MilanJournal of Chemical Information and Computer Sciences (1984), 24 (3), 164-75CODEN: JCISD8; ISSN:0095-2338.The assignment of identification nos. to mols. that are easy to deriv. and have structural significance is discussed and a scheme for assignment is outlined. Output of the ALL-PATH program for study of mol. topol. from graphs with multiple connections is presented which includes weighing factors for individual bonds. Uniqueness and structural significance of the identification nos. are examd. and mol. graphs and identification nos. of some ring compds., terpenes, and some other compds. are presented.
- 61CDK Descriptor Summary (2011–05–28). http://pele.farmbio.uu.se/nightly-1.2.x/dnames.html, accessed Feb. 10,2014.Google ScholarThere is no corresponding record for this reference.
- 62Hewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.In Silico Prediction of Aqueous Solubility: The Solubility ChallengeJ. Chem. Inf. Model.2009, 49 (11) 2572–2587[ACS Full Text ], [CAS], Google Scholar62https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtleqtLvE&md5=7983f8d3133655a4d8967d5b7e9fbdbdIn Silico Prediction of Aqueous Solubility: The Solubility ChallengeHewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.Journal of Chemical Information and Modeling (2009), 49 (11), 2572-2587CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The dissoln. of a chem. into water is a process fundamental to both chem. and biol. The persistence of a chem. within the environment and the effects of a chem. within the body are dependent primarily upon aq. soly. With the well-documented limitations hindering the accurate exptl. detn. of aq. soly., the utilization of predictive methods have been widely investigated and employed. The setting of a soly. challenge by this journal proved an excellent opportunity to explore several different modeling methods, utilizing a supplied dataset of high-quality aq. soly. measurements. Four contrasting approaches (simple linear regression, artificial neural networks, category formation, and available in silico models) were utilized within our lab. and the quality of these predictions was assessed. These were chosen to span the multitude of modeling methods now in use, while also allowing for the evaluation of existing com. soly. models. The conclusions of this study were surprising, in that a simple linear regression approach proved to be superior over more-complex modeling methods. Possible explanations for this observation are discussed and also recommendations are made for future soly. prediction.
- 63Tsvetkova, B.; Pencheva, I.; Zlatkov, A.; Peikov, P.High Performance Liquid Chromatographic Assay of Indomethacin and its Related Substances in Tablet Dosage FormsInt. J. Pharm. Pharm. Sci.2012, 4 (Supplement 3) 549–552[CAS], Google Scholar63https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XptFamurw%253D&md5=dc420050b9fdb9f5e917016ff64b06a4High performance liquid chromatographic assay of indomethacin and its related substances in tablet dosage formsTsvetkova, Boyka; Pencheva, Ivanka; Zlatkov, Alexander; Peikov, PlamenInternational Journal of Pharmacy and Pharmaceutical Sciences (2012), 4 (Suppl. 3), 549-552CODEN: IJPPKB; ISSN:0975-1491. (International Journal of Pharmacy and Pharmaceutical Sciences)A reversed-phase high performance liq. chromatog. (RP-HPLC) method with UV detection was proposed for sepn. of indomethacin and its impurities from tablet dosage forms. The best sepn. was achieved on a LiChrosorb C18, 250 mm × 4.6 mm, 5 μm column at a detector wavelength of 240 nm. The utilization of mixt. of 40 vols. 0.5% vol./vol. orthophosphoric acid, 20 vols. of methanol and 40 vols. of acetonitrile as mobile phase with a flow rate of 2 mL/min enabled acceptable resoln. of indomethacin, in large excess, from possible impurities, in a short elution time (9 min). Anal. parameters linearity, accuracy, precision and specificity were detd. by validation procedure and found to be satisfactory. Overall, the proposed method was found to be simple, rapid, precise and accurate for quality control of indomethacin and its impurities in dosage forms and in raw materials. In this work the kinetic investigation of the alk. hydrolysis of indomethacin was also carried out. The degrdn. reaction was monitored by means of HPLC method developed and was found to follow first-order kinetics. The rate const. and half-life of the hydrolytic decompn. were estd.
Supporting Information
ARTICLE SECTIONSJump ToInformatics_Solubilty_datasets_and_scripts.zip, including R codes, Bash scripts, Python scripts, macro (.xlsb), DLS-100.csv and Solubility_Challenge_dataset.xlsx. Lathay di chadar. DLS-100.csv contains experimental log S values, references, SMILES, sources of smiles, CSD refcodes, molecules names, InChI and Chemspider numbers. SI_document.pdf: Structure data, 2D images of the molecular structures, experimental log S values, CSD refcodes, R2, statistical significance, variable importance. This material is available free of charge via the Internet at http://pubs.acs.org. All scripts and datasets used in this work are available for download from the Mitchell Group web server (http://chemistry.st-andrews.ac.uk/staff/jbom/group/Informatics_Solubility.html, as well as in the Supporting Information.
- pdf
Original author(s) | Christoph Steinbeck, Egon Willighagen, Dan Gezelter |
---|---|
Developer(s) | The CDK Project |
Initial release | 11 May 2001; 18 years ago[1] |
Stable release | 2.2[2](October 30, 2018; 9 months ago)[±] |
Preview release | 1.5.14 (October 9, 2016; 2 years ago)[±] |
Repository | github.com/cdk/cdk |
Written in | Java |
Operating system | Windows, Linux, Unix, macOS |
Platform | IA-32, x86-64 |
Available in | English |
Type | Chemoinformatics, molecular modelling, bioinformatics |
License | LGPL 2.0 |
Website | cdk.github.io |
The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics.[3][4] It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.
- 3Major features
History[edit]
The CDK was created by Christoph Steinbeck, Egon Willighagen and Dan Gezelter, then developers of Jmol and JChemPaint, to provide a common code base, on 27–29 September 2000 at the University of Notre Dame. The first source code release was made on 11 May 2011.[5] Since then more than 100 people have contributed to the project,[6] leading to a rich set of functions, as given below. Between 2004 and 2007, CDK News was the project's newsletter of which all articles are available from a public archive.[7] Due to an unsteady rate of contributions, the newsletter was put on hold.
Language | English |
---|---|
Edited by | Egon Willighagen, Christoph Steinbeck |
Publication details | |
Publication history | 2004-2007 |
Standard abbreviations | |
CDK News | |
Indexing | |
ISSN | 1614-7553 |
Later, unit testing, code quality checking, and Javadoc validation was introduced. Rajarshi Guha developed a nightly build system, named Nightly, which is still operating at Uppsala University.[8] In 2012, the project became a support of the InChI Trust, to encourage continued development. The library uses JNI-InChI[9] to generate International Chemical Identifiers (InChIs).[10]In April 2013, John Mayfield (né May) joined the ranks of release managers of the CDK, to handle the development branch.[11]
Library[edit]
The CDK is a library, instead of a user program. However, it has been integrated into various environments to make its functions available. CDK is currently used in several applications, including the programming language R,[12] CDK-Taverna (a Taverna workbench plugin),[13]Bioclipse, PaDEL,[14] and Cinfony.[15] Also, CDK extensions exist for Konstanz Information Miner (KNIME)[16] and for Excel, called LICSS ([1]).[17]
Our members download database is updated on a daily basis. We currently have 443,033 direct downloads including categories such as: software, movies, games, tv, adult movies, music, ebooks, apps and much more. Take advantage of our limited time offer and gain access to unlimited downloads for $3.99/mo! Juki pm 1 keygen software keys. This special offer gives you full member access to our downloads. That's how much we trust our unbeatable service.
In 2008, bits of GPL-licensed code were removed from the library. While those code bits were independent from the main CDK library, and no copylefting was involved, to reduce confusions among users, the ChemoJava project was instantiated.[18]
Major features[edit]
Chemoinformatics[edit]
- 2D molecule editor and generator
- 3D geometry generation
- ring finding[19][20]
- substructure search using exact structures and Smiles arbitrary target specification (SMARTS) like query language
- QSAR descriptor calculation[21]
- fingerprint calculation, including the ECFP and FCFP fingerprints[22]
- force field calculations
- many input-output chemical file formats, including simplified molecular-input line-entry system (SMILES), Chemical Markup Language (CML), and chemical table file (MDL)
- structure generators[23]
- International Chemical Identifier support, via JNI-InChI
Bioinformatics[edit]
- protein active site detection
- cognate ligand detection[24]
- metabolite identification[25]
- pathway databases
- 2D and 3D protein descriptors[26]
General[edit]
- Python wrapper; see Cinfony
- Ruby wrapper
- active user community
See also[edit]
- Bioclipse – an Eclipse–RCP based chemo-bioinformatics workbench
- JChemPaint – Java 2D molecule editor, applet and application
- Jmol – Java 3D renderer, applet and application
- JOELib – Java version of Open Babel, OELib
References[edit]
- ^https://sourceforge.net/projects/cdk/files/OldFiles/
- ^'cdk/cdk: CDK 2.2'. ZENODO. 2018-10-30. doi:10.5281/zenodo.1474247.
- ^Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. L. (2003). 'The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics'. Journal of Chemical Information and Computer Sciences. 43 (2): 493–500. doi:10.1021/ci025584y. PMC4901983. PMID12653513.
- ^Willighagen, Egon L.; Mayfield, John W.; Alvarsson, Jonathan; Berg, Arvid; Carlsson, Lars; Jeliazkova, Nina; Kuhn, Stefan; Pluskal, Tomáš; Rojas-Chertó, Miquel (2017-06-06). 'The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching'. Journal of Cheminformatics. 9 (1): 33. doi:10.1186/s13321-017-0220-4. ISSN1758-2946. PMC5461230. PMID29086040.
- ^http://sourceforge.net/projects/cdk/files/OldFiles/
- ^https://github.com/cdk/cdk/blob/master/AUTHORS.txt
- ^https://sourceforge.net/projects/cdk/files/CDK%20News/
- ^'Archived copy'. Archived from the original on 2013-05-24. Retrieved 2013-08-05.CS1 maint: Archived copy as title (link)
- ^http://jni-inchi.sourceforge.net/
- ^Spjuth, O.; Berg, A.; Adams, S.; Willighagen, E. L. (2013). 'Applications of the InChI in cheminformatics with the CDK and Bioclipse'. Journal of Cheminformatics. 5 (1): 14. doi:10.1186/1758-2946-5-14. PMC3674901. PMID23497723.
- ^http://chem-bla-ics.blogspot.nl/2013/04/john-may-is-now-release-manager-of-cdk.html
- ^Guha, R. (2007). 'Chemical informatics functionality in R'. Journal of Statistical Software. 18 (5): 1–16. doi:10.18637/jss.v018.i05.
- ^Kuhn, T.; Willighagen, E. L.; Zielesny, A.; Steinbeck, C. (2010). 'CDK-Taverna: an open workflow environment for cheminformatics'. BMC Bioinformatics. 11: 159. doi:10.1186/1471-2105-11-159. PMC2862046. PMID20346188.
- ^Yap, C. W. (2011). 'PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints'. Journal of Computational Chemistry. 32 (7): 1466–74. doi:10.1002/jcc.21707. PMID21425294.
- ^O'Boyle, Noel M (2008). 'Cinfony – combining Open Source cheminformatics toolkits behind a common interface'. Chemistry Central Journal. 2 (1): 24. doi:10.1186/1752-153X-2-24. PMC2646723. PMID19055766.
- ^Beisken, S.; Meinl, T.; Wiswedel, B.; De Figueiredo, L. F.; Berthold, M.; Steinbeck, C. (2013). 'KNIME-CDK: Workflow-driven Cheminformatics'. BMC Bioinformatics. 14: 257. doi:10.1186/1471-2105-14-257. PMC3765822. PMID24103053.
- ^Lawson, K. R.; Lawson, J. (2012). 'LICSS - a chemical spreadsheet in microsoft excel'. Journal of Cheminformatics. 4 (1): 3. doi:10.1186/1758-2946-4-3. PMC3310842. PMID22301088.
- ^ChemoJava
- ^Berger, Franziska; Flamm, Christoph; Gleiss, Petra M.; Leydold, Josef; Stadler, Peter F. (March 2004). 'Counterexamples in Chemical Ring Perception'. Journal of Chemical Information and Computer Sciences. 44 (2): 323–331. doi:10.1021/ci030405d. PMID15032507.
- ^May, John W; Steinbeck, Christoph (2014). 'Efficient ring perception for the Chemistry Development Kit'. Journal of Cheminformatics. 6 (1): 3. doi:10.1186/1758-2946-6-3. PMC3922685. PMID24479757.
- ^Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E. L. (2006). 'Recent developments of the chemistry development kit (CDK) — an open-source java library for chemo- and bioinformatics'. Curr. Pharm. Des. 12 (17): 2111–20. doi:10.2174/138161206777585274. PMID16796559. Archived from the original on 2011-07-25.
Guangli, M.; Yiyu, C. (2006). 'Predicting Caco-2 permeability using support vector machine and chemistry development kit'. J Pharm Pharm Sci. 9 (2): 210–21. PMID16959190. - ^Clark, Alex M; Sarker, Malabika; Ekins, Sean (2014). 'New target prediction and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0'. Journal of Cheminformatics. 6: 38. doi:10.1186/s13321-014-0038-2. PMC4190048. PMID25302078.
- ^Peironcely, J. E.; Rojas-Chertó, M.; Fichera, D.; Reijmers, T.; Coulier, L.; Faulon, J. L.; Hankemeier, T. (2012). 'OMG: Open molecule generator'. Journal of Cheminformatics. 4 (1): 21. doi:10.1186/1758-2946-4-21. PMC3558358. PMID22985496.
- ^Bashton, M.; Nobeli, I.; Thornton, J. M. (2006). 'Cognate Ligand Domain Mapping for Enzymes'. Journal of Molecular Biology. 364 (4): 836–52. doi:10.1016/j.jmb.2006.09.041. PMID17034815.
- ^Rojas-Cherto, M.; Kasper, P. T.; Willighagen, E. L.; Vreeken, R. J.; Hankemeier, T.; Reijmers, T. H. (2011). 'Elemental composition determination based on MSn'. Bioinformatics. 27 (17): 2376–2383. doi:10.1093/bioinformatics/btr409. PMID21757467.
- ^Ruiz-Blanco, Yasser B; Paz, Waldo; Green, James; Marrero-Ponce, Yovani (2015). 'ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins'. BMC Bioinformatics. 16: 162. doi:10.1186/s12859-015-0586-0. PMC4432771. PMID25982853.
External links[edit]
Molecular Descriptors For Chemoinformatics Pdf To Excel
- CDK Wiki – the community wiki
- Planet CDK - a blog planet