Molecular Descriptors For Cheminformatics Pdf To Excel

Biomedical Sciences Research Complex and EaStCHEM, School of Chemistry, Purdie Building, University of St. Andrews, North Haugh, St. Andrews, Scotland, KY16 9ST, United Kingdom
Supporting Info (2)»Supporting Information Supporting Information
Download Hi-Res ImageDownload to MS-PowerPointCite This:J. Chem. Inf. Model.2014543844-856

Abstract

We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ∼1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.

ARTICLE SECTIONS
Jump To
Poor aqueous solubility remains a major cause of attrition in the drug development process. Despite theoretical developments, the solubility of druglike molecules still eludes truly quantitative computation. In recent work,(1) we have shown that accurate first-principles calculation is now becoming possible, provided that both the crystalline and solution phases are described by accurate theoretical models. Before this, energy terms from a computed thermodynamic cycle (see Figure 1) had been used as descriptors in a multilinear regression model for intrinsic solubility, delivering accuracy much better than from direct computation and comparable with the leading informatics approaches.(2)
Since then, sophisticated machine learning techniques have been applied to many problems in the chemical sciences, while, as we have shown,(2, 3) the accuracy of direct computation of hydration energies and solubilities has improved significantly. This led us to revisit the idea of hybrid informatics-theoretical models for solubility.
Cheminformatics methods have seen widespread use for property prediction, particularly in the pharmaceutical industry where they have been applied to; aqueous solubility, melting point, boiling point, log P (where P is the partition coefficient between octanol and water), binding affinities, and toxicology predictions.(4) Such methods are usually much quicker than pure chemical theory calculations, making high throughput virtual screening (HTVS) a possibility. Some methods have become accessible and easy-to-use web-based tools.(5) However, informatics methods suffer from the difficulty of decomposing the results into intuitive, physically meaningful understanding and cannot reflect the physical details of the system. To understand the underlying physics and chemistry, it is necessary to carry out an atomistic physics-based calculation.
Many chemical theory methods have been developed to specifically address one phase. The exact nature of the theory varies between these methods and the phase being studied. Crystal structures are often modeled using one of the lattice energy minimizing simulation methods,(6) plane-wave density functional theory (DFT) methods,(7) or periodic DFT using atom-centered basis sets.(8) The latter two methods come from a quantum-chemical standpoint. The results are often very good but have a high computational cost. The simulation methods often contain empirical parameters, which lowers the cost of these methods significantly, compared to DFT.

Molecular Descriptors For Chemoinformatics Pdf To Excel Free

Popular solution-phase models include atomistic simulation methods based on molecular mechanics and dynamics,(9) quantum-mechanical implicit solvation methods (such as the polarizable continuum model (PCM)),(10) and “hybrid” models (such as the classical statistical mechanics-based reference interaction site model (RISM)(11) or hybrid quantum mechanics/molecular mechanics (QM/MM) methods(12)). These methods have the inherent problem for industrial and drug discovery applications of being significantly more computationally intensive than cheminformatics models, which makes high-throughput computation infeasible. The closest thing to an exception among contemporary theoretical models may be 1D RISM, which requires only a few minutes of calculation time per compound and has been previously combined with cheminformatics to build the 1D-RISM/SDC method.(13)

Measured values for molecular hydrogen, methane and ethane served to derive fragment constants for carbon and hydrogen, free of obscuring interactions. For more complex hydrocarbons, whose measured values were not the sum of fragment values, the differences were defined in terms of correction factors (Tab.

By combining lower levels of theoretical chemistry with cheminformatics, we hope to produce results in good agreement with experiment, but at a lower cost than higher-level theoretical methods, and with higher accuracy than using cheminformatics descriptors alone.
ARTICLE SECTIONS
Jump To

Molecules and Solubility

A set of 100 broadly druglike organic molecules was assembled with the prerequisites that each molecule should have an available crystal structure in the Cambridge Structural Database (CSD)(14) and a well-documented aqueous intrinsic solubility in the literature. Where possible, we prefer experimental solubilities obtained with the CheqSol method,(15) which has been shown to give reproducible results with only small random errors. The possibility of significant systematic errors between different experimental methodologies remains an issue and may possibly limit the accuracy with which modeling-based studies can be validated.
A total of 122 potentially useful CheqSol solubilities were obtained from the two Solubility Challenge papers(16) and downloaded from the Web.(17) While noting that several corrections had previously been made, we also corrected or disambiguated the following names: amitriptyline, 5-bromogramine, 5,5-diphenylhydantoin, 4-hydroxybenzoic acid, nortriptyline, and phenanthroline. Of the 122 compounds, 38 had corresponding crystal structures and could be included in our DLS-100 dataset. Where a choice existed, we selected the solubility and crystal structure of the least soluble and, therefore, most stable polymorph. For druglike compounds with known crystal structures, one further CheqSol solubility was available from Palmer et al.(2) and two from Narasimham et al.(18) We sourced solubility data for an additional 59 compounds from other experimental methods.(19) This gave us a total data set of 100 molecules.
The crystal structures were obtained using either the CrystalWeb(20) interface or the ConQuest(21) interface. Crystal structures were selected on the basis of stability, preferring the polymorph with the lowest literature solubility or the lowest lattice energy according to our computations where polymorph-specific experimental information was not available. We also applied the additional pragmatic selection criterion that the asymmetric unit cell should contain only one molecule. Once structures were identified, they were downloaded in either the SHELX format (.res) or CSD legacy format (.dat).
We chose to use Chemistry Development Kit (CDK)(22) molecular descriptors in this study, because these descriptors do not require proprietary software and are applicable to solubility prediction.(23) The CDK is an open source cheminformatics Java library. In order to use the CDK molecular descriptors,(22) we required each of our chemical structures in SMILES format. As noted by O’Boyle,(24) SMILES can be ambiguous. We thus decided to use one principal source for SMILES records, selecting the well-annotated database ChemSpider.(25) Since we are modeling intrinsic solubility, we wish to describe the neutral form of the druglike compound. This remains the case even if a protonated or deprotonated charged form dominates at neutral pH or across the pH range of the CheqSol (or other) experiment. To obtain a SMILES string for each molecule in the DLS-100 dataset, we wrote a Taverna workflow,(26) which uses web services provided by the ChemSpider database.(25, 27) The workflow is freely available on the MyExperiment(28) repository at the following reference.(29) In five cases, we found the ChemSpider SMILES to correspond to an undesirable protonation state. Thus, we instead took the SMILES from the solubility challenge Web site(17) for cimetidine, pindolol, and phenobarbital, and from Wikipedia for griseofulvin(30) and glipizide.(31) Using the resulting 100 SMILES, we initially calculated all 268 available nonprotein CDK descriptors for each compound. We found that 145 of these descriptors were either undefined for 2D structures, or had the same value for all 100 compounds; their deletion left 123 remaining descriptors.

Crystal Structure and Gas-Phase Calculations

We took experimentally determined crystal structures of the compounds in our DLS-100 dataset as the initial input to our calculations. DMACRYS,(6) a periodic lattice simulation program, was used to perform the crystal structure minimizations and calculate vibrational contributions arising from the crystal. DMACRYS works in conjunction with the GDMA2(32) and Gausssian 09 (G09) programs.(33) The output of these calculations gives us the enthalpy of sublimation and crystal portion of the entropy of sublimation.
The selected crystal structures were input into DMACRYS, which was used to standardize the covalent bond lengths between hydrogens and heavy atoms, as the experimentally determined bond lengths are not accurate, because of the uncertainty in the hydrogen positions obtained by X-ray diffraction, before any calculations were run. Electrostatic interactions were calculated by multipole expansions(34) (obtained using GDMA2) of molecular charge distributions calculated at the MP2/6-31G** level using G09. Multipolar expansions up to hexadecapole were calculated. Intermolecular repulsion and dispersion were calculated by a Buckingham potential.(6, 35)
DMACRYS carries out a rigid-body minimization of the crystal structure, hence arriving at minimized lattice energies. This lattice energy can be converted to an enthalpy of sublimation by the following formula:

Enthalpy of sublimation:

(1)where Ulatt is the lattice energy (energy of the crystal assuming the crystal is static and at 0 K relative to infinitely separated molecules) and the −2RT term arises from lattice vibrational energy.(2, 36)The entropy of sublimation was calculated by:

Entropy of sublimation:

(2)where Srot is the rotational entropy in the gas phase and Strans is the entropy of translation in the gas phase. Scrys is the entropy of phonon vibrations within the crystal. The use of eq 3 makes these assumptions: (i) the rotational and translational entropy of the crystal is minimal, (ii) there is no change in electronic entropy between phases, and (iii) the intramolecular entropy is constant between the two phases. The crystal entropy is calculated by locating the frequencies of the phonon normal modes (lattice vibrations) at the gamma point. This is achieved using lattice dynamics, the results of which are used to calculate the Helmholtz free energy (see eqs S2 and S3 in the Supporting Information).

Gibbs free energy:

(3)
The coordinates of a single molecule were extracted from the minimized lattice and used as input for the gaseous optimization with G09. Optimizations were carried out at the M06-2X and HF levels of theory with a 6-31G* basis set. The gas-phase entropy values were calculated from statistical thermodynamics in G09. Finally, ΔGsub is calculated from the enthalpy and entropy of sublimation.

Solution-Phase Calculations

All solution-phase calculations were carried out with G09 using the Self-Consistent Reaction Field (SCRF) protocol. We selected the SMD (Solvation Model based on Density)(37) implicit solvent model based on previous work.(1) Although RISM yielded more-accurate absolute hydration energies than SMD in our recent work,(1) SMD generated a higher correlation coefficient against experimental results for hydration free energy prediction (R = 0.97 vs R = 0.93). Given the parametrized nature of our present model, correlation is more important than absolute agreement, and, hence, SMD is a suitable solvation model. Solution-phase calculations were carried out with the same methodologies as used in the gas-phase calculations, M06-2X/6-31G* and HF/6-31G*. Geometry optimization was again carried out, this time taking the gas-phase optimized structure as the starting point.
The SMD model is a parametrized implicit solvation model. SMD solves for the free energy of solution (ΔGhyd) as a sum of the electrostatic contributions and nonelectrostatic contributions. The electrostatic contributions are calculated by the solution of the nonhomogeneous Poisson equation;(23, 37) this equation is a second-order differential equation linking the electrostatic potential, dielectric constant, and charge distribution. The nonelectrostatic contributions of cavitation, dispersion, and solvent structure are calculated as a sum of atomic and molecular contributions using parameters inherent to the SMD method. SMD has been shown to provide significant improvements over some other implicit solvent models for datasets containing molecules similar to those used in this study.(1) The hydration free energy is given by eq 4,

Gibbs free energy of hydration:

(4)where Esolution is the total energy of the system in the SMD solvation model and Egaseous is the total energy of the system in a vacuum. Scheme 1 represents the workflow for making such predictions.

Standard States

Sublimation energies were calculated in the 1 atm standard state, which is the conventional standard for experimental sublimation energies to be quoted. However, solvation free energies are usually quoted in the Ben-Naim standard state of 1 mol/L. In this work, ΔG° corresponds to the 1 atm standard state, while ΔG* corresponds to the Ben-Naim 1 mol/L standard state (see Figure 2).(38) The difference between these two standard states is a constant energy value of 1.89 kcal/mol (7.91 kJ/mol). In this work, we calculate the sublimation free energy in the 1 atm standard state and then apply the correction to 1 mol/L in order to be consistent with the hydration free energy calculations; hence, ΔGsolu is in the 1 mol/L standard state for all predictions in this work.

Theoretical Log S Prediction

Our final solution free-energy prediction is then given as the sum of the predicted sublimation and hydration free energies:

Gibbs free energy of solution:

(5)
Therefore, we have two predictions for each molecule: The first method couples DMACRYS with G09 and the SMD solvation model at the HF/6-31G* level of theory. This model will be referred to as SMD(HF). The second method is DMACRYS coupled with G09 and the SMD solvation model at the M06-2X/6-31G* level of theory. This will be referred to as SMD(M06-2X).
For convenience of comparison with experimental values of solubility, we convert the free energy of solution to log S values, and all experimental solubility values to log S values:(6)Here, R is the universal gas constant and T is the absolute temperature (in Kelvin).
The conversion of experimental solubility to log S can be found in the Supporting Information (eq S7). Values for the full DLS-100 dataset, including SMILES and InChI, can be found in the Supporting Information (see zip file and dataset).

Informatics Models

To model the data, we use linear and machine learning regression models: partial least-squares regression, random forest and support vector regression. For reporting the predictive accuracy of these models, we averaged the RMSE of log S over a 10-fold cross-validation of the DLS-100 dataset. The cross-validation fulfils two purposes in this study: parameter optimization and evaluation of the accuracy of the models on unseen data. To ensure that each test fold of data is truly unseen, the parameter optimization is carried out in a separate layer of cross-validation within the training folds, as we will discuss below. In order to avoid overfitting, the data are preprocessed before building the predictive models.

Data Preprocessing

The use of multivariate data presents a danger of overfitting machine learning regression models; moreover, redundancy of attributes and correlation within the data add to the risk of reaching misleading conclusions.(39) To avoid such issues, we have used two normalization methods. One is the commonly used standardization method of variable scaling, equalizing the distributions of the variables by normalizing the mean and standard deviation of each column (variable).(40) The advantage of using this method is that it equalizes the prior importance of all the attributes. The second normalization method is principal component analysis (PCA), transforming the data into a smaller subspace where the new variables are uncorrelated with each other.(39) The PCA data transformation method deals with the redundancy of the data, and places emphasis on the variance of the data. The ability of each principal component to explain the data is measured according to the variance accounted for. Third, we have also fitted each model on the nonpreprocessed raw dataset, for comparison with the results of the two different scaling methods.

Machine Learning Regression Models

In this section, a summary of the regression models are presented; detailed explanations can be found in the Supporting Information.

Partial Least Squares Regression

The Partial Least Squares Regression (PLSR) model design is appropriate in a situation where there is no limit to the X variables or predictors, or where the sample size is small. Moreover, the PLSR model is also beneficial for analyzing strongly colinear and noisy data. The goal of a PLSR model is to predict the output variable Y from the input variables X and to describe the structure of X. For this, PLSR finds a set of components from X that are relevant to Y; these components are known as latent variables. The intention of PLSR is to capture the information in the X-variables that is most useful to predict Y.(41) A graphical representation is supplied in the Supporting Information Figure S1(A).

Random Forest Regression

Random Forest (RF), a method for classification and regression analysis, has very attractive properties that have previously been found to improve the prediction of quantitative structure–activity relationship (QSAR) data.(42) An ensemble of many decision trees constitutes a random forest, and each is tree constructed using the Classification and Regression Trees (CART) algorithm.(43) The RF method is efficient in handling high-dimensional data sets and is tolerant of redundant descriptors.

Support Vector Regression

The main idea in Support Vector Regression (SVR) is to minimize the risk factor based on the structural risk minimization(44) from structure theory, to obtain a good generalization of the limited patterns available in the given data. First, the given data D are mapped onto a higher dimensional feature space, using the kernel function

Molecular Descriptors For Cheminformatics Pdf To Excel Online

k(xi,xj) and then a predictive function is computed on a subset of support vectors. Here, we have used the radial basis kernel function (eq 7) to map the data onto a higher dimensional space. A graphical representation is supplied in the Supporting Information (Figure S1(B)).

SVR mapping on radial basis kernel function:

(7)

Statistical Measures

To evaluate the performance of various machine learning models, we report two statistics: the root mean squared estimate (RMSE) and squared Pearson correlation coefficient R2 (not to be confused with the coefficient of determination).(45) Formulas for these are given in the Supporting Information (eq S5). We have also assessed statistical significance using Menke and Martinez’s method,(46) which we have used previously for similar analysis(47) (see Supporting Information (eq S6, Tables S3–S9 for R2, and Boxes S1–S3) for statistical significance). We also analyzed the variable importance for the RF method (see Table S17 in the Supporting Information). Variable importance was calculated in the CART program as implemented in R.(42b, 48, 49)

10-Fold Cross-Validation

In order to compute and compare the performance of the various regression models, we consider RMSE scores averaged over a 10-fold cross-validation.(50) In the 10-fold cross-validation, the dataset is randomly split into 10 partitions, where the training set consists of 90% of the data and the test set consists of 10% of the data. A predictive regression model is fitted on the training set. The predictivity on the test fold is considered as an external measure to compute the accuracy of the fitted model. The entire process is repeated 10 times in order to cover the entire dataset, with each fold forming the test set on one occasion, and we record the average RMSE. The complete design of the workflow is represented in a flowchart (Scheme 2); similar workflows have been used for classification in other studies.(47, 51) The complete workflow of this analysis was written in R(52) using the CARET package;(53) all scripts are available in the Supporting Information.
In out-of-bag validation, one evaluates the performance of the model by separating training and test data through bootstrap sampling; this is convenient only for the RF method. It is not appropriate to compare RF out-of-bag predictions with other models such as PLS and SVR, which are not based on bootstrap sampling. So, we used 10-fold cross-validation to evaluate the performance of our various models.

10-Fold Cross-Validation for Parameter Tuning

For each model, we use 90% of the total data designated as the training set in order to find the optimum values for these parameters. We selected a range incorporating 20 different possible values for each model parameter, in order to select its best value. For each parameter, a further level of 10-fold cross-validation is carried out in order to retrieve the RMSE of the models using each possible parameter value. Here, the training portion of 90% of the original data is further split into 10 new folds of 9%, with nine (81% of the original data) being used to build each model and one (9%) as an internal validation; this process of model building and internal validation is repeated to predict each of the 10 possible internal validation folds. This internal cross-validation step is repeated 20 times, once for each possible value of the parameter being assessed. Then, based on the value giving the lowest average RMSE score in the internal validation folds, the optimum parameter value is selected. Finally, the model is fitted on the complete training set of 90% of the original data using the selected parameter values.

Assessing the Final Models by 10-Fold Cross-Validation

The given 90%:10% split of the data into training and test sets was used to fit the final model for each fold of the main 10-fold cross-validation, once the optimum parameter values have been selected. The average RMSE and R2 values over the 10 folds were considered in order to compare the usefulness of different descriptor sets and to evaluate the performance of the fitted models.

Dataset

The full DLS-100 dataset, with the experimental log S values, can be found as Supporting Information or downloaded from the Mitchell group web server (http://chemistry.st-andrews.ac.uk/staff/jbom/group/Informatics_Solubility.html; see the Supporting Information (csv_smiles_SI.csv and Table S1)), which is consistent with the excellent suggestions from Walters.(54) The dataset includes CSD refcodes, Chemspider numbers, SMILES, experimental log S values and InChI for all molecules. The log S values in this work come from refs 2, 16, 18, and 19. Where possible, we have selected data obtained from the CheqSol method; where this was not available, we have selected reliable sources using different determination techniques. A good solubility prediction can be considered as a prediction of approximately the same error as that of the experiment. The experimental values have been shown in a number of previous papers to vary considerably.(55) Here, we consider the experimental accuracy limit to be between 0.6 and 1 log S unit (where 1 log S unit represents 5.7 kJ/mol at 298 K). Previous work has reported the experimental error in solubility prediction to be as great as 1.5 log S units and, on average, the error to be at least 0.6 log S units.(56) In 2006, Dearden(55a) noted, as was later reiterated in the Solubility Challenge, that models with RMSE predictions of <0.5 log S units are likely to be overfitted.(16b, 55a) For a prediction to be useful, it must have an RMSE within the standard deviation of the experimental data; otherwise, a trivial prediction using the mean of the experimental data is a more accurate prediction of the log S value.(1) For the DLS-100 dataset, the experimental standard deviation is 1.71 log S units.
ARTICLE SECTIONS
Jump To
We have compiled four sets of results for our DLS-100 dataset. First, a purely theoretical prediction, in which no machine learning is used and where predictions are made using only physics-based calculations. Second, theoretical energies are used as the sole descriptors in machine learning models. Third, cheminformatics descriptors, calculated using the CDK, are used as the sole input to machine learning methods. Finally, cheminformatics descriptors and theoretically computed energies are combined as input to machine learning methods. For each of these methods, we present the results and discussion, with comparison between the methods made on the basis of RMSE and R2 (correlation coefficients for cheminformatics and combined models can be found in the Supporting Information (Tables S3–S9); RMSE values can be found in the Supporting Information (Tables S10–S16)). In addition to these results, we have replicated the solubility challenge using 2D molecular descriptors alone.

Theoretical Predictions

The theoretical methodologies described earlier utilize a thermodynamic cycle to access the free energy of solution. Table 1 shows the R2 correlation coefficient and the RMSE for the predictions made by these methods. Chart 1 shows the linear fit to the data from the SMD(HF) method, which has the lower RMSE and the higher R2 correlation coefficient of the two purely theoretical methods.
Table 1. RMSE and R2 Values for Theoretical Energy Calculationa
DMACRYS + SMD(M06-2X)DMACRYS + SMD(HF)
RMSE (log S units)4.0452.946
R20.2520.327
Chart 1 shows that the data are poorly explained by a linear model. The RMSE for the SMD(HF) method is nearly three times the suggested criterion of 1 log S unit of error. The situation is even worse for the SMD(M06-2X) method for which the RMSE is just over four times this criterion (see Charts S4–S6 in the Supporting Information). Both methods produce results outside the useful prediction criterion of 1.71 log S units. From these results, we can draw a couple of conclusions. First, it is clear that the given methodologies do not adequately quantify the physics occurring in the solution process (i.e., solid to solution). Second, we can conclude that, if it is possible to explain the underlying structure of these data using a general model, based on the predicted log S values, such a model will be inherently nonlinear.
Compared with our previous work,(1) in which theoretical models provided a good prediction of log S, our theoretical methodology here differs only marginally, in the use of MP2 multipoles, and still produces good results (see Supporting Information (Chart S1 and Table S2)) for the same 25 molecules in this work (dataset DLS-25). The predictions for the additional 75 molecules alone show worse predictions than for the full 100-molecule set presented above (see Charts S2 and S3 in the Supporting Information). The additional 75 molecules therefore appear to form a more difficult dataset to predict. It is likely that improved results can be obtained from purely theoretical calculations, if some of the approximations made here are improved; for example, improved modeling of the solvated phase to more accurately describe the solvent and its effects on the solute could increase accuracy. Also, we note that the intramolecular degrees of freedom are neglected in the DMACRYS calculations, and further assumptions are made by using eqs 2 and 3 in the Methods section.
We subsequently applied machine learning methods to the theoretical energies in order to carry out nonlinear regression analysis. The average RMSE scores over 10-fold cross-validation (see the Methods section for details) is represented as two-dimensional (2-D) column charts (see Charts 2 and 6). Different grayscale column bars represent the different machine learning methods used in this study. The standard deviation is shown as an error bar (black line).

Theoretical Energies as Sole Descriptors in Machine Learning

The use of the calculated energies as descriptors in the machine learning models yields considerably improved results, compared to those from the predictions made without machine learning. The results now, while still missing the 1 log S unit error criterion, do make useful predictions in which the RMSE is within the standard deviation of the experimental data (1.71 log S units). The RF and SVR models produce notably better results than PLS. Charts 2 and 3 show that the method minimizing the RMSE (1.21 log S units) is RF with HF when scaled with PCA.

Cheminformatics Descriptors as the Sole Input to Machine Learning

An additional point of interest is that the chemical descriptors alone using RF or SVR can provide a marginally better prediction of log S than the machine learning methods with only the energies as descriptors. In particular, we noticed that fitting the RF model on data that are scaled to a given mean and standard deviation produces a statistically significant improvement in its prediction with cheminformatics descriptors alone rather than theoretical energies (see the Supporting Information (Boxes S1–S3)). In all other cases, the changes are not significant. This suggests that slightly more useful information about the molecules’ log S values is conveyed by the cheminformatics descriptors than by the theoretical energies alone (see Chart 4).

Theoretical Energies and Cheminformatics Descriptors as Input to Machine Learning

When the descriptors and energies are combined as input for the machine learning methods, we obtain results that are generally only very slightly better than those obtained from cheminformatics descriptors alone. This implies that the theoretical energies contain very little extra useful information not already present in the descriptors. The joint results do present a statistically significant improvement for PLS and RF, once scaled by the mean/standard deviation, compared to those for the theoretical energies alone. In light of this, and given that the descriptors alone produce a marginally improved result compared to chemical theory, it is fair to say the cheminformatics descriptors are seen to contain a modest amount of additional information not incorporated in the theoretical energy terms. This suggests that the 123 descriptors of the cheminformatics descriptors and the 10 theoretical energy descriptors convey similar information, with only a small amount of additional information being conveyed by adding the descriptors to the energies and almost no information gained by adding the energies to the set of descriptors. We can conclude that these two sets of features are not generally complementary.
Interestingly, the best result in terms of RMSE is from the descriptors with the M06-2X energies, which, on their own, produced the worse of the two pure theory results in this work (see Charts 5 and 6). The RF model performs particularly well over all descriptor sets, even without any type of scaling, the best RMSE result being only 0.13 units outside the 1 log S unit target. The best single prediction, in terms of the RMSE, was made by the PLS model, using descriptors and the M06-2X energies scaled by the standard deviation and the mean, with an RMSE of 1.11 ± 0.04 log S units. All of these methods make predictions inside the standard deviation of the experimental data; therefore, all of the predictions are useful. We also note that the RF model shows small but statistically significant improvements with all scaling methods (using the theoretical energies and cheminformatics descriptors combined) when compared to some models trained on the theoretical energies only (see Supporting Information (Boxes S1–S3)). This is the only model to show such improvements with all scaling methods in the present work.
We analyzed the relative variable importance (see Table S17 in the Supporting Information) and found that X log P (from ref 57) was consistently rated as the most important feature. X log P is a computed estimate of the base-10 logarithm of the octanol:water partition coefficient (the ratio of concentrations of solute solvated in the two different solvents). This has been seen in many previous studies and is not so surprising given that it provides information specifically about the solvated phase.(4, 56)X log P uses an atom additive model for the prediction of log P. In the Supporting Information, we include tables (Table S17 in the Supporting Information) displaying the 10 most important descriptors; here, we will briefly comment upon these. We find Kier and Hall’s χ path and chain indices(58) to be of importance; these quantify the degree of bonding to heavy atoms within a given path or chain length. In addition, the Moreau–Broto autocorrelation,(59) which describes the charge and mass distribution along a given path length, is found in the top 10. Finally, we also note Randic’s weighted path descriptors,(60) which are used to account for molecular branching. Once the theoretical energies are added to the descriptor set, the free energies of hydration and solution are ranked in the top 10, along with the theoretical log S prediction. Explanations of the molecular descriptors used in this work can be found in ref 61.
Chemically, we can see logic in the most important descriptors. One may expect that molecular branching would play an important role, because it gives information on the extent and flexibility of the molecule, hence contributing some entropic information. Coupling this descriptor with the Kier Hall descriptors, information can be acquired on the composition of such chains, in terms of heavy atoms. The autocorrelation descriptor provides charge and mass distribution information. Again, here, information is imparted concerning the distribution of heavy atoms and electronic factors. For example, the degree of charge separation across a molecule and the localization of charges are important factors in determining particularly enthalpic but also entropic contributions. The theoretical energies in the top 10 are all closely related quantities; it is not surprising that the (purely theoretical) prediction of log S is found in the top 10: since this is the quantity we are trying to predict, it is expected to provide sufficient information to the model to be found in the top 10. The free energies of solution and hydration provide direct information from electronic structure theory and statistical thermodynamics on the interactions of a given molecule, in a given conformation, within its environment, and on the energetics of phase transitions.
As a benchmark, we also present our method’s predictions of the solubility challenge set based solely on cheminformatics descriptors (see Table 2). As suitable crystal structures are not available for all molecules in the solubility challenge, we could not calculate the theoretical energies.
Table 2. Solubility Challenge Dataset: Average over Ten Repetitions of 10-Fold Cross-Validation of RMSE (Standard Deviation) for the Log S Calculation
Machine Learning Solubility Challengeraw data ± stdevscaled by mean/stdev ± stdevscaled by PCA ± stdev
PLS1.08 ± 0.041.03 ± 0.021.15 ± 0.01
RF0.93 ± 0.010.93 ± 0.011.12 ± 0.01
SVR1.17 ± 0.040.93 ± 0.020.95 ± 0.02
Table 3. RMSE for the Log S Calculation Using the Solubility Challenge Dataset with Its Original Training:Test Split
Solubility Challengeraw datascaled by mean/stdevscaled by PCA
PLS0.890.910.91
RF0.931.031.02
SVR1.081.071.08
Tables 2 and 3, and Chart 7, demonstrate that our method can make predictions for the solubility challenge dataset within the coveted 1 log S unit RMSE error and, in fact, makes predictions that are consistent with some commercially available methods and deep-learning methods. A recent publication(56) reported RMSE scores of 0.95 log S units(56) for the commercially available package MLR-SC(62) and 0.90 log S units for a deep-learning method.(56) However, these results are not directly comparable with ours, for two reasons. First, our results have been calculated for a 10-fold cross-validation and for the canonical training:test split (see Tables 2 and 3). Second, the deep-learning result (RMSE = 0.90) given by Lusci et al.(56) is contingent on correcting eight putative errors in the CheqSol solubility data, the most substantial of which is for indomethacin, a compound that has been shown to hydrolyze under alkaline conditions.(63) While we have corrected names and SMILES for the solubility challenge set, we have not adjusted any solubility values therein. It is also reasonable to suggest that, using the solubility challenge set as a benchmark, our 100-molecule set could be considered as a “difficult set”, given the improved prediction offered by our method when the solubility challenge set is used instead.
ARTICLE SECTIONS
Jump To
Our current work shows that accurate solution free energies are not calculable via the simple theoretical procedure that we present here. A significant portion of the important physics in the solution process is not captured using the approximate methodologies that we utilize in this work. This reaffirms that, currently, QSPR methodologies are the most-accurate and time-efficient methods for accurate solution free energy predictions. In addition, we show that state-of-the-art machine learning methods, with a modest number of cheminformatics descriptors, are capable of making solution free-energy predictions that are consistent with those of commercially available programs and newer deep-learning approaches. Here, theoretical energies and cheminformatics descriptors are generally shown to not be complementary for such predictions. Since both sets of descriptors (theoretical energies and cheminformatics descriptors) produce a similar level of accuracy when used alone in the machine learning methods, and little improvement is seen when they are combined, we can conclude that the information conveyed is of a similar nature and that the theoretical energies are, for this reason, a more efficient form of information storage, as 10 descriptors contain equivalent information to 123 molecular descriptors. However, in terms of time, the molecular descriptors are much less expensive to calculate and their use is therefore more time-efficient. Additionally, we note that the RF method has produced promising predictions in this work, with relatively low RMSE. This method has consistently produced good results and would be our recommended method to make solubility predictions.
ARTICLE SECTIONS
Jump To

Informatics_Solubilty_datasets_and_scripts.zip, including R codes, Bash scripts, Python scripts, macro (.xlsb), DLS-100.csv and Solubility_Challenge_dataset.xlsx. DLS-100.csv contains experimental log S values, references, SMILES, sources of smiles, CSD refcodes, molecules names, InChI and Chemspider numbers. SI_document.pdf: Structure data, 2D images of the molecular structures, experimental log S values, CSD refcodes, R2, statistical significance, variable importance. This material is available free of charge via the Internet at http://pubs.acs.org. All scripts and datasets used in this work are available for download from the Mitchell Group web server (http://chemistry.st-andrews.ac.uk/staff/jbom/group/Informatics_Solubility.html, as well as in the Supporting Information.

  • pdf

Author Contributions

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Datasets were curated by J.L.M. and J.B.O.M. Machine learning R scripts were produced by N.N. and L.D.F. The Taverna workflow was produced by L.D.F. Bash scripts, Excel macros, and the R script to run over multiple directories were produced by J.L.McD. DMACRYS and Gaussian calculations were run by J.L.McD. Advice on computational chemistry and machine learning methods was provided by T.v.M. and J.B.O.M. R calculations were run by J.L.McD. and N.N.

The authors declare no competing financial interest.

ARTICLE SECTIONS
Jump To

Scottish Universities Life Science Alliance (SULSA), this work was partly supported by Biotechnology and Biological Sciences Research Council (BBSRC) (No. BB/I00596X/1), Scottish Funding Council (SFC). We thank EaStCHEM for access to the EaStCHEM Research Computing Facility, and Dr. Herbert Früchtl for its maintenance. We are grateful to Dr. Graeme Day (University of Southampton) for providing additional scripts for DMACRYS. We thank Dr. David Palmer (University of Strathclyde) for a script to help automate running DMACRYS. We also thank our colleagues at the University of St. Andrews for useful discussions, particularly Dr. Lazaros Mavridis and Rachael Skyner. We thank the BBSRC for Grant No. BB/I00596X/1 to J.B.O.M., which supports L.D.F.’s research. We thank the Scottish Universities Life Sciences Alliance (SULSA) for supporting J.B.O.M., J.L.McD., and N.N., and we also thank the Scottish Overseas Research Student Awards Scheme of the Scottish Funding Council (SFC) for financial support of N.N.’s studentship.

Abbreviations

Molecular Descriptors For Cheminformatics Pdf To Excel Converter

  1. 1
    Palmer, D. S.; McDonagh, J. L.; Mitchell, J. B. O.; van Mourik, T.; Fedorov, M. V.First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike MoleculesJ. Chem. Theory Comput.2012, 8, 33223337
    [ACS Full Text ], [CAS], Google Scholar
    1
    First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
    Palmer, David S.; McDonagh, James L.; Mitchell, John B. O.; van Mourik, Tanja; Fedorov, Maxim V.
    Journal of Chemical Theory and Computation (2012), 8 (9), 3322-3337CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)
    We demonstrate that the intrinsic aq. soly. of cryst. druglike mols. can be estd. with reasonable accuracy from sublimation free energies calcd. using crystal lattice simulations and hydration free energies calcd. using the 3D Ref. Interaction Site Model (3D-RISM) of the Integral Equation Theory of Mol. Liqs. (IET). The solubilities of 25 cryst. druglike mols. taken from different chem. classes are predicted by the model with a correlation coeff. of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log10S units, which is significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against exptl. soly. data, and it offers a full computational characterization of the thermodn. of transfer of the drug mol. from crystal phase to gas phase to dil. aq. soln.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtV2gsrnE&md5=fbcfafe07e5f8ccb8acc2414bdb3a021
  2. 2
    Palmer, D. S.; Llinas, A.; Morao, I.; Day, G. M.; Goodman, J. M.; Glen, R. C.; Mitchell, J. B. O.Predicting intrinsic aqueous solubility by a thermodynamic cycleMol. Pharm.2008, 5 (2) 266279
    [ACS Full Text ], [CAS], Google Scholar
    2
    Predicting Intrinsic Aqueous Solubility by a Thermodynamic Cycle
    Palmer, David S.; Llinas, Antonio; Morao, Inaki; Day, Graeme M.; Goodman, Jonathan M.; Glen, Robert C.; Mitchell, John B. O.
    Molecular Pharmaceutics (2008), 5 (2), 266-279CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)
    The authors report methods to predict the intrinsic aq. soly. of cryst. org. mols. from two different thermodn. cycles. Direct computation of soly., via ab initio calcn. of thermodn. quantities at an affordable level of theory, cannot deliver the required accuracy. Therefore, the authors have turned to a mixt. of direct computation and informatics, using the calcd. thermodn. properties, along with a few other key descriptors, in regression models. The prediction of log intrinsic soly. (referred to mol/L) by a three-variable linear regression equation gave r2 = 0.77 and RMSE = 0.71 for an external test set comprising drug mols. The model includes a calcd. crystal lattice energy which provides a computational method to account for the interactions in the solid state. Probably it is not necessary to know the polymorphic form prior to prediction. Also, the method developed here may be applicable to other solid-state systems such as salts or cocrystals.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXitleis74%253D&md5=71502207fc58378d9cb7e4cb6ad15fd0
  3. 3
    Mitchell, J. B. O.Informatics, machine learning and computational medicinal chemistryFuture Med. Chem.2011, 3 (4) 45167
    [Crossref], [CAS], Google Scholar
    3
    Informatics, machine learning and computational medicinal chemistry
    Future Medicinal Chemistry (2011), 3 (4), 451-467CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)
    A review. This article reviews the use of informatics and computational chem. methods in medicinal chem., with special consideration of how computational techniques can be adapted and extended to obtain more and higher-quality information. Special consideration is given to the computation of protein--ligand binding affinities, to the prediction of off-target bioactivities, bioactivity spectra and computational toxicol., and also to calcg. absorption-, distribution-, metab.- and excretion-relevant properties, such as soly.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXktVClu7Y%253D&md5=9347e17c69cf60de76ff62184a3f4393
  4. 4
    Hughes, L. D.; Palmer, D. S.; Nigsch, F.; Mitchell, J. B. O.Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log PJ. Chem. Inf. Model2008, 48 (1) 220232
    [ACS Full Text ], [CAS], Google Scholar
    4
    Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P
    Hughes, Laura D.; Palmer, David S.; Nigsch, Florian; Mitchell, John B. O.
    Journal of Chemical Information and Modeling (2008), 48 (1), 220-232CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
    This paper attempts to elucidate differences in QSPR models of aq. soly. (Log S), m.p. (Tm), and octanol-water partition coeff. (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calcd. in the Mol. Operating Environment were the best models. Octanol-water partition coeff. was the easiest property to predict, as indicated by the RMSE of the external test set and the coeff. of detn. (RMSE = 0.73, r2 = 0.87). M.p. prediction, on the other hand, was the most difficult (RMSE = 52.8 °C, r2 = 0.46), and Log S statistics were intermediate between m.p. and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor m.p. prediction, and we suggest that deficiencies in descriptors used in m.p. prediction contribute significantly to the prediction errors.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXksV2htg%253D%253D&md5=7fd5639f3443fa70718ad40e9a9f8957
  5. 5
    (a) Tetko, I. V.Computing chemistry on the webDrug Discovery Today2005, 10 (22) 14971500
    [Crossref], [PubMed], [CAS], Google Scholar
    5a
    Tetko Igor V
    Drug discovery today (2005), 10 (22), 1497-500 ISSN:1359-6446.
    The development of on-line software tools is changing the way we traditionally perform our analysis in drug design, but will chemoinformatics be forever behind bioinformatics in this development?
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2MrnvVOgsw%253D%253D&md5=df73185749c297067e22b2e34629f260
    (b) Tetko, I. V.; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin, V. A.; Radchenko, E. V.; Zefirov, N. S.; Makarenko, A. S.; Tanchuk, V. Y.; Prokopenko, V. V.Virtual computational chemistry laboratory—Design and descriptionJ. Comput. Aid. Mol. Des2005, 19, 45363
    [Crossref], [PubMed], [CAS], Google Scholar
    5b
    Virtual computational chemistry laboratory - design and description
    Tetko, Igor V.; Gasteiger, Johann; Todeschini, Roberto; Mauri, Andrea; Livingstone, David; Ertl, Peter; Palyulin, Vladimir A.; Radchenko, Eugene V.; Zefirov, Nikolay S.; Makarenko, Alexander S.; Tanchuk, Vsevolod Yu.; Prokopenko, Volodymyr V.
    Journal of Computer-Aided Molecular Design (2005), 19 (6), 453-463CODEN: JCADEQ; ISSN:0920-654X. (Springer)
    Internet technol. offers an excellent opportunity for the development of tools by the cooperative effort of various groups and institutions. We have developed a multi-platform software system, Virtual Computational Chem. Lab., http://www.vcclab.org, allowing the computational chemist to perform a comprehensive series of mol. indexes/properties calcns. and data anal. The implemented software is based on a three-tier architecture that is one of the std. technologies to provide client-server services on the Internet. The developed software includes several popular programs, including the indexes generation program, DRAGON, a 3D structure generator, CORINA, a program to predict lipophilicity and aq. soly. of chems., ALOGPS and others. All these programs are running at the host institutes located in five countries over Europe. In this article we review the main features and statistics of the developed system that can be used as a prototype for academic and industry models.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtFaht77F&md5=6e48f916c58c1e772ade43fa8e4b4b1a
  6. 6
    Price, S. L.; Leslie, M.; Welch, G. W. A.; Habgood, M.; Price, L. S.; Karamertzanis, P. G.; Day, G. M.Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentialsPhys. Chem. Chem. Phys.2010, 12 (30) 84788490
    [Crossref], [PubMed], [CAS], Google Scholar
    6
    Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentials
    Price, Sarah L.; Leslie, Maurice; Welch, Gareth W. A.; Habgood, Matthew; Price, Louise S.; Karamertzanis, Panagiotis G.; Day, Graeme M.
    Physical Chemistry Chemical Physics (2010), 12 (30), 8478-8490CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)
    Crystal structure prediction for org. mols. requires both the fast assessment of thousands to millions of crystal structures and the greatest possible accuracy in their relative energies. We describe a crystal lattice simulation program, DMACRYS, emphasizing the features that make it suitable for use in crystal structure prediction for pharmaceutical mols. using accurate anisotropic atom-atom model intermol. potentials based on the theory of intermol. forces. DMACRYS can optimize the lattice energy of a crystal, calc. the second deriv. properties, and reduce the symmetry of the space group to move away from a transition state. The calcd. terahertz frequency k = 0 rigid-body lattice modes and elastic tensor can be used to est. free energies. The program uses a distributed multipole electrostatic model (Qat, t = 00,..,44s) for the electrostatic fields, and can use anisotropic atom-atom repulsion models, damped isotropic dispersion up to R-10, as well as a range of empirically fitted isotropic exp-6 atom-atom models with different definitions of at. types. A new feature is that an accurate model for the induction energy contribution to the lattice energy has been implemented that uses at. anisotropic dipole polarizability models (αat, t = (10,10)..(11c,11s)) to evaluate the changes in the mol. charge d. induced by the electrostatic field within the crystal. It is demonstrated, using the four polymorphs of the pharmaceutical carbamazepine C15H12N2O, that while reproducing crystal structures is relatively easy, calcg. the polymorphic energy differences to the accuracy of a few kJ mol-1 required for applications is very demanding of assumptions made in the modeling. Thus DMACRYS enables the comparison of both known and hypothetical crystal structures as an aid to the development of pharmaceuticals and other specialty org. materials, and provides a tool to develop the modeling of the intermol. forces involved in mol. recognition processes.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXptFOkt7w%253D&md5=098e4b7761cc1d0267402a3d64f214a5
  7. 7
    Segall, M. D.; Lindan, P. J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.First-principles simulation: Ideas, illustrations and the CASTEP codeJ. Phys. Condens. Matter2002, 14 (11) 27172744
    [Crossref], [CAS], Google Scholar
    7
    First-principles simulation: ideas, illustrations and the CASTEP code
    Segall, M. D.; Lindan, Philip J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.
    Journal of Physics: Condensed Matter (2002), 14 (11), 2717-2744CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)
    A review. First-principles simulation, meaning d.-functional theory calcns. with plane waves and pseudopotentials, has become a prized technique in condensed-matter theory. Here I look at the basics of the subject, give a brief review of the theory, examg. the strengths and weaknesses of its implementation, and illustrating some of the ways simulators approach problems through a small case study. I also discuss why and how modern software design methods have been used in writing a completely new modular version of the CASTEP code.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XivFGrs7c%253D&md5=fc155abe0df3e9ec12d832be5b5aa84e
  8. 8
    Dovesi, R.; Orlando, R.; Civalleri, B.; Roetti, C.; Saunders, V. R.; Zicovich-Wilson, C. M.CRYSTAL: a computational tool for the ab initio study of the electronic properties of crystalsZ Kristallogr.2005, 220 (5-2005–6-2005) 571573
    [CAS], Google Scholar
    8
    CRYSTAL: A computational tool for the ab initio study of the electronic properties of crystals
    Dovesi, Roberto; Orlando, Roberto; Civalleri, Bartolomeo; Roetti, Carla; Saunders, Victor R.; Zicovich-Wilson, Claudio M.
    Zeitschrift fuer Kristallographie (2005), 220 (5-6), 571-573CODEN: ZEKRDZ; ISSN:0044-2968. (Oldenbourg Wissenschaftsverlag GmbH)
    CRYSTAL computes the electronic structure and properties of periodic systems (crystals, surfaces, polymers) within Hartree-Fock, D. Functional and various hybrid approxns. CRYSTAL was developed during nearly 30 years (since 1976) by researchers of the Theor. Chem. Group in Torino (Italy), and the Computational Materials Science group in CLRC (Daresbury, UK), with important contributions from visiting researchers, as documented by the main authors list and the bibliog. The basic features of the program CRYSTAL are presented, with two examples of application in the field of crystallog.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVSitbY%253D&md5=7bf7c582dd3196c28e16c4fb24ac9fb7
  9. 9
    (a) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 2. Free energy of solvation in pure meltsJ. Phys. Chem. B2007, 111 (7) 18831892
    [ACS Full Text ], [CAS], Google Scholar
    9a
    In silico prediction of drug solubility: 2. Free energy of solvation in pure melts
    Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander Roland
    The journal of physical chemistry. B (2007), 111 (7), 1883-92 ISSN:1520-6106.
    The solubility of drugs in water is investigated in a series of papers and in the current work. The free energy of solvation, DeltaG*(vl), of a drug molecule in its pure drug melt at 673.15 K (400 degrees C) has been obtained for 46 drug molecules using the free energy perturbation method. The simulations were performed in two steps where first the Coulomb and then the Lennard-Jones interactions were scaled down from full to no interaction. The results have been interpreted using a theory assuming that DeltaG*(vl) = DeltaG(cav) + E(LJ) + E(C)/2 where the free energy of cavity formation, DeltaG(cav), in these pure drug systems was obtained using hard body theories, and E(LJ) and E(C) are the Lennard-Jones and Coulomb interaction energies, respectively, of one molecule with the other ones. Since the main parameter in hard body theories is the volume fraction, an equation of state approach was used to estimate the molecular volume. Promising results were obtained using a theory for hard oblates, in which the oblate axial ratio was calculated from the molecular surface area and volume obtained from simulations. The Coulomb term, E(C)/2, is half of the Coulomb energy in accord with linear response, which showed good agreement with our simulation results. In comparison with our previous results on free energy of hydration, the Coulomb interactions in pure drug systems are weaker, and the van der Waals interactions play a more important role.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7gtFSrsQ%253D%253D&md5=4a28537a0bde7b9df457d5dde0f2de8a
    (b) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matterJ. Phys. Chem. B2007, 111 (25) 73037311
    [ACS Full Text ], [CAS], Google Scholar
    9b
    In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matter
    Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander Roland
    The journal of physical chemistry. B (2007), 111 (25), 7303-11 ISSN:1520-6106.
    The solubility of drugs in water is investigated in a series of papers. In this work, we address the process of bringing a drug molecule from the vapor into a pure drug amorphous phase. This step enables us to actually calculate the solubility of amorphous drugs in water. In our general approach, we, on one hand, perform rigorous free energy simulations using a combination of the free energy perturbation and thermodynamic integration methods. On the other hand, we develop an approximate theory containing parameters that are easily accessible from conventional Monte Carlo simulations, thereby reducing the computation time significantly. In the theory for solvation, we assume that DeltaG* = DeltaGcav + ELJ + EC/2, where the free energy of cavity formation, DeltaGcav, in pure drug systems is obtained using a theory for hard-oblate spheroids, and ELJ and EC are the Lennard-Jones and Coulomb interaction energies between the chosen molecule and the others in the fluid. The theoretical predictions for the free energy of solvation in pure amorphous matter are in good agreement with free energy simulation data for 46 different drug molecules. These results together with our previous studies support our theoretical approach. By using our previous data for the free energy of hydration, we compute the total free energy change of bringing a molecule from the amorphous phase into water. We obtain good agreement between the theory and simulations. It should be noted that to obtain accurate results for the total process, high precision data are needed for the individual subprocesses. Finally, for eight different substances, we compare the experimental amorphous and crystalline solubility in water with the results obtained by the proposed theory with reasonable success.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2szms1Chtw%253D%253D&md5=e09146655f4d28797e7f840919ff30b2
    (c) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Persson, R.; Pedersen, M.In Silico Prediction of Drug Solubility: 4. Will Simple Potentials Suffice?J. Comput. Chem.2009, 30 (12) 18591871
    [Crossref], [PubMed], [CAS], Google Scholar
    9c
    In silico prediction of drug solubility: 4. Will simple potentials suffice?
    Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Persson Rasmus; Pedersen Mikaela
    Journal of computational chemistry (2009), 30 (12), 1859-71 ISSN:.
    In view of the extreme importance of reliable computational prediction of aqueous drug solubility, we have established a Monte Carlo simulation procedure which appears, in principle, to yield reliable solubilities even for complex drug molecules. A theory based on judicious application of linear response and mean field approximations has been found to reproduce the computationally demanding free energy determinations by simulation while at the same time offering mechanistic insight. The focus here is on the suitability of the model of both drug and solvent, i.e., the force fields. The optimized potentials for liquid simulations all atom (OPLS-AA) force field, either intact or combined with partial charges determined either by semiempirical AM1/CM1A calculations or taken from the condensed-phase optimized molecular potentials for atomistic simulation studies (COMPASS) force field has been used. The results illustrate the crucial role of the force field in determining drug solubilities. The errors in interaction energies obtained by the simple force fields tested here are still found to be too large for our purpose but if a component of this error is systematic and readily removed by empirical adjustment the results are significantly improved. In fact, consistent use of the OPLS-AA Lennard-Jones force field parameters with partial charges from the COMPASS force field will in this way produce good predictions of amorphous drug solubility within 1 day on a standard desktop PC. This is shown here by the results of extensive new simulations for a total of 47 drug molecules which were also improved by increasing the water box in the hydration simulations from 500 to 2000 water molecules.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1MvltlGhtQ%253D%253D&md5=34e04b040a9ae704fd5ea7b969d7e5b4
    (d) Westergren, J.; Lindfors, L.; Hoglund, T.; Luder, K.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 1. Free energy of hydrationJ. Phys. Chem. B2007, 111 (7) 18721882
    [ACS Full Text ], [CAS], Google Scholar
    9d
    In Silico Prediction of Drug Solubility: 1. Free Energy of Hydration
    Westergren, Jan; Lindfors, Lennart; Hoeglund, Tobias; Lueder, Kai; Nordholm, Sture; Kjellander, Roland
    Journal of Physical Chemistry B (2007), 111 (7), 1872-1882CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)
    As a first step in the computational prediction of drug soly. the free energy of hydration, ΔGvw•, in TIP4P water has been computed for a data set of 48 drug mols. using the free energy of perturbation method and the optimized potential for liq. simulations all-atom force field. The simulations were performed in two steps, where first the Coulomb and then the Lennard-Jones interactions between the solute and the water mols. were scaled down from full to zero strength to provide phys. understanding and simpler predictive models. The results have been interpreted using a theory assuming ΔGvw• = AMSγ + ELJ + EC/2 where AMS is the mol. surface area, γ is the water-vapor surface tension, and ELJ and EC are the solute-water Lennard-Jones and Coulomb interaction energies, resp. It was found that by a proper definition of the mol. surface area our results as well as several results from the literature were found to be in quant. agreement using the macroscopic surface tension of TIP4P water. This is in contrast to the surface tension for water around a spherical cavity that previously has been shown to be dependent on the size of the cavity up to a radius of ∼1 nm. The step of scaling down the electrostatic interaction can be represented by linear response theory.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtVynsLc%253D&md5=c77f8ac363fc84656673504c964a0b50
  10. 10
    Tomasi, J.; Mennucci, B.; Cammi, R.Quantum Mechanical Continuum Solvation ModelsChem. Rev.2005, 105 (8) 29993094
    [ACS Full Text ], [CAS], Google Scholar
    10
    Tomasi, Jacopo; Mennucci, Benedetta; Cammi, Roberto
    Chemical Reviews (Washington, DC, United States) (2005), 105 (8), 2999-3093CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVynurc%253D&md5=462420dd18b3006ee63d1298b66db247
  11. 11
    (a) Ten-no, S.Free energy of solvation for the reference interaction site model: Critical comparison of expressionsJ. Phys. Chem.2001, 115 (8) 37243731
    [Crossref], [CAS], Google Scholar
    11a
    Free energy of solvation for the reference interaction site model: Critical comparison of expressions
    Journal of Chemical Physics (2001), 115 (8), 3724-3731CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)
    We investigate expressions of excess chem. potential in the ref. interaction site model (RISM) integral equation theory. In addn. to the previous expressions from the Gaussian d. fluctuation theory and from the extended RISM (XRISM) theory, we examine a new free energy functional from the distributed partial wave expansion of mol. correlation functions, using the embedded site model and alcs. with different parameter sets. The results clearly show that the free energy of solvation in the XRISM theory includes a serious error, which is related to the no. of interaction sites and the geometry of a solute mol.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlvVehsbo%253D&md5=42db0b786f7a20ff6ddecb54d0f40dbc
    (b) Palmer, D. S.; Frolov, A. I.; Ratkova, E. L.; Fedorov, M. V.Towards a universal method for calculating hydration free energies: A 3D reference interaction site model with partial molar volume correctionJ. Phys.: Condens. Matter2010, 22 (49) 492101
    [Crossref], [PubMed], [CAS], Google Scholar
    11b
    Towards a universal method for calculating hydration free energies: a 3D reference interaction site model with partial molar volume correction
    Palmer, David S.; Frolov, Andrey I.; Ratkova, Ekaterina L.; Fedorov, Maxim V.
    Journal of Physics: Condensed Matter (2010), 22 (49), 492101/1-492101/9CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)
    We report a simple universal method to systematically improve the accuracy of hydration free energies calcd. using an integral equation theory of mol. liqs., the 3D ref. interaction site model. A strong linear correlation is obsd. between the difference of the exptl. and (uncorrected) calcd. hydration free energies and the calcd. partial molar volume for a data set of 185 neutral org. mols. from different chem. classes. By using the partial molar volume as a linear empirical correction to the calcd. hydration free energy, we obtain predictions of hydration free energies in excellent agreement with expt. (R = 0.94, σ = 0.99 kcal mol-1 for a test set of 120 org. mols.).
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjs1ahug%253D%253D&md5=6b660f5ec50212304923b5c4f6ed75d7
  12. 12
    Stanton, R. V.; Hartsough, D. S.; Merz, K. M.Calculation of solvation free energies using a density functional/molecular dynamics coupled potentialJ. Phys. Chem.1993, 97 (46) 1186811870
    [ACS Full Text ], [CAS], Google Scholar
    12
    Calculation of solvation free energies using a density functional/molecular dynamics coupled potential
    Stanton, Robert V.; Hartsough, David S.; Merz, Kenneth M., Jr.
    Journal of Physical Chemistry (1993), 97 (46), 11868-70CODEN: JPCHAX; ISSN:0022-3654.
    Recently there was much interest in the development of methods which couple quantum mech. and mol. mech. computational models. The authors report the 1st coupling of a d. functional Hamiltonian with a mol. mech. method. The AMBER force field was coupled with a d. functional Hamiltonian as implemented in the deMon program. Test calcns. of solvation energies were carried out for a small group of ions. The coupled potential method slightly underestimates the solvation energy of the chloride ion while it overestimates the solvation energy of the other ions studied. Nonetheless, this method allows to study condensed-phase systems at a level of accuracy currently not available.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXmsFOns70%253D&md5=79256a29182e1a36bbaf822f0ea85b73
  13. 13
    Ratkova, E. L.; Fedorov, M. V.Combination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic PollutantsJ. Chem. Theory Comput.2011, 7 (5) 14501457
    [ACS Full Text ], [CAS], Google Scholar
    13
    Combination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic Pollutants
    Journal of Chemical Theory and Computation (2011), 7 (5), 1450-1457CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)
    The authors discuss a new method for predicting the hydration free energy (HFE) of org. pollutants and illustrate the efficiency of the method on a set of 220 chlorinated arom. hydrocarbons. The new model is computationally inexpensive, with one HFE calcn. taking less than a minute on a PC. The method is based on a combination of a mol. integral equations theory, one-dimensional ref. interaction site model (1D RISM), with the cheminformatics approach. The authors correct HFEs obtained by the 1D RISM with a set of empirical corrections. The corrections are assocd. with the partial molar volume and structural descriptors of the mols. The introduced corrections can significantly improve the quality of the 1D RISM HFE predictions obtained by the partial wave free energy expression and the Kovalenko-Hirata closure. The quality of the model can be further improved by the reparametrization using QM-derived partial charges instead of the originally used OPLS-AA partial charges. The final model gives good results for polychlorinated benzenes (the mean and std. deviation of the error are 0.02 and 0.36 kcal/mol, correspondingly). At the same time, the model gives somewhat worse results for polychlorobiphenyls (PCBs) with a systematic bias of -0.72 kcal/mol but a small std. deviation equal to 0.55 kcal/mol. The error remains the same for the whole set of PCBs, whereas errors of HFEs predicted with continuum solvation models increase significantly for higher chlorinated PCB congeners. The authors discuss potential future applications of the model and several avenues for its further improvement.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXltFGmsbk%253D&md5=73f9a2a58266597d0f7661e84b4a43dd
  14. 14
    Allen, F. H.The Cambridge Structural Database: a quarter of a million crystal structures and risingActa Crystallogr B2002, B58, 380388
    [Crossref], [CAS], Google Scholar
    14
    The Cambridge Structural Database: a quarter of a million crystal structures and rising
    Acta Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 380-388CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)
    The Cambridge Structural Database (CSD) now contains data for more than a quarter of a million small-mol. crystal structures. The information content of the CSD, together with methods for data acquisition, processing and validation, are summarized, with particular emphasis on the chem. information added by CSD editors. Nearly 80% of new structural data arrives electronically, mostly in CIF format, and the CCDC acts as the official crystal structure data depository for 51 major journals. The CCDC now maintains both a CIF archive (more than 73000 CIFs dating from 1996), as well as the distributed binary CSD archive; the availability of data in both archives is discussed. A statistical survey of the CSD is also presented and projections concerning future accession rates indicate that the CSD will contain at least 500000 crystal structures by the year 2010.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu74%253D&md5=406cd0df6ea9035a0ebf8dd9eccbd1f8
  15. 15
    Box, K.; Comer, J. E.; Gravestock, T.; Stuart, M.New Ideas about the Solubility of DrugsChem. Biodiversity2009, 6 (11) 17671788
    [Crossref], [PubMed], [CAS], Google Scholar
    15
    Box, Karl; Comer, John E.; Gravestock, Tom; Stuart, Martin
    Chemistry & Biodiversity (2009), 6 (11), 1767-1788CODEN: CBHIAM; ISSN:1612-1872. (Verlag Helvetica Chimica Acta)
    Methods are described for detecting pptn. of ionizable drugs under conditions of changing pH, estg. kinetic soly. from the onset of pptn., and measuring soly. by chasing equil. Definitions are presented for kinetic, equil., and intrinsic soly. of ionizable drugs, supersatn. and subsatn., and for chasers and non-chasers, which are 2 classes of ionizable drug with significantly different soly. properties. The use of Bjerrum Curves and Neutral-Species Concn. Profiles to depict soly. properties are described and illustrated with case studies showing super-dissolving behavior, conversion between cryst. forms and enhancement of soly. through supersatn., and the use of additives and simulated gastrointestinal fluids.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhsFSrtLrE&md5=57fc3d9b4f1490ded75d71d1cfe20c58
  16. 16
    (a) Hopfinger, A. J.; Esposito, E. X.; Llinàs, A.; Glen, R. C.; Goodman, J. M.Findings of the Challenge To Predict Aqueous SolubilityJ. Chem. Inf. Model.2008, 49 (1) 15
    [ACS Full Text ], Google Scholar
    There is no corresponding record for this reference.
    (b) Llinàs, A.; Glen, R. C.; Goodman, J. M.Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements?J. Chem. Inf. Model.2008, 48 (7) 12891303
    [ACS Full Text ], [CAS], Google Scholar
    16b
    Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements?
    Llinas, Antonio; Glen, Robert C.; Goodman, Jonathan M.
    Journal of Chemical Information and Modeling (2008), 48 (7), 1289-1303CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
    Soly. is a key physicochem. property of mols. Serious deficiencies exist in the consistency and reliability of soly. data in the literature. The accurate prediction of soly. would be very useful. However, systematic errors and lack of metadata assocd. with measurements greatly reduce the confidence in current models. To address this, we are accurately measuring intrinsic soly. values, and here we report results for a diverse set of 100 druglike mols. at 25° and an ionic strength of 0.15 M using the CheqSol approach. This is a highly reproducible potentiometric technique that ensures the thermodn. equil. is reached rapidly. Results with a coeff. of variation higher than 4% were rejected. In addn., the Potentiometric Cycling for Polymorph Creation method, [PC]2, was used to obtain multiple polymorph forms from aq. soln. We now challenge researchers to predict the intrinsic soly. of 32 other druglike mols. that have been measured but are yet to be published.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXosV2itb4%253D&md5=6a8950fc1c51ad9a51731c65e6debc22
  17. 17
    The Goodman group. http://www-jmg.ch.cam.ac.uk/data/solubility/ (accessed Feb. 8,2013) .
    Google Scholar
    There is no corresponding record for this reference.
  18. 18
    Narasimham, L.; Barhate, V. D.Kinetic and intrinsic solubility determination of some β-blockers and antidiabetics by potentiometryJ. Pharm. Res.2011, 4 (2) 532536
    Google Scholar
    There is no corresponding record for this reference.
  19. 19
    (a) Bergström, C. A. S.; Luthman, K.; Artursson, P.Accuracy of calculated pH-dependent aqueous drug solubilityEur. J. Pharm. Sci.2004, 22 (5) 387398
    [Crossref], [PubMed], [CAS], Google Scholar
    19a
    Accuracy of calculated pH-dependent aqueous drug solubility
    Bergstrom, Christel A. S.; Luthman, Kristina; Artursson, Per
    European Journal of Pharmaceutical Sciences (2004), 22 (5), 387-398CODEN: EPSCED; ISSN:0928-0987. (Elsevier B.V.)
    The aim of the present study was to investigate the extent to which the Henderson-Hasselbalch (HH) relationship can be used to predict the pH-dependent aq. soly. of cationic drugs. The pH-dependent soly. for 25 amines, carrying a single pos. charge, was detd. with a small-scale shake flask method. Each sample was prepd. as a suspension in 150 mM phosphate buffer. The pH-dependent soly. curves were obtained using at least 10 different pH values. The intrinsic soly., the soly. at the pKa and the soly. at pH values reflecting the pH of the bulk and acid microclimate in the human small intestine (pH 7.4 and 6.5, resp.) were detd. for all compds. The exptl. study revealed a large diversity in slope, from -0.5 (celiprolol) to -8.6 (hydralazine) in the linear pH-dependent soly. interval, which is in sharp contrast to the slope of -1 assumed by the HH equation. In addn., a large variation in the range of soly. between the completely uncharged and completely charged drug species was obsd. The range for disopyramide was only 1.1 log units, whereas that for amiodarone was greater than 6.3 log units, pointing at the compd. specific response to counter-ion effects. In conclusion, the investigated cationic drugs displayed compd. specific pH-dependent soly. profiles, indicating that the HH equation in many cases will only give rough estns. of the pH-dependent soly. of drugs in divalent buffer systems.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXlsl2nsbc%253D&md5=dc25a69e233afd97c1ae1cb490f9de0d
    (b) Bergström, C. A. S.; Wassvik, C. M.; Norinder, U.; Luthman, K.; Artursson, P.Global and Local Computational Models for Aqueous Solubility Prediction of Drug-Like MoleculesJ. Chem. Inf. Comput. Sci.2004, 44 (4) 14771488
    [ACS Full Text ], [CAS], Google Scholar
    19b
    Global and local computational models for aqueous solubility prediction of drug-like molecules
    Bergstrom Christel A S; Wassvik Carola M; Norinder Ulf; Luthman Kristina; Artursson Per
    Journal of chemical information and computer sciences (2004), 44 (4), 1477-88 ISSN:0095-2338.
    The aim of this study was to develop in silico protocols for the prediction of aqueous drug solubility. For this purpose, high quality solubility data of 85 drug-like compounds covering the total drug-like space as identified with the ChemGPS methodology were used. Two-dimensional molecular descriptors describing electron distribution, lipophilicity, flexibility, and size were calculated by Molconn-Z and Selma. Global minimum energy conformers were obtained by Monte Carlo simulations in MacroModel and three-dimensional descriptors of molecular surface area properties were calculated by Marea. PLS models were obtained by use of training and test sets. Both a global drug solubility model (R(2) = 0.80, RMSE(te) = 0.83) and subset specific models (after dividing the 85 compounds into acids, bases, ampholytes, and nonproteolytes) were generated. Furthermore, the final models were successful in predicting the solubility values of external test sets taken from the literature. The results showed that homologous series and subsets can be predicted with high accuracy from easily comprehensible models, whereas consensus modeling might be needed to predict the aqueous drug solubility of datasets with large structural diversity.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2cvgtVSkuw%253D%253D&md5=4945232f11abd9bf5b0510226b82198d
    (c) Ran, Y.; Yalkowsky, S. H.Prediction of Drug Solubility by the General Solubility Equation (GSE)J. Chem. Inf. Comput. Sci.2001, 41 (2) 354357
    [ACS Full Text ], [CAS], Google Scholar
    19c
    Prediction of Drug Solubility by the General Solubility Equation (GSE)
    Journal of Chemical Information and Computer Sciences (2001), 41 (2), 354-357CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)
    The revised GSE proposed by Jain and Yalkowsky is used to est. the aq. soly. of a set of org. nonelectrolytes studied by Jorgensen and Duffy. The only inputs used in the GSE are the Celsius m.p. (MP) and the octanol water partition coeff. (Kow). These are generally known, easily measured, or easily calcd. The GSE does not utilize any fitted parameters. The av. abs. error for the 150 compds. is 0.43 compared to 0.56 with Jorgensen and Duffy's computational method, which utilizes 5 fitted parameters. Thus, the revised GSE is simpler and provides a more accurate estn. of aq. soly. of the same set of org. compds. It is also more accurate than the original version of the GSE.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXislKntA%253D%253D&md5=c15ffefdf6aacf5fdafd95e5adad7017
    (d) Rytting, E.; Lentz, K.; Chen, X.-Q.; Qian, F.; Venkatesh, S.Aqueous and cosolvent solubility data for drug-like organic compoundsAAPS J.2005, 7 (1) E78E105
    [Crossref], [PubMed], [CAS], Google Scholar
    19d
    Aqueous and cosolvent solubility data for drug-like organic compounds
    Rytting, Erik; Lentz, Kimberley A.; Chen, Xue-Qing; Qian, Feng; Venkatesh, Srini
    AAPS Journal (2005), 7 (1), E78-E105CODEN: AJAOB6; ISSN:1550-7416. (American Association of Pharmaceutical Scientists)
    A review. Recently 2 QSPR-based in silico models were developed in the authors' labs. to predict the aq. and non-aq. soly. of drug-like org. compds. For the intrinsic aq. soly. model, a set of 321 structurally diverse drugs was collected from literature for the anal. For the PEG 400 cosolvent model, exptl. data for 122 drugs were obtained by a uniform exptl. procedure at 4 vol. fractions of PEG 400 in water, 0%, 25%, 50%, and 75%. The drugs used in both models represent a wide range of compds., with log P values from -5 to 7.5, and mol. wts. from 100 to >600 g/mol. Because of the standardized procedure used to collect the cosolvent data and the careful assessment of quality used in obtaining literature data, both data sets have potential value for the scientific community for use in building various models that require exptl. soly. data.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXltlOqtr0%253D&md5=9be1e57d033ed321abbfbee0db943619
    (e) Shareef, A.; Angove, M. J.; Wells, J. D.; Johnson, B. B.Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol AJ. Chem. Eng. Data2006, 51 (3) 879881
    [ACS Full Text ], [CAS], Google Scholar
    19e
    Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol A
    Shareef, Ali; Angove, Michael J.; Wells, John D.; Johnson, Bruce B.
    Journal of Chemical & Engineering Data (2006), 51 (3), 879-881CODEN: JCEAAX; ISSN:0021-9568. (American Chemical Society)
    The solubilities of three estrogenic hormones-estrone, 17β-estradiol, and 17α-ethynylestradiol - and the industrial pollutant bisphenol A were measured in water, dil. acid and alkali (pH 4 and 10, resp.), and aq. KNO3 (0.01 mol/L-1 and 0.1 mol/L-1). The concns. of satd. solns., after equilibration at (25.0 ± 0.5)° with excess solid for 4 days, were detd. by HPLC. Six replicate results were obtained for each solute-solvent pair and the coeff. of variation was in most cases <5%. The solubilities in pure water with std. deviations were estrone (1.30 ± 0.08) mg/L-1, 17β-estradiol (1.51 ± 0.04) mg/L-1, 17α-ethynylestradiol (9.20 ± 0.09) mg/L-1, and bisphenol A (300 ± 5) mg/L-1. The soly. of each of the hormones was unchanged between pH 4 and pH 7 but was greater at pH 10. At pH 7, the hormones became progressively less sol. as the ionic strength increased from 0.0 to 0.1 mol/L-1. By contrast the soly. of bisphenol A was essentially the same under all of the exptl. conditions tested.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XjtFKisbk%253D&md5=e6eddfd0865cc5f1e0a2da81a92873e9
  20. 20
    CrystalWeb unfortunately withdrawn in2013. http://cds.dl.ac.uk/cds/datasets/crys/cweb/cweb.html.
    Google Scholar
    There is no corresponding record for this reference.
  21. 21
    Bruno, I. J.; Cole, J. C.; Edgington, P. R.; Kessler, M.; Macrae, C. F.; McCabe, P.; Pearson, J.; Taylor, R.New software for searching the Cambridge Structural Database and visualizing crystal structuresActa Crystallogr., Sect. B: Struct. Sci.2002, 58 (3 Part 1) 389397
    [Crossref], [PubMed], [CAS], Google Scholar
    21
    New software for searching the Cambridge Structural Database and visualizing crystal structures
    Bruno, Ian J.; Cole, Jason C.; Edgington, Paul R.; Kessler, Magnus; Macrae, Clare F.; McCabe, Patrick; Pearson, Jonathan; Taylor, Robin
    Acta Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 389-397CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)
    Two new programs were developed for searching the Cambridge Structural Database (CSD) and visualizing database entries: ConQuest and Mercury. The former is a new search interface to the CSD, the latter is a high-performance crystal-structure visualizer with extensive facilities for exploring networks of intermol. contacts. Particular emphasis was placed on making the programs as intuitive as possible. Both ConQuest and Mercury run under Windows and various types of Unix, including Linux.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu78%253D&md5=b8cd5dddcd43067010fef6d60e37b3c2
  22. 22
    Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E.The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and BioinformaticsJ. Chem. Inf. Comput. Sci.2003, 43 (2) 493500
    [ACS Full Text ], [CAS], Google Scholar
    22
    The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics
    Steinbeck, Christoph; Han, Yongquan; Kuhn, Stefan; Horlacher, Oliver; Luttmann, Edgar; Willighagen, Egon
    Journal of Chemical Information and Computer Sciences (2003), 43 (2), 493-500CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)
    The Chem. Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in mol. informatics, including 2D and 3D rendering of chem. structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXhtVaktbg%253D&md5=afc8fd10783af301c73a8183727230bf
  23. 23
    Gupta, R. R.; Gifford, E. M.; Liston, T.; Waller, C. L.; Hohman, M.; Bunin, B. A.; Ekins, S.Using Open Source Computational Tools for Predicting Human Metabolic Stability and Additional Absorption, Distribution, Metabolism, Excretion, and Toxicity PropertiesDrug Metab. Dispos.2010, 38 (11) 20832090
    [Crossref], [PubMed], [CAS], Google Scholar
    23
    Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties
    Gupta, Rishi R.; Gifford, Eric M.; Liston, Ted; Waller, Chris L.; Hohman, Moses; Bunin, Barry A.; Ekins, Sean
    Drug Metabolism and Disposition (2010), 38 (11), 2083-2090CODEN: DMDSAI; ISSN:0090-9556. (American Society for Pharmacology and Experimental Therapeutics)
    Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source mol. descriptors [e.g., chem. development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary com. software. We initially evaluated open source descriptors and model building algorithms using a training set of approx. 50,000 mols. and a test set of approx. 25,000 mols. with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and pos. predicted value (PPV) = 0.64], equiv. to those of models built with com. Mol. Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 mols. and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we obsd. a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of com. software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVWqurjN&md5=7366b0c99868668e5b95f4e60093814f
  24. 24
    O’Boyle, N.Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChIJ. Cheminform.2012, 4 (1) 22
    [Crossref], [CAS], Google Scholar
    24
    Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
    Journal of cheminformatics (2012), 4 (1), 22 ISSN:.
    UNLABELLED: BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain - such as the development of a standard aromatic model for SMILES - the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38botVKjsw%253D%253D&md5=c9107b5c0392711cee66979cfa7356c5
  25. 25
    RSC ChemSpider. (accessed Feb. 8,2013) .
    Google Scholar
    There is no corresponding record for this reference.
  26. 26
    Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; Soiland-Reyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M. P.; Sufi, S.; Goble, C.The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloudNucleic Acids Res.2013, 41 (W1) W557W561
    [Crossref], [PubMed], Google Scholar
    There is no corresponding record for this reference.
  27. 27
    Little, J. L.; Williams, A. J.; Pshenichnov, A.; Tkachenko, V.Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderJ. Am. Soc. Mass. Spectrom.2012, 23 (1) 179185
    [Crossref], [PubMed], [CAS], Google Scholar
    27
    Identification of 'known unknowns' utilizing accurate mass data and ChemSpider
    Little, James L.; Williams, Antony J.; Pshenichnov, Alexey; Tkachenko, Valery
    Journal of the American Society for Mass Spectrometry (2012), 23 (1), 179-185CODEN: JAMSEF; ISSN:1044-0305. (Springer)
    In many cases, an unknown to an investigator is actually known in the chem. literature, a ref. database, or an internet resource. We refer to these types of compds. as 'known unknowns.'. ChemSpider is a very valuable internet database of known compds. useful in the identification of these types of compds. in com., environmental, forensic, and natural product samples. The database contains over 26 million entries from hundreds of data sources and is provided as a free resource to the community. Accurate mass mass spectrometry data is used to query the database by either elemental compn. or a monoisotopic mass. Searching by elemental compn. is the preferred approach. However, it is often difficult to det. a unique elemental compn. for compds. with mol. wts. greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results are refined by sorting the no. of refs. assocd. with each compd. in descending order. This raises the most useful candidates to the top of the list for further evaluation. These approaches were shown to be successful in identifying 'known unknowns' noted in our lab. and for compds. of interest to others.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XltVaitbw%253D&md5=4fd410551631762b48d179362a82a971
  28. 28
    Goble, C. A.; Bhagat, J.; Aleksejevs, S.; Cruickshank, D.; Michaelides, D.; Newman, D.; Borkum, M.; Bechhofer, S.; Roos, M.; Li, P.; De Roure, D.myExperiment: A repository and social network for the sharing of bioinformatics workflowsNucleic Acids Res.2010, 38 (suppl 2) W677W682
    [Crossref], [PubMed], [CAS], Google Scholar
    28
    myExperiment: a repository and social network for the sharing of bioinformatics workflows
    Goble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, David
    Nucleic Acids Research (2010), 38 (Web Server), W677-W682CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
    MyExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and anal., to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to [email protected]
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXotVWjsrs%253D&md5=201899f12a0151252eaebc638813171b
  29. 29
    De Ferrari, L.Workflow Entry: From molecule name to SMILE and InchI using ChemSpider. http://www.myexperiment.org/workflows/3603.html. (accessed 10th February2014) .
    Google Scholar
    There is no corresponding record for this reference.
  30. 30
    Griseofulvin. http://en.wikipedia.org/wiki/Griseofulvin (accessed 11th December 2012. SMILES source).
    Google Scholar
    There is no corresponding record for this reference.
  31. 31
    Glipizide. http://en.wikipedia.org/wiki/Glipizide (accessed 11th December 2012. SMILES source).
    Google Scholar
    There is no corresponding record for this reference.
  32. 32
    Stone, A.Distributed Multipole Analysis of Gaussian wavefunctions GDMA version 2.2.02. http://www-stone.ch.cam.ac.uk/documentation/gdma/manual.pdf (accessed Feb. 10, 2014).
    Google Scholar
    There is no corresponding record for this reference.
  33. 33
    Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; shida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, N. J.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J.Gaussian 09, Gaussian, Inc: Wallingford, CT,2009.
    Google Scholar
    There is no corresponding record for this reference.
  34. 34
    Stone, A. J.Distributed multipole analysis, or how to describe a molecular charge distributionChem. Phys. Lett.1981, 83 (2) 233239
    [Crossref], [CAS], Google Scholar
    34
    Distributed multipole analysis, or how to describe a molecular charge distribution
    Chemical Physics Letters (1981), 83 (2), 233-9CODEN: CHPLBC; ISSN:0009-2614.
    A method of analyzing mol. wavefunctions is described. It can be regarded as an extension of Mulliken population anal., and can be used both to give a qual. or quant. picture of the mol. charge distribution, and in the accurate evaluation of mol. multipole moments of arbitrary order with negligible computational effort.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL3MXmt1yitbY%253D&md5=1a7ac695fa444006688ea669d36a3d55
  35. 35
    Buckingham, R.The classical equation of state of gaseous helium, neon and argonProc. R. Soc. Lon. Ser-A1938, 168 (933) 264283
    [Crossref], Google Scholar
    There is no corresponding record for this reference.
  36. 36
    Gavezzotti, A.; Filippini, G.Theoretical Aspects and Computer Modeling.; Gavezzotti, A., Ed. Wiley and Sons: Chichester,1997; pp 6197.
    Google Scholar
    There is no corresponding record for this reference.
  37. 37
    Marenich, A. V.; Cramer, C. J.; Truhlar, D. G.Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface TensionsJ. Phys. Chem. B2009, 113 (18) 63786396
    [ACS Full Text ], [CAS], Google Scholar
    37
    Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions
    Marenich, Aleksandr V.; Cramer, Christopher J.; Truhlar, Donald G.
    Journal of Physical Chemistry B (2009), 113 (18), 6378-6396CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)
    We present a new continuum solvation model based on the quantum mech. charge d. of a solute mol. interacting with a continuum description of the solvent. The model is called SMD, where the 'D' stands for 'd.' to denote that the full solute electron d. is used without defining partial at. charges. 'Continuum' denotes that the solvent is not represented explicitly but rather as a dielec. medium with surface tension at the solute-solvent boundary. SMD is a universal solvation model, where 'universal' denotes its applicability to any charged or uncharged solute in any solvent or liq. medium for which a few key descriptors are known (in particular, dielec. const., refractive index, bulk surface tension, and acidity and basicity parameters). The model separates the observable solvation free energy into two main components. The first component is the bulk electrostatic contribution arising from a self-consistent reaction field treatment that involves the soln. of the nonhomogeneous Poisson equation for electrostatics in terms of the integral-equation-formalism polarizable continuum model (IEF-PCM). The cavities for the bulk electrostatic calcn. are defined by superpositions of nuclear-centered spheres. The second component is called the cavity-dispersion-solvent-structure term and is the contribution arising from short-range interactions between the solute and solvent mols. in the first solvation shell. This contribution is a sum of terms that are proportional (with geometry-dependent proportionality consts. called at. surface tensions) to the solvent-accessible surface areas of the individual atoms of the solute. The SMD model has been parametrized with a training set of 2821 solvation data including 112 aq. ionic solvation free energies, 220 solvation free energies for 166 ions in acetonitrile, methanol, and DMSO, 2346 solvation free energies for 318 neutral solutes in 91 solvents (90 nonaq. org. solvents and water), and 143 transfer free energies for 93 neutral solutes between water and 15 org. solvents. The elements present in the solutes are H, C, N, O, F, Si, P, S, Cl, and Br. The SMD model employs a single set of parameters (intrinsic at. Coulomb radii and at. surface tension coeffs.) optimized over six electronic structure methods: M05-2X/MIDI!6D, M05-2X/6-31G*, M05-2X/6-31+G**, M05-2X/cc-pVTZ, B3LYP/6-31G*, and HF/6-31G*. Although the SMD model has been parametrized using the IEF-PCM protocol for bulk electrostatics, it may also be employed with other algorithms for solving the nonhomogeneous Poisson equation for continuum solvation calcns. in which the solute is represented by its electron d. in real space. This includes, for example, the conductor-like screening algorithm. With the 6-31G* basis set, the SMD model achieves mean unsigned errors of 0.6-1.0 kcal/mol in the solvation free energies of tested neutrals and mean unsigned errors of 4 kcal/mol on av. for ions with either Gaussian03 or GAMESS.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXksV2is74%253D&md5=54931a64c70d28445ee53876a8b1a4b9
  38. 38
    (a) Ben-Naim, A.Standard thermodynamics of transfer. Uses and misusesJ. Phys. Chem.1978, 82 (7) 792803
    [ACS Full Text ], [CAS], Google Scholar
    38a
    Standard thermodynamics of transfer. Uses and misuses
    Journal of Physical Chemistry (1978), 82 (7), 792-803CODEN: JPCHAX; ISSN:0022-3654.
    The std. free energy of transfer of a solute A between two solvents a and b is discussed at both a thermodn. and a statistical-mech. level. Whereas thermodn. alone cannot be used to choose the 'best' std. quantity, statistical mechanics can help to make such a choice. The std. free energy of transferrin A, ΔμA°, computed by using the no. d. (or molarity) scale has the following advantages: (1) it is the simplest and least ambiguous quantity; (2) it is the quantity that directly probes the difference in the solvation properties of the two solvents with respect to the solute A; (3) it can be used, without any change of notation, in any soln., not necessarily a dil. one, and including even pure A; (4) by straightforward thermodn. manipulations one obtains the entropy, enthalpy, vol. changes, etc., for the same process. All of these quantities have advantages similar to those indicated for the free-energy change. Because of the advantages of this particular choice of std. quantities, it is proposed to 'standardize' the use of the std. thermodn. quantities of transfer and refer to them as the local-std. quantities.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE1cXht1Ohtrs%253D&md5=7bc3f4c27e458daf5ba115dcd69092d6
    (b) Ben-Naim, A.; Marcus, Y.Solvation thermodynamics of nonionic solutesJ. Phys. Chem.1984, 81 (4) 20162027
    [Crossref], [CAS], Google Scholar
    38b
    Ben-Naim, A.; Marcus, Y.
    Journal of Chemical Physics (1984), 81 (4), 2016-27CODEN: JCPSA6; ISSN:0021-9606.
    A generalized process of solvation is defined. It is argued that the thermodn. of this solvation process is more informative as compared with other processes suggested before. Numerical examples are presented and compared with some recently published related data.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXlvFyqsb4%253D&md5=48040de9341c4f18c98dc4c66266b017
  39. 39
    Howley, T.; Madden, M. G.; O’Connell, M.-L.; Ryder, A. G.The effect of principal component analysis on machine learning accuracy with high-dimensional spectral dataKnowl.-Based Syst.2006, 19 (5) 363370
    [Crossref], Google Scholar
    There is no corresponding record for this reference.
  40. 40
    Wold, H.Partial Least Squares (PLS) Regression2003, 17
    Google Scholar
    There is no corresponding record for this reference.
  41. 41
    (a) Abdi, H.Partial Least Squares (PLS) Regression2003, 17
    Google Scholar
    There is no corresponding record for this reference.
    (b) Wold, S.; Sjöström, M.; Eriksson, L.PLS-regression: A basic tool of chemometricsChemometr. Intell. Lab.2001, 58 (2) 109130
    [Crossref], [CAS], Google Scholar
    41b
    Wold, Svante; Sjostrom, Michael; Eriksson, Lennart
    Chemometrics and Intelligent Laboratory Systems (2001), 58 (2), 109-130CODEN: CILSEN; ISSN:0169-7439. (Elsevier Science B.V.)
    A review on PLS-regression (PLSR) as a std. tool in chemometrics and used in chem. and engineering. The underlying model and its assumption and commonly used diagnostics are discussed, together with the interpretation of resulting parameters. Two examples are used as illustrations: first, a Quant. Structure-Activity Relationship (QSAR)/Quant. Structure Property Relationship (QSPR) data set of peptides is used to outline the development, interpretation, and refinement of a PLSR model. Second, a data set from the manufg. of recycled paper is analyzed to illustrate time series modeling of process data by means of PLSR and time-lagged X-variables.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXotF2mtLw%253D&md5=2d7fd1e946600e138ac92699ebcc7e29
    (c) Mevik, B.; Wehrens, R.The pls Package: Principal Component and Partial Least Squares Regression in RJ Stat Softw.2007, 18 (2) 124
    Google Scholar
    There is no corresponding record for this reference.
  42. 42
    (a) Palmer, D. S.; O’Boyle, N. M.; Glen, R. C.; Mitchell, J. B. O.Random Forest Models To Predict Aqueous SolubilityJ. Chem. Inf. Model2006, 47 (1) 150158
    [ACS Full Text ], Google Scholar
    There is no corresponding record for this reference.
    (b) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P.Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingJ. Chem. Inf. Comput. Sci.2003, 43 (6) 19471958
    [ACS Full Text ], [CAS], Google Scholar
    42b
    Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling
    Svetnik, Vladimir; Liaw, Andy; Tong, Christopher; Culberson, J. Christopher; Sheridan, Robert P.; Feuston, Bradley P.
    Journal of Chemical Information and Computer Sciences (2003), 43 (6), 1947-1958CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)
    A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compd.'s quant. or categorical biol. activity based on a quant. description of the compd.'s mol. structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. The authors built predictive models for six cheminformatics data sets. The authors anal. demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. The authors also present three addnl. features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compd. similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXos1Wiu7s%253D&md5=dea7867551ec30260b0091b90593a660
  43. 43
    Breiman, L.Random ForestsMach. Learning2001, 45 (1) 532
    [Crossref], Google Scholar
    There is no corresponding record for this reference.
  44. 44
    (a) Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z.; Chen, X.; Li, H.-D.Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machineJ. Chemometr.2010, 24 (9) 584595
    Google Scholar
    There is no corresponding record for this reference.
    (b) Vapnik, V. N.An overview of statistical learning theoryIEEE Trans. Neural Netw.1999, 10 (5) 988999
    [Crossref], [PubMed], [CAS], Google Scholar
    44b
    Vapnik V N
    IEEE transactions on neural networks (1999), 10 (5), 988-99 ISSN:1045-9227.
    Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1c%252FpsFSqtA%253D%253D&md5=d4e24c4899519f0c21087b610e28c849
  45. 45
    Hu, S. In R2 Vs. r2, SCEA/ISPA Conference,2008; pp 115.
    Google Scholar
    There is no corresponding record for this reference.
  46. 46
    Menke, J.; Martinez, T. R.In Using permutations instead of student’s t distribution for p-values in paired-difference algorithm comparisons, IEEE IJCNN, July 25–29, 2004;2004; Vol. 2, pp 13311335.
    Google Scholar
    There is no corresponding record for this reference.
  47. 47
    Nath, N.; Mitchell, J. B. O.Is EC class predictable from reaction mechanism?BMC Bioinformatics2012, 13 (1) 60
    [Crossref], [PubMed], [CAS], Google Scholar
    47
    Nath Neetika; Mitchell John B O
    BACKGROUND: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. RESULTS: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. CONCLUSIONS: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways.The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38rnslKjtA%253D%253D&md5=c3f196743c10b505f74b4528c839c4dc
  48. 48
    Kuhn, M.Variable Importance Using The caret Package 2012; Available via the Internet at http://cran.open-source-solution.org/web/packages/caret/vignettes/caretVarImp.pdf, accessed Feb. 10,2014.
    Google Scholar
    There is no corresponding record for this reference.
  49. 49
    Kuhn, M.Variable Importance Using The caret Package2010, 17
    Google Scholar
    There is no corresponding record for this reference.
  50. 50
    Varma, S.; Simon, R.Bias in error estimation when using cross-validation for model selectionBMC Bioinform.2006, 7 (1) 91
    [Crossref], [PubMed], [CAS], Google Scholar
    50
    Bias in error estimation when using cross-validation for model selection
    BMC bioinformatics (2006), 7 (), 91 ISSN:.
    BACKGROUND: Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. RESULTS: We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these 'null' datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With 'null' and 'non null' (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the 'null' datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training and 'non-null' data distributions. CONCLUSION: We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD287ktFKlsA%253D%253D&md5=6fce0c91e4624476b4134dd4545af4ce
  51. 51
    Simon, R. M.; Subramanian, J.; Li, M.-C.; Menezes, S.Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional dataBrief. Bioinform.2011, 12 (3) 203214
    [Crossref], [PubMed], [CAS], Google Scholar
    51
    Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data
    Simon Richard M; Subramanian Jyothi; Li Ming-Chung; Menezes Supriya
    Briefings in bioinformatics (2011), 12 (3), 203-14 ISSN:.
    Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell's concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MvoslSqsg%253D%253D&md5=a1ba7c32d7741dfb57f6dd2564511bb6
  52. 52
    R Development Core Team. R: A language and environment for statistical computing; R Foundation for Statistical Computing, Vienna, Austria,2011.
    Google Scholar
    There is no corresponding record for this reference.
  53. 53
    (a) Kuhn, M.Building Predictive Models in R Using the caret PackageJ. Stat. Software2008, 28, 126
    [Crossref], [PubMed], Google Scholar
    There is no corresponding record for this reference.
    (b) Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.R Core Team. caret: Classification and Regression Training. R Package “caret”. http://CRAN.R-project.org/package=caret.
    Google Scholar
    There is no corresponding record for this reference.
  54. 54
    Walters, W. P.Modeling, Informatics, and the Quest for ReproducibilityJ. Chem. Inf. Model2013, 53 (7) 15291530
    [ACS Full Text ], [CAS], Google Scholar
    54
    Modeling, Informatics, and the Quest for Reproducibility
    Journal of Chemical Information and Modeling (2013), 53 (7), 1529-1530CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
    A review. There is no doubt that papers published in the Journal of Chem. Information and Modeling, and related journals, provide valuable scientific information. However, it is often difficult to reproduce the work described in mol. modeling and chemoinformatics papers. In many cases the software described in the paper is not readily available, in other cases the supporting information is not provided in an accessible format. To date, the major journals in the fields of mol. modeling and chemoinformatics have not established guidelines for reproducible research. This letter provides an overview of the reproducibility challenges facing our field and suggests some guidelines for improving the reproducibility of published work.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpt1ylsLo%253D&md5=99ec193c7cc97cee6f0b2cf820454ca4
  55. 55
    (a) Dearden, J. C.In silico prediction of aqueous solubilityExpert Opin. Drug Discovery2006, 1 (1) 3152
    [Crossref], [PubMed], [CAS], Google Scholar
    55a
    Dearden, John C.
    Expert Opinion on Drug Discovery (2006), 1 (1), 31-52CODEN: EODDBX; ISSN:1746-0441. (Informa Healthcare)
    A review. The fundamentals of aq. soly., and the factors that affect it, are briefly outlined, followed by a short introduction to quant. structure-property relationships. Early (pre-1990) work on aq. soly. prediction is summarized, and a more detailed presentation and crit. discussion are given of the results of most, if not all, of those published in silico prediction studies from 1990 onwards that have used diverse training sets. A table is presented of a no. of studies that have used a 21-compd. test set of drugs and pesticides to validate their aq. soly. models. Finally, the results are given of a test of 15 com. available software programs for aq. soly. prediction, using a test set of 122 drugs with accurately measured aq. solubilities.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtVChtL%252FO&md5=1daa383519ba4ffc887c773847d8a656
    (b) Jorgensen, W. L.; Duffy, E. M.Prediction of drug solubility from structureAdv. Drug Delivery Rev.2002, 54 (3) 355366
    [Crossref], [PubMed], [CAS], Google Scholar
    55b
    Jorgensen, William L.; Duffy, Erin M.
    Advanced Drug Delivery Reviews (2002), 54 (3), 355-366CODEN: ADDREP; ISSN:0169-409X. (Elsevier Science Ireland Ltd.)
    A review with refs. The aq. soly. of a drug is an important factor affecting its bioavailability. Numerous computational methods have been developed for the prediction of aq. soly. from a compd.'s structure. A review is provided of the methodol. and quality of results for the most useful procedures including the model implemented in the QikProp program. Viable methods now exist for predictions with <1 log unit uncertainty, which is adequate for prescreening synthetic candidates or design of combinatorial libraries. Further progress with predictive methods would require an exptl. database of highly accurate solubilities for a large, diverse collection of drug-like mols.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xitlartbc%253D&md5=bc749286d56bf55c26d25b70806217e1
  56. 56
    Lusci, A.; Pollastri, G.; Baldi, P.Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like MoleculesJ. Chem. Inf. Model.2013, 53, 15631575
    [ACS Full Text ], [CAS], Google Scholar
    56
    Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules
    Lusci, Alessandro; Pollastri, Gianluca; Baldi, Pierre
    Journal of Chemical Information and Modeling (2013), 53 (7), 1563-1575CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
    A review. Shallow machine learning methods have been applied to chemoinformatics problems with some success. As more data becomes available and more complex problems are tackled, deep machine learning methods may also become useful. Here, we present a brief overview of deep learning methods and show in particular how recursive neural network approaches can be applied to the problem of predicting mol. properties. However, mols. are typically described by undirected cyclic graphs, while recursive approaches typically use directed acyclic graphs. Thus, we develop methods to address this discrepancy, essentially by considering an ensemble of recursive neural networks assocd. with all possible vertex-centered acyclic orientations of the mol. graph. One advantage of this approach is that it relies only minimally on the identification of suitable mol. descriptors because suitable representations are learned automatically from the data. Several variants of this approach are applied to the problem of predicting aq. soly. and tested on four benchmark data sets. Exptl. results show that the performance of the deep learning methods matches or exceeds the performance of other state-of-the-art methods according to several evaluation metrics and expose the fundamental limitations arising from training sets that are too small or too noisy. A Web-based predictor, AquaSol, is available online through the ChemDB portal (cdb.ics.uci.edu) together with addnl. material.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpvVGht7g%253D&md5=d51e537fea2f1f53ea5013224ee1cdc9
  57. 57
    Wang, R.; Gao, Y.; Lai, L.Calculating partition coefficient by atom-additive methodPerspect. Drug Discovery Des.2000, 19 (1) 4766
    [Crossref], [CAS], Google Scholar
    57
    Calculating partition coefficient by atom-additive method
    Perspectives in Drug Discovery and Design (2000), 19 (Hydrophobicity and Solvation in Drug Design, Pt. 3), 47-66CODEN: PDDDEC; ISSN:0928-2866. (Kluwer Academic Publishers)
    A new atom-additive method is presented for calcg. octanol/H2O partition coeff. (log P) of org. compds. The method, XLOGP v2.0, gives log P values by summing the contributions of component atoms and correction factors. Altogether 90 atom types are used to classify C, N, O, S, P and halogen atoms, and 10 correction factors are used for some special substructures. The contributions of each atom type and correction factor are derived by multivariate regression anal. of 1853 org. compds. with known exptl. log P values. The correlation coeff. (r) for fitting the whole set is 0.973 and the std. deviation (s) is 0.349 log units. Comparison of various log P calcn. procedures demonstrates that method gives much better results than other atom-additive approaches and is at least comparable to fragmental approaches. Because of the simple methodol., the missing fragment problem does not occur in method.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXnslaitbg%253D&md5=90d29fc6b268c9af644b244eb0bc6912
  58. 58
    Kier, L. B.; Hall, L. H.Molecular Connectivity in Chemistry and Drug Research; Academic Press: New York,1976.
    Google Scholar
    There is no corresponding record for this reference.
  59. 59
    Moreau, G.; Broto, P.The autocorrelation of a topological structure: A new molecular descriptorNew J. Chem.1980, 359360
    Google Scholar
    There is no corresponding record for this reference.
  60. 60
    Randic, M.On molecular identification numbersJ. Chem. Inf. Comput. Sci.1984, 24 (3) 164175
    [ACS Full Text ], [CAS], Google Scholar
    60
    Randic, Milan
    Journal of Chemical Information and Computer Sciences (1984), 24 (3), 164-75CODEN: JCISD8; ISSN:0095-2338.
    The assignment of identification nos. to mols. that are easy to deriv. and have structural significance is discussed and a scheme for assignment is outlined. Output of the ALL-PATH program for study of mol. topol. from graphs with multiple connections is presented which includes weighing factors for individual bonds. Uniqueness and structural significance of the identification nos. are examd. and mol. graphs and identification nos. of some ring compds., terpenes, and some other compds. are presented.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXkvV2rtLs%253D&md5=355a2f5987cde26550a10bf7ef475d9c
  61. 61
    CDK Descriptor Summary (2011–05–28). http://pele.farmbio.uu.se/nightly-1.2.x/dnames.html, accessed Feb. 10,2014.
    Google Scholar
    There is no corresponding record for this reference.
  62. 62
    Hewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.In Silico Prediction of Aqueous Solubility: The Solubility ChallengeJ. Chem. Inf. Model.2009, 49 (11) 25722587
    [ACS Full Text ], [CAS], Google Scholar
    62
    In Silico Prediction of Aqueous Solubility: The Solubility Challenge
    Hewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.
    Journal of Chemical Information and Modeling (2009), 49 (11), 2572-2587CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
    The dissoln. of a chem. into water is a process fundamental to both chem. and biol. The persistence of a chem. within the environment and the effects of a chem. within the body are dependent primarily upon aq. soly. With the well-documented limitations hindering the accurate exptl. detn. of aq. soly., the utilization of predictive methods have been widely investigated and employed. The setting of a soly. challenge by this journal proved an excellent opportunity to explore several different modeling methods, utilizing a supplied dataset of high-quality aq. soly. measurements. Four contrasting approaches (simple linear regression, artificial neural networks, category formation, and available in silico models) were utilized within our lab. and the quality of these predictions was assessed. These were chosen to span the multitude of modeling methods now in use, while also allowing for the evaluation of existing com. soly. models. The conclusions of this study were surprising, in that a simple linear regression approach proved to be superior over more-complex modeling methods. Possible explanations for this observation are discussed and also recommendations are made for future soly. prediction.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtleqtLvE&md5=7983f8d3133655a4d8967d5b7e9fbdbd
  63. 63
    Tsvetkova, B.; Pencheva, I.; Zlatkov, A.; Peikov, P.High Performance Liquid Chromatographic Assay of Indomethacin and its Related Substances in Tablet Dosage FormsInt. J. Pharm. Pharm. Sci.2012, 4 (Supplement 3) 549552
    [CAS], Google Scholar
    63
    High performance liquid chromatographic assay of indomethacin and its related substances in tablet dosage forms
    Tsvetkova, Boyka; Pencheva, Ivanka; Zlatkov, Alexander; Peikov, Plamen
    International Journal of Pharmacy and Pharmaceutical Sciences (2012), 4 (Suppl. 3), 549-552CODEN: IJPPKB; ISSN:0975-1491. (International Journal of Pharmacy and Pharmaceutical Sciences)
    A reversed-phase high performance liq. chromatog. (RP-HPLC) method with UV detection was proposed for sepn. of indomethacin and its impurities from tablet dosage forms. The best sepn. was achieved on a LiChrosorb C18, 250 mm × 4.6 mm, 5 μm column at a detector wavelength of 240 nm. The utilization of mixt. of 40 vols. 0.5% vol./vol. orthophosphoric acid, 20 vols. of methanol and 40 vols. of acetonitrile as mobile phase with a flow rate of 2 mL/min enabled acceptable resoln. of indomethacin, in large excess, from possible impurities, in a short elution time (9 min). Anal. parameters linearity, accuracy, precision and specificity were detd. by validation procedure and found to be satisfactory. Overall, the proposed method was found to be simple, rapid, precise and accurate for quality control of indomethacin and its impurities in dosage forms and in raw materials. In this work the kinetic investigation of the alk. hydrolysis of indomethacin was also carried out. The degrdn. reaction was monitored by means of HPLC method developed and was found to follow first-order kinetics. The rate const. and half-life of the hydrolytic decompn. were estd.
    https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XptFamurw%253D&md5=dc420050b9fdb9f5e917016ff64b06a4

This article is cited by 27 publications.

  1. Xin Yang, Yifei Wang, Ryan Byrne, Gisbert Schneider, Shengyong Yang. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chemical Reviews2019, Article ASAP.
  2. Dipankar Roy, Andriy Kovalenko. Performance of 3D-RISM-KH in Predicting Hydration Free Energy: Effect of Solute Parameters. The Journal of Physical Chemistry A2019,123 (18) , 4087-4093. DOI: 10.1021/acs.jpca.9b01623.
  3. James L. McDonagh, Arnaldo F. Silva, Mark A. Vincent, Paul L. A. Popelier. Machine Learning of Dynamic Electron Correlation Energies from Topological Atoms. Journal of Chemical Theory and Computation2018,14 (1) , 216-224. DOI: 10.1021/acs.jctc.7b01157.
  4. Hannes K. Buchholz, Rebecca K. Hylton, Jan Gerit Brandenburg, Andreas Seidel-Morgenstern, Heike Lorenz, Matthias Stein, and Sarah L. Price . Thermochemistry of Racemic and Enantiopure Organic Crystals for Predicting Enantiomer Separation. Crystal Growth & Design2017,17 (9) , 4676-4686. DOI: 10.1021/acs.cgd.7b00582.
  5. Sereina Riniker . Molecular Dynamics Fingerprints (MDFP): Machine Learning from MD Data To Predict Free-Energy Differences. Journal of Chemical Information and Modeling2017,57 (4) , 726-741. DOI: 10.1021/acs.jcim.6b00778.
  6. Sungjin Kim, Adrián Jinich, and Alán Aspuru-Guzik . MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes. Journal of Chemical Information and Modeling2017,57 (4) , 657-668. DOI: 10.1021/acs.jcim.6b00332.
  7. James L. McDonagh, David S. Palmer, Tanja van Mourik, and John B. O. Mitchell . Are the Sublimation Thermodynamics of Organic Molecules Predictable?. Journal of Chemical Information and Modeling2016,56 (11) , 2162-2179. DOI: 10.1021/acs.jcim.6b00033.
  8. Yuriy A. Abramov . Major Source of Error in QSPR Prediction of Intrinsic Thermodynamic Solubility of Drugs: Solid vs Nonsolid State Contributions?. Molecular Pharmaceutics2015,12 (6) , 2126-2141. DOI: 10.1021/acs.molpharmaceut.5b00119.
  9. Richard L. Marchese Robinson, Kevin J. Roberts, Elaine B. Martin. The influence of solid state information and descriptor selection on statistical models of temperature dependent aqueous solubility. Journal of Cheminformatics2018,10 (1) DOI: 10.1186/s13321-018-0298-3.
  10. Christiaan Jardinez, José L Medina-Franco. QSAR Modeling Using Quantum Chemical Descriptors of Benzimidazole Analogues With Antiparasitic Properties. International Journal of Quantitative Structure-Property Relationships2018,3 (2) , 61-79. DOI: 10.4018/IJQSPR.2018070105.
  11. Yanqing Zhu, Jiao Chen, Min Zheng, Gaoquan Chen, Ali Farajtabar, Hongkun Zhao. Equilibrium solubility and preferential solvation of 1,1′-sulfonylbis(4-aminobenzene) in binary aqueous solutions of n -propanol, isopropanol and 1,4-dioxane. The Journal of Chemical Thermodynamics2018,122, 102-112. DOI: 10.1016/j.jct.2018.03.010.
  12. Christel A.S. Bergström, Per Larsson. Computational prediction of drug solubility in water-based systems: Qualitative and quantitative approaches used in the current drug discovery and development setting. International Journal of Pharmaceutics2018,540 (1-2) , 185-193. DOI: 10.1016/j.ijpharm.2018.01.044.
  13. Samuel Boobier, Anne Osbourn, John B. O. Mitchell. Can human experts predict solubility better than computers?. Journal of Cheminformatics2017,9 (1) DOI: 10.1186/s13321-017-0250-y.
  14. Gisbert Schneider, Kimito Funatsu, Yasushi Okuno, Dave Winkler. De novo Drug Design - Ye olde Scoring Problem Revisited. Molecular Informatics2017,36 (1-2) , 1681031. DOI: 10.1002/minf.201681031.
  15. V. Sathyanarayanamoorthi, S. Suganthi, V. Kannappan, R. Kumar. Solubility study of cefpodoxime acid antibiotic in terms of free energy of solution - Insights from polarizable continuum model (PCM) analysis. Journal of Molecular Liquids2016,224, 657-661. DOI: 10.1016/j.molliq.2016.10.019.
  16. Donghai Yu, Ruobing Du, Suhui Zhang, Renjie Lu, Huaying An, Ji-Chang Xiao. Prediction of Solubility Properties from Transfer Energies for Acidic Phosphorus-Containing Rare-Earth Extractants Using Implicit Solvation Model. Solvent Extraction and Ion Exchange2016,34 (4) , 347-354. DOI: 10.1080/07366299.2016.1156420.
  17. Ayesha Zafar, Jóhannes Reynisson. Hydration Free Energy as a Molecular Descriptor in Drug Design: A Feasibility Study. Molecular Informatics2016,35 (5) , 207-214. DOI: 10.1002/minf.201501035.
  18. David S. Palmer, Maxim V. Fedorov. Molecular Simulation Methods to Compute Intrinsic Aqueous Solubility of Crystalline Drug-Like Molecules. 2016,, 263-286. DOI: 10.1002/9781118700686.ch11.
  19. Edward O. Pyzer-Knapp, Gregor N. Simm, Alán Aspuru Guzik. A Bayesian approach to calibrating high-throughput virtual screening results and application to organic photovoltaic materials. Materials Horizons2016,3 (3) , 226-233. DOI: 10.1039/C5MH00282F.
  20. Jia Fu, Jianzhong Wu. Toward high-throughput predictions of the hydration free energies of small organic molecules from first principles. Fluid Phase Equilibria2016,407, 304-313. DOI: 10.1016/j.fluid.2015.05.042.
  21. Shahram Emami, Abolghasem Jouyban, Hadi Valizadeh, Ali Shayanfar. Are Crystallinity Parameters Critical for Drug Solubility Prediction?. Journal of Solution Chemistry2015,44 (12) , 2297-2315. DOI: 10.1007/s10953-015-0410-5.
  22. J. L. McDonagh, T. van Mourik, J. B. O. Mitchell. Predicting Melting Points of Organic Molecules: Applications to Aqueous Solubility Prediction Using the General Solubility Equation. Molecular Informatics2015,34 (11-12) , 715-724. DOI: 10.1002/minf.201500052.
  23. Edward O. Pyzer-Knapp, Kewei Li, Alan Aspuru-Guzik. Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery. Advanced Functional Materials2015,25 (41) , 6495-6502. DOI: 10.1002/adfm.201501919.
  24. William Kew, John B. O. Mitchell. Greedy and Linear Ensembles of Machine Learning Methods Outperform Single Approaches for QSPR Regression Problems. Molecular Informatics2015,34 (9) , 634-647. DOI: 10.1002/minf.201400122.
  25. Oleg A. Raevsky, Daniel E. Polianczyk, Veniamin Yu. Grigorev, Olga E. Raevskaja, John C. Dearden. In silico Prediction of Aqueous Solubility: a Comparative Study of Local and Global Predictive Models. Molecular Informatics2015,34 (6-7) , 417-430. DOI: 10.1002/minf.201400144.
  26. Robert Docherty, Klimentina Pencheva, Yuriy A. Abramov. Low solubility in drug development: de-convoluting the relative importance of solvation and crystal packing. Journal of Pharmacy and Pharmacology2015,67 (6) , 847-856. DOI: 10.1111/jphp.12393.
  27. R. E. Skyner, J. L. McDonagh, C. R. Groom, T. van Mourik, J. B. O. Mitchell. A review of methods for the calculation of solution free energies and the modelling of systems in solution. Physical Chemistry Chemical Physics2015,17 (9) , 6174-6191. DOI: 10.1039/C5CP00288E.
  • ARTICLE SECTIONS
    Jump To

    This article references 63 other publications.

    1. 1
      Palmer, D. S.; McDonagh, J. L.; Mitchell, J. B. O.; van Mourik, T.; Fedorov, M. V.First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike MoleculesJ. Chem. Theory Comput.2012, 8, 33223337
      [ACS Full Text ], [CAS], Google Scholar
      1
      First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
      Palmer, David S.; McDonagh, James L.; Mitchell, John B. O.; van Mourik, Tanja; Fedorov, Maxim V.
      Journal of Chemical Theory and Computation (2012), 8 (9), 3322-3337CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)
      We demonstrate that the intrinsic aq. soly. of cryst. druglike mols. can be estd. with reasonable accuracy from sublimation free energies calcd. using crystal lattice simulations and hydration free energies calcd. using the 3D Ref. Interaction Site Model (3D-RISM) of the Integral Equation Theory of Mol. Liqs. (IET). The solubilities of 25 cryst. druglike mols. taken from different chem. classes are predicted by the model with a correlation coeff. of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log10S units, which is significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against exptl. soly. data, and it offers a full computational characterization of the thermodn. of transfer of the drug mol. from crystal phase to gas phase to dil. aq. soln.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtV2gsrnE&md5=fbcfafe07e5f8ccb8acc2414bdb3a021
    2. 2
      Palmer, D. S.; Llinas, A.; Morao, I.; Day, G. M.; Goodman, J. M.; Glen, R. C.; Mitchell, J. B. O.Predicting intrinsic aqueous solubility by a thermodynamic cycleMol. Pharm.2008, 5 (2) 266279
      [ACS Full Text ], [CAS], Google Scholar
      2
      Predicting Intrinsic Aqueous Solubility by a Thermodynamic Cycle
      Palmer, David S.; Llinas, Antonio; Morao, Inaki; Day, Graeme M.; Goodman, Jonathan M.; Glen, Robert C.; Mitchell, John B. O.
      Molecular Pharmaceutics (2008), 5 (2), 266-279CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)
      The authors report methods to predict the intrinsic aq. soly. of cryst. org. mols. from two different thermodn. cycles. Direct computation of soly., via ab initio calcn. of thermodn. quantities at an affordable level of theory, cannot deliver the required accuracy. Therefore, the authors have turned to a mixt. of direct computation and informatics, using the calcd. thermodn. properties, along with a few other key descriptors, in regression models. The prediction of log intrinsic soly. (referred to mol/L) by a three-variable linear regression equation gave r2 = 0.77 and RMSE = 0.71 for an external test set comprising drug mols. The model includes a calcd. crystal lattice energy which provides a computational method to account for the interactions in the solid state. Probably it is not necessary to know the polymorphic form prior to prediction. Also, the method developed here may be applicable to other solid-state systems such as salts or cocrystals.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXitleis74%253D&md5=71502207fc58378d9cb7e4cb6ad15fd0
    3. 3
      Mitchell, J. B. O.Informatics, machine learning and computational medicinal chemistryFuture Med. Chem.2011, 3 (4) 45167
      [Crossref], [CAS], Google Scholar
      3
      Informatics, machine learning and computational medicinal chemistry
      Future Medicinal Chemistry (2011), 3 (4), 451-467CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)
      A review. This article reviews the use of informatics and computational chem. methods in medicinal chem., with special consideration of how computational techniques can be adapted and extended to obtain more and higher-quality information. Special consideration is given to the computation of protein--ligand binding affinities, to the prediction of off-target bioactivities, bioactivity spectra and computational toxicol., and also to calcg. absorption-, distribution-, metab.- and excretion-relevant properties, such as soly.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXktVClu7Y%253D&md5=9347e17c69cf60de76ff62184a3f4393
    4. 4
      Hughes, L. D.; Palmer, D. S.; Nigsch, F.; Mitchell, J. B. O.Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log PJ. Chem. Inf. Model2008, 48 (1) 220232
      [ACS Full Text ], [CAS], Google Scholar
      4
      Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P
      Hughes, Laura D.; Palmer, David S.; Nigsch, Florian; Mitchell, John B. O.
      Journal of Chemical Information and Modeling (2008), 48 (1), 220-232CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
      This paper attempts to elucidate differences in QSPR models of aq. soly. (Log S), m.p. (Tm), and octanol-water partition coeff. (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calcd. in the Mol. Operating Environment were the best models. Octanol-water partition coeff. was the easiest property to predict, as indicated by the RMSE of the external test set and the coeff. of detn. (RMSE = 0.73, r2 = 0.87). M.p. prediction, on the other hand, was the most difficult (RMSE = 52.8 °C, r2 = 0.46), and Log S statistics were intermediate between m.p. and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor m.p. prediction, and we suggest that deficiencies in descriptors used in m.p. prediction contribute significantly to the prediction errors.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXksV2htg%253D%253D&md5=7fd5639f3443fa70718ad40e9a9f8957
    5. 5
      (a) Tetko, I. V.Computing chemistry on the webDrug Discovery Today2005, 10 (22) 14971500
      [Crossref], [PubMed], [CAS], Google Scholar
      5a
      Tetko Igor V
      Drug discovery today (2005), 10 (22), 1497-500 ISSN:1359-6446.
      The development of on-line software tools is changing the way we traditionally perform our analysis in drug design, but will chemoinformatics be forever behind bioinformatics in this development?
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2MrnvVOgsw%253D%253D&md5=df73185749c297067e22b2e34629f260
      (b) Tetko, I. V.; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin, V. A.; Radchenko, E. V.; Zefirov, N. S.; Makarenko, A. S.; Tanchuk, V. Y.; Prokopenko, V. V.Virtual computational chemistry laboratory—Design and descriptionJ. Comput. Aid. Mol. Des2005, 19, 45363
      [Crossref], [PubMed], [CAS], Google Scholar
      5b
      Virtual computational chemistry laboratory - design and description
      Tetko, Igor V.; Gasteiger, Johann; Todeschini, Roberto; Mauri, Andrea; Livingstone, David; Ertl, Peter; Palyulin, Vladimir A.; Radchenko, Eugene V.; Zefirov, Nikolay S.; Makarenko, Alexander S.; Tanchuk, Vsevolod Yu.; Prokopenko, Volodymyr V.
      Journal of Computer-Aided Molecular Design (2005), 19 (6), 453-463CODEN: JCADEQ; ISSN:0920-654X. (Springer)
      Internet technol. offers an excellent opportunity for the development of tools by the cooperative effort of various groups and institutions. We have developed a multi-platform software system, Virtual Computational Chem. Lab., http://www.vcclab.org, allowing the computational chemist to perform a comprehensive series of mol. indexes/properties calcns. and data anal. The implemented software is based on a three-tier architecture that is one of the std. technologies to provide client-server services on the Internet. The developed software includes several popular programs, including the indexes generation program, DRAGON, a 3D structure generator, CORINA, a program to predict lipophilicity and aq. soly. of chems., ALOGPS and others. All these programs are running at the host institutes located in five countries over Europe. In this article we review the main features and statistics of the developed system that can be used as a prototype for academic and industry models.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtFaht77F&md5=6e48f916c58c1e772ade43fa8e4b4b1a
    6. 6
      Price, S. L.; Leslie, M.; Welch, G. W. A.; Habgood, M.; Price, L. S.; Karamertzanis, P. G.; Day, G. M.Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentialsPhys. Chem. Chem. Phys.2010, 12 (30) 84788490
      [Crossref], [PubMed], [CAS], Google Scholar
      6
      Modelling organic crystal structures using distributed multipole and polarizability-based model intermolecular potentials
      Price, Sarah L.; Leslie, Maurice; Welch, Gareth W. A.; Habgood, Matthew; Price, Louise S.; Karamertzanis, Panagiotis G.; Day, Graeme M.
      Physical Chemistry Chemical Physics (2010), 12 (30), 8478-8490CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)
      Crystal structure prediction for org. mols. requires both the fast assessment of thousands to millions of crystal structures and the greatest possible accuracy in their relative energies. We describe a crystal lattice simulation program, DMACRYS, emphasizing the features that make it suitable for use in crystal structure prediction for pharmaceutical mols. using accurate anisotropic atom-atom model intermol. potentials based on the theory of intermol. forces. DMACRYS can optimize the lattice energy of a crystal, calc. the second deriv. properties, and reduce the symmetry of the space group to move away from a transition state. The calcd. terahertz frequency k = 0 rigid-body lattice modes and elastic tensor can be used to est. free energies. The program uses a distributed multipole electrostatic model (Qat, t = 00,..,44s) for the electrostatic fields, and can use anisotropic atom-atom repulsion models, damped isotropic dispersion up to R-10, as well as a range of empirically fitted isotropic exp-6 atom-atom models with different definitions of at. types. A new feature is that an accurate model for the induction energy contribution to the lattice energy has been implemented that uses at. anisotropic dipole polarizability models (αat, t = (10,10)..(11c,11s)) to evaluate the changes in the mol. charge d. induced by the electrostatic field within the crystal. It is demonstrated, using the four polymorphs of the pharmaceutical carbamazepine C15H12N2O, that while reproducing crystal structures is relatively easy, calcg. the polymorphic energy differences to the accuracy of a few kJ mol-1 required for applications is very demanding of assumptions made in the modeling. Thus DMACRYS enables the comparison of both known and hypothetical crystal structures as an aid to the development of pharmaceuticals and other specialty org. materials, and provides a tool to develop the modeling of the intermol. forces involved in mol. recognition processes.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXptFOkt7w%253D&md5=098e4b7761cc1d0267402a3d64f214a5
    7. 7
      Segall, M. D.; Lindan, P. J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.First-principles simulation: Ideas, illustrations and the CASTEP codeJ. Phys. Condens. Matter2002, 14 (11) 27172744
      [Crossref], [CAS], Google Scholar
      7
      First-principles simulation: ideas, illustrations and the CASTEP code
      Segall, M. D.; Lindan, Philip J. D.; Probert, M. J.; Pickard, C. J.; Hasnip, P. J.; Clark, S. J.; Payne, M. C.
      Journal of Physics: Condensed Matter (2002), 14 (11), 2717-2744CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)
      A review. First-principles simulation, meaning d.-functional theory calcns. with plane waves and pseudopotentials, has become a prized technique in condensed-matter theory. Here I look at the basics of the subject, give a brief review of the theory, examg. the strengths and weaknesses of its implementation, and illustrating some of the ways simulators approach problems through a small case study. I also discuss why and how modern software design methods have been used in writing a completely new modular version of the CASTEP code.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XivFGrs7c%253D&md5=fc155abe0df3e9ec12d832be5b5aa84e
    8. 8
      Dovesi, R.; Orlando, R.; Civalleri, B.; Roetti, C.; Saunders, V. R.; Zicovich-Wilson, C. M.CRYSTAL: a computational tool for the ab initio study of the electronic properties of crystalsZ Kristallogr.2005, 220 (5-2005–6-2005) 571573
      [CAS], Google Scholar
      8
      CRYSTAL: A computational tool for the ab initio study of the electronic properties of crystals
      Dovesi, Roberto; Orlando, Roberto; Civalleri, Bartolomeo; Roetti, Carla; Saunders, Victor R.; Zicovich-Wilson, Claudio M.
      Zeitschrift fuer Kristallographie (2005), 220 (5-6), 571-573CODEN: ZEKRDZ; ISSN:0044-2968. (Oldenbourg Wissenschaftsverlag GmbH)
      CRYSTAL computes the electronic structure and properties of periodic systems (crystals, surfaces, polymers) within Hartree-Fock, D. Functional and various hybrid approxns. CRYSTAL was developed during nearly 30 years (since 1976) by researchers of the Theor. Chem. Group in Torino (Italy), and the Computational Materials Science group in CLRC (Daresbury, UK), with important contributions from visiting researchers, as documented by the main authors list and the bibliog. The basic features of the program CRYSTAL are presented, with two examples of application in the field of crystallog.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVSitbY%253D&md5=7bf7c582dd3196c28e16c4fb24ac9fb7
    9. 9
      (a) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 2. Free energy of solvation in pure meltsJ. Phys. Chem. B2007, 111 (7) 18831892
      [ACS Full Text ], [CAS], Google Scholar
      9a
      In silico prediction of drug solubility: 2. Free energy of solvation in pure melts
      Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander Roland
      The journal of physical chemistry. B (2007), 111 (7), 1883-92 ISSN:1520-6106.
      The solubility of drugs in water is investigated in a series of papers and in the current work. The free energy of solvation, DeltaG*(vl), of a drug molecule in its pure drug melt at 673.15 K (400 degrees C) has been obtained for 46 drug molecules using the free energy perturbation method. The simulations were performed in two steps where first the Coulomb and then the Lennard-Jones interactions were scaled down from full to no interaction. The results have been interpreted using a theory assuming that DeltaG*(vl) = DeltaG(cav) + E(LJ) + E(C)/2 where the free energy of cavity formation, DeltaG(cav), in these pure drug systems was obtained using hard body theories, and E(LJ) and E(C) are the Lennard-Jones and Coulomb interaction energies, respectively, of one molecule with the other ones. Since the main parameter in hard body theories is the volume fraction, an equation of state approach was used to estimate the molecular volume. Promising results were obtained using a theory for hard oblates, in which the oblate axial ratio was calculated from the molecular surface area and volume obtained from simulations. The Coulomb term, E(C)/2, is half of the Coulomb energy in accord with linear response, which showed good agreement with our simulation results. In comparison with our previous results on free energy of hydration, the Coulomb interactions in pure drug systems are weaker, and the van der Waals interactions play a more important role.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7gtFSrsQ%253D%253D&md5=4a28537a0bde7b9df457d5dde0f2de8a
      (b) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matterJ. Phys. Chem. B2007, 111 (25) 73037311
      [ACS Full Text ], [CAS], Google Scholar
      9b
      In silico prediction of drug solubility. 3. Free energy of solvation in pure amorphous matter
      Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Kjellander Roland
      The journal of physical chemistry. B (2007), 111 (25), 7303-11 ISSN:1520-6106.
      The solubility of drugs in water is investigated in a series of papers. In this work, we address the process of bringing a drug molecule from the vapor into a pure drug amorphous phase. This step enables us to actually calculate the solubility of amorphous drugs in water. In our general approach, we, on one hand, perform rigorous free energy simulations using a combination of the free energy perturbation and thermodynamic integration methods. On the other hand, we develop an approximate theory containing parameters that are easily accessible from conventional Monte Carlo simulations, thereby reducing the computation time significantly. In the theory for solvation, we assume that DeltaG* = DeltaGcav + ELJ + EC/2, where the free energy of cavity formation, DeltaGcav, in pure drug systems is obtained using a theory for hard-oblate spheroids, and ELJ and EC are the Lennard-Jones and Coulomb interaction energies between the chosen molecule and the others in the fluid. The theoretical predictions for the free energy of solvation in pure amorphous matter are in good agreement with free energy simulation data for 46 different drug molecules. These results together with our previous studies support our theoretical approach. By using our previous data for the free energy of hydration, we compute the total free energy change of bringing a molecule from the amorphous phase into water. We obtain good agreement between the theory and simulations. It should be noted that to obtain accurate results for the total process, high precision data are needed for the individual subprocesses. Finally, for eight different substances, we compare the experimental amorphous and crystalline solubility in water with the results obtained by the proposed theory with reasonable success.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2szms1Chtw%253D%253D&md5=e09146655f4d28797e7f840919ff30b2
      (c) Luder, K.; Lindfors, L.; Westergren, J.; Nordholm, S.; Persson, R.; Pedersen, M.In Silico Prediction of Drug Solubility: 4. Will Simple Potentials Suffice?J. Comput. Chem.2009, 30 (12) 18591871
      [Crossref], [PubMed], [CAS], Google Scholar
      9c
      In silico prediction of drug solubility: 4. Will simple potentials suffice?
      Luder Kai; Lindfors Lennart; Westergren Jan; Nordholm Sture; Persson Rasmus; Pedersen Mikaela
      Journal of computational chemistry (2009), 30 (12), 1859-71 ISSN:.
      In view of the extreme importance of reliable computational prediction of aqueous drug solubility, we have established a Monte Carlo simulation procedure which appears, in principle, to yield reliable solubilities even for complex drug molecules. A theory based on judicious application of linear response and mean field approximations has been found to reproduce the computationally demanding free energy determinations by simulation while at the same time offering mechanistic insight. The focus here is on the suitability of the model of both drug and solvent, i.e., the force fields. The optimized potentials for liquid simulations all atom (OPLS-AA) force field, either intact or combined with partial charges determined either by semiempirical AM1/CM1A calculations or taken from the condensed-phase optimized molecular potentials for atomistic simulation studies (COMPASS) force field has been used. The results illustrate the crucial role of the force field in determining drug solubilities. The errors in interaction energies obtained by the simple force fields tested here are still found to be too large for our purpose but if a component of this error is systematic and readily removed by empirical adjustment the results are significantly improved. In fact, consistent use of the OPLS-AA Lennard-Jones force field parameters with partial charges from the COMPASS force field will in this way produce good predictions of amorphous drug solubility within 1 day on a standard desktop PC. This is shown here by the results of extensive new simulations for a total of 47 drug molecules which were also improved by increasing the water box in the hydration simulations from 500 to 2000 water molecules.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1MvltlGhtQ%253D%253D&md5=34e04b040a9ae704fd5ea7b969d7e5b4
      (d) Westergren, J.; Lindfors, L.; Hoglund, T.; Luder, K.; Nordholm, S.; Kjellander, R.In silico prediction of drug solubility: 1. Free energy of hydrationJ. Phys. Chem. B2007, 111 (7) 18721882
      [ACS Full Text ], [CAS], Google Scholar
      9d
      In Silico Prediction of Drug Solubility: 1. Free Energy of Hydration
      Westergren, Jan; Lindfors, Lennart; Hoeglund, Tobias; Lueder, Kai; Nordholm, Sture; Kjellander, Roland
      Journal of Physical Chemistry B (2007), 111 (7), 1872-1882CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)
      As a first step in the computational prediction of drug soly. the free energy of hydration, ΔGvw•, in TIP4P water has been computed for a data set of 48 drug mols. using the free energy of perturbation method and the optimized potential for liq. simulations all-atom force field. The simulations were performed in two steps, where first the Coulomb and then the Lennard-Jones interactions between the solute and the water mols. were scaled down from full to zero strength to provide phys. understanding and simpler predictive models. The results have been interpreted using a theory assuming ΔGvw• = AMSγ + ELJ + EC/2 where AMS is the mol. surface area, γ is the water-vapor surface tension, and ELJ and EC are the solute-water Lennard-Jones and Coulomb interaction energies, resp. It was found that by a proper definition of the mol. surface area our results as well as several results from the literature were found to be in quant. agreement using the macroscopic surface tension of TIP4P water. This is in contrast to the surface tension for water around a spherical cavity that previously has been shown to be dependent on the size of the cavity up to a radius of ∼1 nm. The step of scaling down the electrostatic interaction can be represented by linear response theory.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtVynsLc%253D&md5=c77f8ac363fc84656673504c964a0b50
    10. 10
      Tomasi, J.; Mennucci, B.; Cammi, R.Quantum Mechanical Continuum Solvation ModelsChem. Rev.2005, 105 (8) 29993094
      [ACS Full Text ], [CAS], Google Scholar
      10
      Tomasi, Jacopo; Mennucci, Benedetta; Cammi, Roberto
      Chemical Reviews (Washington, DC, United States) (2005), 105 (8), 2999-3093CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmsVynurc%253D&md5=462420dd18b3006ee63d1298b66db247
    11. 11
      (a) Ten-no, S.Free energy of solvation for the reference interaction site model: Critical comparison of expressionsJ. Phys. Chem.2001, 115 (8) 37243731
      [Crossref], [CAS], Google Scholar
      11a
      Free energy of solvation for the reference interaction site model: Critical comparison of expressions
      Journal of Chemical Physics (2001), 115 (8), 3724-3731CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)
      We investigate expressions of excess chem. potential in the ref. interaction site model (RISM) integral equation theory. In addn. to the previous expressions from the Gaussian d. fluctuation theory and from the extended RISM (XRISM) theory, we examine a new free energy functional from the distributed partial wave expansion of mol. correlation functions, using the embedded site model and alcs. with different parameter sets. The results clearly show that the free energy of solvation in the XRISM theory includes a serious error, which is related to the no. of interaction sites and the geometry of a solute mol.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlvVehsbo%253D&md5=42db0b786f7a20ff6ddecb54d0f40dbc
      (b) Palmer, D. S.; Frolov, A. I.; Ratkova, E. L.; Fedorov, M. V.Towards a universal method for calculating hydration free energies: A 3D reference interaction site model with partial molar volume correctionJ. Phys.: Condens. Matter2010, 22 (49) 492101
      [Crossref], [PubMed], [CAS], Google Scholar
      11b
      Towards a universal method for calculating hydration free energies: a 3D reference interaction site model with partial molar volume correction
      Palmer, David S.; Frolov, Andrey I.; Ratkova, Ekaterina L.; Fedorov, Maxim V.
      Journal of Physics: Condensed Matter (2010), 22 (49), 492101/1-492101/9CODEN: JCOMEL; ISSN:0953-8984. (Institute of Physics Publishing)
      We report a simple universal method to systematically improve the accuracy of hydration free energies calcd. using an integral equation theory of mol. liqs., the 3D ref. interaction site model. A strong linear correlation is obsd. between the difference of the exptl. and (uncorrected) calcd. hydration free energies and the calcd. partial molar volume for a data set of 185 neutral org. mols. from different chem. classes. By using the partial molar volume as a linear empirical correction to the calcd. hydration free energy, we obtain predictions of hydration free energies in excellent agreement with expt. (R = 0.94, σ = 0.99 kcal mol-1 for a test set of 120 org. mols.).
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjs1ahug%253D%253D&md5=6b660f5ec50212304923b5c4f6ed75d7
    12. 12
      Stanton, R. V.; Hartsough, D. S.; Merz, K. M.Calculation of solvation free energies using a density functional/molecular dynamics coupled potentialJ. Phys. Chem.1993, 97 (46) 1186811870
      [ACS Full Text ], [CAS], Google Scholar
      12
      Calculation of solvation free energies using a density functional/molecular dynamics coupled potential
      Stanton, Robert V.; Hartsough, David S.; Merz, Kenneth M., Jr.
      Journal of Physical Chemistry (1993), 97 (46), 11868-70CODEN: JPCHAX; ISSN:0022-3654.
      Recently there was much interest in the development of methods which couple quantum mech. and mol. mech. computational models. The authors report the 1st coupling of a d. functional Hamiltonian with a mol. mech. method. The AMBER force field was coupled with a d. functional Hamiltonian as implemented in the deMon program. Test calcns. of solvation energies were carried out for a small group of ions. The coupled potential method slightly underestimates the solvation energy of the chloride ion while it overestimates the solvation energy of the other ions studied. Nonetheless, this method allows to study condensed-phase systems at a level of accuracy currently not available.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXmsFOns70%253D&md5=79256a29182e1a36bbaf822f0ea85b73
    13. 13
      Ratkova, E. L.; Fedorov, M. V.Combination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic PollutantsJ. Chem. Theory Comput.2011, 7 (5) 14501457
      [ACS Full Text ], [CAS], Google Scholar
      13
      Combination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic Pollutants
      Journal of Chemical Theory and Computation (2011), 7 (5), 1450-1457CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)
      The authors discuss a new method for predicting the hydration free energy (HFE) of org. pollutants and illustrate the efficiency of the method on a set of 220 chlorinated arom. hydrocarbons. The new model is computationally inexpensive, with one HFE calcn. taking less than a minute on a PC. The method is based on a combination of a mol. integral equations theory, one-dimensional ref. interaction site model (1D RISM), with the cheminformatics approach. The authors correct HFEs obtained by the 1D RISM with a set of empirical corrections. The corrections are assocd. with the partial molar volume and structural descriptors of the mols. The introduced corrections can significantly improve the quality of the 1D RISM HFE predictions obtained by the partial wave free energy expression and the Kovalenko-Hirata closure. The quality of the model can be further improved by the reparametrization using QM-derived partial charges instead of the originally used OPLS-AA partial charges. The final model gives good results for polychlorinated benzenes (the mean and std. deviation of the error are 0.02 and 0.36 kcal/mol, correspondingly). At the same time, the model gives somewhat worse results for polychlorobiphenyls (PCBs) with a systematic bias of -0.72 kcal/mol but a small std. deviation equal to 0.55 kcal/mol. The error remains the same for the whole set of PCBs, whereas errors of HFEs predicted with continuum solvation models increase significantly for higher chlorinated PCB congeners. The authors discuss potential future applications of the model and several avenues for its further improvement.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXltFGmsbk%253D&md5=73f9a2a58266597d0f7661e84b4a43dd
    14. 14
      Allen, F. H.The Cambridge Structural Database: a quarter of a million crystal structures and risingActa Crystallogr B2002, B58, 380388
      [Crossref], [CAS], Google Scholar
      14
      The Cambridge Structural Database: a quarter of a million crystal structures and rising
      Acta Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 380-388CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)
      The Cambridge Structural Database (CSD) now contains data for more than a quarter of a million small-mol. crystal structures. The information content of the CSD, together with methods for data acquisition, processing and validation, are summarized, with particular emphasis on the chem. information added by CSD editors. Nearly 80% of new structural data arrives electronically, mostly in CIF format, and the CCDC acts as the official crystal structure data depository for 51 major journals. The CCDC now maintains both a CIF archive (more than 73000 CIFs dating from 1996), as well as the distributed binary CSD archive; the availability of data in both archives is discussed. A statistical survey of the CSD is also presented and projections concerning future accession rates indicate that the CSD will contain at least 500000 crystal structures by the year 2010.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu74%253D&md5=406cd0df6ea9035a0ebf8dd9eccbd1f8
    15. 15
      Box, K.; Comer, J. E.; Gravestock, T.; Stuart, M.New Ideas about the Solubility of DrugsChem. Biodiversity2009, 6 (11) 17671788
      [Crossref], [PubMed], [CAS], Google Scholar
      15
      Box, Karl; Comer, John E.; Gravestock, Tom; Stuart, Martin
      Chemistry & Biodiversity (2009), 6 (11), 1767-1788CODEN: CBHIAM; ISSN:1612-1872. (Verlag Helvetica Chimica Acta)
      Methods are described for detecting pptn. of ionizable drugs under conditions of changing pH, estg. kinetic soly. from the onset of pptn., and measuring soly. by chasing equil. Definitions are presented for kinetic, equil., and intrinsic soly. of ionizable drugs, supersatn. and subsatn., and for chasers and non-chasers, which are 2 classes of ionizable drug with significantly different soly. properties. The use of Bjerrum Curves and Neutral-Species Concn. Profiles to depict soly. properties are described and illustrated with case studies showing super-dissolving behavior, conversion between cryst. forms and enhancement of soly. through supersatn., and the use of additives and simulated gastrointestinal fluids.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhsFSrtLrE&md5=57fc3d9b4f1490ded75d71d1cfe20c58
    16. 16
      (a) Hopfinger, A. J.; Esposito, E. X.; Llinàs, A.; Glen, R. C.; Goodman, J. M.Findings of the Challenge To Predict Aqueous SolubilityJ. Chem. Inf. Model.2008, 49 (1) 15
      [ACS Full Text ], Google Scholar
      There is no corresponding record for this reference.
      (b) Llinàs, A.; Glen, R. C.; Goodman, J. M.Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements?J. Chem. Inf. Model.2008, 48 (7) 12891303
      [ACS Full Text ], [CAS], Google Scholar
      16b
      Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements?
      Llinas, Antonio; Glen, Robert C.; Goodman, Jonathan M.
      Journal of Chemical Information and Modeling (2008), 48 (7), 1289-1303CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
      Soly. is a key physicochem. property of mols. Serious deficiencies exist in the consistency and reliability of soly. data in the literature. The accurate prediction of soly. would be very useful. However, systematic errors and lack of metadata assocd. with measurements greatly reduce the confidence in current models. To address this, we are accurately measuring intrinsic soly. values, and here we report results for a diverse set of 100 druglike mols. at 25° and an ionic strength of 0.15 M using the CheqSol approach. This is a highly reproducible potentiometric technique that ensures the thermodn. equil. is reached rapidly. Results with a coeff. of variation higher than 4% were rejected. In addn., the Potentiometric Cycling for Polymorph Creation method, [PC]2, was used to obtain multiple polymorph forms from aq. soln. We now challenge researchers to predict the intrinsic soly. of 32 other druglike mols. that have been measured but are yet to be published.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXosV2itb4%253D&md5=6a8950fc1c51ad9a51731c65e6debc22
    17. 17
      The Goodman group. http://www-jmg.ch.cam.ac.uk/data/solubility/ (accessed Feb. 8,2013) .
      Google Scholar
      There is no corresponding record for this reference.
    18. 18
      Narasimham, L.; Barhate, V. D.Kinetic and intrinsic solubility determination of some β-blockers and antidiabetics by potentiometryJ. Pharm. Res.2011, 4 (2) 532536
      Google Scholar
      There is no corresponding record for this reference.
    19. 19
      (a) Bergström, C. A. S.; Luthman, K.; Artursson, P.Accuracy of calculated pH-dependent aqueous drug solubilityEur. J. Pharm. Sci.2004, 22 (5) 387398
      [Crossref], [PubMed], [CAS], Google Scholar
      19a
      Accuracy of calculated pH-dependent aqueous drug solubility
      Bergstrom, Christel A. S.; Luthman, Kristina; Artursson, Per
      European Journal of Pharmaceutical Sciences (2004), 22 (5), 387-398CODEN: EPSCED; ISSN:0928-0987. (Elsevier B.V.)
      The aim of the present study was to investigate the extent to which the Henderson-Hasselbalch (HH) relationship can be used to predict the pH-dependent aq. soly. of cationic drugs. The pH-dependent soly. for 25 amines, carrying a single pos. charge, was detd. with a small-scale shake flask method. Each sample was prepd. as a suspension in 150 mM phosphate buffer. The pH-dependent soly. curves were obtained using at least 10 different pH values. The intrinsic soly., the soly. at the pKa and the soly. at pH values reflecting the pH of the bulk and acid microclimate in the human small intestine (pH 7.4 and 6.5, resp.) were detd. for all compds. The exptl. study revealed a large diversity in slope, from -0.5 (celiprolol) to -8.6 (hydralazine) in the linear pH-dependent soly. interval, which is in sharp contrast to the slope of -1 assumed by the HH equation. In addn., a large variation in the range of soly. between the completely uncharged and completely charged drug species was obsd. The range for disopyramide was only 1.1 log units, whereas that for amiodarone was greater than 6.3 log units, pointing at the compd. specific response to counter-ion effects. In conclusion, the investigated cationic drugs displayed compd. specific pH-dependent soly. profiles, indicating that the HH equation in many cases will only give rough estns. of the pH-dependent soly. of drugs in divalent buffer systems.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXlsl2nsbc%253D&md5=dc25a69e233afd97c1ae1cb490f9de0d
      (b) Bergström, C. A. S.; Wassvik, C. M.; Norinder, U.; Luthman, K.; Artursson, P.Global and Local Computational Models for Aqueous Solubility Prediction of Drug-Like MoleculesJ. Chem. Inf. Comput. Sci.2004, 44 (4) 14771488
      [ACS Full Text ], [CAS], Google Scholar
      19b
      Global and local computational models for aqueous solubility prediction of drug-like molecules
      Bergstrom Christel A S; Wassvik Carola M; Norinder Ulf; Luthman Kristina; Artursson Per
      Journal of chemical information and computer sciences (2004), 44 (4), 1477-88 ISSN:0095-2338.
      The aim of this study was to develop in silico protocols for the prediction of aqueous drug solubility. For this purpose, high quality solubility data of 85 drug-like compounds covering the total drug-like space as identified with the ChemGPS methodology were used. Two-dimensional molecular descriptors describing electron distribution, lipophilicity, flexibility, and size were calculated by Molconn-Z and Selma. Global minimum energy conformers were obtained by Monte Carlo simulations in MacroModel and three-dimensional descriptors of molecular surface area properties were calculated by Marea. PLS models were obtained by use of training and test sets. Both a global drug solubility model (R(2) = 0.80, RMSE(te) = 0.83) and subset specific models (after dividing the 85 compounds into acids, bases, ampholytes, and nonproteolytes) were generated. Furthermore, the final models were successful in predicting the solubility values of external test sets taken from the literature. The results showed that homologous series and subsets can be predicted with high accuracy from easily comprehensible models, whereas consensus modeling might be needed to predict the aqueous drug solubility of datasets with large structural diversity.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2cvgtVSkuw%253D%253D&md5=4945232f11abd9bf5b0510226b82198d
      (c) Ran, Y.; Yalkowsky, S. H.Prediction of Drug Solubility by the General Solubility Equation (GSE)J. Chem. Inf. Comput. Sci.2001, 41 (2) 354357
      [ACS Full Text ], [CAS], Google Scholar
      19c
      Prediction of Drug Solubility by the General Solubility Equation (GSE)
      Journal of Chemical Information and Computer Sciences (2001), 41 (2), 354-357CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)
      The revised GSE proposed by Jain and Yalkowsky is used to est. the aq. soly. of a set of org. nonelectrolytes studied by Jorgensen and Duffy. The only inputs used in the GSE are the Celsius m.p. (MP) and the octanol water partition coeff. (Kow). These are generally known, easily measured, or easily calcd. The GSE does not utilize any fitted parameters. The av. abs. error for the 150 compds. is 0.43 compared to 0.56 with Jorgensen and Duffy's computational method, which utilizes 5 fitted parameters. Thus, the revised GSE is simpler and provides a more accurate estn. of aq. soly. of the same set of org. compds. It is also more accurate than the original version of the GSE.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXislKntA%253D%253D&md5=c15ffefdf6aacf5fdafd95e5adad7017
      (d) Rytting, E.; Lentz, K.; Chen, X.-Q.; Qian, F.; Venkatesh, S.Aqueous and cosolvent solubility data for drug-like organic compoundsAAPS J.2005, 7 (1) E78E105
      [Crossref], [PubMed], [CAS], Google Scholar
      19d
      Aqueous and cosolvent solubility data for drug-like organic compounds
      Rytting, Erik; Lentz, Kimberley A.; Chen, Xue-Qing; Qian, Feng; Venkatesh, Srini
      AAPS Journal (2005), 7 (1), E78-E105CODEN: AJAOB6; ISSN:1550-7416. (American Association of Pharmaceutical Scientists)
      A review. Recently 2 QSPR-based in silico models were developed in the authors' labs. to predict the aq. and non-aq. soly. of drug-like org. compds. For the intrinsic aq. soly. model, a set of 321 structurally diverse drugs was collected from literature for the anal. For the PEG 400 cosolvent model, exptl. data for 122 drugs were obtained by a uniform exptl. procedure at 4 vol. fractions of PEG 400 in water, 0%, 25%, 50%, and 75%. The drugs used in both models represent a wide range of compds., with log P values from -5 to 7.5, and mol. wts. from 100 to >600 g/mol. Because of the standardized procedure used to collect the cosolvent data and the careful assessment of quality used in obtaining literature data, both data sets have potential value for the scientific community for use in building various models that require exptl. soly. data.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXltlOqtr0%253D&md5=9be1e57d033ed321abbfbee0db943619
      (e) Shareef, A.; Angove, M. J.; Wells, J. D.; Johnson, B. B.Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol AJ. Chem. Eng. Data2006, 51 (3) 879881
      [ACS Full Text ], [CAS], Google Scholar
      19e
      Aqueous Solubilities of Estrone, 17β-Estradiol, 17α-Ethynylestradiol, and Bisphenol A
      Shareef, Ali; Angove, Michael J.; Wells, John D.; Johnson, Bruce B.
      Journal of Chemical & Engineering Data (2006), 51 (3), 879-881CODEN: JCEAAX; ISSN:0021-9568. (American Chemical Society)
      The solubilities of three estrogenic hormones-estrone, 17β-estradiol, and 17α-ethynylestradiol - and the industrial pollutant bisphenol A were measured in water, dil. acid and alkali (pH 4 and 10, resp.), and aq. KNO3 (0.01 mol/L-1 and 0.1 mol/L-1). The concns. of satd. solns., after equilibration at (25.0 ± 0.5)° with excess solid for 4 days, were detd. by HPLC. Six replicate results were obtained for each solute-solvent pair and the coeff. of variation was in most cases <5%. The solubilities in pure water with std. deviations were estrone (1.30 ± 0.08) mg/L-1, 17β-estradiol (1.51 ± 0.04) mg/L-1, 17α-ethynylestradiol (9.20 ± 0.09) mg/L-1, and bisphenol A (300 ± 5) mg/L-1. The soly. of each of the hormones was unchanged between pH 4 and pH 7 but was greater at pH 10. At pH 7, the hormones became progressively less sol. as the ionic strength increased from 0.0 to 0.1 mol/L-1. By contrast the soly. of bisphenol A was essentially the same under all of the exptl. conditions tested.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XjtFKisbk%253D&md5=e6eddfd0865cc5f1e0a2da81a92873e9
    20. 20
      CrystalWeb unfortunately withdrawn in2013. http://cds.dl.ac.uk/cds/datasets/crys/cweb/cweb.html.
      Google Scholar
      There is no corresponding record for this reference.
    21. 21
      Bruno, I. J.; Cole, J. C.; Edgington, P. R.; Kessler, M.; Macrae, C. F.; McCabe, P.; Pearson, J.; Taylor, R.New software for searching the Cambridge Structural Database and visualizing crystal structuresActa Crystallogr., Sect. B: Struct. Sci.2002, 58 (3 Part 1) 389397
      [Crossref], [PubMed], [CAS], Google Scholar
      21
      New software for searching the Cambridge Structural Database and visualizing crystal structures
      Bruno, Ian J.; Cole, Jason C.; Edgington, Paul R.; Kessler, Magnus; Macrae, Clare F.; McCabe, Patrick; Pearson, Jonathan; Taylor, Robin
      Acta Crystallographica, Section B: Structural Science (2002), B58 (3, No. 1), 389-397CODEN: ASBSDK; ISSN:0108-7681. (Blackwell Munksgaard)
      Two new programs were developed for searching the Cambridge Structural Database (CSD) and visualizing database entries: ConQuest and Mercury. The former is a new search interface to the CSD, the latter is a high-performance crystal-structure visualizer with extensive facilities for exploring networks of intermol. contacts. Particular emphasis was placed on making the programs as intuitive as possible. Both ConQuest and Mercury run under Windows and various types of Unix, including Linux.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XktVOqu78%253D&md5=b8cd5dddcd43067010fef6d60e37b3c2
    22. 22
      Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E.The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and BioinformaticsJ. Chem. Inf. Comput. Sci.2003, 43 (2) 493500
      [ACS Full Text ], [CAS], Google Scholar
      22
      The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics
      Steinbeck, Christoph; Han, Yongquan; Kuhn, Stefan; Horlacher, Oliver; Luttmann, Edgar; Willighagen, Egon
      Journal of Chemical Information and Computer Sciences (2003), 43 (2), 493-500CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)
      The Chem. Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in mol. informatics, including 2D and 3D rendering of chem. structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXhtVaktbg%253D&md5=afc8fd10783af301c73a8183727230bf
    23. 23
      Gupta, R. R.; Gifford, E. M.; Liston, T.; Waller, C. L.; Hohman, M.; Bunin, B. A.; Ekins, S.Using Open Source Computational Tools for Predicting Human Metabolic Stability and Additional Absorption, Distribution, Metabolism, Excretion, and Toxicity PropertiesDrug Metab. Dispos.2010, 38 (11) 20832090
      [Crossref], [PubMed], [CAS], Google Scholar
      23
      Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties
      Gupta, Rishi R.; Gifford, Eric M.; Liston, Ted; Waller, Chris L.; Hohman, Moses; Bunin, Barry A.; Ekins, Sean
      Drug Metabolism and Disposition (2010), 38 (11), 2083-2090CODEN: DMDSAI; ISSN:0090-9556. (American Society for Pharmacology and Experimental Therapeutics)
      Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source mol. descriptors [e.g., chem. development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary com. software. We initially evaluated open source descriptors and model building algorithms using a training set of approx. 50,000 mols. and a test set of approx. 25,000 mols. with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and pos. predicted value (PPV) = 0.64], equiv. to those of models built with com. Mol. Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 mols. and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we obsd. a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of com. software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVWqurjN&md5=7366b0c99868668e5b95f4e60093814f
    24. 24
      O’Boyle, N.Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChIJ. Cheminform.2012, 4 (1) 22
      [Crossref], [CAS], Google Scholar
      24
      Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
      Journal of cheminformatics (2012), 4 (1), 22 ISSN:.
      UNLABELLED: BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain - such as the development of a standard aromatic model for SMILES - the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38botVKjsw%253D%253D&md5=c9107b5c0392711cee66979cfa7356c5
    25. 25
      RSC ChemSpider. (accessed Feb. 8,2013) .
      Google Scholar
      There is no corresponding record for this reference.
    26. 26
      Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; Soiland-Reyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M. P.; Sufi, S.; Goble, C.The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloudNucleic Acids Res.2013, 41 (W1) W557W561
      [Crossref], [PubMed], Google Scholar
      There is no corresponding record for this reference.
    27. 27
      Little, J. L.; Williams, A. J.; Pshenichnov, A.; Tkachenko, V.Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderJ. Am. Soc. Mass. Spectrom.2012, 23 (1) 179185
      [Crossref], [PubMed], [CAS], Google Scholar
      27
      Identification of 'known unknowns' utilizing accurate mass data and ChemSpider
      Little, James L.; Williams, Antony J.; Pshenichnov, Alexey; Tkachenko, Valery
      Journal of the American Society for Mass Spectrometry (2012), 23 (1), 179-185CODEN: JAMSEF; ISSN:1044-0305. (Springer)
      In many cases, an unknown to an investigator is actually known in the chem. literature, a ref. database, or an internet resource. We refer to these types of compds. as 'known unknowns.'. ChemSpider is a very valuable internet database of known compds. useful in the identification of these types of compds. in com., environmental, forensic, and natural product samples. The database contains over 26 million entries from hundreds of data sources and is provided as a free resource to the community. Accurate mass mass spectrometry data is used to query the database by either elemental compn. or a monoisotopic mass. Searching by elemental compn. is the preferred approach. However, it is often difficult to det. a unique elemental compn. for compds. with mol. wts. greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results are refined by sorting the no. of refs. assocd. with each compd. in descending order. This raises the most useful candidates to the top of the list for further evaluation. These approaches were shown to be successful in identifying 'known unknowns' noted in our lab. and for compds. of interest to others.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XltVaitbw%253D&md5=4fd410551631762b48d179362a82a971
    28. 28
      Goble, C. A.; Bhagat, J.; Aleksejevs, S.; Cruickshank, D.; Michaelides, D.; Newman, D.; Borkum, M.; Bechhofer, S.; Roos, M.; Li, P.; De Roure, D.myExperiment: A repository and social network for the sharing of bioinformatics workflowsNucleic Acids Res.2010, 38 (suppl 2) W677W682
      [Crossref], [PubMed], [CAS], Google Scholar
      28
      myExperiment: a repository and social network for the sharing of bioinformatics workflows
      Goble, Carole A.; Bhagat, Jiten; Aleksejevs, Sergejs; Cruickshank, Don; Michaelides, Danius; Newman, David; Borkum, Mark; Bechhofer, Sean; Roos, Marco; Li, Peter; De Roure, David
      Nucleic Acids Research (2010), 38 (Web Server), W677-W682CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
      MyExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and anal., to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to [email protected]
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXotVWjsrs%253D&md5=201899f12a0151252eaebc638813171b
    29. 29
      De Ferrari, L.Workflow Entry: From molecule name to SMILE and InchI using ChemSpider. http://www.myexperiment.org/workflows/3603.html. (accessed 10th February2014) .
      Google Scholar
      There is no corresponding record for this reference.
    30. 30
      Griseofulvin. http://en.wikipedia.org/wiki/Griseofulvin (accessed 11th December 2012. SMILES source).
      Google Scholar
      There is no corresponding record for this reference.
    31. 31
      Glipizide. http://en.wikipedia.org/wiki/Glipizide (accessed 11th December 2012. SMILES source).
      Google Scholar
      There is no corresponding record for this reference.
    32. 32
      Stone, A.Distributed Multipole Analysis of Gaussian wavefunctions GDMA version 2.2.02. http://www-stone.ch.cam.ac.uk/documentation/gdma/manual.pdf (accessed Feb. 10, 2014).
      Google Scholar
      There is no corresponding record for this reference.
    33. 33
      Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; shida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, N. J.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J.Gaussian 09, Gaussian, Inc: Wallingford, CT,2009.
      Google Scholar
      There is no corresponding record for this reference.
    34. 34
      Stone, A. J.Distributed multipole analysis, or how to describe a molecular charge distributionChem. Phys. Lett.1981, 83 (2) 233239
      [Crossref], [CAS], Google Scholar
      34
      Distributed multipole analysis, or how to describe a molecular charge distribution
      Chemical Physics Letters (1981), 83 (2), 233-9CODEN: CHPLBC; ISSN:0009-2614.
      A method of analyzing mol. wavefunctions is described. It can be regarded as an extension of Mulliken population anal., and can be used both to give a qual. or quant. picture of the mol. charge distribution, and in the accurate evaluation of mol. multipole moments of arbitrary order with negligible computational effort.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL3MXmt1yitbY%253D&md5=1a7ac695fa444006688ea669d36a3d55
    35. 35
      Buckingham, R.The classical equation of state of gaseous helium, neon and argonProc. R. Soc. Lon. Ser-A1938, 168 (933) 264283
      [Crossref], Google Scholar
      There is no corresponding record for this reference.
    36. 36
      Gavezzotti, A.; Filippini, G.Theoretical Aspects and Computer Modeling.; Gavezzotti, A., Ed. Wiley and Sons: Chichester,1997; pp 6197.
      Google Scholar
      There is no corresponding record for this reference.
    37. 37
      Marenich, A. V.; Cramer, C. J.; Truhlar, D. G.Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface TensionsJ. Phys. Chem. B2009, 113 (18) 63786396
      [ACS Full Text ], [CAS], Google Scholar
      37
      Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions
      Marenich, Aleksandr V.; Cramer, Christopher J.; Truhlar, Donald G.
      Journal of Physical Chemistry B (2009), 113 (18), 6378-6396CODEN: JPCBFK; ISSN:1520-6106. (American Chemical Society)
      We present a new continuum solvation model based on the quantum mech. charge d. of a solute mol. interacting with a continuum description of the solvent. The model is called SMD, where the 'D' stands for 'd.' to denote that the full solute electron d. is used without defining partial at. charges. 'Continuum' denotes that the solvent is not represented explicitly but rather as a dielec. medium with surface tension at the solute-solvent boundary. SMD is a universal solvation model, where 'universal' denotes its applicability to any charged or uncharged solute in any solvent or liq. medium for which a few key descriptors are known (in particular, dielec. const., refractive index, bulk surface tension, and acidity and basicity parameters). The model separates the observable solvation free energy into two main components. The first component is the bulk electrostatic contribution arising from a self-consistent reaction field treatment that involves the soln. of the nonhomogeneous Poisson equation for electrostatics in terms of the integral-equation-formalism polarizable continuum model (IEF-PCM). The cavities for the bulk electrostatic calcn. are defined by superpositions of nuclear-centered spheres. The second component is called the cavity-dispersion-solvent-structure term and is the contribution arising from short-range interactions between the solute and solvent mols. in the first solvation shell. This contribution is a sum of terms that are proportional (with geometry-dependent proportionality consts. called at. surface tensions) to the solvent-accessible surface areas of the individual atoms of the solute. The SMD model has been parametrized with a training set of 2821 solvation data including 112 aq. ionic solvation free energies, 220 solvation free energies for 166 ions in acetonitrile, methanol, and DMSO, 2346 solvation free energies for 318 neutral solutes in 91 solvents (90 nonaq. org. solvents and water), and 143 transfer free energies for 93 neutral solutes between water and 15 org. solvents. The elements present in the solutes are H, C, N, O, F, Si, P, S, Cl, and Br. The SMD model employs a single set of parameters (intrinsic at. Coulomb radii and at. surface tension coeffs.) optimized over six electronic structure methods: M05-2X/MIDI!6D, M05-2X/6-31G*, M05-2X/6-31+G**, M05-2X/cc-pVTZ, B3LYP/6-31G*, and HF/6-31G*. Although the SMD model has been parametrized using the IEF-PCM protocol for bulk electrostatics, it may also be employed with other algorithms for solving the nonhomogeneous Poisson equation for continuum solvation calcns. in which the solute is represented by its electron d. in real space. This includes, for example, the conductor-like screening algorithm. With the 6-31G* basis set, the SMD model achieves mean unsigned errors of 0.6-1.0 kcal/mol in the solvation free energies of tested neutrals and mean unsigned errors of 4 kcal/mol on av. for ions with either Gaussian03 or GAMESS.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXksV2is74%253D&md5=54931a64c70d28445ee53876a8b1a4b9
    38. 38
      (a) Ben-Naim, A.Standard thermodynamics of transfer. Uses and misusesJ. Phys. Chem.1978, 82 (7) 792803
      [ACS Full Text ], [CAS], Google Scholar
      38a
      Standard thermodynamics of transfer. Uses and misuses
      Journal of Physical Chemistry (1978), 82 (7), 792-803CODEN: JPCHAX; ISSN:0022-3654.
      The std. free energy of transfer of a solute A between two solvents a and b is discussed at both a thermodn. and a statistical-mech. level. Whereas thermodn. alone cannot be used to choose the 'best' std. quantity, statistical mechanics can help to make such a choice. The std. free energy of transferrin A, ΔμA°, computed by using the no. d. (or molarity) scale has the following advantages: (1) it is the simplest and least ambiguous quantity; (2) it is the quantity that directly probes the difference in the solvation properties of the two solvents with respect to the solute A; (3) it can be used, without any change of notation, in any soln., not necessarily a dil. one, and including even pure A; (4) by straightforward thermodn. manipulations one obtains the entropy, enthalpy, vol. changes, etc., for the same process. All of these quantities have advantages similar to those indicated for the free-energy change. Because of the advantages of this particular choice of std. quantities, it is proposed to 'standardize' the use of the std. thermodn. quantities of transfer and refer to them as the local-std. quantities.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE1cXht1Ohtrs%253D&md5=7bc3f4c27e458daf5ba115dcd69092d6
      (b) Ben-Naim, A.; Marcus, Y.Solvation thermodynamics of nonionic solutesJ. Phys. Chem.1984, 81 (4) 20162027
      [Crossref], [CAS], Google Scholar
      38b
      Ben-Naim, A.; Marcus, Y.
      Journal of Chemical Physics (1984), 81 (4), 2016-27CODEN: JCPSA6; ISSN:0021-9606.
      A generalized process of solvation is defined. It is argued that the thermodn. of this solvation process is more informative as compared with other processes suggested before. Numerical examples are presented and compared with some recently published related data.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXlvFyqsb4%253D&md5=48040de9341c4f18c98dc4c66266b017
    39. 39
      Howley, T.; Madden, M. G.; O’Connell, M.-L.; Ryder, A. G.The effect of principal component analysis on machine learning accuracy with high-dimensional spectral dataKnowl.-Based Syst.2006, 19 (5) 363370
      [Crossref], Google Scholar
      There is no corresponding record for this reference.
    40. 40
      Wold, H.Partial Least Squares (PLS) Regression2003, 17
      Google Scholar
      There is no corresponding record for this reference.
    41. 41
      (a) Abdi, H.Partial Least Squares (PLS) Regression2003, 17
      Google Scholar
      There is no corresponding record for this reference.
      (b) Wold, S.; Sjöström, M.; Eriksson, L.PLS-regression: A basic tool of chemometricsChemometr. Intell. Lab.2001, 58 (2) 109130
      [Crossref], [CAS], Google Scholar
      41b
      Wold, Svante; Sjostrom, Michael; Eriksson, Lennart
      Chemometrics and Intelligent Laboratory Systems (2001), 58 (2), 109-130CODEN: CILSEN; ISSN:0169-7439. (Elsevier Science B.V.)
      A review on PLS-regression (PLSR) as a std. tool in chemometrics and used in chem. and engineering. The underlying model and its assumption and commonly used diagnostics are discussed, together with the interpretation of resulting parameters. Two examples are used as illustrations: first, a Quant. Structure-Activity Relationship (QSAR)/Quant. Structure Property Relationship (QSPR) data set of peptides is used to outline the development, interpretation, and refinement of a PLSR model. Second, a data set from the manufg. of recycled paper is analyzed to illustrate time series modeling of process data by means of PLSR and time-lagged X-variables.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXotF2mtLw%253D&md5=2d7fd1e946600e138ac92699ebcc7e29
      (c) Mevik, B.; Wehrens, R.The pls Package: Principal Component and Partial Least Squares Regression in RJ Stat Softw.2007, 18 (2) 124
      Google Scholar
      There is no corresponding record for this reference.
    42. 42
      (a) Palmer, D. S.; O’Boyle, N. M.; Glen, R. C.; Mitchell, J. B. O.Random Forest Models To Predict Aqueous SolubilityJ. Chem. Inf. Model2006, 47 (1) 150158
      [ACS Full Text ], Google Scholar
      There is no corresponding record for this reference.
      (b) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P.Random Forest: A Classification and Regression Tool for Compound Classification and QSAR ModelingJ. Chem. Inf. Comput. Sci.2003, 43 (6) 19471958
      [ACS Full Text ], [CAS], Google Scholar
      42b
      Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling
      Svetnik, Vladimir; Liaw, Andy; Tong, Christopher; Culberson, J. Christopher; Sheridan, Robert P.; Feuston, Bradley P.
      Journal of Chemical Information and Computer Sciences (2003), 43 (6), 1947-1958CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)
      A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compd.'s quant. or categorical biol. activity based on a quant. description of the compd.'s mol. structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. The authors built predictive models for six cheminformatics data sets. The authors anal. demonstrates that Random Forest is a powerful tool capable of delivering performance that is among the most accurate methods to date. The authors also present three addnl. features of Random Forest: built-in performance assessment, a measure of relative importance of descriptors, and a measure of compd. similarity that is weighted by the relative importance of descriptors. It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXos1Wiu7s%253D&md5=dea7867551ec30260b0091b90593a660
    43. 43
      Breiman, L.Random ForestsMach. Learning2001, 45 (1) 532
      [Crossref], Google Scholar
      There is no corresponding record for this reference.
    44. 44
      (a) Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z.; Chen, X.; Li, H.-D.Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machineJ. Chemometr.2010, 24 (9) 584595
      Google Scholar
      There is no corresponding record for this reference.
      (b) Vapnik, V. N.An overview of statistical learning theoryIEEE Trans. Neural Netw.1999, 10 (5) 988999
      [Crossref], [PubMed], [CAS], Google Scholar
      44b
      Vapnik V N
      IEEE transactions on neural networks (1999), 10 (5), 988-99 ISSN:1045-9227.
      Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1c%252FpsFSqtA%253D%253D&md5=d4e24c4899519f0c21087b610e28c849
    45. 45
      Hu, S. In R2 Vs. r2, SCEA/ISPA Conference,2008; pp 115.
      Google Scholar
      There is no corresponding record for this reference.
    46. 46
      Menke, J.; Martinez, T. R.In Using permutations instead of student’s t distribution for p-values in paired-difference algorithm comparisons, IEEE IJCNN, July 25–29, 2004;2004; Vol. 2, pp 13311335.
      Google Scholar
      There is no corresponding record for this reference.
    47. 47
      Nath, N.; Mitchell, J. B. O.Is EC class predictable from reaction mechanism?BMC Bioinformatics2012, 13 (1) 60
      [Crossref], [PubMed], [CAS], Google Scholar
      47
      Nath Neetika; Mitchell John B O
      BACKGROUND: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. RESULTS: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. CONCLUSIONS: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways.The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38rnslKjtA%253D%253D&md5=c3f196743c10b505f74b4528c839c4dc
    48. 48
      Kuhn, M.Variable Importance Using The caret Package 2012; Available via the Internet at http://cran.open-source-solution.org/web/packages/caret/vignettes/caretVarImp.pdf, accessed Feb. 10,2014.
      Google Scholar
      There is no corresponding record for this reference.
    49. 49
      Kuhn, M.Variable Importance Using The caret Package2010, 17
      Google Scholar
      There is no corresponding record for this reference.
    50. 50
      Varma, S.; Simon, R.Bias in error estimation when using cross-validation for model selectionBMC Bioinform.2006, 7 (1) 91
      [Crossref], [PubMed], [CAS], Google Scholar
      50
      Bias in error estimation when using cross-validation for model selection
      BMC bioinformatics (2006), 7 (), 91 ISSN:.
      BACKGROUND: Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. RESULTS: We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these 'null' datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With 'null' and 'non null' (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the 'null' datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training and 'non-null' data distributions. CONCLUSION: We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD287ktFKlsA%253D%253D&md5=6fce0c91e4624476b4134dd4545af4ce
    51. 51
      Simon, R. M.; Subramanian, J.; Li, M.-C.; Menezes, S.Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional dataBrief. Bioinform.2011, 12 (3) 203214
      [Crossref], [PubMed], [CAS], Google Scholar
      51
      Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data
      Simon Richard M; Subramanian Jyothi; Li Ming-Chung; Menezes Supriya
      Briefings in bioinformatics (2011), 12 (3), 203-14 ISSN:.
      Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell's concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MvoslSqsg%253D%253D&md5=a1ba7c32d7741dfb57f6dd2564511bb6
    52. 52
      R Development Core Team. R: A language and environment for statistical computing; R Foundation for Statistical Computing, Vienna, Austria,2011.
      Google Scholar
      There is no corresponding record for this reference.
    53. 53
      (a) Kuhn, M.Building Predictive Models in R Using the caret PackageJ. Stat. Software2008, 28, 126
      [Crossref], [PubMed], Google Scholar
      There is no corresponding record for this reference.
      (b) Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.R Core Team. caret: Classification and Regression Training. R Package “caret”. http://CRAN.R-project.org/package=caret.
      Google Scholar
      There is no corresponding record for this reference.
    54. 54
      Walters, W. P.Modeling, Informatics, and the Quest for ReproducibilityJ. Chem. Inf. Model2013, 53 (7) 15291530
      [ACS Full Text ], [CAS], Google Scholar
      54
      Modeling, Informatics, and the Quest for Reproducibility
      Journal of Chemical Information and Modeling (2013), 53 (7), 1529-1530CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
      A review. There is no doubt that papers published in the Journal of Chem. Information and Modeling, and related journals, provide valuable scientific information. However, it is often difficult to reproduce the work described in mol. modeling and chemoinformatics papers. In many cases the software described in the paper is not readily available, in other cases the supporting information is not provided in an accessible format. To date, the major journals in the fields of mol. modeling and chemoinformatics have not established guidelines for reproducible research. This letter provides an overview of the reproducibility challenges facing our field and suggests some guidelines for improving the reproducibility of published work.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpt1ylsLo%253D&md5=99ec193c7cc97cee6f0b2cf820454ca4
    55. 55
      (a) Dearden, J. C.In silico prediction of aqueous solubilityExpert Opin. Drug Discovery2006, 1 (1) 3152
      [Crossref], [PubMed], [CAS], Google Scholar
      55a
      Dearden, John C.
      Expert Opinion on Drug Discovery (2006), 1 (1), 31-52CODEN: EODDBX; ISSN:1746-0441. (Informa Healthcare)
      A review. The fundamentals of aq. soly., and the factors that affect it, are briefly outlined, followed by a short introduction to quant. structure-property relationships. Early (pre-1990) work on aq. soly. prediction is summarized, and a more detailed presentation and crit. discussion are given of the results of most, if not all, of those published in silico prediction studies from 1990 onwards that have used diverse training sets. A table is presented of a no. of studies that have used a 21-compd. test set of drugs and pesticides to validate their aq. soly. models. Finally, the results are given of a test of 15 com. available software programs for aq. soly. prediction, using a test set of 122 drugs with accurately measured aq. solubilities.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtVChtL%252FO&md5=1daa383519ba4ffc887c773847d8a656
      (b) Jorgensen, W. L.; Duffy, E. M.Prediction of drug solubility from structureAdv. Drug Delivery Rev.2002, 54 (3) 355366
      [Crossref], [PubMed], [CAS], Google Scholar
      55b
      Jorgensen, William L.; Duffy, Erin M.
      Advanced Drug Delivery Reviews (2002), 54 (3), 355-366CODEN: ADDREP; ISSN:0169-409X. (Elsevier Science Ireland Ltd.)
      A review with refs. The aq. soly. of a drug is an important factor affecting its bioavailability. Numerous computational methods have been developed for the prediction of aq. soly. from a compd.'s structure. A review is provided of the methodol. and quality of results for the most useful procedures including the model implemented in the QikProp program. Viable methods now exist for predictions with <1 log unit uncertainty, which is adequate for prescreening synthetic candidates or design of combinatorial libraries. Further progress with predictive methods would require an exptl. database of highly accurate solubilities for a large, diverse collection of drug-like mols.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xitlartbc%253D&md5=bc749286d56bf55c26d25b70806217e1
    56. 56
      Lusci, A.; Pollastri, G.; Baldi, P.Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like MoleculesJ. Chem. Inf. Model.2013, 53, 15631575
      [ACS Full Text ], [CAS], Google Scholar
      56
      Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules
      Lusci, Alessandro; Pollastri, Gianluca; Baldi, Pierre
      Journal of Chemical Information and Modeling (2013), 53 (7), 1563-1575CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
      A review. Shallow machine learning methods have been applied to chemoinformatics problems with some success. As more data becomes available and more complex problems are tackled, deep machine learning methods may also become useful. Here, we present a brief overview of deep learning methods and show in particular how recursive neural network approaches can be applied to the problem of predicting mol. properties. However, mols. are typically described by undirected cyclic graphs, while recursive approaches typically use directed acyclic graphs. Thus, we develop methods to address this discrepancy, essentially by considering an ensemble of recursive neural networks assocd. with all possible vertex-centered acyclic orientations of the mol. graph. One advantage of this approach is that it relies only minimally on the identification of suitable mol. descriptors because suitable representations are learned automatically from the data. Several variants of this approach are applied to the problem of predicting aq. soly. and tested on four benchmark data sets. Exptl. results show that the performance of the deep learning methods matches or exceeds the performance of other state-of-the-art methods according to several evaluation metrics and expose the fundamental limitations arising from training sets that are too small or too noisy. A Web-based predictor, AquaSol, is available online through the ChemDB portal (cdb.ics.uci.edu) together with addnl. material.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpvVGht7g%253D&md5=d51e537fea2f1f53ea5013224ee1cdc9
    57. 57
      Wang, R.; Gao, Y.; Lai, L.Calculating partition coefficient by atom-additive methodPerspect. Drug Discovery Des.2000, 19 (1) 4766
      [Crossref], [CAS], Google Scholar
      57
      Calculating partition coefficient by atom-additive method
      Perspectives in Drug Discovery and Design (2000), 19 (Hydrophobicity and Solvation in Drug Design, Pt. 3), 47-66CODEN: PDDDEC; ISSN:0928-2866. (Kluwer Academic Publishers)
      A new atom-additive method is presented for calcg. octanol/H2O partition coeff. (log P) of org. compds. The method, XLOGP v2.0, gives log P values by summing the contributions of component atoms and correction factors. Altogether 90 atom types are used to classify C, N, O, S, P and halogen atoms, and 10 correction factors are used for some special substructures. The contributions of each atom type and correction factor are derived by multivariate regression anal. of 1853 org. compds. with known exptl. log P values. The correlation coeff. (r) for fitting the whole set is 0.973 and the std. deviation (s) is 0.349 log units. Comparison of various log P calcn. procedures demonstrates that method gives much better results than other atom-additive approaches and is at least comparable to fragmental approaches. Because of the simple methodol., the missing fragment problem does not occur in method.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXnslaitbg%253D&md5=90d29fc6b268c9af644b244eb0bc6912
    58. 58
      Kier, L. B.; Hall, L. H.Molecular Connectivity in Chemistry and Drug Research; Academic Press: New York,1976.
      Google Scholar
      There is no corresponding record for this reference.
    59. 59
      Moreau, G.; Broto, P.The autocorrelation of a topological structure: A new molecular descriptorNew J. Chem.1980, 359360
      Google Scholar
      There is no corresponding record for this reference.
    60. 60
      Randic, M.On molecular identification numbersJ. Chem. Inf. Comput. Sci.1984, 24 (3) 164175
      [ACS Full Text ], [CAS], Google Scholar
      60
      Randic, Milan
      Journal of Chemical Information and Computer Sciences (1984), 24 (3), 164-75CODEN: JCISD8; ISSN:0095-2338.
      The assignment of identification nos. to mols. that are easy to deriv. and have structural significance is discussed and a scheme for assignment is outlined. Output of the ALL-PATH program for study of mol. topol. from graphs with multiple connections is presented which includes weighing factors for individual bonds. Uniqueness and structural significance of the identification nos. are examd. and mol. graphs and identification nos. of some ring compds., terpenes, and some other compds. are presented.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXkvV2rtLs%253D&md5=355a2f5987cde26550a10bf7ef475d9c
    61. 61
      CDK Descriptor Summary (2011–05–28). http://pele.farmbio.uu.se/nightly-1.2.x/dnames.html, accessed Feb. 10,2014.
      Google Scholar
      There is no corresponding record for this reference.
    62. 62
      Hewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.In Silico Prediction of Aqueous Solubility: The Solubility ChallengeJ. Chem. Inf. Model.2009, 49 (11) 25722587
      [ACS Full Text ], [CAS], Google Scholar
      62
      In Silico Prediction of Aqueous Solubility: The Solubility Challenge
      Hewitt, M.; Cronin, M. T. D.; Enoch, S. J.; Madden, J. C.; Roberts, D. W.; Dearden, J. C.
      Journal of Chemical Information and Modeling (2009), 49 (11), 2572-2587CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
      The dissoln. of a chem. into water is a process fundamental to both chem. and biol. The persistence of a chem. within the environment and the effects of a chem. within the body are dependent primarily upon aq. soly. With the well-documented limitations hindering the accurate exptl. detn. of aq. soly., the utilization of predictive methods have been widely investigated and employed. The setting of a soly. challenge by this journal proved an excellent opportunity to explore several different modeling methods, utilizing a supplied dataset of high-quality aq. soly. measurements. Four contrasting approaches (simple linear regression, artificial neural networks, category formation, and available in silico models) were utilized within our lab. and the quality of these predictions was assessed. These were chosen to span the multitude of modeling methods now in use, while also allowing for the evaluation of existing com. soly. models. The conclusions of this study were surprising, in that a simple linear regression approach proved to be superior over more-complex modeling methods. Possible explanations for this observation are discussed and also recommendations are made for future soly. prediction.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtleqtLvE&md5=7983f8d3133655a4d8967d5b7e9fbdbd
    63. 63
      Tsvetkova, B.; Pencheva, I.; Zlatkov, A.; Peikov, P.High Performance Liquid Chromatographic Assay of Indomethacin and its Related Substances in Tablet Dosage FormsInt. J. Pharm. Pharm. Sci.2012, 4 (Supplement 3) 549552
      [CAS], Google Scholar
      63
      High performance liquid chromatographic assay of indomethacin and its related substances in tablet dosage forms
      Tsvetkova, Boyka; Pencheva, Ivanka; Zlatkov, Alexander; Peikov, Plamen
      International Journal of Pharmacy and Pharmaceutical Sciences (2012), 4 (Suppl. 3), 549-552CODEN: IJPPKB; ISSN:0975-1491. (International Journal of Pharmacy and Pharmaceutical Sciences)
      A reversed-phase high performance liq. chromatog. (RP-HPLC) method with UV detection was proposed for sepn. of indomethacin and its impurities from tablet dosage forms. The best sepn. was achieved on a LiChrosorb C18, 250 mm × 4.6 mm, 5 μm column at a detector wavelength of 240 nm. The utilization of mixt. of 40 vols. 0.5% vol./vol. orthophosphoric acid, 20 vols. of methanol and 40 vols. of acetonitrile as mobile phase with a flow rate of 2 mL/min enabled acceptable resoln. of indomethacin, in large excess, from possible impurities, in a short elution time (9 min). Anal. parameters linearity, accuracy, precision and specificity were detd. by validation procedure and found to be satisfactory. Overall, the proposed method was found to be simple, rapid, precise and accurate for quality control of indomethacin and its impurities in dosage forms and in raw materials. In this work the kinetic investigation of the alk. hydrolysis of indomethacin was also carried out. The degrdn. reaction was monitored by means of HPLC method developed and was found to follow first-order kinetics. The rate const. and half-life of the hydrolytic decompn. were estd.
      https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XptFamurw%253D&md5=dc420050b9fdb9f5e917016ff64b06a4
  • Supporting Information

    ARTICLE SECTIONS
    Jump To

    Informatics_Solubilty_datasets_and_scripts.zip, including R codes, Bash scripts, Python scripts, macro (.xlsb), DLS-100.csv and Solubility_Challenge_dataset.xlsx. Lathay di chadar. DLS-100.csv contains experimental log S values, references, SMILES, sources of smiles, CSD refcodes, molecules names, InChI and Chemspider numbers. SI_document.pdf: Structure data, 2D images of the molecular structures, experimental log S values, CSD refcodes, R2, statistical significance, variable importance. This material is available free of charge via the Internet at http://pubs.acs.org. All scripts and datasets used in this work are available for download from the Mitchell Group web server (http://chemistry.st-andrews.ac.uk/staff/jbom/group/Informatics_Solubility.html, as well as in the Supporting Information.

    • pdf

Chemistry Development Kit
Original author(s)Christoph Steinbeck, Egon Willighagen, Dan Gezelter
Developer(s)The CDK Project
Initial release11 May 2001; 18 years ago[1]
Stable release2.2[2](October 30, 2018; 9 months ago)[±]
Preview release1.5.14 (October 9, 2016; 2 years ago)[±]
Repositorygithub.com/cdk/cdk
Written inJava
Operating systemWindows, Linux, Unix, macOS
PlatformIA-32, x86-64
Available inEnglish
TypeChemoinformatics, molecular modelling, bioinformatics
LicenseLGPL 2.0
Websitecdk.github.io

The Chemistry Development Kit (CDK) is computer software, a library in the programming language Java, for chemoinformatics and bioinformatics.[3][4] It is available for Windows, Linux, Unix, and macOS. It is free and open-source software distributed under the GNU Lesser General Public License (LGPL) 2.0.

  • 3Major features

History[edit]

The CDK was created by Christoph Steinbeck, Egon Willighagen and Dan Gezelter, then developers of Jmol and JChemPaint, to provide a common code base, on 27–29 September 2000 at the University of Notre Dame. The first source code release was made on 11 May 2011.[5] Since then more than 100 people have contributed to the project,[6] leading to a rich set of functions, as given below. Between 2004 and 2007, CDK News was the project's newsletter of which all articles are available from a public archive.[7] Due to an unsteady rate of contributions, the newsletter was put on hold.

CDK News
LanguageEnglish
Edited byEgon Willighagen, Christoph Steinbeck
Publication details
Publication history
2004-2007
Standard abbreviations
CDK News
Indexing
ISSN1614-7553

Later, unit testing, code quality checking, and Javadoc validation was introduced. Rajarshi Guha developed a nightly build system, named Nightly, which is still operating at Uppsala University.[8] In 2012, the project became a support of the InChI Trust, to encourage continued development. The library uses JNI-InChI[9] to generate International Chemical Identifiers (InChIs).[10]In April 2013, John Mayfield (né May) joined the ranks of release managers of the CDK, to handle the development branch.[11]

Library[edit]

The CDK is a library, instead of a user program. However, it has been integrated into various environments to make its functions available. CDK is currently used in several applications, including the programming language R,[12] CDK-Taverna (a Taverna workbench plugin),[13]Bioclipse, PaDEL,[14] and Cinfony.[15] Also, CDK extensions exist for Konstanz Information Miner (KNIME)[16] and for Excel, called LICSS ([1]).[17]

Our members download database is updated on a daily basis. We currently have 443,033 direct downloads including categories such as: software, movies, games, tv, adult movies, music, ebooks, apps and much more. Take advantage of our limited time offer and gain access to unlimited downloads for $3.99/mo! Juki pm 1 keygen software keys. This special offer gives you full member access to our downloads. That's how much we trust our unbeatable service.

In 2008, bits of GPL-licensed code were removed from the library. While those code bits were independent from the main CDK library, and no copylefting was involved, to reduce confusions among users, the ChemoJava project was instantiated.[18]

Major features[edit]

Chemoinformatics[edit]

  • 2D molecule editor and generator
  • 3D geometry generation
  • ring finding[19][20]
  • substructure search using exact structures and Smiles arbitrary target specification (SMARTS) like query language
  • QSAR descriptor calculation[21]
  • fingerprint calculation, including the ECFP and FCFP fingerprints[22]
  • force field calculations
  • many input-output chemical file formats, including simplified molecular-input line-entry system (SMILES), Chemical Markup Language (CML), and chemical table file (MDL)
  • structure generators[23]
  • International Chemical Identifier support, via JNI-InChI

Bioinformatics[edit]

  • protein active site detection
  • cognate ligand detection[24]
  • metabolite identification[25]
  • pathway databases
  • 2D and 3D protein descriptors[26]

General[edit]

  • Python wrapper; see Cinfony
  • Ruby wrapper
  • active user community

See also[edit]

  • Bioclipse – an Eclipse–RCP based chemo-bioinformatics workbench
  • JChemPaint – Java 2D molecule editor, applet and application
  • Jmol – Java 3D renderer, applet and application
  • JOELib – Java version of Open Babel, OELib

References[edit]

  1. ^https://sourceforge.net/projects/cdk/files/OldFiles/
  2. ^'cdk/cdk: CDK 2.2'. ZENODO. 2018-10-30. doi:10.5281/zenodo.1474247.
  3. ^Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. L. (2003). 'The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics'. Journal of Chemical Information and Computer Sciences. 43 (2): 493–500. doi:10.1021/ci025584y. PMC4901983. PMID12653513.
  4. ^Willighagen, Egon L.; Mayfield, John W.; Alvarsson, Jonathan; Berg, Arvid; Carlsson, Lars; Jeliazkova, Nina; Kuhn, Stefan; Pluskal, Tomáš; Rojas-Chertó, Miquel (2017-06-06). 'The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching'. Journal of Cheminformatics. 9 (1): 33. doi:10.1186/s13321-017-0220-4. ISSN1758-2946. PMC5461230. PMID29086040.
  5. ^http://sourceforge.net/projects/cdk/files/OldFiles/
  6. ^https://github.com/cdk/cdk/blob/master/AUTHORS.txt
  7. ^https://sourceforge.net/projects/cdk/files/CDK%20News/
  8. ^'Archived copy'. Archived from the original on 2013-05-24. Retrieved 2013-08-05.CS1 maint: Archived copy as title (link)
  9. ^http://jni-inchi.sourceforge.net/
  10. ^Spjuth, O.; Berg, A.; Adams, S.; Willighagen, E. L. (2013). 'Applications of the InChI in cheminformatics with the CDK and Bioclipse'. Journal of Cheminformatics. 5 (1): 14. doi:10.1186/1758-2946-5-14. PMC3674901. PMID23497723.
  11. ^http://chem-bla-ics.blogspot.nl/2013/04/john-may-is-now-release-manager-of-cdk.html
  12. ^Guha, R. (2007). 'Chemical informatics functionality in R'. Journal of Statistical Software. 18 (5): 1–16. doi:10.18637/jss.v018.i05.
  13. ^Kuhn, T.; Willighagen, E. L.; Zielesny, A.; Steinbeck, C. (2010). 'CDK-Taverna: an open workflow environment for cheminformatics'. BMC Bioinformatics. 11: 159. doi:10.1186/1471-2105-11-159. PMC2862046. PMID20346188.
  14. ^Yap, C. W. (2011). 'PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints'. Journal of Computational Chemistry. 32 (7): 1466–74. doi:10.1002/jcc.21707. PMID21425294.
  15. ^O'Boyle, Noel M (2008). 'Cinfony – combining Open Source cheminformatics toolkits behind a common interface'. Chemistry Central Journal. 2 (1): 24. doi:10.1186/1752-153X-2-24. PMC2646723. PMID19055766.
  16. ^Beisken, S.; Meinl, T.; Wiswedel, B.; De Figueiredo, L. F.; Berthold, M.; Steinbeck, C. (2013). 'KNIME-CDK: Workflow-driven Cheminformatics'. BMC Bioinformatics. 14: 257. doi:10.1186/1471-2105-14-257. PMC3765822. PMID24103053.
  17. ^Lawson, K. R.; Lawson, J. (2012). 'LICSS - a chemical spreadsheet in microsoft excel'. Journal of Cheminformatics. 4 (1): 3. doi:10.1186/1758-2946-4-3. PMC3310842. PMID22301088.
  18. ^ChemoJava
  19. ^Berger, Franziska; Flamm, Christoph; Gleiss, Petra M.; Leydold, Josef; Stadler, Peter F. (March 2004). 'Counterexamples in Chemical Ring Perception'. Journal of Chemical Information and Computer Sciences. 44 (2): 323–331. doi:10.1021/ci030405d. PMID15032507.
  20. ^May, John W; Steinbeck, Christoph (2014). 'Efficient ring perception for the Chemistry Development Kit'. Journal of Cheminformatics. 6 (1): 3. doi:10.1186/1758-2946-6-3. PMC3922685. PMID24479757.
  21. ^Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E. L. (2006). 'Recent developments of the chemistry development kit (CDK) — an open-source java library for chemo- and bioinformatics'. Curr. Pharm. Des. 12 (17): 2111–20. doi:10.2174/138161206777585274. PMID16796559. Archived from the original on 2011-07-25.
    Guangli, M.; Yiyu, C. (2006). 'Predicting Caco-2 permeability using support vector machine and chemistry development kit'. J Pharm Pharm Sci. 9 (2): 210–21. PMID16959190.
  22. ^Clark, Alex M; Sarker, Malabika; Ekins, Sean (2014). 'New target prediction and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0'. Journal of Cheminformatics. 6: 38. doi:10.1186/s13321-014-0038-2. PMC4190048. PMID25302078.
  23. ^Peironcely, J. E.; Rojas-Chertó, M.; Fichera, D.; Reijmers, T.; Coulier, L.; Faulon, J. L.; Hankemeier, T. (2012). 'OMG: Open molecule generator'. Journal of Cheminformatics. 4 (1): 21. doi:10.1186/1758-2946-4-21. PMC3558358. PMID22985496.
  24. ^Bashton, M.; Nobeli, I.; Thornton, J. M. (2006). 'Cognate Ligand Domain Mapping for Enzymes'. Journal of Molecular Biology. 364 (4): 836–52. doi:10.1016/j.jmb.2006.09.041. PMID17034815.
  25. ^Rojas-Cherto, M.; Kasper, P. T.; Willighagen, E. L.; Vreeken, R. J.; Hankemeier, T.; Reijmers, T. H. (2011). 'Elemental composition determination based on MSn'. Bioinformatics. 27 (17): 2376–2383. doi:10.1093/bioinformatics/btr409. PMID21757467.
  26. ^Ruiz-Blanco, Yasser B; Paz, Waldo; Green, James; Marrero-Ponce, Yovani (2015). 'ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins'. BMC Bioinformatics. 16: 162. doi:10.1186/s12859-015-0586-0. PMC4432771. PMID25982853.

External links[edit]

Molecular Descriptors For Chemoinformatics Pdf To Excel

  • CDK Wiki – the community wiki
  • Planet CDK - a blog planet
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Chemistry_Development_Kit&oldid=897870595'