A Trainable In-silico Screening Filter for Various Human Cytochrome P450 Isoforms Inhibition
Authors: P. Japertasa, R. Didziapetrisa, J. Dapkunasa,b, and A. Sazonovasa,c
This study focuses on development and validation of a series of in-silico models that can distinguish between inhibitors and non-inhibitors of the cytochrome P450 isoforms 3A4, 2D6, 2C9, 2C19, and 1A2.
Inhibition constant thresholds equal to 10 and 50 µM were used to classify compounds regarding CYP isoform inhibition. The initial data sets ranged from ca. 5000 to 8000 compounds for the five considered enzyme isoforms. These have been compiled from literature publications and PubChem screening database. A novel GALAS (Global, Adjusted Locally According to Similarity) modeling methodology was applied, utilizing a predefined set of molecular fragments as descriptors. A very important feature of this modeling methodology is the possibility to quantitatively evaluate prediction quality using calculated Reliability Index (RI) values.
The obtained RI values correlate with prediction accuracy. Predictions with low RI are outside the applicability domain of models and cannot be considered. For the predictions with acceptable RI values, the accuracy approaches 90% in all five internal test sets (20% of the corresponding initial database). All models have been further subjected to a more sophisticated external validation, using the latest data from the PubChem screening program. As an example, in the case of CYP3A4 inhibition, it yielded results similar to the model testing on the internal test set (88% accuracy when RI > 0.3). The model trainability feature was assessed in an attempt to train the CYP3A4 inhibition model based on the literature dataset with the PubChem library, while reserving half of the PubChem data as a test set. After adding 5% of the training library, the number of test set predictions with acceptable reliability (RI > 0.3) was below 50% while the number of high reliability predictions (RI > 0.5) barely exceeded 10%. Subsequent additions of library portions gave a steady increase in these numbers, reaching ca. 85% and 60% correspondingly with whole library added.
Obtained models represent the valuable computational filters in early drug discovery to identify compounds that may have unwanted cytochrome P450 inhibition liability. The GALAS modeling methodology enables fast and efficient model training, allows extending the applicability domain of current models, and adjusting them to screen proprietary databases for potential CYP inhibitors.
Modeling Toxicity of Chemicals to Aquatic Organisms
Authors: P. Japertasa, K. Lanevskija,b, L. Juskaa,b, R. Didziapetrisa
This study focuses on the application of recently introduced GALAS modeling methodology for the development of predictive models that would allow estimating toxicity of new chemicals to several aquatic species. Experimental data used for analysis were expressed as median lethal concentration of test compound in water (LC50), representing compounds’ toxicity to fish and crustaceans. The overall data set collected from literature contained toxicities of 900 compounds to fathead minnows (Pimephales promelas) and almost 600 LC50 values determined for water fleas (Daphnia magna).
Each GALAS model consists of two parts, the first one being a global QSAR reflecting the general trends (baseline toxicity prediction), while the second part accounts for more specific effects by introducing local corrections to the baseline values based on the analysis of experimental data for similar compounds. One of the major benefits of the underlying methodology is the ability to estimate prediction reliability by means of calculated Reliability Index values. Also, new experimental data can be added to expand the applicability domain of these models without full statistical reparameterization (trainability feature).
The modeling approach utilized herein for aquatic toxicity predictions was validated by applying the same principles to develop a new model that predicts IGC50 (50% inhibitory growth concentration) to protozoan Tetrahymena pyriformis. This model was submitted as an entry for an environmental toxicity prediction challenge hosted by the CADASTER project. The final model derived using known IGC50 values for 644 compounds was identified among the winners achieving RMSE
Probabilistic Model for the Prediction of the Human Liver Microsomal Metabolism Regioselectivity
Authors: P. Japertasa, J. Dapkunasa,b, and A. Sazonovasa,c
Cytochromes P450 are the main enzymes involved in the metabolism of drugs and other xenobiotics within the human organism. In this work, we present a model for in silico prediction of the most probable sites of human liver microsomal (HLM) metabolism in a molecule. The developed models calculate the probabilities of being a target of human cytochrome P450 enzymes (CYP3A4, CYP2D6, CYP2C9, CYP2C19, CYP1A2) for any atom in a molecule, and allow forecasting of the most probable phase I metabolites. The novel GALAS (Global, Adjusted Locally According to Similarity) modeling methodology was used for development of probabilistic models. The latter technique allows for a dynamic determination of the similarity inside model space, the subsequent corrections of the baseline predictions according to experimental values for the most similar compounds in the training set of the model, and estimation of the final prediction quality.
Experimental data on HLM and cytochrome P450 metabolism for 873 compounds with >9000 different atoms (1324 metabolism sites) were used for modeling. Five baseline models were developed for five types of atoms considered in the modeling of HLM metabolism (aromatic carbon, aliphatic carbon, carbon near nitrogen, carbon near oxygen, and sulfur). Final GALAS models provide a list of all the atoms with predicted probabilities to undergo metabolic transformations in human liver microsomes.
As a result of GALAS modeling concept application, each prediction of the proposed models is provided with a quantitative estimation of its quality in the form of calculated Reliability Index (RI). This quantity is shown to correlate with the prediction accuracy, as both the numbers of mispredictions and inconclusive results reduce significantly when only results of high quality (RI > 0.5) are taken into account, demonstrating that RI is suitable for the assessment of the Applicability Domain of the models presented in this work. Moreover, as it is demonstrated by clear examples, the Applicability Domain of those models can be easily expanded to cover specific compound classes of user interest with the help of ‘in-house’ databases containing experimental metabolism data. In addition, training of the corresponding baseline models with experimental data on metabolism by individual CYP450 isoforms allowed attributing each of the predicted metabolism sites to one or more particular enzymes (CYP3A4, CYP2D6, CYP2C9, CYP2C19, or CYP1A2).
a ACD/Labs, Inc., A.Mickeviciaus g. 29, LT-08117 Vilnius, Lithuania
b Faculty of Natural Sciences, Vilnius University, M.K.Ciurlionio g. 21/27, LT-03101 Vilnius, Lithuania
c Faculty of Chemistry, Vilnius University, Naugarduko g. 24, LT-03225 Vilnius, Lithuania