Probabilistic Model for the Prediction of the Human Liver Microsomal Metabolism Regioselectivity
Authors: P. Japertasa, J. Dapkunasa,b, and A. Sazonovasa,c
Cytochromes P450 are the main enzymes involved in the metabolism of drugs and other xenobiotics within the human organism. In this work, we present a model for in silico prediction of the most probable sites of human liver microsomal (HLM) metabolism in a molecule. The developed models calculate the probabilities of being a target of human cytochrome P450 enzymes (CYP3A4, CYP2D6, CYP2C9, CYP2C19, CYP1A2) for any atom in a molecule, and allow forecasting of the most probable phase I metabolites. The novel GALAS (Global, Adjusted Locally According to Similarity) modeling methodology was used for development of probabilistic models. The latter technique allows for a dynamic determination of the similarity inside model space, the subsequent corrections of the baseline predictions according to experimental values for the most similar compounds in the training set of the model, and estimation of the final prediction quality.
Experimental data on HLM and cytochrome P450 metabolism for 873 compounds with >9000 different atoms (1324 metabolism sites) were used for modeling. Five baseline models were developed for five types of atoms considered in the modeling of HLM metabolism (aromatic carbon, aliphatic carbon, carbon near nitrogen, carbon near oxygen, and sulfur). Final GALAS models provide a list of all the atoms with predicted probabilities to undergo metabolic transformations in human liver microsomes.
As a result of GALAS modeling concept application, each prediction of the proposed models is provided with a quantitative estimation of its quality in the form of calculated Reliability Index (RI). This quantity is shown to correlate with the prediction accuracy, as both the numbers of mispredictions and inconclusive results reduce significantly when only results of high quality (RI > 0.5) are taken into account, demonstrating that RI is suitable for the assessment of the Applicability Domain of the models presented in this work. Moreover, as it is demonstrated by clear examples, the Applicability Domain of those models can be easily expanded to cover specific compound classes of user interest with the help of ‘in-house’ databases containing experimental metabolism data. In addition, training of the corresponding baseline models with experimental data on metabolism by individual CYP450 isoforms allowed attributing each of the predicted metabolism sites to one or more particular enzymes (CYP3A4, CYP2D6, CYP2C9, CYP2C19, or CYP1A2).
Mechanistic Prediction of Volume of Distribution: the Influence of Plasma and Tissue Binding
Authors: K. Lanevskija,b, R. Didziapetrisa, P. Japertasa
This aim of this study was to develop novel QSAR models for the prediction of two key pharmacokinetic properties of drugs—the extent of plasma protein binding and apparent volume of distribution (V
d) in humans.
Experimental plasma protein binding data were represented by the overall percentage bound values for almost 1500 drugs and more than 300 human serum albumin affinity constants (

). Predictive models for both considered properties were developed using recently introduced GALAS modeling methodology. This technique allows estimating reliability of resulting predictions by the means of calculated Reliability Index (
RI) values, and provides the basis for model trainability. Mechanistic approach was employed to model volume of distribution accounting for the influence of drug binding strength in both plasma and tissues. 800 V
d values collected from original literature sources were corrected for free fraction in plasma yielding ‘unbound V
d’ (V
du) values that represent drugs’ affinity to tissues. pV
du was then described by a nonlinear model in terms of simple physicochemical properties (log
P and pK
a).
The predictive power of the obtained models was first evaluated by their performance on internal test sets. The extent of human plasma protein binding is predicted with R
2 close to 0.80 for %bound and RMSE about 0.5 log units for

), if only predictions of at least moderate reliability (as indicated by calculated
RI values) are considered. Furthermore, external validation of the V
d model was performed using experimental data for an additional 100 compounds obtained after model development. Validation results illustrate good performance of the V
d model with RMSE of pV
du prediction being close to 0.4 log units in both internal and external test sets.
Good predictive power of the obtained models makes them valuable tools for initial screening of candidate compounds in the early stages of drug discovery. Moreover, clear mechanistic interpretation of the volume of distribution model in terms of the contribution of plasma and tissue binding allows simulating the effect of these processes on the V
d values of drugs as well as improving the prediction accuracy if experimental plasma protein binding data are available.
GALAS Modeling Methodology Applications In The Prediction Of The Drug Safety Related Properties
Authors: A. Sazonovasa,b, R. Didziapetrisa, J. Dapkunasa,c, L. Juskaa,c and P. Japertasa
Effective use of available third-party predictive algorithms for drug safety related properties in the pharmaceutical industry is severely hindered by several problems. The training set rarely covers the specific part of the chemical space occupied by the compounds that a certain company is working with, or a specific experimental protocol is used to measure the corresponding properties or activities ‘in house’. Therefore the need has long existed for a method that would allow any company to effectively tailor a third-party predictive algorithm to its specific needs using proprietary in-house data.
Here we present a few practical examples of the application of a novel GALAS (Global, Adjusted Locally According to Similarity) modeling methodology. It provides the possibility for a user to expand the Applicability Domain of the existing ACD/Labs models with the help of a custom database of experimental values for the property of interest. The use of the method is illustrated with examples of its application in predicting CYP3A4 and hERG inhibition which figure among the major factors attributing to the rising attrition rate, being responsible for the various unwanted drug-drug interactions and cardiotoxicity respectively. Each validation scenario has been specially set-up to represent specific problematic situations usually encountered in the use of predictive QSAR models and show the abilities of the presented methodology to cope with these problems.
It is shown that a relatively small amount (5 to 10) of similar compounds has to be added to substantially improve the prediction for a group of problematic compounds that is not represented in the original training set. The Reliability Index that is calculated for each prediction is also shown to be a suitable measure for the quantitative assessment of the prediction quality as indicated by a clear correlation of the RI and RMSE values.
Given that the improvement of ACD/Labs models in this way is instant—as it occurs that very moment when new compounds with experimental values are added to the similarity database, and there is no need to retrain the model—this method opens wide new possibilities for their use in the industry.
a ACD/Labs, Inc., A.Mickeviciaus g. 29, LT-08117 Vilnius, Lithuania
b Faculty of Natural Sciences, Vilnius University, M.K.Ciurlionio g. 21/27, LT-03101 Vilnius, Lithuania
c Faculty of Chemistry, Vilnius University, Naugarduko g. 24, LT-03225 Vilnius, Lithuania