Skip To Content
Back to Events

QSAR – 20th International Workshop on (Q)SAR in Environmental and Health Sciences

Poster Presentation

Physicochemical QSAR Analysis of hERG Inhibition Revisited: Temporal Validation and Transition to a Quantitative Model

Andrius Sazonovas, Head of Software Development, ACD/Labs

  1. Lanevskija,b, R. Didziapetrisa,b, A. Sazonovasa,b

aVšĮ “Aukštieji algoritmai”, A. Mickevičiaus 29, LT-08117 Vilnius, Lithuania

bACD/Labs, Inc., 8 King Street East, Suite 107, Toronto, Ontario, M5C 1B5, Canada


In a previous publication [1], we presented a compilation of literature data of hERG inhibitory potential for >6600 drug-like compounds, and a probabilistic classification model based on these data and a minimal set of readily interpretable physicochemical descriptors, such as logP, pKa, molecular size and topology parameters. The main goals of the current follow-up study were: (1) further expansion of the database with newly available experimental data; (2) temporal validation of the previously derived model by assessing its performance on new data; (3) transitioning the model onto a quantitative scale, making it possible to estimate the actual inhibitory potencies.

Curation of data from recent publications involving experimental determination of hERG inhibition brought up the database to a total of about 9400 molecules. Validation of our previous model on a subset of almost 1000 new compounds from recent lead optimization studies showed almost no performance degradation compared to the original work with overall classification accuracy close to 75%.

The most challenging issue with quantitative modeling of hERG inhibition is high prevalence of censored data – observations providing open-ended intervals (e.g., IC50 > 30 µM) instead of exact values. The current study aims to resolve this issue by taking advantage of a modern machine learning technique (gradient boosting) coupled to a censored regression objective (AFT), which enables using both quantitative and censored data in modeling. The resulting AFT model relying on the same set of physicochemical descriptors as before has a similar classification accuracy as the probabilistic model, but unlike the latter, it is not tied to a fixed classification threshold. Moreover, it allows predicting IC50 values measured in patch-clamp assay with R2 about 0.4 and MAE < 0.5, which enables ranking the compounds by their inhibitory potential.


  1. Didziapetris, R., Lanevskij, K. J Comput Aided Mol Des, 30, 2016, 1175-1188.
Learn More