Developing safe medicines is hard work. Researchers must rigorously assess every drug candidate molecule to ensure it does not harm patients. Cardiotoxicity – toxicity to the heart – is one of the factors that scientists consider.

Members of the ACD/Labs team, Kiril Lanevskij, Remigijus Didziapetris, and Andrius Sazonovas recently published a paper on the subject of predicting hERG inhibition, a property critical for assessing cardiotoxicity. This conversation with Andrius and Kiril discusses what hERG inhibition is, and how they were able to build a model using the unusual data available.

Read the full transcript

00:00  Andrius Sazonovas

Having a quantitative model, you would be able to see that hERG inhibition potency reduces and in other words, you’re moving towards a compound which will eventually hopefully become safe in this respect.

00:28  Jesse Harris

I’m sure all our listeners know this already, but creating safe medicines is hard work. Scientists need to thoroughly test drug candidates to reduce the risk of negative side effects.

00:39  Sarah Srokosz

And the earlier on they know this information. The more time and resources they can spend on better candidates. Luckily, there are predictive tools that can help identify toxicity issues before even making the molecule. This helps protect patients while saving time and money.

00:55  Jesse Harris

Hi, I’m Jesse.

00:57  Sarah Srokosz

And I’m Sarah. We’re the hosts of The Analytical Wavelength, a podcast about chemistry and chemical data brought to you by ACD/Labs.

01:06  Jesse Harris

Our colleagues Kiril Lanevskij, Remigijus Didziapetris, Andrius Sazonovas recently published a paper in the Journal of Computer Aided Molecular Design on the subject of hERG inhibition, which is critical for predicting cardiotoxicity.

01:23  Sarah Srokosz

We had a chance to talk with Andrius and Kiril about the paper and have them explain to us how these models are built using unusual data sets.

01:31  Jesse Harris

This is a fascinating conversation, covering everything from machine learning to go-go dancing. We hope you all enjoy. Hello, Andrius and Kiril. How are you two doing today?

01:44  Kiril Lanevskij

I’m fine.

01:45  Andrius Sazonovas

Yeah, yeah. Hello, Jesse. Or actually no, good. Really good.

01:49  Jesse Harris

Good, good. Okay, so we want to start off, of course, with our favorite opening question. What is your favorite chemical? Let’s start with Andrius.

02:12  Andrius Sazonovas

Okay, so that’s an interesting question, especially for a chemist, because, you know, you encounter a lot of chemicals in your life. But, you know, in the summer, I always enjoy having a good swim, like in the river, lake, ocean, or sea or whatever. And in the winter, I actually hardly can live without downhill skiing. And all of that involves water in one form or another.

02:25  Jesse Harris

What an amazing molecule, so many different uses.

02:29  Andrius Sazonovas

Yeah, absolutely.

02:31  Jesse Harris

And how about you, Kiril?

02:34  Kiril Lanevskij

I will take this question literally. And since I am a biochemist, so I will say that my favorite molecule is serotonin, because this is the actual so-called hormone of happiness. So this is literally what makes me and other people happy.

02:53  Sarah Srokosz

I like that answer. So diving into our topic of the day, can you explain to us what is hERG inhibition and why is it important?

03:06  Kiril Lanevskij

So to start this discussion, probably the first thing we must explain is what is actually hERG itself. So the name hERG stands for a gene that encodes a part of the potassium ion channel protein, and that protein is one of the important components in regulating the proper heart rhythm. And the name of hERG itself actually has quite an interesting background behind it.

The first homolog of this protein was first identified in Drosophila flies, and mutant Drosophila flies were shown to exhibit quite a strange behavior under the exposure of ether. They started shaking their legs in a way reminiscent of movements of Go-Go dancers. And here the biologists actually showed that they have a sense of humor and that gene was called etherable Golgi.

And now in a much more complex organism, within our human organism, this story still persists as this acronym hERG extends to humans after a go-go related gene.

But now back to more serious stuff. This ion channel can be blocked as a side effect of drug X and since it is involved in maintaining heart rhythm, so this side effect can manifest as a specific type of arrhythmia, the so-called QT interval prolongation. And it is unfortunately a life threatening condition.

And due to the side effect, a number of drugs have been even withdrawn from the market. And one of the classical examples is an anti-allergic drug called Terfenadine. So imagine that you have an allergy. You try to take an antihistamine pill and you can actually die from the side effect, so obviously that’s not good. And fortunately, in that particular case, this story was resolved in quite a favorable manner. Terfenadine was replaced by its metabolite, which was also active as a drug. But nowadays, obviously, all new drug candidates must be tested for hERG inhibition in order to avoid such serious consequences.

05:37  Jesse Harris

Yeah, that sounds like a very serious consequence. So that’s why I imagine these predictors for hERG inhibition are so important. So why use a hERG inhibition prediction model based on physicochemical parameters as opposed to just based on structure?

05:55  Andrius Sazonovas

This is one of those dilemmas because when we are saying a model is based on physicochemical parameters as opposed to structure, we usually mean mechanistic models versus empirical models; the latter being based exclusively on structural information and various descriptors derived from it. Both of these classes of model, they have their own pros and cons, and since we are now speaking about what is the benefit of using the model based on the physicochemical parameters, so these types of models, otherwise called mechanistic models, so they usually possess a much wider applicability domain.

In other words, the model is not that much fitting to the initial training set and you are able to apply the model later on, on the wider structural variety of chemicals. And these models for the same reasons, these kinds of mechanistic models, they show a better results in what is called a temporal validation. In other words, the model’s performance over time, because as we have shown, among other things in our publication, the focus areas of the drug discovery, they drift with time.

So in other words, there are classes of compounds going in and off what you would call a high, let’s say. And so over time, the chemical compounds with which the companies are dealing with, they change. And if you have an empirical model based on the structure, it means that in order for it to keep up with the time, you have to keep it constantly retraining, using new data. The mechanistic models are the ones based on physicochemical parameters so they don’t suffer from those drawbacks.

And another thing probably worth mentioning is these mechanistic models, they usually express one property, in our case hERG inhibition, as a function dependent on other properties, which are measurable but simpler. So in our case, let’s say pKa or logP, in this way mechanistic models sometimes can be seen as a method to exchange a more complex and more costly measurement with a simpler one and a cheaper one.

So in other words, you can do what is called in combo modeling when instead of doing a prediction entirely within the software, you just take the model, which is in an equation and you actually measure pKa and logP values, which are much easier to measure compared to hERG. And then you substitute measure the values into that equation and that way arrive at a more reliable, more accurate evaluation of hERG.

08:45  Sarah Srokosz

Yeah, those are certainly some compelling arguments for the mechanistic model. But going forward from that, what are the reasons, why is it useful, to have a quantitative model for hERG inhibition prediction?

09:02  Andrius Sazonovas

This one is probably even more obvious. It depends on the context. And of course, if we are talking about a thing like safety assessment of impurities in the formulation which happens in the late stage development, and you’re obviously just interested whether the compound is genotoxic or not genotoxic or in other ways has any other safety issues with it.

So then it is enough just to know a yes or no answer, but it if we’re talking about a earlier stage of development, especially like lead optimization. So, you know, having just an answer yes and no is of course a minimum that you would like to have, but it’s much more beneficial to be able to actually know if you have a hERG liability in your candidate compound.

How far is it actually from dealing with this issue? So in other words, how potent of an inhibitor that is, and so that you could rank your compounds ranging from, you know, strong inhibitors to weaker inhibitors. And then of course, when you do the optimization, again, it’s usually performed in a stepwise manner, so using just yes and no predictions, it would be often impossible to see the evolution of your optimization.

So in other words, your compound becomes a weaker inhibitor, but a qualitative model will still say that it’s just an inhibitor whereas having a quantitative model you would be able to see that your inhibition potency reduces and in other words, you’re moving towards a compound which will eventually hopefully become safe in this respect.

10:49  Kiril Lanevskij

And one more thing that I can add to Andrius’ answer, is that even when you’re dealing with a yes or no answer in this case, the underlying characteristic is still quantitative. So it’s like inhibitory concentration or inhibition constant. And when you’re dealing with this kind of data, you still need to pick a threshold between inhibitors and non inhibitors. And if we provide only such kind of model, that gives either a yes no answer or a probability, we are tied to that threshold. And if some company works with a different kind of threshold, there arises some kind of incompatibility between the data and our predictions.

And if we offer a quantitative model, it automatically solves this problem. And you companies free to choose that threshold they work with.

11:45  Jesse Harris

I can imagine that’ll be frustrating to the users too, getting different answers from different pieces of software and not really understanding what is happening or what the differences are. It’s probably better to just let them be able to make the judgments for themselves. But this goes into some of the challenges of working with this censored or non-quantitative data. What do you do in order to work with this data?

12:08  Kiril Lanevskij

Well, again, probably the first thing we need to explain here is what actually is the sense of need. So when we are working with a quantitative characteristic such as a hERG inhibitory constant, in order to determine this characteristic, in a more or less precise way, the researchers need to perform an entire series of experiments to determine the full concentration activity curve.

So that means that they need to test their compounds at a series of different concentrations. But what happens in practice is that when the compound is really not a potent hERG inhibitor, it’s really safe in this regard. So, for example, people test at a constant concentration of 30 μmol, it gives only 5% inhibition. So the compound is clearly safe by all reasonable margins.

And in this kind of situation, the researchers are not interested in determining the full curve. For their purposes, this kind of data is already enough. But for us, it’s like a semi quantitative data point. We know for sure that the inhibitory constant is larger than 30 μmol, but we don’t know by how much. So this is exactly the kind of data point that is called a censor.

Another situation which happens I would say quite rarely, but it still occurs sometimes, when it is a very potent inhibitor. And again, the researchers already know for sure from one data point that this compound basically has to be thrown away. And they determined that, for example, IC 50 is less than two micrograms. For them, it’s enough, for us, it is, again, a censor data point. In this situation left censor data point; in the first case, right censor.

When we try to make a quantitative prediction, we can’t just use these data as if they were thirty and two, because, you know, if our model predicts, for example, for the first compound that it is 50 or 60, it’s kind of good prediction in both cases and we don’t know which one is closer to truth. So this this poses kind of a problem in statistical analysis.

14:41  Sarah Srokosz

And so what did you do in this work to overcome the challenge of having non quantitative data?

14:50  Kiril Lanevskij

Well, speaking about the data itself, obviously there was nothing we can do because the data is what it is. But what we can do, we can drive it to research what kind of statistical analysis methods are available for dealing with this kind of data. And in fact, this concept of censored data was explored already in the mid-twentieth century.

But back at those times… But the method that was offered a sense of regression, it was basically an envelope of a simple linear regression, that obviously can work well when you try to explore a relationship between your target end point and very few parameters with linear dependencies. But in our case, with the more complex descriptors and non-linearities in the dependencies, it won’t work.

We typically work with more modern methods, and with hERG, the previous iteration of our physicochemical model was based on gradient boosting methodology. And it turned out that these days there are actually methods that allow to combine advanced machine learning techniques, such as gradient boosting, with fitting for objects. It’s my impression that actually this kind of research became a bit more popular during the COVID pandemic because typically this kind of data are invoked in survival analysis. And with COVID when there is some kind of survival data, obviously this often is the method of choice. And we found a combination that works for our use case. Therefore, we have developed a gradient model fitted to an objective function using the survival numbers.

16:50  Jesse Harris

Excellent. If people want to learn more about that, they can of course check out the paper will be linked into the show notes. But before I let you guys go on this note about machine learning, I actually had a question for you in terms of what you think the future is for machine learning and AI in the next couple of years, in chemistry specifically. It’s a very hot topic, I think, as everybody knows, and I’d be interested to hear what you two think is going to be happening next, or what you’re most excited about.

17:20  Andrius Sazonovas

Definitely, as a question, it is always tough to be a prophet, so I’ll probably phrase my answer in a bit more general way. Not like trying to predict what particular results can we expect from the application from this, but like a general perspective of what I would call like a current state of all these methodologies. And undoubtedly machine learning and artificial intelligence are really capable techniques.

It’s just the challenge in my opinion, is actually adequately understanding their capabilities. Because, like by definition, any statistical methods, including machine learning and even more advanced methods like artificial intelligence, by definition they do what they are told to do. But you have to always think about whether you are asking the right questions. Some people, this is not entirely my thought, but there are quite a lot of people that see these methodologies in the context of drug discovery, as a sort of solution, looking still, looking for a problem that it can allegedly solve. Because to be honest, I do not think that in the current state of research, even though we know that it is already capable to do a lot of things that people can notice even in their daily lives, I don’t think it can act as a silver bullet, so-called, you know, and universal solution to all the problems.

And then all the people that are trying to use artificial intelligence in such a way is just throwing all the information we have at this method, and then just hope that it will figure something reasonable out. So I believe they are in for a let down.

There is one anecdotal story. I was watching a TedX presentation a number of years ago already that presented research where the experimenters actually tried to build an artificial intelligence model that provided with the spare parts for a robot would learn to assemble a robot out of those spare parts that then could work or just could travel from point A to point B, And then again, the presenter of that talk did not talk about specifically how they trained that model, but when they did that, so the eventual output of the model was that it stacked all the spare parts of the robots into a very tall tower standing on point A and flip it over and like the top parts fell over so far enough to cover the distance to point B and the model considered like the problem to be solved.

And it’s probably like, you know, I’ve corrected even myself right now telling the story when I said like from walk to travel, it’s like, you know, probably if you would, yeah, put walk in the definition of the solution; so that would not be a reasonable solution along with like falling over this, this tower. So in other words, you have to be really careful, you have to understand that current AI is not an all knowing magical the box that could solve all of your problems.

So you have to be like really, really careful and really specific about how you define those problems. And in that, trying to summarize, undoubtedly the impact of artificial intelligence will rise, but it will come, in my opinion, from like really targeted research on some specific areas, rather than trying to build sort of global models that you put a big database and get a good drug out of it.

21:19  Jesse Harris

Well, if that means that it’s particularly important to have experts like the two of you to help guide that research and that development. So we’re so happy to have you on the podcast to share some of that with the audience. Thank you for taking the time to talk with us today.

21:33  Sarah Srokosz

Thank you both.

21:35  Andrius Sazonovas

Absolutely. You’re welcome.

21:39  Kiril Lanevskij

You’re welcome.

21:40  Sarah Srokosz

I think what you said at the end is right. With so much happening in the world of chemistry, AI, and medicine, expertise is more important than ever.

21:48  Jesse Harris

Absolutely. If folks want to learn more about this subject, be sure to check out the show notes where we have a link to the paper that we discussed.

21:56  Sarah Srokosz

That’s all for today. Thanks, as always for spending time with us. And don’t forget to subscribe in your favorite podcast app.

The Analytical Wavelength is brought to you by ACD/Labs. We create software to help scientists make the most of their analytical data by predicting molecular properties and by organizing and analyzing their experimental results. To learn more, please visit us at

Enjoying the show?

Suscribe to the podcast using your favourite service.