Skip To Content

The Black Swan of Compound Libraries

Recently I finished a great book that made me think about different things on a lot of different levels. The book is called The Black Swan and the author is Nassim Nicholas Taleb

In short, a black swan is defined as an event, positive or negative, that is deemed improbable yet causes massive consequences. 

From wikipedia:

What we call here a Black Swan (and capitalize it) is an event with the following three attributes. First, it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme impact. Third, in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.

Anyone who follows this blog knows that I have a passion around the concept of automated structure verification by software methods and the different applications it can play within the pharmaceutical industry. 

Reading through this book, one of the things that popped in my mind was the black swan impact as it pertains to compound registration libraries in pharma.

I spend many hours in my role discussing the concepts of automated structure verification (by NMR of course) to spectroscopists, chemists, directors, VPs, Managers of compound management, etc. 

In many of these conversations, the risk of having an incorrect compound in the registration library is always a fruitful and interesting discussion. However, this conversation is almost always dominated by low probabilities. The low probability that the compound being registered is wrong in the first place. I agree, this is a low probability. After all, a trained chemist has synthesized said material and has used a variety of analytical methods to confirm it's identity. In some organizations, the compounds in this library will be validated further either before they are accepted in the store, or before they are sent off to assay. 

So the argument goes that incorrect compounds in the registration library are low probability. And furthermore, there are steps being taken in some organizations to proactively catch incorrect compounds (most often LC-MS). Finally, perhaps the lowest probability event of them all is the compound being identified as a hit during the assay. 

So in the end, what are the chances that a meaningful compound that is identified as a "hit" is actually an incorrect compound? Furthermore, in this instance what's the chances that this compound will make it too far downstream without it's false identity being exposed?

In short, it's a pretty low probability event, and because in most cases it will eventually be caught it's a stretch to really call this a massive consequence or impact. 

That's not the black swan I am talking about in this post. 

In my opinion, the Black Swan concept is more relevant to those incorrect compounds that lie in the registration library and never get cherry picked out. A compound that doesn't produce anything interesting from the assay. A compound that effectively hides in the back of the shelves in a compound management cold room, that has been documented as "tested" but never advanced any further because of poor assay results.

Of course this compound is not the Black Swan. However, it's possible that the compound that it was supposed to be is. The compound that the chemist thought they made, and the compound that was referenced in the inventory.   

While it's low probability, there are probably tens of thousands of misrepresented compounds that haven't truly been assayed over the last 20 years. Of these thousands could any of them turned into the highly coveted, blockbuster drug?

Sure, it's doutbful, but the infinitely small possibility exists. 

And further, this very idea contains one of the core components of a Black Swan; "nothing in the past can convincingly point to it's possibility"

I am not certain that there is an example of a blockbuster drug that was originally missed because of a mistake in synthesis, or a misrepresented registrant. My guess is "not really" And if it did, perhaps as Taleb suggests, it would have been rationalized by hindsight.