June 4, 2012
I’ve already described the Bosutinib fiasco in my last entry, and finished with a teaser that we would be back for more commentary on this topic.
We plan a series of tests of our algorithms and systems to see if our software could have been used to prevent a situation like this. The main question in this post being:
If we supplied the 1H and HSQC spectra for the incorrect isomer to our system proposing the actual chemical structure of bosutinib, what would the result be?
For this, I’d like to thank my friend and colleague, Philip Keyes from Lexicon Pharmaceuticals for his crucial participation in this study. Phil has kindly purchased the compounds of interest for this study and has acquired the NMR data, tested it on our system in his environment, and supplied us with the data for our own testing and evaluation.
In this article, Levinson and Boxer clearly suggested that acquisition of an HSQC experiment would have clearly ruled out the structure of bosutinib as a possibility. We put our system to the test for this study.
First some brief methodology on our combined verification approach when a proposed structure, 1H NMR and HSQC are entered into our system the software will automatically process and analyze the data without any manual intervention and generate a verification score that we call the Verification Product. In a nutshell, the verification product gives us a measure of the consistency between a proposed structure and the supplied NMR data. More details on the methodology of our system can be in the 2007 article published here.
The verification product generates a result between 0 and 1, with 1 being the highest possible score suggesting the highest level of confidence that a structure is consistent with a given set of spectra. Based on how the system is deployed in practice, the end-user will define verification product thresholds to determine which compounds should be let through, and which compounds require further investigation.
Based on our numerous publications in this area, we’ve adopted a traffic light schema where:
Green Light: The structure-spectrum correspondence are consistent with each other. No further review is required
Yellow Light: The strucure-spectrum correspondence is questionable. The software has identified one or more issues in an attempt to assign all peaks to atoms in the structure. Review is up to the user’s judgement.
Red Light: The structure-spectrum correspondence are inconsistent with each other. Review is required.
And again, based on our research works to date, we’ve adopted the following thresholds as an optimal starting point (organization who deploy ASV will deviate from these thresholds based on whether they are false positive or false negative tolerant). Those thresholds are:
Green Light: When Verification Product exceeds 0.67
Yellow Light: When Verification Product is between 0.5 and 0.67
Red Light: When Verification Product is less than 0.5
With that out of the way, let’s look at the results.
When we ran the 1H and HSQC data through the software against the structure of Bosutinib in completely automated fashion the reported verification product result yielded a yellow light with a value 0.52. In essence, the software has flagged this compound as questionable based on the 1H and HSQC NMR data.
What’s more, the software does not generate just a black box numerical result. When there is a questionable or inconsistent result, the software specifically describes the issue, and highlights it on the structure and spectrum (see below).
Naturally, we also ran the data through our system with the actual structure for the bosutinib isomer #1. The result was a green light with a verification product of 0.75. Furthermore, as shown below, the software showed no assignment issues:
So that gets the important example out of the way. The next test is whether what the system would have thought of the actual data of bosutinib against both structures. That’s coming soon, I hope.
Also, I’ll be writing up a post on how our Structure Elucidation software handled this case when given a full set of NMR data.