Skip To Content

False Negatives and False Positives are Waiting…

Great post from Derek Lowe from In the Pipeline the other day talking about the dangers of not quality checking those fine-looking starting compounds for your project. Chemistry happens and yes, mistakes do too.

In fact, it appears that Derek has been on a kick as of late referring to personal QC.

I Can Has Ugly Molecules?


I thought this would once again be a good opportunity to provide you with a link to a poster Sergey Golotvin presented at ENC 2008 entitled, "Validating the Quality of Large Collections of NMR Spectra Automatically".

Long story short, 15,000 1H NMR Spectra from the Aldrich collection were evaluated in complete automation and the software was able to confirm 88% of the collection as having chemical structures that were consistent with the respective spectra. In addition, 4% were flagged by the software as being inconsistent. A closer, manual look at those 4% revealed that there were indeed some truly wrong structures (or incorrect tautomers) in the collection.

This was evaluating the 1H NMR data only. Using additional 2D experiments, such as HSQC, will likely improve these results.

Just an example of a check an organization can build into their process for additional QC of their registration database for example.

Is it perfect? Absolutely not. There are perhaps a few more false positives in there that the software didn’t catch, and of course the software provided some false negatives as well, annoying because presumable someone has to look over them manually only to realize that they were indeed the right structure all along. But at least this doesn’t involve manually pining over 15,000 spectra!

We continue to run these datasets, and actually have a consortium consisting of several NMR experts in the industry we call ASCI (Automated Structure Confirmation Initiative) where we are testing and validating this technology in the real pharmaceutical world. Identifying the common areas where false negatives and false positives occur and trying to address them with algorithms.

Will we ever solve all the problems, especially in the world of novel chemistry? Of course not, and for that matter there are some existing problems that appear to be too hard to solve.

But that being said, what is the acceptable limit of false positives and false negatives for automated verification by software for the verification of  registered compounds in a library?

Interested in hearing your thoughts.