Skip To Content

The Benefits and Risks of Full Disclosure in Automated Verification

December 9, 2009

Inconsistent
Above is a picture of an example output of our automated structure verification routine.

This highlights one of the strengths of our automated verification approach, one Phil Keyes calls a “fringe benefit” (See slides 39-42) in that following verification, you get an estimated assignment starting point. The output is a report, or spectrum file that reveals to the end-user what assignments were made and with how much confidence.

This is of course a very good thing!

The software, is not a black box. It doesn’t simply spit out a yes or no answer. When it says inconsistent, it gives you an explanation or reason why it believes the proposed structure is inconsistent with the spectrum. Furthermore, if that explanation is because of a disagreement in chemical shift, if it turns out the software’s estimation is wrong, this pinpoints a perfect assignment to submit to the prediction training database to improve the overall system and avoid a similar problem in the future.

On the flip-side, it allows you to very closely evaluate the assignments for all the compounds it deemed as consistent.

BUT, one of the things I’ve learned over the last year is that this benefit sometimes can be a curse as well. Maybe curse is not the right word, in fact I am sure it is the wrong word. Can of Worms maybe?

The reason why is because I (as well as others) have come to the realization that there are really two kinds of false positives based on our routine:

1) False Positive 1: The structure is actually incorrect, but the software passes it anyway. I think our introduction of HSQC in our routine has dramatically improved this one.

2) False Positive 2: The structure is indeed correct, and the software passes it, but it passes it for the wrong reasons, i.e. incorrect assignments.

False Positive #2 has led to some interesting conversations with different people around the world either on their site, or at conferences.

Why?

Because you have the ability to not just catch when something is wrong, but also catch if it assigns something wrong.

A system like this helps you identify a subset of compounds from your library that are suspicious. Certainly there are going to be false negatives in there, and some wasted time, but I argue that it’s better than not validating anything at all.

Of course a system like this assumes you aren’t worried about the false positives, because you don’t look at them. In actuality you are worried, but you are using a system to hopefully catch more of those incorrect compounds than you would have otherwise. So there’s an argument against false positive #1. The system will identify false compounds in your library that you wouldn’t otherwise catch until a much later time.

With respect to False Positive #2,The interesting question I always ask is this:

How do you know that your library provider, or chemist who has registered a compound, has passed this for the right reasons?

I don’t know too many organizations that require their chemists to fully assign the NMR spectrum, and furthermore document that. Don’t know too many library providers that will supply you with a fully assigned NMR spectrum with the compounds you purchase. Not saying it doesn’t happen, or that it should happen, I just haven’t heard of many, and I visit a lot of people and places.

In the end, despite these conversations, I think providing full disclosure on why a compound is deemed consistent or not, is absolutely crucial for the evaluation of this system and understanding it’s performance.

It’s clear that the benefits significantly out-weigh the risks.

All this said, in our ongoing improvements, because assigning the data is a key part of the process, it’s something we continue to work on. We aren’t simply working on the best way to get the best pass:fail rate, we also want to investigate the best ways to get better assignments.

More on these directions to come…

Comments

Your email address will not be published.