October 9, 2008
A few months back, I referenced Derek Lowe’s excellent blog, "In the Pipeline"
Some of the most interesting entries for me are:
One of the take home messages for me, is when Derek says, "False Negatives and False Positives are waiting in your dataset, depend on it"
I think this is a point that is largely acknowledged in the industry. For example, I think it is pretty obvious to most organizations that there are a significant amount of incorrect structures in their registration database. Some will have a pretty good idea of how much, and others will have no idea.
I think the key point however, is to acknowledge it.
Now this brings me to the topic of the balancing act between false positives and false negatives. If I go back to the application of validation compound registrations with automated NMR verification. This application is one I have blogged about quite a bit over the last year and a half.
The key point to emphasize here, is that this application is NOT replacing the chemists analysis and QC
Chemists are still looking at their data and registering their compounds the normal way. But is an automated validation step necessary after the fact to ensure the quality of the registration database is accurate?
I asked the question from an earlier post, "what is the acceptable limit of false positives and false negatives for
automated verification by software for the evaluation of registered
compounds in a library?"
I didn’t get any comments on the blog, but I got a few email responses and I have discussed this with some industry people over the last few months. Not surprisingly, the results vary significantly.
Of course everyone would prefer a 0% False Positive Rate, and a 0% False Negative Rate. Yes, in a perfect world that would be great. But it’s not going to happen. Not ever. So decisions need to be made about how important this this exercise is, and furthermore what is deemed acceptable?
If we acknowledge that there are incorrect structures in the registration database, then the question is how much, and furthermore, how many can be removed with the use of some automation.
I think the challenge that most organizations face today when evaluating such a system is trying to balance out to sides, false positives vs. false negatives. Why?
My understanding is that if you have a system that is false positive tolerant, well, you are sacrificing the overall quality of your organizations registration database. On the other side of the coin, if you have a system that is false negative tolerant, well, someone needs to look at the data manually to see if this is indeed the right or wrong structure. So there are two forces pulling here. One, is actually knowing that despite your efforts, you are still letting incorrect structures in your database. The other force is knowing that you need a real person’s time to spend on manually evaluating a bunch of spectra.
Most spectroscopists I know, do not have a lot of time for this. So of course it would be preferable to have a system that is potentially more false positive tolerant. However, the whole point of implementing the system in the first place was to identify the false positives.
But at the end of the day, I believe there are systems and applications out there that can help drastically improve the quality of your registration databases.
However, the balancing act between false positives and false negatives is an issue that you are most certainly going to have to juggle.