Several posts back I pointed you to a couple of articles ACD/Labs were involved in with regards to automated structure verification.
I have pointed to these articles, but I have spent little time talking about it. I will now.
For those new to this idea, it involves using software to automatically confirm the consistency between a chemical structure and an NMR spectrum using NMR prediction. Lee Griffiths from AstraZeneca has done excellent work over the years in this field. Lee was kind enough to present at our European User’s Meeting last year to share a summary of his approach towards automated structure using 1D 1H and 13C, and 2D HSQC data.
In addition, by doing a simple search for “Griffiths” on the Magentic Resonance in Chemistry webpage, you’ll find a whole bunch of relevant articles.
We initially published a validation on the performance of automated structure verification using just 1D 1H NMR data. We then proceeded to publish again recently to compare that to the performance of a combined verification approach using 1D 1H and 2D HSQC data.
As a result of these and other studies, much of the focus of late by ACD/Labs has been on the performance of
automated structure verification using 1D 1H and 2D HSQC NMR data.
These publications should give you a general idea about the performance and accuracy of this approach.
I am not going to discuss the performance of this approach today but rather focus on the real-world applications and performance in an industrial setting.
Last Thursday I was in New Brunswick, New Jersey at our New Jersey User’s Meeting where I was blown away by two terrific presentations by our guest speakers, Phil Keyes from Lexicon Pharmaceuticals and Anthony Macherone from ASDI.
Two different applications in two different environments. I’ll talk about Phil’s today, and Anthony’s tomorrow. Phil’s is interesting as he is setting up a really cool system to significantly improve how analytical data is handled in an open access environment, and further to validate Lexicon’s compound registration database.
In my opinion, the real crucial thing to point out here is the evolution of an open access environment from a more traditional analytical services setup. It used to be that NMR Spectroscopists would run and handle all the analytical data for compounds that a chemist produced, verify their structures for them, and give them the thumbs up or thumbs down. In this environment, spectroscopists were getting a look at the data from all compounds entering the registration database. In an open access environment this is no longer the case. While NMR spectroscopist certainly see lots of this data still, and they will likely eventually see a compounds data during it’s pharmaceutical R&D life cycle, the reality is that there are still going to be some incorrectly or questionably verified structures in a company’s registration database that will go on for further testing. Towards the evolution of open access NMR, somewhere along the way, it became OK for compounds to get registered without being approved by an analytical expert. Of course, these aren’t being registered blindly, chemists are approving these and in most cases they are more than qualified to do so and are doing a good job. However, I have yet to talk to a NMR spectroscopist who has NOT seen compounds registered incorrectly.
My point is of course to not pick on chemists here. Sometimes these mistakes are unavoidable and the data LOOKS right. Sometimes there is nothing in the 1H NMR spectrum or the LC-MS that suggests that there is anything different present. The key is to better identify when these instances arise in the registration database. Can an automated structure verification solution with NMR software replace and outperform the QC of a chemist for good in an open access environment? No, not right now anyway.
However, the key statement is in Phil’s presentation:
“Integrating a system to perform automated compound verification provides value by highlighting compounds for which structural data is complex and subject to interpretation.”
Sure there are going to be false positives and false negatives with an automated approach. The question is, if 50 out of 1000 compounds being registered by chemists are incorrect, is there value in automated software highlighting 40 of them?
False negatives can be annoying because it involves the spectroscopist to do unnecessary work on a sample that was correct all along. But other times it might point out the need to run more experiments to prove that it is indeed the right structure. Ideally ALL of the data gets manually evaluated, but in the age of open access NMR where chemists are outnumbering spectroscopists 100:1 in some organizations this is clearly no longer plausible. But is there a balance here? While it isn’t plausible to manually evaluate the data for say 1000 compounds, would it be feasible to manually evaluate the 300 of the 1000 samples that software has highlighted as complex or subject to interpretation?
For those who want to do advanced reading on the topic for tomorrow’s blog entry: