Skip To Content

Robien’s and Modgraph’s Response on NMR Prediction Validation

Wolfgang Robien and Modgraph Consultants, Ltd. have responded to my latest comments and findings on the evaluation of our NMR predictions vs. the NMRShiftDB. 

Let me first begin by saying that over the last 24 hours, ACD/Labs
was informed by Modgraph representation that my claims on this blog regarding
CSEARCH and Modgraph’s NMRPredict were not clear. Since this time it has since
been explained that while Wolfgang’s CSEARCH program is indeed the basis of
NMRPredict, the commercial product includes different underlying databases and datafiles as well as additional
enhancements, such as ‘auto-stereo recognition’, different utilization of
solvent-dependent predictions and ‘BEST selection’, which MAY not be available
in CSEARCH. 

It turns out that Robien’s average deviation of 2.22 ppm was
based on his CSEARCH algorithm. So you may notice that in my previous entries,
I have scratched out any inaccurate product mentions of NMRPredict. If you read
Robien’s original posting you might be confused in the same way that I was
since in Robien’s initial study (the study my response was based on) he
referred to CSEARCH and NMRPredict as one and the same
. As you can see in the comparison table, he is directly comparing CSEARCH/NMRPredict to NMRSHiftDB.

Additional comments from Robien:

“A
few facts about CSEARCH/NMRPredict and NMRShiftDB"
(notice he is linking to the NMRPredict product description on the Modgraph website!!!)

"My
cooperation partners, where either CSEARCH-spectra are available or
CSEARCH-technology has been implemented, also within CSEARCH/NMRPredict other
collection like SPECINFO has been implemented"

Keep in mind, this is my own personal blog and I have taken
on a personal responsibility to inform the public on news and events in the
world of NMR software. I see the above text as written by Robien and make my
own conclusions. I took what Robien wrote and proceeded with my thoughts. It
was not clear in Wolfgang’s original article that he was not using Modgraph’s
NMRPredict. If he had mentioned it, then perhaps it would have been less
confusing. But there are several mentions and links to NMRPredict and Modgraph throughout his web pages. 

ACD/Labs has also made the appropriate revisions to the official validation document that I posted
on my blog. The revised version is here. 

I want to stress that while this document mistakenly claimed
a comparison with Modgraph’s NMRPredict, I believe that this document still
represents a valid practice for the
evaluation of prediction accuracy. Transparency within this validation study
was the goal and I truly believe that this study represents a fair and unbiased
way of properly performing a prediction accuracy validation on an independent
data set. 

With that in mind, Robien has since collaborated with
Modgraph Consultants, LTD. and produced what they believe is a TRUE prediction
accuracy evaluation of the NMRPredict product. 

His findings reveal an overall average deviation of 1.40 ppm
(compare to the ACD/CNMR Predictor deviation of 1.59 ppm).

So has Modgraph now definitely proven that indeed it is,
the most accurate carbon 13 NMR predictor in an independent evaluation? 

The numbers suggest so, but please pay attention to the
details within the study.   

“For this evaluation the combined databases from CSEARCH and SPECINFO holding a total of 345,308 reference spectra were
used. Based on this higher number of reference spectra a somewhat higher
structural overlap between our databases and the NMRShiftDB-test data has been
detected.”
 

In our document, we clearly state the structural overlap
(57%) between our prediction database and the NMRShiftDB. This number is not
provided by Modgraph. This number is extremely important to know, is it not?
They mention “somewhat higher structural overlap” but what is it? 

They base their final results on the following: 

“In order to compensate for
this, we have recalculated our overall average deviation of 1.40 ppm using the
lower structural overlap as detected by ACD. The value of 1.40 ppm corresponds
to 92,927 known carbon environments and 121,209 unknown carbon environments –
without this compensation our overall average deviation would be slightly
better, but a comparison with ACD’s results would be impossible.”
 

I do not believe that this is a very scientific way to
produce these results. It is certainly not a valid practice to compare directly
with the results produced by ACD/CNMR Predictor. Which compounds (or chemical
shifts) did they remove for their analysis? 

For example, if they have 80% overlap between their database
and the NMRShiftDB, which chemical shifts do they remove from their analysis to
get to an overlap of 57%? (as they have to produce their final deviation of
1.40 ppm). 

Why not state what the overlap with NMRShiftDB is, and then
perform a validation on completely novel chemical shifts like we have done?
Transparency and valid science was our priority during the creation of the
validation document. As a result we stated the database overlap up front, and
conducted two separate studies to inform the public of our performance under
both circumstances. 

So before we can make a final decision on performance, I
think Modgraph needs to make very clear the following: 

  1. What is the overlap between NMRShiftDB and Modgraph’s NMR prediction databases?
    Further, with several different database sources how much duplication of data
    exists across the databases and within the
    entire package?

  2. Once
    that overlap is removed from the dataset, what is the final deviation produced
    by NMRPredict?

I think this information needs to be made very clear from
Modgraph before they can claim to be, “the most accurate carbon 13 NMR
predictor in an independent evaluation?

We worked very hard to create a best practice for the
validation of prediction accuracy on an independent data set. It is
disappointing to see that the example was not followed in an attempt here by
Modgraph to compare to our results.  Perhaps there is a better way. We are
open for suggestions.

I hope that we are able to compare the performance in a
truly fair and reliable way as to provide the public with the correct
information!  

Finally, Jeff Seymour, Marketing Manager for Modgraph
Consultants, Ltd. has issued a comment on my blog here:

http://acdlabs.typepad.com/my_weblog/2007/05/nmrshiftdb_acdl.html#comment-71790940

I want to highlight one comment specifically:

The average deviation in
NMRPredict was 1.40 ppm compared to an average deviation of 1.59 ppm in
ACD/CNMR Predictor version 10.5, already compensating for your somewhat smaller
structural overlap.
          

Again, I ask…what is the structural overlap and once you
compensate for that and remove it from the test set, what is your average
deviation? 

These numbers have NOT been published, and therefore this
statement is dubious for the time being.

More information from Robien’s finding can be found via the
link below:

http://nmrpredict.orc.univie.ac.at/csearchlite/Robien2Ryan_May31_2007.html

Modgraph’s announcement is here:

http://www.modgraph.co.uk/product_nmr_shiftdb.htm

EDIT: This conversation has continued in the following entries (in order):

http://acdlabs.typepad.com/my_weblog/2007/06/note-from-an-nm.html

http://acdlabs.typepad.com/my_weblog/2007/06/the_purgatory_d.html

http://acdlabs.typepad.com/my_weblog/2007/07/final-note-on-t.html