Skip To Content

NMRShiftDB, ACD/Labs, and Modgraph

May 23, 2007

NMRShiftDB is an open source collection of  chemical structures and their associated NMR shift assignments. The database is generated as a result of contributions by the public and currently contains 19,958 structures with 214,136 assigned carbon chemical shifts. Turns out, it is also a GREAT resource for evaluating the accuracy of NMR predictions.

Several weeks ago, Wolfgang Robien performed a quality check of the NMRShiftDB by comparing the experimental values posted in this database with the CSEARCH prediction algorithm. This algorithm is currently in use in NMRPredict, a commercial product provided by Modgraph Consultants Ltd.

By sharing the results of his study online, we were then able to compare the performance of these algorithms directly with the prediction algorithms within ACD/CNMR Predictor.

The details can be found in the document at the bottom of this entry but the conclusions are very easy to make. Using what we believe is an unbiased, statistically relevant dataset with sufficient structural diversity, ACD/CNMR Predictor vastly outperforms the prediction accuracy of the CSEARCH algorithm. The average deviation in ACD/CNMR predictor was 1.59 ppm compared to an average deviation of 2.22 ppm in CSEARCH. 

The document also highlights a separate validation study we performed that considers the degree of overlap between the structures in the training set of ACD/CNMR Predictor with the validation set of NMRShiftDB. This study provides a true measure of the performance of ACD/CNMR Predictor for novel chemical shifts. A similar study of this nature was not conducted by Robien (at least not publicized to date), so we are unaware of the degree of overlap or the accuracy of his predictions on novel chemical shifts.

Download the PDF document here. (Updated July 5th, 2005)

Stay tuned, read the document, feel free to comment below, there is much to discuss around this…

EDIT: This conversation has continued in the following entries (in order):

http://acdlabs.typepad.com/my_weblog/2007/05/how_accurate_sh.html

http://acdlabs.typepad.com/my_weblog/2007/05/update_robien_o.html

http://acdlabs.typepad.com/my_weblog/2007/05/more_dialogue_o.html

http://acdlabs.typepad.com/my_weblog/2007/06/robiens_and_mod.html

http://acdlabs.typepad.com/my_weblog/2007/06/note-from-an-nm.html

http://acdlabs.typepad.com/my_weblog/2007/06/the_purgatory_d.html

http://acdlabs.typepad.com/my_weblog/2007/07/final-note-on-t.html

5 Replies to “NMRShiftDB, ACD/Labs, and Modgraph”

  1. I’ll add that the availability of the NMRShiftDB to perform this type of analysis is a true public service in terms of having a diverse database of this size to analyze. I’ve blogged elsewhere (http://www.chemspider.com/blog/?p=13) that, despite the fact that it is not perfect, NO database of this size is free of errors. The literature is loaded with poor assignments and, without care, they carry into the database. This is a high quality test set! So, thanks to the NMRShiftDB team. At least two groups have been able to perform an analysis of performance of NMR prediction algorithms. Are there others out there???

  2. Tony makes an excellent point. The NMRShiftDB team should be absolutely acknowledged here (and they are within the document)
    Bottom line, commercial products can conduct comparisons for databases that they create themselves…but it is difficult to make them free from bias. As mentioned, the NMRShiftDB is populated by scientists and spectroscopists worldwide. This is good data.
    The NMRShiftDB is a terrific resource for the validation and comparison of different NMR predictors. I encourage others to experiment and I’ll be happy to share the results on here (provided it is done right and overlap is acknowledged). I think the document provided in the entry above is a great way to evaluate the performance of different NMR predictors…and it’s easy to do.

  3. I am Jeff Seymour, Marketing Manager of Modgraph Consultants and I am responding to the articles on your BLOG which started on May 23rd http://www.acdlabs.typepad.com/my_weblog/2007/05/nmrshiftdb_acdl.html comparing ACD/CNMR Predictor to NMRPredict.
    In the PDF document which is attached to the May 23rd BLOG you state “we have (had) an opportunity to compare performance with another commercial product, NMRPredict provided by Modgraph Consultants, Ltd” and conclude that ACD “significantly outperforms the algorithms of Robien”.
    In your BLOG of May 30th http://acdlabs.typepad.com/my_weblog/2007/05/update_robien_o.html you state “ACD/CNMR Predictor vastly outperforms the prediction accuracy of the CSEARCH algorithm. The average deviation in ACD/CNMR predictor was 1.59 ppm compared to an average deviation of 2.22 ppm in CSEARCH.”
    The problem is that your were looking at prediction values from Wolfgang Robien’s web page from March 12th http://nmrpredict.orc.univie.ac.at/csearchlite/enjoy_its_free.html and you have assumed that Wolfgang was using NMRPredict to generate his results. WOLFGANG WAS NOT USING NMRPREDICT IN HIS ARTICLE – as he clearly stated.
    Wolfgang’s CSEARCH program can (as with ACD) use both HOSE code databases and Neural Network technology for its predictions. Wolfgang has over 750,000 data at his disposal. The most accurate predictions will always come from using a combination of well verified HOSE code databases and Neural Network technology.
    In his original paper Wolfgang only used a Neural Network from 1996, there were no HOSE code databases used. This is not surprising. Wolfgang’s intention in his article was clearly to demonstrate how in a few hours he could find glaring errors in the NMRShiftDB database. His intention was not, as you seem to have assumed, to show how accurate his predictions could be.
    Wolfgang’s CSEARCH program is indeed the basis of NMRPredict. However, together with Wolfgang, we have added significant enhancements, such as ‘auto-stereo recognition’, different utilization of solvent-dependent predictions and ‘BEST selection’, which are not available in CSEARCH.
    NMRPredict uses a database of over 345,000 records, a Neural Network and also includes a “BEST–selection” routine to choose which of the HOSE code or Neural network values to use for each carbon atom.
    We have now re-run the NMRShiftDB database using the NMRPredict program and have come up with an average deviation of 1.40 ppm compared to the 1.59 ppm in ACD CNMR/Predictor 10.5. Details can be found at http://www.modgraph.co.uk/product_nmr.htm, http://www.modgraph.co.uk/product_nmr_shiftdb.htm and http://nmrpredict.orc.univie.ac.at/csearchlite/Robien2Ryan_May31_2007.html
    In conclusion, using a set of data which you described as being “of size and quality to serve as a fair and reliable validation set to evaluate the performance of ACD/CNMR Predictor in terms of accuracy of NMR prediction” the outcome is that NMRPredict vastly outperforms the prediction accuracy of ACD/CNMR Predictor rather than the other way round. The average deviation in NMRPredict was 1.40 ppm compared to an average deviation of 1.59 ppm in ACD/CNMR Predictor version 10.5, already compensating for your somewhat smaller structural overlap.

Comments

Your email address will not be published.