Update: Robien on NMRShiftDB

If you have read my earlier post, you will be aware of Wolfgang Robien’s critique of the NMRShiftDB.

Following this critique, Tony Williams from the ChemSpider Blog  and Peter Murray-Rust from the Unilever Cambridge Centre for Molecular Informatics replied to Wolfgang’s comments.

Well now, it appears that Wolfgang has responded to Tony’s comments.You can find his response here.

It appears that Wolfgang remains firm in his stance that the NMRShiftDB is not a good resource for scientists as it contains too many errors. He continues with the comments, “But: Enjoy – it’s free!”

So I have a couple of responses in regards to Wolfgang’s comments in his follow-up (bolded parts are excerpts from other sources):


When doing this job in a more systematic way not using specific examples as given here, the total number of incorrect assignments exceeds the above mentioned limit of 250 significantly. The intermediate number is at the moment around 300, but about ca. 1,000 pages of printouts are waiting for visual inspection.

Is 300 vs. 250 errors in a dataset of over 200,000 chemical shifts SIGNIFICANT? Is a difference of 50 errors in this dataset statistically significant? That’s 0.025%. I await Wolfgang’s final results and then we can judge whether it is significant. Meanwhile, he should also read the document we produced comparing the prediction accuracy between ACD/CNMR Predictor and Modgraph’s NMRPredict Robien’s CSEARCH algorithm if he wants to challenge our findings. I think it is a good place to pick up our conversation.


“I definitely do not claim, that collections like CSEARCH, NMRPredict and SPECINFO are free of errors – the desired level of errors is always 0.0%; a value which can’t be reached – the acceptable limit is clearly below 0.1%, maybe 0.05% is good compromise between dream and reality.”

I agree, as I mentioned in my last post that while the desired level of error is 0.0%, this is a value that cannot be reached. I certainly would not claim that our prediction databases are free of error. Further, our work reveals about 8% errors in the form of mis-assignments, transcription errors, and incorrect structures within the peer-reviewed literature we comb. Error is human nature.

Let me say, I am very confused by the positioning of this question to Christoph Steinbeck:


 “Why do you “reinvent” existing systems – there are a lot of systems (with much better performance !) already around  (a few in alphabetical order: ACD, CSEARCH, KnowItAll, NMRPredict, SDBS, SPECINFO)”

Why reinvent existing systems? To improve! To provide better resources for NMR spectroscopists and scientists around the world! While there is certainly better performing systems to date there is no reason to believe that these existing systems cannot be surpassed in terms of performance. Further, they offer an alternative to those institutions that do not have access to commercial products.

I think that Wolfgang is misunderstanding something here. From his writing, it seems that he feels threatened by the NMRShiftDB and is trying too hard to discredit the hard work and ideas behind this open source collection. What NMRShiftDB is providing, is something very different than anything the commercial products he names are offering. It is a truly open access and open source offering where scientists and spectroscopists can freely share their data and build an NMR database that is freely available to the scientific community.

It’s FREE! It’s not a commercial product like the ones he compares it to!

Christoph’s group is handling this very well and he mentions himself,


validations like Robien’s and the ones performed by us help make a strong case for open access and open source policy.

Finally, As I mentioned above, I can only make the assumption that Wolfgang has not seen my blog posting that compares the results of his algorithm vs. ACD/Labs. It should make for an interesting discussion.

EDIT: This conversation has continued in the following entries (in order):

2 Replies to “Update: Robien on NMRShiftDB”

  1. Re: “Why do you reinvent existing systems – there are a lot of systems (with much better performance !) already around (a few in alphabetical order: ACD, CSEARCH, KnowItAll, NMRPredict, SDBS, SPECINFO)”
    To develop better predictive software, you need data. Unfortunately, many (or all??) of the existing algorithms and software were developed by academics collaborating with commercial database providers, with the result that their software became proprietary; i.e. in scientific terms, a waste of time (sometimes decades of work), as no other scientist can build on the work done.
    NMRShiftDB is a step towards providing data that scientists can use to develop algorithms and software that advance science.

  2. Excellent point baoilleach!
    Thanks for the comment. I completely agree with you in regards to providing data that scientists can use to develop algorithms and software to advance science.
    It is a major advantage provided by open source systems…again, something that the commercial systems that Robien describes do not provide to the public!


