May 30, 2007
If you have read my earlier post, you will be aware of Wolfgang Robien’s critique of the NMRShiftDB.
Following this critique, Tony Williams from the ChemSpider Blog and Peter Murray-Rust from the Unilever Cambridge Centre for Molecular Informatics replied to Wolfgang’s comments.
Well now, it appears that Wolfgang has responded to Tony’s comments.You can find his response here.
It appears that Wolfgang remains firm in his stance that the NMRShiftDB is not a good resource for scientists as it contains too many errors. He continues with the comments, “But: Enjoy – it’s free!”
So I have a couple of responses in regards to Wolfgang’s comments in his follow-up (bolded parts are excerpts from other sources):
“When doing this job in a more systematic way not using specific examples as given here, the total number of incorrect assignments exceeds the above mentioned limit of 250 significantly. The intermediate number is at the moment around 300, but about ca. 1,000 pages of printouts are waiting for visual inspection.“
Is 300 vs. 250 errors in a dataset of over 200,000 chemical shifts SIGNIFICANT? Is a difference of 50 errors in this dataset statistically significant? That’s 0.025%. I await Wolfgang’s final results and then we can judge whether it is significant. Meanwhile, he should also read the document we produced comparing the prediction accuracy between ACD/CNMR Predictor and
Modgraph’s NMRPredict Robien’s CSEARCH algorithm if he wants to challenge our findings. I think it is a good place to pick up our conversation.
“I definitely do not claim, that collections like CSEARCH, NMRPredict and SPECINFO are free of errors – the desired level of errors is always 0.0%; a value which can’t be reached – the acceptable limit is clearly below 0.1%, maybe 0.05% is good compromise between dream and reality.”
I agree, as I mentioned in my last post that while the desired level of error is 0.0%, this is a value that cannot be reached. I certainly would not claim that our prediction databases are free of error. Further, our work reveals about 8% errors in the form of mis-assignments, transcription errors, and incorrect structures within the peer-reviewed literature we comb. Error is human nature.
Let me say, I am very confused by the positioning of this question to Christoph Steinbeck:
“Why do you “reinvent” existing systems – there are a lot of systems (with much better performance !) already around (a few in alphabetical order: ACD, CSEARCH, KnowItAll, NMRPredict, SDBS, SPECINFO)”
Why reinvent existing systems? To improve! To provide better resources for NMR spectroscopists and scientists around the world! While there is certainly better performing systems to date there is no reason to believe that these existing systems cannot be surpassed in terms of performance. Further, they offer an alternative to those institutions that do not have access to commercial products.
I think that Wolfgang is misunderstanding something here. From his writing, it seems that he feels threatened by the NMRShiftDB and is trying too hard to discredit the hard work and ideas behind this open source collection. What NMRShiftDB is providing, is something very different than anything the commercial products he names are offering. It is a truly open access and open source offering where scientists and spectroscopists can freely share their data and build an NMR database that is freely available to the scientific community.
It’s FREE! It’s not a commercial product like the ones he compares it to!
Christoph’s group is handling this very well and he mentions himself,
“validations like Robien’s and the ones performed by us help make a strong case for open access and open source policy.“
Finally, As I mentioned above, I can only make the assumption that Wolfgang has not seen my blog posting that compares the results of his algorithm vs. ACD/Labs. It should make for an interesting discussion.
EDIT: This conversation has continued in the following entries (in order):