Dereplication? Version 11- Searching PubChem using ACD/Structure Elucidator

This resource will likely only be useful to readers who are currently using ACD/Structure Elucidator, however, I do have two questions for my general audience and I would greatly appreciate your comments.

First things first, a technical note has been created to explain how to install and search the PubChem Database. It can be downloaded here:

You’ll notice in the introductory paragraph of this technical note it uses the term "dereplication" to describe the process of searching a spectral database with the NMR data of an "unknown" prior to elucidation.

Can this process REALLY be called Dereplication?

How do you define the term, "dereplication"?

The first hit from Google (not necessarily the most accurate) provides the following definition:


the process of testing samples of mixtures which are active in a screening process, so as to recognize and eliminate from consideration those active substances already studied; – a stage subsequent to the preliminary screening in the process of discovery of new pharmacologically active substances in mixtures of natural products; – also called countersceening.

I think this is a reasonable definition based on my understanding of the process.

A few years back, I did quite a bit of research talking to some natural products scientists to try and uncover this idea. I got various different views on the topic and the definition of this term and it’s applications. A common response that I got was something like:

"If it is an unknown compound, and I am able to use spectral databases to identify known compounds prior to elucidation, I’d call that dereplication"

And there were this valued added comment:

"If I am able to identify known compounds using a NMR search method, and avoid even two repeat elucidations per year, that’s incredibly valuable"

Probably the best, and most comprehensive responses I got were:

"Usually dereplication is done as early as possible in the process. If you have already isolated the compound to NMR purity most of the costs have already been incurred. Typically LC-MS and/or LC-UV on an early crude subsample is the most cost effective. However some pharma use LC-NMR, and using this technique is where NMR database searching can reap rewards."

"Dereplication is done on only a small crude sub-sample of the organism/extract long before large-scale isolation by chromatography is performed. Dereplication only makes sense at an early enough stage in natural product discovery to prevent the high cost isolation chemistry from being undertaken. Hence, only early stage dereplication makes business sense. It would be better to classify dereplication based on the hyphenated techniques. Your late-stage dereplication is really known structure matching/identification – a worthwhile and necessarry pursuit as dereplication is NEVER perfect."

"Searching databases by NMR prior to an elucidation represents dereplication in only some laboratory instances. This workflow would work very well in those laboratories that employ employ LC-NMR as a tool for the separation of natural product extracts. A fraction’s NMR spectrum can automatically be searched in a database to identify isolates that contain known compounds. However, this type of analysis is not done in all research labs. Therefore, without this type of analysis, the major costs of natural products research are generally incurred prior to NMR analysis. NMR is introduced as an elucidation tool after separation and purification. A combination of LC-MS and LC-UV, for example, can be used effectively for dereplication purposes as MS can provide an accurate mass and structural information and UV can provide insight on existing chromophores and a compound’s structure."

It appears that hyphenated techniques are likely the key to dereplication.

Some good work has be done here using LC-UV-MS, for example:

Merck in NJ:

Microbial Screening Technologies developed an in-house, metabolite recognition software called COMET that compiles and analyses co-metabolite patterns in natural product mixtures:

Dereplication using LC-NMR:

The late, great John Faulkner once said in a Philosophical Basis for Structure Elucidation:

"The problem with using NMR for dereplication is that no reliable method of searching NMR spectral libraries  has yet been devised, although there have some attempts to construct and search 13C NMR libraries. It is possible that computers will, in the future will be able to recognize and compare the patterns that are found in NMR spectra but that seems a long way off."

I think that computers can do this now. But what to call it?

Can an NMR DB search be termed dereplication?

Should it be positioned around and to only those people who use LC-NMR?

What do we call the discovery of known compounds by NMR prior to structure elucidation?

Provide your own thoughts on this.