Skip To Content

Linking to Meaningful Data in an ELN World

In a previous post, I asked the question, why does paper spectra continue to persist in chemistry?

Of course there is the next challenge, as Rich Apodaca points out on his Depth-First Blog in an earlier post:

The previous article in this series,
suggested that the same dynamic applied to the compilation, management,
and sharing of spectral data by chemists. More to the point:
   

… cheminformatics has failed to deliver an
inexpensive, robust, and truly usable solution to the problem of
compiling, managing, and sharing spectral data for scientists of
average computer skills. …

To be sure, there are tools that address parts of the problem. But
no solution addresses them all and that’s why scientists and publishers
resort to using obviously inferior solutions like PDFs.

Whether or not organizations and groups are resorting to inferior solutions is up for debate because it of course depends on the expectations of the end user. But his comments definitely struck a chord with me.

So the next question is:

"What is the best way to connect my analytical data to my ELN records TODAY?"

By far, the most common way that I have seen organizations connect the analytical data from our software to ELNs is via PDF.

But as Rich mentions in yet another post,
for people who are looking to build on experiments or model or compile
the results, static PDF images are practically useless.

I couldn’t
agree more.

So why do organizations choose this route?

The three biggest reasons I have heard are:

  1. File size limitations in the ELN
  2. The lack of a standard and supported analytical data format that is generic, open, lockable, and widely supported for years to come.
  3. Currently,  PDF is more controlled for legacy support than analytical data.

As a result, PDF is the only reasonable approach for many, and it is certainly
better than not connecting to a record of the data at all. 

I think the key is for vendors to work horizontally and to combine their strengths to deliver as Rich suggests a:

an inexpensive, robust, and truly usable solution to the problem of
compiling, managing, and sharing spectral data for scientists of
average computer skills.

But the file format remains an issue.

Work by the ASTM E13.15 Commitee has been ongoing for the past 5-6 years towards a universal analytical data file format. This file format is called AnIML (Analytical Information Markup Language), the developing XML standard for analytical chemistry data. Most vendors support the general directions of the ASTM E13.15 for a universal data format for analytical data.

A
final note on the role of MEANINGFUL data in an electronic world. When I
refer to meaningful data, I am referring to knowledge gained and stored in an actual data file as
opposed to a static PDF. One of the unique features that ACD/Labs has
maintained over the years is the ability to electronically assign NMR
data to chemical structures to truly capture not only the data but the
knowledge gained from the experiment. I think not leveraging this
knowledge is an awful shame, especially in an electronic world, but I
think it will come.

As of right now, While it is common that NMR Spectroscopists will
assign their data electronically, it is very rare to find a group of
chemists in the pharmaceutical industry, for example, who routinely use
their processing tools to assign their data. Why?

  1. They might not have the right software tools
  2. It is not
    required. In fact, in some cases I have learned that it is forbidden.
    Why spend the time it takes to assign the data if it is not required or
    permitted?

A static PDF is indeed proof that an experiment was run, but does it contain information that supports a proof of the proposed structure? Where is the knowledge that was gained from this exercise?

I think 1D NMR Assistant significantly reduces the amount of time it takes to electronically assign a spectrum so now it is just a matter of finding an easy way to tie this assigned analytical data to the ELN.

I think there is a real opportunity here.

What are your thoughts?

Would you prefer electronic data over PDFs?

Is simply raw or processed data enough?

How important is maintaining the knowledge gained from the experiment (i.e. assignments)?

Thanks to Rich for the multiple inspirations for this and previous posts.