ACD Labs Logo

Helping Analytical Chemistry Embrace Big Data

Published online at Technology Networks, August 2018.

Techlogy Networks published an article on data integrity, which includes commentary from ACD/Labs' Andrew Anderson and Graham McGibbon.

An excerpt from the article is included below:

"If you don't have correct data then it's pretty much unusable by anybody downstream, including yourself, for anything that you originally intended it for," says Andrew Anderson, Vice President of Innovation and Informatics Strategy at Toronto-based analytical software supplier ACD/Labs. Anderson suggests that this need for correctness is now being recognized at the beginning of the data life cycle – and the end: "Organizations like the Food and Drug Administration require pharmaceutical companies and drug manufacturers to have safe, efficacious and quality drugs and the data that they supply for characterizing those drugs has to meet guidelines according to data integrity. There's both the pragmatic impetus right from the get-go and at the end. What is the expectation if you're going to bring a product to market that is supposed to benefit people? If it's not what it's supposed to be, there could be really serious consequences."

Anderson's view is that data integrity is important at all stages of the research pipeline, from design to drug. This perspective has become vital as advances in technology enable data to be recorded from more sources in larger volumes: "One of the trends in industrial innovation is utilizing what we would call the secondary or tertiary value that you would get from data. Historically, if you look at how analytical data is leveraged within industry, it's question and answer, input and output. What people have recognized is that by having data you can infer trends, you can apply and use data for training sets, or things like predictive analytics, machine learning and the like. If I'm using analytical data to release a substance to be used in a pharmacy setting or in a commercial setting, that released data is used to give a green light to say, yes, you can release the batch for its intended use. If you store that data right on every batch that's ever been released, you can look at trends, and infer operational optimization decision making – do I see any trends in how quality at one site differs from another, for example?"

With these potential benefits available, it's surprising that analytical chemistry has been slower than other fields to embrace big data techniques, with available datasets and algorithms often not up to the task of analyzing complex chemical data. Andrew's colleague, and ACD/Labs' Director of Strategic Partnerships, Graham McGibbon, says that the complexity and volume of data are the biggest obstacles to simply adopting automation techniques: "You have optical spectra across ranges of wavelengths, you have experiments performed not just for the certain sampling frequency but across all frequencies. It takes time to run them—a chromatography run could take half an hour. If you're acquiring data for that entire half hour and you have a mass spectrometer attached, there could be thousands or millions of data points. Furthermore, you have multiple dimensions of information where you can probe how atoms are attached to each other. People want to know which peaks represent which atoms or features, and that complexity is really a key thing about chemistry data. I think it's much more complicated or complex than for some other data that people would choose to store in other fields."