By Charis Lam, Marketing Communications Specialist
ACD/Labs is launching a new podcast! The Analytical Wavelength will cover industry trends and topics for scientists working with analytical data. Join us for Episode 1 here, or read on for more insights about data and the mass spectrometrist. (Interviews are slightly edited for clarity.)
The days of integrating peaks by cutting and weighing them are thankfully past, but the modern deluge of data can drown the unwary scientist. Hyphenated techniques, MSn, data-independent acquisition, high resolution—what are we to do with everything we could potentially discover via MS?
“I think the next wave for pharma isn’t going to be so much actual hardware. It’s really going to be: how are they going to deal with the vast amounts of data generated off these LC/MS instruments?”
So says Richard Lee, Director of Core Technology and Capabilities at ACD/Labs. The bits and bytes that stream continuously off our instruments reveal sample composition, metabolites, degradants, chemical structure, and more. But to get quality information, quality data must be collected in the first place.
That means data organization, but it also means metadata organization. All experimental information exists within context: What was the sample? What were the compounds? What were the instrument and experiment conditions? Data cannot be interpreted apart from this information, and so productivity decreases when the two are regularly divorced.
Sometimes, the experimental context includes information from other techniques. According to Graham McGibbon, Director of Strategic Partnerships at ACD/Labs:
Sometimes you need to bring [together] multiple pieces of data to have an absolutely assured structure as I mentioned with NMR. To do de novo structure elucidation, you often need high-resolution accurate mass measurement. […] Bringing different kinds of data from different instruments in different formats can often lead to insights that you couldn’t get from just one single experiment, or one instrument or one technique alone.
The organization of all this data together, in a way that can be stored and readily searched, is an emerging challenge for companies that rely on scientific R&D.
A further step would be to make this data accessible across an organization. Richard says:
The transfer, processing, and movement of data within [companies’] IT infrastructure is very, very siloed, in a sense, depending on the department and application. But just because they’re in different departments, doesn’t mean that information and data cannot be leveraged by other groups. […] You could have the same type of department, let’s say a metabolism DMPK group, in one organization in site one and site two, and they may share the same database and same information. How do we share that information from the DMPK group to the discovery development groups to the upstream or downstream groups? So it may be that process chemistry may use some of that DMPK information, but they’ll have no way of accessing that. So just being able to share that information, if you can call up a chemical ID number, and it will bring up all that information about that particular compound across groups and departments. I think that will be a key aspect of it.
Putting that data organization in place would obviously be useful for organizations now—it would increase productivity, prevent teams from repeating work, and spark new insights from seeing disparate pieces of information together. But it will also help prepare for a possible future of AI and machine learning tools. Graham says:
People who are interested in doing data science with AI and machine learning tools want quality data. And that means that you have to take care of the parameters used to acquire the data and the instrument condition. You really have to practice good science, good data collection, good data organization, even if you’re going to go for non-hypothesis driven investigations.
Data also has to be normalized. Currently, different scientists, instruments, or programs might refer to the same concept using different words. But if data synthesis is going to be the future—if, say, sets of spectra taken from ten instruments will be combined for algorithmic processing—then a shared vocabulary is needed.
Every vendor has used its own terminology internally because they had to: for instance, ionization mode or type of ionization; for instance, electron ionization or electrospray ionization. There has to be a way to store that with a set of data or a spectrum in order that one knows what the ionization was.
And Richard adds:
They need to have a system in place that can either normalize that information so that one group’s idea of relative retention time, for example, is the same as another group’s. So there’s going to have to be an assistant in place to normalize that data, so that the entire organization is on the same page.
That’s one way for mass spectrometrists to prepare for the future without sacrificing in the present. Collecting quality data, aggregating that data, making sure it translates across formats—all that will make scientists more efficient now, but it’ll also help those looking to turn new tools on old data in the future. As Graham concludes:
If your instrument and your scientists are delivering value on a daily basis, I think then it’s figuring out how to leverage it for even more value. That’s the future of mass spectrometry.