ACD Labs Logo


October 31-November 4, 2021
Pennsylvania Convention Center, Philadelphia, PA, USA

Poster Schedule

Session: LC/MS: Chromatography and Software (In Person)

Identifying pharmaceuticals in wastewater using spectral and mass search: a software workflow
Richard Lee, Anne Marie Smith, Charis Lam

View Abstract

Identifying pharmaceuticals in wastewater using spectral and mass search: a software workflow
Richard Lee, Anne-Marie Smith, Charis Lam

Deformulation of complex samples remains a challenge in many fields, including environmental analysis. For example, the components of wastewater affect ecosystem health. But possible components are numerous, varied, and not necessarily known in advance, complicating efforts to identify them.

In this work, we present a software workflow for deformulation using complex LC/MS data. With a test sample containing pharmaceuticals in wastewater, we show the accurate identification of components using a combination of spectral search, mass search, retention time prediction, and fragment prediction. These multiple modes act as an accumulative filter, helping us narrow down candidate structures with greater confidence.

A wastewater sample was spiked with pharmaceuticals (including metoprolol, citalopram, and sulfasalazine) and run on an Agilent QToF LC/MS using ESI and All Ions Fragmentation acquisition, at 20 and 40 eV collision energies.

The data were imported into the software interface, where the chromatograms were deconvoluted and component spectra were produced. Component spectra were searched against spectral databases of known compounds. Any remaining unidentified components were then searched against ChemSpider and PubChem databases for matching accurate masses.

To rank the search results for candidates from the initial search structure search, in-silico fragmentation was performed and fragments assigned to the high energy collision spectra.

Preliminary data
The software workflow uses four layers of accumulated searches and filters to obtain candidate identifications:

  1. Spectral match
  2. Accurate Mass search
  3. Isotopic pattern match
  4. Fragmentation prediction

Known compounds are identified by spectral search. Hit quality is assessed by match to reference spectra. Mirror plots provide visual confirmation of spectral alignment.

For unknown compounds, an accurate mass search in large databases (>100 million unique structures) can produce many (>1000) candidates. To further filter these candidates, we can simulate the theoretical isotope pattern to further reduce candidate structures.

The top candidates are then run through an in-silico fragmentation routine, which predicts mass fragments from chemical structure. The fragmentation predictions are founded on a rules-based approach for the most likely observed dissociation. Predicted fragments are assigned to peaks on the experimental spectrum, and candidates are scored based on the fraction of successful assignments.

After final scoring, the top candidates correctly reflect components of the sample (e.g., metoprolol).

This workflow demonstrates the successful deformulation of a complex sample, producing accurate identifications from among millions of possible components. Though used in this case on a test sample of wastewater, it could be adapted for deformulation in other environmental, pharmaceutical, or chemical samples.

Novel aspect
A software workflow for the identification of unknown components in complex samples, using predictions to progressively filter candidate structures

Session: Informatics: Workflow and Data Management (Remote Posters)

New Data Marshalling Design System for Processing LC/UV/MS Data
Richard Lee

View Abstract

New Data Marshalling Design System for Processing LC/UV/MS Data
Richard Lee

Organizations are constantly attempting to streamline analytical processes within a lab, from sample management to data processing, reporting and transferring of data to external sources. Implementing automation for analytical lab work-horses, LC/UV/MS instruments, that enables data transfer, processing and movement to a final destination can be an onerous task, especially where a variety of instruments vendors are deployed. Typically, this type of activity is relegated to advanced programmers or systems architects to design this type of workflow, that will transfer data efficiently, often leaving the scientist or manager to follow up with them for additional support or addressing workflows that go offline.

To address the high level of programmatic competency required for these automated systems, we present a new graphical user interface (GUI) for vendor agnostic dataflow design system that enables the end user to create and manage workflows for a variety of LC/UV/MS systems. This system allows for the end user to assign data sources, frequency of scanning, set data processing parameters, report and data push to a desired database for future access. This new informatics system supports all major vendor proprietary data formats and various configurations of LC/UV/MS instruments. Moreover, this new architecture approach allows deployment

Preliminary Data
Biotech organizations generate a vast amount of LC/UV/MS data with scientists utilizing high-volume data generating low resolution open access systems to high resolution MS instruments for structure elucidation workflows. To reduce inefficiencies, automation services can be installed to address data processing, generating reports and transferring data but its often a highly customized "black box" where the end user or the system administrator does not have access to editing or creating new workflows because it requires a highly skilled programmer. In addition, users may be want to abstract all or partial data in a specific format to be used further downstream in machine learning (ML) frameworks or deeper analytics platforms, which requires additional work from a software developer.

In this work, we describe a web-based application for new architecture for a data marshalling system that incorporates a novel data processing and automation servers that can be easily expand to support a growing number of LC/UV/MS instruments. The novel data processing server can be dynamically scaled to account for load management when under heavy use. A specially designed data containerization unit based on HDF5 for LC/UV/MS data can hold thousands of datasets in a single container, that can be used for facile data access and extraction.

The processing parameters can be customized for basic data processing of low resolution LC/UV/MS datasets or for more highly advanced data processing algorithms requiring more computational resources such as chromatographic deconvolution for high resolution LC/MSn data. Once processed, the data can be transferred to a database for easy recall or reprocessing of data. In addition, the system allows for data abstraction and translation into a desired format, such as JSON, that can that be leveraged as input for ML/AI frameworks.

Novel Aspect
A new web-based workflow design and management system for LC/UV/MS instruments.