Cluster Analysis of Spectral Data

Data Mining with Confidence

Finding similarities or differences between spectra becomes more difficult as the number of spectra increase. Clustering is applied in chemical R&D to group similar formulations in competitive analysis; to evaluate sources of raw materials; and to compare polymorphs and determine good versus bad samples, or real versus counterfeit products.

ACD/Labs clustering toolset provides scientists with the ability to:

  • Cluster different types of spectral data
  • Using a Euclidean Distance (or first derivative Euclidean Distance) metric to develop clusters, ACD/Labs' algorithm can be applied to many data types including IR, Raman, 1H NMR, 13C NMR, and XRPD.
  • Make decisions with confidence—visualize and manage spectra in chemical context with associated metadata and structures.
  • The ability to examine members of each group is essential in assessing the confidence level of any given cluster. ACD/Labs tools enable scientists to analyze data sets and compare clusters.
  • View metadata associated with each spectrum and compare results within a cluster
  • Compare each member of a group with the average spectrum for that cluster
  • Compare clustering results from different algorithms or spectral types
  • View the nearest neighbor table to see the next best matches for any member of a group
  • Move spectra between clusters, merge existing groups, or create new clusters
  • Capture project knowledge—share 'live' spectra and chromatograms, analyst notes, and interpretation results for easier communication and collaboration between scientists, and enable re-use of information.

Once groups or classes have been established, new data (samples/spectra) may be compared to existing clusters to determine their proper group. Project knowledge can be further enhanced over time in this ULI-based approach to spectral data analysis.