Skip To Content

Navigating the Challenges of AI/ML in R&D

April 10, 2025
by Sanji Bhal, Director, Marketing & Communications, ACD/Labs

Data is the Biggest Barrier to AI/ML Implementation

The poor state of scientific data is the most important barrier, not only to artificial intelligence and machine learning (AI/ML), but also to scientists using experimental data in decision-making. This is apparent from surveys we have run (not limited to our comprehensive Analytical Data Management Report)…

…and third-party surveys conducted at industry conferences. As Christian Baber (Chief Portfolio Officer, Pistoia Alliance) concluded, presenting the survey results at Lab of the Future, the “issue is still the data”.

Image courtesy of Pistoia Alliance and Lab of the Future Europe 2024.

Scattered, Inaccessible Scientific Data

The Design-Make-Test-Analyze (DMTA) cycle generates vast amounts of analytical and chemical data. Experimental complexity and the need for multiple pieces of data to make informed decisions contribute to data that is scattered in heterogeneous formats, managed in a variety of software applications and platforms, and near impossible to share.

Scientists have become adept at assembling information from multiple locations to make decisions. The drive to leverage data at an organizational scale with AI/ML, however, has brought to light the jarring reality of poorly managed scientific data.

Scientific data is siloed, it is not findable, accessible, interoperable, and reusable (FAIR); there is a lack of consistency and quality, data is not curated, and data and meta data are not standardized.

Addressing these challenges is the first critical step towards AI/ML-ready data that will also benefit scientists.

Preparing AI/ML-Ready Scientific Data

Data science models require structured datasets that are normalized, standardized, accurate, and complete. Several foundational steps must be addressed to bridge the gap between raw data and AI/ML-ready datasets.

Digitalization is essential for making data accessible to machines and spans well beyond converting data to a digital format. Digitalization includes data standardization, normalization, automation, and systems integration.

Standardization and normalization of data formats are critical to enabling downstream computational use.  AI/ML frameworks are highly sensitive to inconsistencies, rendering data incompatible from a machine perspective.  Ontologies and controlled vocabularies can help resolve these discrepancies.

Automation plays a central role in scaling data preparation efforts. Without automated data marshaling—import, export, transformation, and movement—scientists are left to manually shuttle data between instruments, systems, and software tools. This manual effort is time consuming and risks introducing errors, inconsistencies, and data loss.

Automated workflows can streamline repetitive tasks such as analytical data processing, peak picking, result normalization, and report generation, freeing scientists to focus on interpretation and innovation. Beyond data marshaling, automation plays a critical role for data assembly, where multiple analytical datasets across different systems/instruments are generated from a chemical study.  The assembled data creates a digital representation of that chemical study.

Integration across systems is another critical component. Lab environments often contain a patchwork of disparate software platforms—instrument control software, LIMS, ELNs, and third-party analysis tools. AI/ML initiatives depend on the consolidation of data across these systems into a unified, context-rich data layer. This requires systems with robust APIs for communication and data access by downstream systems and AI/ML frameworks.

Scientific Data is not Treated as an Asset

Historically scientists’ attitude, particularly in research, is that they acquire data to make decisions—“my data.” Data from product development may be included in regulatory filings or used in manufacturing QA/QC, so attitudes towards data are more evolved, but still fall short of data being an asset for the organization to leverage.

It is acceptable for “my data” to be siloed within a project or team. Scientists can speak to colleagues to find other relevant information when required. A subtle shift from “my data” to “the company’s data” means data must be:

  • Easily findable and accessible to people and machines with permission to use it
  • Accompanied with all the relevant meta data and context for effective re-use

Digitalization and automation inevitably inflict change on processes for scientists that generate data and those that access data for primary use. This change in data culture and mindset eases the transition to new ways of working.

AI/ML Projects that Matter to Scientists

The hype around AI/ML can make it difficult for scientists to understand how these technologies can positively impact their work. While organizations must select projects that drive the most impactful change, including projects that motivate scientists to leverage AI/ML will vastly improve uptake. Easing of administrative burdens such as report-writing are an obvious application of large language models (LLMs), but scientists are more interested in AI/ML technologies that will facilitate the interpretation of results, help them explore larger experimental design space, and avoid unnecessary experiments.

The Data Science Skills Gap

As industries across the globe invest in creating data analytics divisions and leveraging AI/ML, demand for data scientists remains high. According to the U.S. Bureau of Labor Statistics, employment of data scientists is projected to grow 36% to 2033. Add to this data scientists who are leaving their jobs because they spend more of their time finding, cleaning, and organizing data than performing analysis on it (the enjoyable part); and the skills gap crisis becomes abundantly clear.

This leaves R&D organizations with two options:

  1. Employ external consultants, which can be costly
  2. Retrain employees open to a career switch

This is time consuming because scientists will need time to develop skills in statistics, mathematics, and programming; but is an important medium–long term investment. Data scientists with relevant domain expertise are invaluable because they understand the context and implications of their findings. They can identify patterns and draw meaningful conclusions for more accurate actionable insights.

An AI/ML-Enabled Future

AI and ML are poised to improve productivity and revolutionize innovation. The most important step for every R&D organization is to bridge the gap between raw scientific data and AI/ML-ready datasets. This is a complex but necessary journey for organizations seeking to unlock the full potential of data-driven science. Addressing challenges around data standardization, automation, and system integration will not only enable AI/ML initiatives but also empower scientists to focus on innovation and decision-making. Equally important is the cultural shift required to treat scientific data as an organizational asset—one that is findable, accessible, and reusable (FAIR). As organizations invest in building data science expertise and align AI/ML projects with scientists’ priorities, they will be better positioned to drive meaningful advancements in R&D. By laying this groundwork, organizations can overcome the barriers that are hindering AI/ML adoption and set the stage for transformative scientific breakthroughs.


Send me more info!

Subscribe to receive more information on analytical data management, Dx, AI/ML

This field is for validation purposes and should be left unchanged.

Comments

Your email address will not be published. Required fields are marked *

Send me more info!

Subscribe to receive more information on analytical data management, Dx, AI/ML

This field is for validation purposes and should be left unchanged.