November 30, 2023
by Daria Thorp, President and CEO, ACD/Labs
2023 in Review: Was it a Year of AI?
This year will be remembered for many complex and challenging scenarios, in the economy and geopolitics, across the globe. I join the many voices that express hope for their fast and successful stabilization.
In contrast, a few developments this past year made technology and science enthusiasts “giggle with joy”. When an emerging technology develops to the level that one’s first impression is, “Wow, this is amazing!”, everyone starts to look for further industrial applications. For me, Generative AI and Machine Learning (ML) application in drug discovery and development has been one of several “fun” topics of 2023!
Yes, I do have an OpenAI account…What can I tell you that you don’t know already? Generative AI today is excellent for writing, but terrible for expert level chemistry; not surprising, considering its intended scope and purpose. There are efforts underway to enhance such large language models with chemistry awareness, notably in the literature/patent space. However, there is more to AI technologies than natural language-based applications. What are the AI/ML advances within industry segments that ACD/Labs is watching closely?
Drug Discovery and Predictive Modeling
Early Drug Discovery holds the most promise for predictive modeling, where computer algorithms have been serving scientists for decades. ACD/Labs is traditionally known in this field and has been integrating our in silico predictive models into our customers’ machine learning efforts for over 25 years. However, the recent push to integrate the AI/ML frameworks and new modeling capabilities into the drug discovery process have been impressive. The latest technology can manage, arrange, and sort incredible volumes of data in excess of what is humanly possible. At the same time, it can work on smaller training sets that classical statistical analysis never could, through a variety of learning algorithms. Even the FDA put out a Discussion Paper on AI/ML in 2023 (1). We all closely follow the progress made by a number of industry leaders putting their drug candidates through clinical trials.
Definitely, learning technology and discussing what is possible with generative AI has been a 2023 highlight. I suspect the topic’s popularity is driven by us all, knowingly or unknowingly. Having looked at ChatGPT, with our collective hearts skipping a beat, we might have expected a miracle. And indeed it does look like a miracle when it works! But as with any technology, it comes with caveats and limitations, and requires knowledge to be put into the algorithms in order for the model to work. While I am ready to hope and wait, the realist in me demands practical, tangible, useful accomplishments.
There are many non-magical applications of this technology in Drug Discovery today. Having said that, they all rely on experimental data, big or small. One important caveat is that data is not neutral, it has a bias. This is definitely a broader topic than this blog can hold, but if you think about it: defining by who, when, and how your data is assembled; its quality, schema, heterogenicity, source, and inclusion criteria potentially affect the outcomes. There is also value in negative data—the one for failed experiments, rarely retained as part of routine operations. I have enjoyed reading my colleague’s article on data in American Pharmaceutical Review (2) recently, that summarizes some of our experiences. Data engineering is critical to the success of AI/ML, and is not discussed and described enough.
Beyond data science, why are we not seeing instant magical AI and ML breakthroughs? Drug Discovery is a field of Unknown Unknowns… Algorithms are at their best when the subject matter is Known, training data well curated, and foundational mechanisms/processes well defined. All of that is virtually never aligned in Drug Discovery. Biological and target behavior face a significant risk when presented to ML applications. As in any drug discovery process, quality and curation of data, and clarity of experimental design are paramount in AI and ML virtual research. These techniques are not magic; they are science and technology enablers, and a new generation of scientists is emerging to plan and conduct these experiments. In the end, experimental testing (aka clinical trials) will be the proof, as it always has been.
With contrasting opinions on the progress to-date, I welcome you to browse the 2022 McKinsey and Company summary (3) and the blog by Derek Lowe (4) (please search for the latest blogs, too many to link).
In our product context, ACD/Labs predictors apply machine learning techniques and neural networks within a reasonably well understood bio-chemical and spectral domain. Such focus, alongside our continuous improvement efforts, results in reliable prediction, as evidenced by our products’ popularity and longevity with users. Within the AI/ML discussion, in addition to the traditional benefit of screening for future problematic ADMET characteristics, I would like to particularly highlight the ionization prediction capability that we offer (also known as pKa prediction). It is critical that the chemical identity of the test substance is fully understood and assured for any bioassay tests to be representative, and for any software model as well. The behaviors of the ionizable chemical entity change depending on the pH value of its environment, which might be obvious to a chemist but not always to a software modeler and a modeling framework algorithm. Furthermore, localized charge state might also influence the target bioactivity during in-vitro testing, and might afford modifications of the molecule’s drug behavior (such as ADME, Tox, etc.) without an impact on lead series potency. Additionally, preparation of test samples and drug product formulation are greatly affected by ionization. And one more thing: best practices suggest that scientists use logD not logP to assure ionic forms across pH ranges! I will leave it at that… All of these challenges can be anticipated with a bit of predictive ionization insight.
Accelerating Digital Transformation in R&D
At the later R&D stages in biopharma and process/product development in chemistry-dependent industries, we observe different dynamics. To put it bluntly, conversations around AI and ML quickly become conversations about accelerating digital transformation, the prerequisite for process efficiency enhancements, including AI/ML derived benefits. I would argue that at this time, ROI for the productivity and efficiency from automated digitalization is the greatest practical benefit. It appears that McKinsey and Company, in their advice to the QC labs (5), different from R&D Analytical Instrumental labs but a technologically related segment, also place AI/ML into Horizon 3, the level after their Digitally Enabled and Automated visions.
As in other applications, in analytical chemistry the quality of data and processes determines the quality of computer assisted outcomes. We are collaborating with several “Top 10” Pharma companies on the automated creation of curated analytical databases. Presently in production, these databases are in excess of 15TB in size, and growing; such effort is clearly an investment. The analytical chemistry data that ACD/Labs Spectrus and customers’ R&D Digital Science initiatives are aggregating is comprehensive and “live”. Even with that, the design of the datapoints for further machine use, and needed levels of data abstraction-on-demand remain the critical part of such projects that require specialist efforts. When designed to be reused by BOTH humans and machines, and in alignment with FAIR (Findable, Accessible, Interoperable, Reusable) and ALCOA+ standards, the digitalization and normalization of data that previously belonged to inaccessible silos becomes suitable for deeper insight.
Live Analytical Data
- Abstracted text, numbers, and images
- Difficult to interrogate and review
- Must find the original raw data before reusing
- Rich spectra and chromatograms
- Interactive and intuitive for scientific review
- Immediately available for re-use
With live analytical data, you store data with its interpretation, rather than separating the two in different places. Spectrus stores live data for seamless data analysis and reanalysis.
Among such insights are improvement of test methods, hardware utilization, test volume optimization, problem prevention, and accuracy improvements. Companies can minimize experimental duplication and enable data sharing across the organization, with far-ranging commercial benefits. To this end, ACD/Labs 2023 Conduit technology introduction, especially when paired with Spectrus JS cloud-based analytical processing software, greatly simplifies standardization, automation, and distributed access to results. Our customers are also working on improving automation and advanced analytical interpretation around experimental results (one example is automated molecular identity verification, or ASV, that ACD/Labs enables; such automated decision support benefits greatly from experimental contextual data utilization). Some of the projects our Pharma customers are working on were presented at our October 2023 virtual symposium.
Computer Modeling in HTE and D2B Techniques
Last but definitely not least, an area that is ready for computer modeling is High Throughput Experimentation and emerging Direct-to-Biology techniques. By their very definition, the technologies are the source of experimental results, critical for algorithm learning/training, and could benefit from unbiased interpretation to optimize future experimental designs.
Our observation is that the degree of engagement varies greatly between companies as well as industries. The common thread is that successful projects are undertaken jointly by our end users within R&D science and business teams, alongside IT and data scientists, signaling deeper integration of models into day-to-day operations.
To me, and beyond the popularity of the AI/ML topic, decisions have to be based on simple business guidelines:
- What is the benefit in terms of efficiency, productivity, or innovation?
- What is the investment in terms of money, time, people skillset, and business process change?
- How do these two compare?
After comprehending what is possible at the edge of chemical and computer science, this vision can be applied to people, processes, and costs. Many organizations are choosing to develop a comprehensive strategy and proceed to digitally enable their critical path elements. Sometimes, they are unlocking the bottlenecks… sometimes, offering distributed solutions or innovation drivers. I am looking forward to dreaming these strategies into reality, and seeing them come to life, in 2024 and the following years!
Happy New Year!
- FDA. (2023, May 5). AI/ML for Drug Development Discussion Paper. US Food & Drug Administration. https://www.fda.gov/media/167973/download
- A. Anderson. (2023, Oct. 1). The Role of Data in the Pharmaceutical Lifecycle. American Pharmaceutical Review. https://www.americanpharmaceuticalreview.com/Featured-Articles/607993-The-Role-of-Data-in-the-Pharmaceutical-Lifecycle/
- A. Devereson, et al. (2022, Oct 10). AI in biopharma research: A time to focus and scale. McKinsey & Company. https://www.mckinsey.com/industries/life-sciences/our-insights/ai-in-biopharma-research-a-time-to-focus-and-scale
C.A. Viswa, et al. (2024, Jan. 9). Generative AI in the pharmaceutical industry: Moving from hype to reality. McKinsey & Company. https://www.mckinsey.com/industries/life-sciences/our-insights/generative-ai-in-the-pharmaceutical-industry-moving-from-hype-to-reality
- D. Lowe. (2021, Nov. 8). AI-Generated Clinical Candidates, So Far. Science. https://www.science.org/content/blog-post/ai-generated-clinical-candidates-so-far
- N. Carra, et al. (2021, Apr. 14th). Digitization, automation, and online testing: Embracing smart quality control. McKinsey & Company.https://www.mckinsey.com/industries/life-sciences/our-insights/digitization-automation-and-online-testing-embracing-smart-quality-control
- ACD/Labs. (2023, Aug. 14). A Leap in Ionization Prediction: ACD/Labs Unveils Groundbreaking Version of Modeling Software at ACS Fall 2023. Advanced Chemistry Development, Inc. https://www.acdlabs.com/resource/a-leap-in-ionization-prediction-acd-labs-unveils-groundbreaking-version-of-modeling-software-at-acs-fall-2023/
- S. Bhal. (2023, Aug. 10). The Importance of Ionization in Pharmaceutical R&D. Advanced Chemistry Development, Inc. https://www.acdlabs.com/blog/the-importance-of-ionization-in-pharmaceutical-rd/
- ACD/Labs. (2023, Oct. 25). Driving Efficiency with Spectrus®. Structure Characterization, Method Development & Analytical Data Management. Advanced Chemistry Development, Inc. https://www.acdlabs.com/resources/virtual-symposium-driving-efficiency-with-spectrus/
- ACD/Labs (n.d.). Structure Elucidation & Verification. Advanced Chemistry Development, Inc. https://www.acdlabs.com/solutions/structure-elucidation-verification/