Presentation

Enabling Self-Driving Labs for Synthetic Molecule Process Development

Takeda has invested significant time and effort in digitalizing and automating R&D. In synthetic molecule process development (SMPD) this involved building up high-throughput experimentation (HTE) and automation capabilities to accelerate process optimization.

Join Adrian Ramirez Galilea, Associate Director of Automation & HTE at Takeda, as he shares his experience and insights. He will discuss:

How experiment time has been reduced from weeks to days (or minutes) with integrated systems that enable data flow and experiment execution with minimal human intervention.
The importance of automating analytical data processing for a seamless feedback loop.
Generating high-quality, reproducible, AI-ready, FAIR data from consistent, repeatable experiments.
Designing standardized workflows suited for self-driving labs.
Takeda’s vision and progress towards a 24/7 autonomous lab.

Presented By

Adrian Ramirez Galilea

Associate Director of Automation & HTE, Takeda

Transcript

A. Ramirez Galilea 00:00

Today we would like to show you the recent work we’ve been doing together enabling self-driving labs for synthetic molecule process development. But before jumping on the self-driving lab topic, I would like to give you a bit of background of who we are and what we do and why we are pursuing this self-driving lab approach. First, a bit of history about Takeda. It was founded in in 1781, so it’s a quite an old pharmaceutical company with the HQ in Tokyo, Japan. It’s one of the top 20 pharmaceutical companies with around 50,000 employees, and therapeutic areas include oncology, rare diseases, plasma, genes, neuroscience and gastro.

Inside Takeda we are part of the synthetic molecule process development, or SMPD, which is a process chemistry and engineering group located within R&D. We have around 100 FTEs in two sites, one in Cambridge, MA and one in Shonan (Japan), and we only focus on synthetic molecules. So from all the prior pipelines, we will tackle those that comprise small molecules, peptides and oligos.

Our work spans from early process development to process optimization and tech transfer. We support this pipeline by finding new routes, improving those routes so they can transfer for clinical trials or manufacturing. We scale up some existing routes to provide materials for clinical trials, and we also do tech transfer to our manufacturing partners.

Inside SMPD, we have our high throughput and automation team, so we are embedded into the press engineering group. It’s a relatively new team. It’s less than three years old, and prior to 2022 there was no dedicated high throughput or automation group at SMPD, only some isolated SMEs and equipment, with most of the work being externalized. So all that you are seeing today has been developed in a bit less than two years. So it can be done!

Our team aims to provide support to all global SMPD, and our focus areas include solubility, crystallization and lipid destruction. We obviously do a lot of reaction screening and optimization. And in addition, we do some custom automations and some robotics projects that you will later see.

A. Ramirez Galilea 02:81

This is our team, so myself; then we have Andrew Kukor-he’s an automation scientist, Mitya is our robotics engineer, Uda San, is the east side of our automation scientists, and Phillip Tran is our newest robotic engineer. If you’ve seen our new labs in Cambridge, we have two differentiated areas. We have the main high throughput labs around the Unchained Labs family of tools. So we have here four Juniors and one OSR. They’re in a set of two. So, we have one powder Junior and one liquid Junior and duplicated, and they are inside a custom-made isolator here on the left, and then we also have a robotics development area where we carry out some custom integration projects ourselves.

In the image on the right, you can see a dual arm in a ventilator frame. And this lab is already currently being replicated in our site, in Shonan. And one of the beauties of this workflow, is that once you have everything done, it has taken us two years to reach this state, now we can very easily replicate it, in our labs at other sites, because we have all the knowledge.

So, a bit more background, our historical problem statement and why we are trying to do what we do, is that in the past, process development, and I think it still, is very labor intensive. You have to do a lot of random errors, a lot of screening optimization efforts, and they were typically done in vials, with the DOEs being experience-driven by our scientists. And this resulted in SMPD being heavily outsourced because we didn’t have the numbers to run all these internally. Then if you outsource some of this work, often we have no clear understanding of the methodology employed by our partners, so how many samples were evaluated with certain variables were selected or discarded. You typically just receive a report with few slides with the best performing candidates. And of course, the data is not FAIR. You don’t have, typically, access to all the analytical data or the process data of the reactors. And then one of the issues that we have, and also if we do internally it’s a bit minimized, but it also happens that after we complete a project, that data is not stored in an internal database. The knowledge is lost. And typically, we duplicate efforts across projects. So, unless we have a scientist that joins a similar project where they have faced a similar issue, we typically have to start from scratch. And on top of that, we are a relatively lean organization with not so many contractors or FTEs to run internally all the required screens in the standard manual approach. So if we want to change and shift how this works, our solution was what we call the self-driving data rich HTE that we are trying to apply here.

A. Ramirez Galilea 11:80

Now we are going to dig a bit more on this workflow that I will explain point by point. So here you can see a scheme of how we envision this workflow, what we are trying to build. So this is a closed loop, multi generation, self-optimizing workflow. But the main difference is that it’s not just the closed loop. But the way we envision it is that we start in our ELN, in this case, Dotmatics, where scientists introduce a reaction, this case, for instance, that they want to optimize, and then that reaction, which is just the smile codes of your product, your starting materials, etc., gets translated into a reaction database where we have all of our prior knowledge. And there is a sort of retro synthesis tool, although it’s more a parameter recommender tool, that looks at that reaction database at all of our prior knowledge, and then it’s able to tell us if we have done something similar in the past, and more importantly, what parameters should we vary in order to optimize these reactions. Because most of the time when we are doing reaction optimization, we tend to use the same bases, the same solvents, the same catalyst.

We can have the best closed loop, with the robotics, with everything in the world but if we constrain our parameter space and we left out some of the catalysts, some of the bases, some of the solvents that do actually work, we might be discarding pathways that do actually work, but we simply are not able to optimize for them because here in the first step we are setting the boundaries of parameter space, and we are missing some key elements. So prior knowledge, coupled with retrosynthesis, is where we try to minimize that in order to have a proper data driven DOE, not just experience intuition driven DOE.

Once we have that, we need to translate that into an automated reaction screening. In our case, with the Unchained Labs family of tools. We are also working in automated consumables handling, because this is also something that typically is overlooked. And if you run this 24/7, you will start to generate a lot of waste that you don’t want manually to take care of. We are also working on this. Then we have the data rich analysis. So we call it data rich HTE, because it’s not just one sample for HPLC, we can also take Raman, and Vision and we can also take XRD if you are doing some crystallization, that will be the same loop.

Then we need a way to automate the data collection and visualization. That’s where Katalyst [D2D] kicks in. And then we also need to send this data to our AI optimizer to predict the next generation. What should we test next? So that’s where we are using Atinary for. So, then we can carry out this loop 3, 4, or 5 times until we reach the maximum. And this is our standard business practice. We don’t do factorial DOEs anymore. We always go through this closed loop workflow, and it has been demonstrated to be really robust and efficient. But in order to be successful in this approach and have a paradigm shift we need a smart combination of hardware and software. This is not just “I’m going to buy a few robotic platforms, and at least all of my problems are solved”.

You need to also look at the data automation to remove all the bottlenecks like analytical data process or software-hardware connectivity, because here we are looking for end-to-end lab automation. So, it’s a smart combination of both. And most of the time, the data automation is overlooked because it’s behind the scenes and it’s not something that you see in the lab. You see all the robots, but not all the data connectivity that needs to be in the back for this approach to really be successful.

Just to give you one example of the integrations that for our workflow we need to have. It’s quite a lot of software layers between the physical and digital automation with a master scheduler on the top. We start with this retro synthesis database that predicts the conditions. Those conditions need to go to Katalyst to build the reactant tables. Then those tables go to Atinary. Atinary produces the first generation of samples to be evaluated. Katalyst translates that into the Unchained Labs robots, they perform the experiment. Then we have the mobile robot that moves the samples to UPLC. The UPLC does the analysis and data goes back to Katalyst for processing. Then Katalyst sends that information to Atinary to predict the second generation. Then back to Katalyst to create the new Unchained Labs file. Then we have the mobile robot here, transferring new plates and removing old plates to the Juniors. And on top of this unit, a master scheduler that’s able to trigger all of this autonomously.

In our current state, we are almost there regarding integrations. Only a few of them are still being developed, and we are also developing the master scheduler. So it’s a journey, and there is a lot of very small tasks that need to happen.

Now we are going to dive a bit more on both sides, in the physical automation and in digital automation. So for physical automation, the first approach we took was to standardize the size and format of all our screens, moving to bigger formats and lower throughput, 24 by 4ml, instead of the standard HTE practice of 96 well plates. And we not only standardized the plate format, but also the positions in the robot deck, so we will always have the solvents in the same positions, the reaction plate in the same position, the gaps in the same position, because this means that we will have less errors with the robots. By applying this standardization it’s much easier for us to reach this autonomous, self-driving loop.

Also in this lower throughput plate, we can gain confidence in the data because we scale that 10 times to reduce manipulation errors. We can overcome the loss of throughput, so we don’t care anymore about, big format plates because we are using this AI guided optimization with the Atinary algorithms from one generation to the other. This means that despite using a smaller format, we can more efficiently explore the parameter space. We can sample several times without disturbing the reaction. So typically, we sample four times.

Now we can fully automate the workflow leveraging mobile robotics, and we can do a more data-rich despite being an HTE workflow. We not only take HPLC, but also images, and non-contact Raman (we can take the vial to the Raman). So, we have a lot of information for each data point, not just a hit.

This standardization is what has allowed us to automate the workflow with the Unchained labs robotic platforms. We start with the powder dispense, then we do the capping, then we liquid dispense, then incubate. Then we can take an image of the vial. We can also take a Raman and non-contact Raman. We can filter, if needed, for the sample prep. Then we do HPLC sampling. And we can also prepare in the Junior these XRD plates in case we are doing solid screening or crystallization.

We have also developed a solution to close the physical workflow with a mobile robot looking to transfer the plate across the different Juniors and the waters UPLC. With this integration we will almost double our throughput. What typically happens is that when we start an experiment we start in early in the morning and typically the reaction ends around two to 3am in the morning. So then you need to wait to the next day to move and analyze the full plate. Analysis of the full plate typically takes also a few hours. So then you are basically losing that second day. If we have the mobile robot by the time we go back to the lab, the analysis is done. So we gain this extra layer which is almost doubling your capacity. In our particular case, it is a MiR base with a UR5 arm assembled by Enabled Robotics.

A. Ramirez Galilea 15:56

Here we have a video where you will see the robot in our lab. This is the isolator, where we have the two Juniors, the other Junior on the left. Here you can see the mobile robot opening the isolator door, it’s opening it like a human. Then we have there the HPLC sample with a rail that goes out after the reaction is completed. The robot will pick the plate, move it to the base and then close the isolator. We have also on top of this anti chamber, nitrogen and vacuum lines so we can minimize the air contamination of the chamber during operation. We have seen that it doesn’t affect much our processes because by the time the samples are analyzed and you need to run second generation, it takes a few hours, and in that time, it has been regenerated to the specs.

Now the mobile robot is driving around the lab. It’s looking for our analytical instrument which is this Waters UPLC/MS, with the automation portal. It will read the QR code to position itself a bit better, and then it will take the plate, put it in the portal, and then trigger the analysis. After the analysis done, all that information will go to Katalyst, and then to Atinary. We will create a second generation, and then it will run autonomously in the lab. You will see now that it’s taking the plate for analysis, and then the robot simply goes back to the base for charging. This is the physical automation around the Unchained Labs family of tools, the mobile robot, and automation portal of Waters.

A. Ramirez Galilea 16:82

Now we need to look a bit more at the other side, the digital automation. As I mentioned this is crucial, because in order to minimize the bottlenecks we had to do something that was not done before, and we found this solution in the form of combining Katalyst and Atinary in a full integrated platform. This means that now we can generate the Unchained Labs robotic files that you see here on the library, the reaction plate, the analysis plates, and all the maps in a few minutes. And you don’t need any prior programming knowledge. You don’t need to know how to do anything in the Unchained Labs software. You don’t need to know any machine learning software to operate the Atinary platform. Katalyst is also very intuitive. Anyone with minimal training should be able to run this workflow, and in order to demonstrate it, I have here a real recording of what it takes to run an experiment from scratch using this new integration. We start in Katalyst and here we are adding some information of your experiment, your department, site, purpose, and project code, it’s a reaction. You can also save your experiment name. It is just like creating an ELN record. Then you go to your reaction scheme, where Katalyst has a very nice materials database. So, to create a coupling right reaction, you can select a bunch of ligands that appear in the database and then just drop them to the reaction. And then we will do the same thing for all the elements. We would add some solvents for this process. Then let’s add also some bases to the reaction. It’s just drag and dropping, simple. Then a catalyst, we can add Palladium Acetate to the mix. So, you can see that we have already started to create some variables here, to optimize all of these conditions. Then for the starting material, we are just going to select a couple of molecules, and same for the for the product. This is just for demonstration.

Now that we have our reaction scheme, we move to the other side, which is the experiment layout. And here is where you define your reaction plate. As I said, we use 24 by 4ml well plate. And the only thing that you need to add is your starting material or limiting reactant to the well, because all the rest of your compounds right will be calculated based on that. And now we select the parameters that we want to vary. So we want to vary all the bases as discrete variables in equivalents from one to three with a stride of 0.4 and this creates that factor in the DOE. Then we do the same with the catalytic agent where we want equivalents from 0.2 and this is uniform, then you can do all the ligands and they can be continuous or discrete. You can change the units. Typically, we use equivalents. And lastly, we have the reactant. You can also evaluate, for my co-reactant, I want to study different equivalents from one to two, and we typically also use this grid with a stride of 0.25 mean equivalents. Then last but not least. It’s solvents. So, we want to also evaluate those solvents. These are in milliliters and one milliliter. So now we have here all of our factor. Save it, and here is where the integration starts to kick in, because we are going to send this factor list, with all the material classes, with all the information that it’s in Katalyst to Atinary by clicking on the widget.

A. Ramirez Galilea 22:90

Now you can see that the Atinary platform automatically opens it up, and creates a workstation with all the parameters that we want to vary that we specified. We need to add now the parameter that we want to maximize, in this case, the peak area. And then we create a template, which is very simple, it guides you step by step, so you will see how quick that is. You know everything that you want to vary. Then we want to maximize the peak area. Next, we can select the optimizer. You want our batch size. It’s 24 then we click Next, and then we run the campaign.

You can put whatever name you want. You can also preload data. So that’s a nice feature from Atinary, that if you have past data, you can use that data from scratch. And here you can see that we have created an optimization reaction with all the conditions that came from the Katalyst integration. And I think it’s been five minutes, something like that, where we have been able to have all these measurements. So these are all the points suggested by the Atinary algorithm. And now we go back to Katalyst, click optimization in Atinary. Now what will happen is that Katalyst will read those conditions generated by Atinary, and then you can simply add reactions to the reaction plate. And this is beautiful. It’s amazing that the well is automatically filled with the composition suggested by Atinary. Before we had to do this manually, and it took a lot of time. And now the final part of the integration is where we are generating the Library Studio file for the Unchained Labs robots with just two clicks. This generates the file that the robot will use to start and it will have the same plate but in the Unchained Labs format ready to be run. So this means that anyone with no need of prior knowledge of programming, machine learning, or even familiarity with the Unchained Labs system, is able to run this AI guided workflow very, very simply. Now in a few seconds the file will open then we will move to the next section.

A. Ramirez Galilea 26:02

Here you have the Unchained Labs file with all the maps, all the recipes. This is what we will run in our Juniors. You can see how nice this integration works. Then we will have the analysis that here at the end of the workflow, and we will send those analytical data, all the peak areas, back to Atinary to predict a new generation and there you have it. This is really an amazing integration. And prior to this, we had to transform the data into different formats. We were always working with Excel files to do transformation. We had, we had the materials in Katalyst, but then to fill the wells with the data provided by Atinary we had to create an Excel file like this with a lot of manual calculations. Then we also needed to add those to the Unchained Labs. It was one simple set of conditions that needed to go to Katalyst, to Atinary and to the Unchained equipment, and all of them in different formats. We had to do a lot of manual calculations so it could take us four hours to have everything ready for a simple reaction like the coupling that you are seeing here, but now it’s just a matter of minutes.

Now the last part of the presentation is a case study showcasing that it really works and we are actively using it. The example is a very simple Buchwald reaction in the frame of a pipeline LCM project to replace the current reaction. And we didn’t screen here a lot of parameters because we had some background information. But even a few base equivalents—six precatalysts, and six solvents, that’s over 4000 combinations. So, if you want to do that in a manual way, where you have to screen everything that will take you quite some time. In our approach, this optimization was done in about a week, in comparison with the one-two months of back and forth that will take you know, with the manual external approach. We already see here some big wins in time.

In looking at the results, this is from Katalyst data analysis, here you can see on the top the heat maps which is product peak area in different colors with the color and scale on the right. Red, bit of product, orange more, yellow more and green, lots of product. But on the bottom, we have the pie charts, which is the peak area percentage composition for starting materials and products. Obviously, more green is more product. And here you can see that we have first generation, second generation and third generation.

So you can see that the AI algorithm of Atinary and now our self-driving approach really works. From the very first plate where we have a random design, then to the second one and to the third, we start to see more and more green (product).

You can see that the AI algorithm of Atinary and now our self-driving approach really works. From the very first plate where we have a random design, then to the second one and to the third, we start to see more and more green (product).

A. Ramirez Galilea 27:79

We also have some very powerful visualization tools. This one we liked a lot, it’s the parallel coordinates graph and each of the lines represent an experiment that we have done. One of the conditions using a base, every catalyst, solvent base equivalence, precatalyst equivalence, and then the results in peak area. And the color represents how much peak area, in this case, you have for your product. You can also filter them, and this very quickly helps us to tell we have potassium acetate and carbonate, and BINAP is the only catalyst that works for this reaction. We can use t-Amyl alcohol or toluene. We have some freedom in the base equivalents, but we need relatively high catalyst loading. All of this information we can get from the data analytics embedded into the Atinary platform. Another visualization that we like a lot is this map which is a multi-dimensional map converted into a 2D representation of all the possible combinations that are in the parameter space. Being real points, the ones that have been evaluated in the campaign. This is very important for us for a scale up because what we don’t want is isolated maximums, for instance, like this point in the corner or this point. We want to focus in areas where you have several points with good performance, because when you have to do a scale up and transfer, you don’t want to take a risk of being in this area that you move a bit your conditions, and then the yield drops drastically. So this is also very powerful tool for us.

Lastly, we tend to do a scale-up of the best predictive reaction conditions by combining ReactAll and Katalyst to confirm our findings and derive kinetic models. This only adds one or two days to the workflow so we can get all the data from the five reactors and build the kinetics to provide a really good package of data in a bit more than a week.

Summing up everything that we discussed today, with the new emerging technologies, I think high throughput can be much more than just exploring a few conditions with a factorial DOE and hoping for a positive resulting hit. However, developing this data with self driving labs requires a lot of hardware and software integrations that sometimes are not easy to develop. To reach this state we have combined several commercial tools, Unchained, Katalyst and Atinary, with some internally developed solutions to fill the technology gap, like the mobile robot. The integration presented here is a great example of how the smart combination of these tools can really accelerate the workflow. And you can see, especially to bring down the knowledge barrier. You don’t need to be an expert to run these tools. And while this is not an easy path and has taken us two years to reach our current state, we have now the tools in-house, and we can easily carry out this AI guided, closed loop optimization workflows. Being much closer to truly unsupervised autonomous process optimization workflow.

Other Resources

Virtual Symposium

Digitalization of R&D

Software has been reshaping R&D for decades. Digitalization of lab workflows is helping scientists work faster, smarter, and more efficiently. Join our Virtual Symposium on Digitalization of R&D, where we bring together leading researchers and industry experts to explore practical applications of digital tools in real-world R&D.

Presentation

Application of NMR Predictors in Polymer Research and Complex Mixture Study

Y. He, Dupont

Hear how DuPont spectroscopists use ¹H and ¹³C NMR Predictors to brainstorm ideas, determine polymer structures with incomplete data, and accelerate mixture characterization.

Presentation

Data, So Much Data…Improving Spectroscopic Data Analysis & Management at FMC

C. Amezcua, FMC

FMC decided to digitalize and take control of their data to prevent loss of knowledge, ease data sharing and collaboration, and improve decision traceability. Hear about their journey towards data-driven innovation.

Presentation

Streamlining Quantitative NMR: External Standards and Digital Workflows

Y. Zhou, Genentech

Genentech digitalized their external standard quantitative NMR workflow from Excel spreadsheets. Analysts can now concentrate on projects requiring their expertise, they have improved data integrity, and are now automation ready. Hear how they did it!

Presentation

Digitalizing Scientific Data Management at Pfizer PharmSci

B. Du, Pfizer

Pfizer’s 5 year data management journey has given them a global knowledgebase that manages chemical and analytical data across various workflows. Scientists are accessing data and saving time. Learn how ACD/Labs technologies are enabling their digital evolution strategy.

Presentation

Smarter Achiral HPLC Method Development: Leveraging In-Silico Tools for Efficiency

S. Bondili, Pharmaron

Discover how Pharmaron accelerates achiral HPLC method development using in-silico tools and digital optimization strategies. Learn how they leverage predictive modeling, automation, and smarter data use to improve efficiency, reduce experimentation, and drive faster, more reliable analytical results.

Presentation

Spectroscopic Workflow Automation & Data Management…Going Paperless

Á. Szigetvári, Gedeon Richter

Discover how Gedeon Richter is automating spectroscopic workflows and managing analytical data end-to-end—going paperless, improving traceability, and accelerating method development in R&D.

Presentation

In Silico Mapped Separation Spaces for Green Method Development

T. Handlovic, Amgen

Discover how Amgen uses in-silico mapping of separation spaces to develop greener chromatographic methods—reducing solvent use, boosting resolution, and accelerating method development.

Presentation

Enabling Self-Driving Labs for Synthetic Molecule Process Development

Presented By

Adrian Ramirez Galilea

Transcript

Other Resources

Digitalization of R&D

Application of NMR Predictors in Polymer Research and Complex Mixture Study

Data, So Much Data…Improving Spectroscopic Data Analysis & Management at FMC

Streamlining Quantitative NMR: External Standards and Digital Workflows

Digitalizing Scientific Data Management at Pfizer PharmSci

Smarter Achiral HPLC Method Development: Leveraging In-Silico Tools for Efficiency

Spectroscopic Workflow Automation & Data Management…Going Paperless

In Silico Mapped Separation Spaces for Green Method Development

Send me more info!