Is Excel Holding Back your Research?

Excel plays a critical role in research and development. Everyone from undergraduate students to elite pharmaceutical chemists relies on the program to process, analyze, and store their data. Unfortunately, spreadsheets are not designed to work with chemical information or support collaborative workflows. What’s the alternative?

Season 3 of the Analytical Wavelength brings unique perspectives from our team of experts who are working at the frontier of chemistry. In this episode, we talk to Application Scientist Alex Waked about his peer-reviewed paper “Consolidating and Managing Data for Drug Development within a Pharmaceutical Laboratory” that discusses data management issues in pharmaceutical development.

Listen, and learn how chemical development researchers can overcome Excel’s data management issues.

Read the full transcript

00:00 Alex Waked

Change is to then consolidate all the information from these different sources. This then heavily relies on the manual transcription. So while a lot of these software packages, and they do have a lot of the tools that are required and necessary for drug development, there are some vital tools and pieces that are missing from them.

00:32 Jesse Harris

Welcome back to the Analytical Wavelength podcast on chemistry and chemical data. Brought to you by ACD/Labs. I am your returning co-host, Jesse.

00:41 Sarah Srokosz

And I’m your new co-host, Sarah, and we’re excited to bring you a new season of our podcast. I’m really looking forward to this season. We have some really interesting discussions lined up.

00:51 Jesse Harris

And since it’s our first season since the return of in-person conferences and events, we’re planning on keeping you updated on those as well. Additionally, for season three of the Analytical Wavelength, we’re going to be focused on sharing insights from our colleagues here at ACD/Labs.

01:08 Sarah Srokosz

That’s right. Working every day alongside top scientists in the pharmaceutical industry to help them work with their chemical data, our team has a unique perspective. Instead of keeping this to ourselves. We want to share their experiences and expertise with you.

01:22 Jesse Harris

To start things off, we are bringing you a conversation with Alex Waked. He is an application scientist and he was one of the coauthors on a paper about data management in pharmaceutical development that was published last year in OPR&D.

01:36 Sarah Srokosz

This paper was quite well received and seems to have struck a chord with those in the pharmaceutical process chemistry space, as it really articulates the challenges they face with analytical data management. But I really believe this is a topic for any scientists out there who feel like they rely too heavily on Excel. Let’s learn more.

01:54 Jesse Harris

Hi, Alex. How are you doing today?

01:56 Alex Waked

I’m doing well. How about you, Jesse?

01:58 Jesse Harris

Been doing great. Doing great. So, yeah, we wanted to talk to you today about the paper that you wrote a little while ago. Can you start off by just introducing yourself and telling us a little bit about what you do here at ACD/Labs and as well as your favorite chemical?

02:12 Alex Waked

Yeah, of course. So I’m Alex Waked. I’m an application scientist at ACD/Labs. My job entails a few things. First, I learn about and understand our customers current analytical workflows that they have in place, and where their gaps currently are. And then as a second step, I assist, alongside a team of us at ACD/Labs, to implement one or more of our products into their current workflows, which should address the customer’s specific needs and requirements; to address essentially their gaps in describing their workflows.

My favorite chemical is actually not necessarily a chemical per se; it’s more of a family of compounds called what we label as azophosphonium cations. So those are just a family of cations that I worked with during my PhD that are fairly straightforward to synthetically prepare. And what I really enjoy about them are their colors are really quite vibrant, which is in contrast to a lot of other compounds I work with.

03:17 Jesse Harris

That’s awesome. Very cool.

03:19 Sarah Srokosz

As Jesse mentioned, about a year ago, you and our colleagues, Arvin Moser and Joe DiMartino, published an article called Consolidating and Managing Data for Drug Development Within a Pharmaceutical Laboratory, in OPRD. Can you provide a high level summary of what this article was about?

03:37 Alex Waked

Yeah, of course. The article is essentially perspective, describing some of the main aspects of the drug development cycle and how the ACD/Labs commercial technology, called Luminata, is used to address, you know, the particular data analysis and data storage needs related to these aspects. So, you know, and in addition to describing Luminata itself, we also present comparisons between Luminata and other commercial programs that are currently used for drug development, such as Microsoft Excel being a big one.

04:13 Jesse Harris

Yeah, and that’s kind of where I wanted to go next. Excel plays a really big role in pharmaceutical development today. Can you tell us a little bit about that? And just maybe for people who aren’t in pharmaceutical development, give them a sense of how important it is in the workflows of a lot of organizations.

04:30 Alex Waked

Yeah of course, so Excel documents and spreadsheets essentially become the ultimate repository of information that’s extracted from multiple data sources that a company may use. So, for example, companies may use ELNs, LIMS systems, systems where they store, you know, different analytical data, chromatographic data, reaction data. And in order to, and since these are typically incompatible with each other, a lot of the information from these various systems have to get exported and consolidated into these various Excel spreadsheets.

And these spreadsheets and these files are used in both, you know, early stage development as well as late stage development in the drug development cycle. You know, for the purposes of tracking impurity profiles across different process names, across different stages, across different projects, right? Or even storing data for process control studies, forced degradation and stability studies, you know, batch genealogy, where they’re looking at the history of the batches and lots that are prepared for their different intermediates and APIs and impurities.

So, when companies are generating reports for either internal purposes, they’re having internal meetings, or if they’re for regulatory filings, they typically have to generate these tables from these Excel spreadsheets. These become really important in terms of storing a lot of the data that companies are producing.

06:12 Sarah Srokosz

Wow. Yeah. That sounds like a lot of pressure almost to put on one Excel spreadsheet. Can you elaborate on any other issues with using Excel? Would you say that Excel is, or software in general, is slowing down pharmaceutical development?

06:29 Alex Waked

Yeah. So, you know, as you just mentioned, one Excel spreadsheet. So typically it’s the case where there are multiple Excel spreadsheets that store different pieces of data and different pieces of information related to the different stages of drug development. So there can be multiple Excel spreadsheets. They’re going to be stored in separate locations. So you’ll have each individual scientists, local computers, right? You have different shared networks. And these alone pose, can pose quite a few problems. So first of all, if you have data stored in different networks, these different networks may have different permissions, according to which users may have access to some of them. Many times as well, these different spreadsheets contain common data between them. So even though they’re different files themselves, they may have some common data that’s stored within each of the spreadsheets, so you can imagine how tedious and how difficult it can become to ensure that when changes can be made in one spreadsheet, the same changes must also be done and applied to the other relevant ones.

So an example of a change like this, is that throughout the drug development process, the names of impurities will typically change. You can start off when you collect the initial sets of analytical data from your processes, where the identities of the impurities are completely unknown and you label them with just an rrt value or relative retention time value.

And then later on in the process is when these particular impurities are deemed to be, you know, important in terms of its toxicological effects. And where those compounds need to be then elucidated. So after the comments are elucidated, you’re given these different names, you know, starting off giving a compound an unknown name versus later on in the development where these compounds do have names. When you make that change to an Excel spreadsheet, you have to remember, you know, where else and what other spreadsheets this name can be found in, so that the spreadsheets and reports are consistent between one another.

And in addition to this as well, when any modifications are made to this data, they always have to be tracked using an audit trail. And this is really important for companies to keep this, you know, to store all this information. What has been changed, who has changed it, and when has it been changed? And this is very challenging to do in Excel. Especially when you consider how many different people and chemists have access to the same files and how many different versions of files that may exist.

Then finally, I think the last major issue with Excel is that chemists will frequently have to generate chromatogram overlays between different stages of a process. So this is done to visually compare data between different stages or between different process, right? Or even between different lots or batches of material. These are done for, you know, different reports that they may have to generate. And since Excel does not store this kind of live analytical data of the chromatograms, this type of comparative analysis is just really challenging to perform in Excel. Overall, the answer the question here is Excel slowing down the pharmaceutical development? I believe it is.

10:09 Jesse Harris

Yeah. So that really brings me to my next question very well, which is about live analytical data, because it is something that’s mentioned a number of times here and it’s a theme that we talk about a lot actually at ACD/Labs and the importance of live, analytical, clear data. Can you explain what is meant by that term and then how it compares to dead data and why the difference is important?

10:29 Alex Waked

Yeah. So when we say live data, this typically refers to the connection between the reported analytical results that you may see, for example, in a table when you’re looking at peak errors, and the raw data collected by the instrument itself. So in other words, the instrument data, the raw data can be reprocessed, right? So that you can do re-picking of the peaks; you can do reintegration of some of the peaks; you can rename some of the peaks in your chromatographic data. And when you apply these changes, these should also be immediately reflected in the reported results. So in the tables. So when there is that connection between the change that you make in the analytical data processing versus the results in the table itself, that is what we refer to as live data.

So if we look at examples of dead data, a couple examples are PDF files or Excel spreadsheets. That data are basically less desirable to have because it becomes difficult to manipulate and interact with it without having to go back to the programs where the data were originally collected. For example, if you have an Excel spreadsheet with peak areas of a particular chromatogram. If you go back into the software and do re-peak picking or rename some of the peaks, those changes are not reflected in the Excel spreadsheet. So that’s an example of one of the changes I described before where, you know, you make a change in the original data, you have to then go to all the Excel spreadsheets that contain that data and manually apply these changes.

Another example is PDF reports that contain chromatograms. They contain screenshots of chromatogram. You can’t easily blow up the baseline, for example, or zoom in to particular regions for further inspection of the chromatographic data itself. To do these, you have to go back to the original programs of the original raw data sets, where you would be able to apply these. So this is why when we’re referring to live data or dead data, applied data is definitely preferred because you definitely have a lot more flexibility and freedom in terms of interacting with it and manipulating it.

12:49 Sarah Srokosz

So you touched on Excel and PDFs there, but there are also many other data management tools such as ELNs, CDSs and LIMS, for example. Do any of these include the necessary data management functionality for pharmaceutical development?

13:07 Alex Waked

That’s a great question. So these other types of software packages, they do indeed contain some of the data management tools that are necessary for drug development, but they also each have different vital pieces that are missing. And in a lot of these cases, these missing pieces cannot be addressed, at least not without significant customization or manual addition of some tabular data.

So, for example, you consider some of the following important tools for drug development. You know, there’s chemical structure awareness. So being able to, for example, search by actual chemical structure or connect chemical structures to particular compound names.

A second point or second tool is, you know, storing and processing the live vendor data across multiple stages and processes.

Point three: Comparing and overlaying live chromatographic data. So as I described before and you know, a fourth point, I’ll just mention just for now, is the ability to create dynamic tables, to compare, you know, the peak areas across stages and processes. So looking at the actual results. So these are just four of some of the tools that are quite important.

And in the cases of, you know, LIMS, ELNs, and CDSs, a lot of these tools are not able to be used in these software packages. So, for example, LIMS does not contain any of these tools. ELNs don’t contain these tools except for the chemical structure awareness. CDSs don’t contain these tools except for being able to compare and overlay chromatograms.

And lastly, you know, the challenge is to then consolidate all the information from these different sources of data into a single table or report, which this then heavily relies on the manual transcription into Excel spreadsheets. And so while a lot of these software packages and they do have a lot of the tools that are required and necessary for drug development, there are some vital tools and pieces that are missing from them.

15:23 Jesse Harris

Yeah. And that then transitions to talking about Luminata, which is a piece of software that ACD/Labs has developed to try to bridge some of these gaps and bring together this data. So can you explain a little bit as to how Luminata fits into this conversation and ties together some of these concerns?

15:41 Alex Waked

Yeah, Luminata is ultimately a commercial technology that is developed by ACD/Labs, that is used as a CMC or chemistry manufacturing and controls decision support tool. So it’s basically an alternative to these software packages that we’re just describing now, Excel, LIMS, etc. And Luminata, there are a few points I’ll just touch upon here that really emphasize, you know, how or emphasizes, you know, the advantages that it may offer over some other ones and the other packages.

So whereas previously we were describing Excel is used as the ultimate repository for, you know, the data from different sources; in Luminata, the scientists can store and utilize the analytical data from different vendors, from different sources, and they can all store it in one place. And it can also be accessed by however many scientists are given access to Luminata in a company.

So, you know, the live analytical data from multiple vendors can be stored and interacted with as well within Luminata. So, referring back to what we mentioned before, these data are stored as live data in Luminata where you could apply, you know, reprocessing of a chromatogram, right. And where you can edit peak integration, you can edit peak names and chemical structures, and applying any changes within Luminata will automatically update basically every table within Luminata, which contains that information already.

So this already just bypasses almost all of the manual transcription that the scientists would have to do using Excel. And this in turn reduces the chances of human error, obviously, and it also saves the scientists hours of time of sifting through different Excel spreadsheets and different pieces of original data.

And once the data is in Luminata as well, users are able to perform different dynamic searches for data analysis, from which they can easily generate tables and chromatogram overlays for their reporting purposes.

So these points that I’ve just mentioned here, I think, demonstrate that Luminata does contain many of the major tools that render the drug development cycle more efficient, actually, for companies. What also speaks, I think, to the effectiveness and validity of Luminata is that our article was published, you know, in OPRD, and it had over a thousand views and just the first, you know, first month after publication.

So there are other scientists and other companies who do see the use and effectiveness of Luminata.

18:37 Sarah Srokosz

Yeah, certainly it seems like a topic that a lot of people and companies would be interested in, but it might also seem like kind of a big first step. So do you have any suggestions for listeners who are maybe looking to just start to reduce their reliance on Excel?

18:56 Alex Waked

Yeah, so that’s a great point with, you know, any new software or new programs that anyone really uses, there’s always a bit of a learning curve, right? So the first suggestion to reduce your reliance on using Excel is to just get informed, first of all, about the alternatives to Excel that companies have been moving towards, such as some of the packages that we mentioned earlier, ELNs, CDSs, LIMS, Luminata.

These are all alternatives that, again, if you read into and look into, you can really see how they differ from Excel. So one thing I will mention here too, is the article that we wrote, I think does a great job at outlining the differences between all the systems in a fairly straightforward table that we actually have in the paper.

Second, what I would suggest is do some homework and find some companies who have been implementing these other systems and other packages, and learn about their firsthand experiences using both Excel and these other packages. Hearing about this directly from the scientists who are say in the trenches in this drug development research really I think puts into perspective how much benefit these systems can give you over Excel. And give you a clear idea of what aspect of their daily activities really benefit from using these alternatives, such as Luminata.

20:29 Jesse Harris

Yeah, I think that that’s something that I see a lot. You know, it’s easy to hear people say, “Oh, it’s more efficient, it’s easier” and everything. But when you actually talk to the scientist, it’s like “No, this is a game changer in terms of my day-to-day work,” that it can it can make a really big difference. But with that, then I just want to thank you for coming onto the podcast and being our guest for the first episode of the new season Alex, thank you so much.

20:51 Alex Waked

Yeah, it’s a pleasure to be on. Thanks Jesse and Sarah.

20:56 Sarah Srokosz

That was a great conversation to start off our new season. Thank you very much to Alex for hanging out with us.

21:02 Jesse Harris

If you want to check out the paper we discussed or learn more about Luminata, there will be links in the show notes.

21:08 Sarah Srokosz

That’s all for this episode. Tune in next time to get an update on what’s happening in the world of NMR. Thanks for joining us.

The Analytical Wavelength is brought to you by ACD/Labs. We create software to help scientists make the most of their analytical data by predicting molecular properties and by organizing and analyzing their experimental results. To learn more, please visit us at www.acdlabs.com.

Enjoying the show?

Suscribe to the podcast using your favourite service.

Season 3, Episode 1