November 4, 2021
Excel® is a bottleneck for pharmaceutical research and development. The application struggles with issues such as inability to read live analytical data, versioning problems, no audit trails, and lack of chemical awareness. While scientists do their best to validate software and hardware, sensitive information often resides in Excel spreadsheets outside of the controls of GxP and CFR21 Part 11 guidelines.
Excel’s problems can’t be solved by revising an SOP or integrating an extra plug-in; the software simply isn’t designed to manage the data of pharmaceutical research projects. So why haven’t researchers moved away to find alternative data management solutions that don’t rely on Excel? Is it because they’re happy to continue with the status quo rather than implement new systems that require change—which is often avoided when possible? Or because they’re unaware of software that will handle the data they’re generating in a chemically intelligent manner?
Our team of chemists recently reported a detailed analysis of Excel’s shortcomings during pharmaceutical development. The article, “Consolidating and Managing Data for Drug Development within a Pharmaceutical Laboratory,” was published in Organic Process Research and Development (OPRD). The paper explores Excel’s core weaknesses in pharmaceutical development, comparing its performance to CMC decision support software (ACD/Labs Luminata®), as well as ELNs, CDSs, and LIMSs.
This publication offers an opportunity to start a conversation about the role spreadsheets play in chemical and pharmaceutical research and development. Here are seven of Excel’s most pressing shortcomings that were identified in the paper.
1. Lack of Chemical Structure Awareness
Most data in pharmaceutical development is meant to answer one question: what chemicals are in my sample? Almost every measurement is being taken to answer that question either directly or indirectly. Shouldn’t our data management systems be able to understand chemical structures?
Excel is built around numbers and images. Almost every feature calculates, and stores numbers or manages images. Excel does not “understand” chemical structures. There are some chemical tools in the application, but they are quite limited. Scientists have developed workarounds to get chemistry into Excel, including using chemical notation or pictures of chemical structures with metadata. These solutions are often time-consuming to maintain and prone to error. Seemingly trivial tasks, such as searching for a chemical by structure, are nearly impossible.
Chemically aware software allows scientists to:
- Create relationships between chemical structures or between chemical structures and data within a project
- Perform searches for identical or similar chemical structures across projects
- Adjust a chemical structure in multiple locations simultaneously
These are not just ease-of-use features, they improve accuracy, simplify decision-making, and save time.
2. Challenges Consolidating Data from Multiple Sources
Chemists regularly use MS, LC, GC, UV, NMR, IR, TGA, DMA, and PXRD data, often from instruments manufactured by different vendors spread across multiple sites. Teams must have access to all this information with as little friction as possible to leverage all their data.
Excel often operates as the common ground for each of these data streams. Unfortunately, Excel doesn’t understand analytical data, so data must first be converted into a CSV file before it can be processed. This leads to an endless cycle of copy-pasting and converting file formats back and forth, creating frustration and increasing the risk of misplaced files or transcription errors.
3. Inability to Read Live Analytical Data
There are two types of analytical data: live data and dead data. Live data means an analytical data file that (1) can be reprocessed, reviewed, and interrogated, and (2) any reprocessing is reflected in the results. Scientists have many options of how they interact with live data.
Once analytical data is exported into a spreadsheet, CSV file, or PDF, the information becomes “dead.” The data is abstracted and flattened into a series of numbers or images. Dead data is more manageable for general-purpose programs to work with, but the richness is lost in the process of flattening the data.
Let’s consider a situation to understand the dead-versus-live data division. A development team identifies an impurity peak in a test batch using HPLC. They want to revisit previous chromatograms to determine if the impurity is present. If chromatographic data is stored as a peak table, this “new” peak might not have been captured, meaning team members would need to track down the original chromatograms, reprocess them, and then consolidate them before they can be compared. When working with a live data file, scientists can directly access the chromatogram and reintegrate peaks from the same interface.
4. Limited Visualization
Humans are most effective at processing data when it is displayed visually. Scientists often overlay chromatograms and spectra to compare their appearance and to spot trends or outliers. This is particularly essential for finding abnormalities, as small peaks may be disregarded as baseline depending on system preferences and performance.
While Excel is a versatile tool for creating graphs and charts, your data must be flattened before it is useable. Information is transformed into either a picture of a chromatogram or a data table. These conversions strip your data of any nuance or depth. If you find any outlier or abnormality, you will likely still need to recheck the original version.
Visualization tools that work with live analytical data include more detail and do not require you to constantly switch programs simply to visualize your results.
5. Cloning and Templating is Difficult, Time-Consuming, and Error-Prone
Science involves a substantial amount of repetitive work. This includes replicating experiments or experiments with slight variations. In pharmaceutical development, this may mean repeating single trials or several related sets of experiments. Cloning and templating tools within a data management system allow researchers to quickly build records based on previous experiments while allowing for the flexibility to adjust parameters as needed.
Duplicating a project in Excel is a time-intensive process. Transferring or copying spreadsheets can lead to broken links, incomplete data, transcription errors, or synchronization issues. Scientists have little control over the settings of these duplicate files. Considerable time is devoted to troubleshooting old Excel files before they are fully operational. And when you have transferred the information, how will users be confident they’re using the most up-to-date version?
6. Collaboration and the Versioning Problem
One of Excel’s most significant weaknesses relates to collaboration. Passing spreadsheets back-and-forth, either within a team or between groups, generates a mountain of files. This leads to a “versioning” problem, where people lose track of which file version is accurate.
While it is possible to create shared spreadsheets, these are challenging to maintain and can lead to similar versioning issues. Project leaders must spend their time micromanaging file permissions or rechecking spreadsheets to ensure accuracy. Despite this, data is still lost due to mistakes in file management and sharing.
It should be emphasized that these problems are not mere inconveniences—critical results go missing, incorrect data is submitted to regulators, or costly experiments need to be repeated. Many companies that ACD/Labs works with have shared stories where a small mistake in spreadsheet management has led to unnecessary expense, frustration, and embarrassment.
While it is impossible to design a data management system that is immune to human error, Excel is not suited for the highly collaborative workflow of a modern pharmaceutical company.
7. Audit Trails
While Excel acts as the central repository for data, it does not generate any of that data on its own. Results originate from instruments or sensors and are then processed in a specialized piece of software. That data is often stored in an ELN before it is then pushed to Excel.
For regulatory purposes, you must be able to track the chain of custody of that data. You must prove that the original instrument was compliant with any necessary regulation and that the data was not improperly altered at any point. Given the number of team members, variety of instruments, and volume of data, maintaining this audit trail for each entry is challenging.
Excel does allow you to create links and references to external files, but this system can be disrupted. Moving, renaming, or deleting files—even months or years in the future—may cause complications. Even if no problems arise, maintaining audit trails and data integrity requires considerable energy and attention.
ELNs, CDSs, and LIMS Systems
Pharmaceutical development companies do not rely solely on Excel and spreadsheets for their data management these days. A combination of electronic laboratory notebooks (ELNs), chromatography data systems (CDSs), laboratory information management systems (LIMSs) and other software and informatics systems, in addition to Excel, are necessary to manage the variety and volume of data generated.
Given their variety, addressing these data management systems is outside the scope of this article. Each method of information management has its strengths and weaknesses. Refer to the OPRD paper for additional analysis on these systems and how they compare to Excel and Luminata.
Within the context of this discussion, it is worth remembering that information systems such as ELNs, CDSs, and LIMSs often rely on Excel. Many pharmaceutical development programs have gaps in their data management strategy, such as between disconnected data silos or difficult-to-read instrument data formats. Excel is used to patch these holes by exporting data into a spreadsheet in one system to import into another. While this may seem like a convenient approach, you reintroduce all the problems outlined above.
Excel is the “data manager of last resort.” Data management is only as strong as its weakest link, so it is not enough to have an ELN, CDS, and LIMS. Sound science demands data management strategies that do not rely on Excel.
Better Data for a Better Future
One of Excel’s strengths is its flexibility. With enough effort, plug-ins, and workarounds, it seems like a spreadsheet can do almost anything. The problem is we are asking too much from Excel—it is being stretched beyond its limits. It is a great application but is not designed to manage chemical information, much less to handle experimental and analytical data for an entire research project.
Think of all the time spent tracking down spreadsheets, verifying file versions, and cleaning data. Around the world, the best and the brightest in the pharmaceutical industry are spending thousands of hours per year being data janitors. That is a waste of money and talent.
Better data management can lead to more productive scientists and better pharmaceutical products. That is why ACD/Labs developed Luminata, a chemistry manufacturing and control (CMC) decision support software that acts as an Excel alternative, bringing all your process and analytical data together in one interface. Scientists at a major biopharmaceutical company found that Luminata can drastically reduce their reliance on Excel, accelerate regulatory submissions, and contribute to a culture of data sharing.