What do a racecar and an LC/MS system have in common? They’re both sophisticated machines whose performance must be measured by applicable standards.

Scientists have long recognized the importance of standards, but with improvements in technology, the way we set them has changed. How can we use standardization to better communicate cutting-edge science? How can we improve our efficiency as chromatographers through data sharing?

Join us in Episode 3 to hear the racecar analogy, learn about method standardization, and catch a mention or two of vampire bats.

Read the full transcript

Pankaj Aggarwal  00:02

The drug development timeline that looks like 15 years today, or 5 years, might be reduced further. As we know, for COVID-19, the vaccine was developed in one year. But with these aggressive timelines, we need to make our processes more efficient. That would be the biggest gain.

[opening music]

Charis Lam  00:31

Jesse, what’s your first thought when you hear method standardization?

Jesse Harris  00:34

I don’t know Charis. It seems like an old topic to me. Reminds me of something from undergraduate chemistry.

Charis Lam  00:40

Yeah, the idea of standardization itself is old. But chromatographers have new ideas about how to standardize and about the impact it’ll have on their work.

Jesse Harris  00:48


Charis Lam  00:50

Some chromatographers are pushing for standardization, because they think it’ll improve the quality and speed of their scientific research.

Jesse Harris  00:57

Alright, let’s start with an expert on the subject. Ben Neely works at the National Institute for Standards and Technology. And he talked to us about standardization and system suitability in mass spectrometry and LC/MS.

Charis Lam  01:10

As our podcast guest for this interview, we have Ben Neely. Ben is a research chemist at the National Institute of Standards and Technology in Charleston, South Carolina. He did his PhD at the Medical University of South Carolina, and his research is on proteomics. He has many publications, including a recent one on the serum proteome of vampire bats, which is quite a topic.

Welcome, Ben.  So let’s start perhaps with a icebreaker question. What is your favorite chemical?

Ben Neely  01:45

What is my favorite chemical?

Charis Lam  01:47

Proteins count too.

Ben Neely  01:49

Oh, proteins count too? Yeah, I mean, I can’t say I have a favorite chemical. Lately, one of my favorites has been Complement C3. So in other words, the innate immune response. Because it’s this weird protein that we have a lot of floating around, and it essentially breaks apart and does all this crazy stuff. And we, I don’t think, mostly appreciate that in our analytical techniques.

Jesse Harris  02:16

Nice. It’s a very useful protein to have around nowadays, I’m sure.

Ben Neely  02:22

Yeah, they’re weird. And there’s lots of complements. Yeah, and again, I’m not much of a biologist. So when I learned about these, you know, all these little machines that we have, and the things they do, it always kind of blows my mind.

Jesse Harris  02:34

Yeah, they are crazy. You work at NIST, as Charis was saying. Can you briefly explain what this institution is, for those who don’t know what it is? And then also maybe explain what your role is there?

Ben Neely  02:47

Yeah, that’s a great question. NIST has a really interesting history, where I think we actually were mandated in the Constitution as a Bureau of Weights and Measures. Very broadly, NIST is within the United States Department of Commerce.  And we are tasked with accelerating commerce through measurement science.

On a very basic level, you can think, let’s say, GPS: for all of our GPS to work, the clocks have to be in sync. So NIST filled that role by providing the atomic clock, which allows everyone to agree on what a second is and what time it is. But you can extrapolate that across lots of things, not just like buying bananas, we have to agree what a pound is, or a kilogram, but even with very specific weights and measures. You know, what’s an atomic weight?

But NIST does all these other things. So it’s the National Institute of Standards and Technology. And the last part is kind of where I like to think I belong. So NIST has a small laboratory in Charleston, South Carolina, where I’m based, and my work specifically is kind of enabling proteomic research in non-model organisms. That’s kind of one slice of that. So you mentioned the vampire bat. It’s not that it’s special that we can do proteomics in vampire bats. But it’s just that no one necessarily has done that. And I like to think that by doing that once, we encourage other people to do that research in these other organisms that are incredibly valuable. That’s our hot topic right now. So we can use these tools to do that.

But we also work on, you know, as the name implies, standards, so standardization, how do I collect data, and communicate that data with quality across laboratories, but also just in these other systems? You know, like, if I do a vampire bat, how do you know that data is good? You’re used to seeing humans. Well, how can I tell you that my vampire bat data is also good?

Charis Lam  04:52

How did you get interested in standardization and how does it apply to your everyday work?

Ben Neely  04:58

I think I’ve always been. I mean, I came up being an analytical chemist. And I’ve always enjoyed kind of enabling research with other people. So I do like the applied aspect, but I think part of it is building the foundation for people to do work.  And standardization, or having some way to benchmark yourself, is another way of saying … it’s a way for you to learn how to do something, right? So I can teach you how to do, I mean, let’s take proteomics, but how do you know that you’re doing it well? And the way that you do that is by having something to benchmark yourself against.

So from analytical chemistry, you would have a material that you could analyze and get a recovery of some analyte. Proteomics, you have many 1000s, if not millions of analytes. It’s a little harder to benchmark, but it’s the same idea. So I think I’ve always been in the frame of reference where I like to enable people and getting kind of into these really not-exciting topics. You know, we’re not going to talk about databases today. But I think, you know, with proteomics, I always, I started with getting into databases, how do searches work. But at the same time you’re doing the back-end work, you’re also doing the front-end, which is how do you know you’re digesting? How do you know if something is what it is?

Jesse Harris  06:16

So one thing that we talked about when we were preparing for this was a concept about system suitability as something that you saw as a complement to standardization. And I think that was a theme that was very interesting for people who might be involved in these topics. Can you expand on what that was and what you mean by system suitability?

Ben Neely  06:34

System suitability is this concept that I think has become more important as we do things like proteomics, where we’re analyzing many 10s of 1000s, 100s of 1000s of analytes at the same time. So with normal analytical chemistry, maybe you just need to see, can I measure this isotope or this compound. But when you get into something where you have a system where, let’s take LC/tandem mass spec, where we’re separating out these peptide mixtures, and then we’re allowing the instrument to, let’s say, pick those masses and fragment them, there’s a lot of processes that are going on at the same time that we need to be able to evaluate at once.

Historically, you know, 10 years ago, we were all injecting a digested protein and that would yield, you know, so many peptides. And we would say, did I get good coverage of this protein? But the problem is that with our new instrumentation, which is so fast and so accurate, you can get good coverage of a single protein, and your instrument is on fire, to borrow an expression of a friend of mine. And so the idea is, well, what can you put on your instrument that lets you gauge the system?

A really bad analogy I use is it’s like a Formula One car, that’s what we’re driving now. So the test can’t be “can you drive it around the block?” It needs to be “can you drive it around the Formula One race course?” It has all these different ways to test. And so running a complex mixture on a system is a way to test that. And what that means is we get a lot of performance metrics. So we don’t just get: did you see one protein? It’s did you see 1000s of proteins? Did you see 10s of 1000s of peptides? How did your peak widths look? How was your mass accuracy? And we can get all this information. How’s your chromatography? How’s your media? We get all this information from a single sample that we run and determine system suitability.

Charis Lam  08:24

Those are a lot of metrics that you talked about. What are the most important ones? And how do people tend to communicate all of these metrics to each other?

Ben Neely  08:33

Right. And that’s a really good question. About 10 years ago, we had people start to establish these metrics. We call them ID-based and ID-free. So ID-based are where I would communicate my metrics of ‘I have identified 40,000 peptides and 4000 proteins,’ and then you would say, ‘That number is plus or minus 20% of what I would expect.’

But when you use these ID-based metrics, you’re kind of missing these other things that are probably affecting your IDs. So I would also like to report my peak widths. Peak widths on certain systems are very important to how the instrument is working downstream. So with my instrument, if my peak widths get to about 30 seconds, I actually get drop-offs in performance in other ways. You can evaluate your mass isolation. Your mass accuracy is a huge one. So if I run my instrument one day, and my mass accuracy is much higher than I would have expected, that’s going to affect my results.

So to say there’s one very important metric is hard, but I think it kind of all comes together, and that’s where you see a lot of these newer tools are creating this holistic kind of report.  You know, if your instrument’s running well, it probably is just running well. All these things fit. I think the biggest question that a system suitability test helps you pinpoint is when it’s not working well. What is wrong, right? Is it your column’s gone off? Is it that your spray is instable? Is it that your mass accuracy is wrong? You can get all these information, these metrics that will then help you troubleshoot to kind of get it back up to optimal levels.

Jesse Harris  10:11

Now, how widely used are these concepts of system suitability? Is this something that is more focused on just the proteomics field? Or is this something that is also being adopted more broadly within the analytical chemistry space?

Ben Neely  10:25

Right, you know, I can’t speak to a lot of these more classic inorganic, organic analyses. A lot of times they’re using things like spikes or standards. You know, it’s that kind of recovery concept. I injected these 20 things; did I see these 20 things?

When you get into, you can call it untargeted, whatever you want to call it, but like lipidomics, small molecules, metabolomics, proteomics, where you’re measuring 10s of 1000s of things that you don’t necessarily have a reference for … I think that’s when this becomes important. And we’re starting to see it grow.  You know, especially with bottom-up proteomics, it seems to be more widely accepted, You know, people are running some digest, either a commercially available one or something that they’ve made in house, and they’re running it every day. And it allows them to check themselves.

I think the next step, and we’re hopefully starting to see this is that you start getting requirements for data deposition. So if I deposit data, and you deposit data, we also have a companion file that is of some sample. And that kind of creates this way for all of us to know. It’s like a secondary check. Is your system good? Not just is that data good of this experiment that I have no idea about? Like vampire bats; you don’t know about vampire bats. But if I also showed you my HeLa mix, then you can be like, ‘Well, I know how this should look; this is how it looks.’ Therefore, I can now evaluate this other data that I have no idea how it should look.

Charis Lam  11:52

A lot of what we’ve talked about so far is about standardization before you run the experiment. You’re really interested in the system suitability concept. What about after you collect the data from the experiment that you’re actually doing? Are there standards for sharing that kind of data between labs?

Ben Neely  12:10

I think there’s definitely best practices. And something that I’m very, I guess, happy about, in the field of proteomics specifically, is that we have been very open with sharing of raw data. So that, I think, if you see a paper nowadays that doesn’t have raw data available to be reprocessed, that for me raises big red flags. And I think as a community, we’ve gotten very good about publishing, but also being open to people reanalyzing.

But with that means that right now, there’s a very big push to not just make, you know, terabytes of data available, but have that annotated in a way that facilitates the downstream analysis. So we have, you know, the people at EBI and PRIDE are pushing the format that essentially describes your experiments. So that when you grab my data, you have this list of raw files, but you know: it was collected in this instrument of this tissue, these are the modifications. All that’s more programmatically available, because as we keep generating terabytes and terabytes of data, we want to be able to use them. And kind of this end goal is to be able to use them without having to call Ben Neely up.

You know, how do I just use this data and get the same results, but also add it to my future knowledge? So I don’t know about standards. But there are a lot of best practices with proteomics. And again, I think including the system suitability or the quality controls is important when you deposit that data.

Jesse Harris  13:46

And so in terms of that improvement, it’s good to hear that the proteomics community has this, you know, data-sharing mindset. Are there other ideas that you can think about? Where system suitability can be improved, data sharing can be improved, like, where would you like to see these things evolve in the coming years?

Ben Neely  14:09

I think right now, it seems like there’s good attention on it. For instance, in the special issue, the Journal of Proteomics Research Tools Issue, there were a handful of tools about how to evaluate your system suitability standard. And I think the more comfortable people get with not only running that sample, but then evaluating that sample, will make them confident in their own data as they go.

And I think for me, it’s just people being comfortable sharing between their groups. You know, if I am working with a group in Arkansas and a group in California, and we’re all let’s say, running the same samples, maybe a different way, or maybe we’re just doing it to save time, or maybe I ran the first set, they’re running the second set. By us running the same system suitability sample, it allows us to kind of communicate results between ourselves. And what I mean is I need to have confidence that the Arkansas and the California in this hypothetical example are running well. But also, if their results are much, much better than mine, this kind of lets me know, well, this is how their system looks on the sample. And therefore that’s the reason. Or even the alternative, like maybe their platform only runs at a certain level.

You know, we can’t dismiss data just because it’s different. And so yeah, I think in the future, I think it seems like we are getting more comfortable with that. Something that I think, again, the standardization enables is that at some point, we have to stop repeating experiments, right? How many times can we run HeLa? How many times can we run colorectal cancer? You know, maybe there’s this huge field of vampire bats to use our example. You know, we have to, at some point be able to use other data, and it’s really hard with proteomics, because we’re doing so many things to the sample, you know. We digest it a certain way; we extract it a certain way; we ran it a certain way. But at some point, if we can build a usable knowledge base, one that makes it so I don’t have to analyze 1000 samples every time, I think that’s the real dream, you know. And to get there is hard because especially with proteomics, we’re so bespoke. And, you know, we feel like we need to do the experiment. And there’s nothing wrong with that.  But you can’t discredit, you know, data from 10 years ago, that was done on different platforms, because if we always discredit the prior generation of platforms, we’re never going to move forward. Like they made discoveries, only seeing 50 proteins. It doesn’t mean it’s bad data. And so I think coming up with ways to integrate that data with newer data, as we continue moving forward, is really, for me, is a big kind of dream.

Charis Lam  17:03

That’s a great point. Um, is there anything else, any final thoughts on the subject of standardization or system suitability that you want to share?

Ben Neely  17:13

I’d be remiss if I didn’t mention all the colleagues at NIST that are working on this. You know, I really lean on my collaborators and colleagues within NIST for developing new materials, not just for proteomic analysis, but also for genomics, for small molecules. And, and for people … these are products that are being made for the community, like we make them, we sell them. But then there’s other people, other companies, they’re also making similar things. And I think, the more that we can reach for samples that aren’t necessarily pools that we just happen to have in our lab, I think the better that you’re future proofing those experiments. So I would encourage everyone to try to reach for a material that is available to the world. And not just because it’s in your group. And I think by doing that, you’re going to help your data 5 years from now, 10 years from now, be usable. So just encourage … consult NIST, but also consult other companies, vendors, wherever you are.

Jesse Harris  18:15

Right. Now, actually, one last question, I would love to hear your 30-second pitch as to why vampire bat blood is interesting to study. It’s been a little bit of a theme throughout this conversation. But I think that your research theory is actually really interesting. So I’d love to hear your elevator pitch on that.

Ben Neely  18:35

I’m gonna pitch beyond vampire bats. Mammals, in general, do a lot of cool things. There’s a line that for every phenotype out there, there is a model organism in nature that’s already there. And mammals, there’s 5400 species, we largely don’t know anything about them. And on a very basic level, we don’t understand what their blood looks like. I mean, aside from like, the big 5.

So take a bat. There are … 20% of mammal species, 1400, are bats. And we don’t understand their basic blood. So we talked about innate immunity. We don’t know … Do they have special proteins? Probably not. But do they have proteins that are at weird levels? More than likely, and the vampire bat was our first foray into that. Now we want to look more broadly. How does this affect innate immunity? How are they natural hosts? What in the world does a natural host mean, at a molecular level? Are they infected? Are they not? So that was longer than 30 seconds, but I think it’s not just vampire bats. It’s a lot of animals.

Jesse Harris  19:38

Yeah, there’s a lot of great and interesting things to look into there. But thank you very much, Ben, for your time. A lot of interesting thoughts there on the concept of standardization and system suitability. Thank you for your time.

Ben Neely  19:49

Thank you again for having me.

Charis Lam  19:50

See, system suitability is an interesting concept. It’s all about how we communicate as scientists and how we make sense of other scientists’ results. The better we communicate, the more our scientific research improves.

Jesse Harris  20:05

Okay, I will admit that that was pretty interesting, but that might just be the vampire bats talking. Do you have a more practical application that we can discuss?

Charis Lam  20:14

Uh huh. How about accelerating pharmaceutical research?

Jesse Harris  20:17

Oh, okay, you have my attention.

Charis Lam  20:19

Our second guest today is Pankaj Agarwal. He’s an Associate Principal Scientist of Analytical R&D at Merck. And previously, he was at Pfizer, where he was Principal Scientist and Team Lead. He received his PhD in Analytical Chemistry from Brigham Young University. And he specializes in developing chromatographic methods and in chromatographic standardization.

Jesse Harris  20:41

I’m pleased to welcome Pankaj Aggarwal to our podcast today. Thank you so much for joining us.

Pankaj Aggarwal  20:49

Thank you for inviting me.

Jesse Harris  20:50

We’re gonna be talking today about method standardization, and specifically around LC methods, but we’re gonna start off with an icebreaker question of: what is your favorite chemical?

Pankaj Aggarwal  21:01

Oh, I’d say let’s go with caffeine. That that keeps you thinking and on your toes, right? It needs to be in the right dose.

Jesse Harris  21:10

Yes, yes.

Charis Lam  21:13

Just the right amount, not bouncing off the walls. Alright, so Pankaj, we brought you on because you’re an expert, really, in LC and LC method standardization as well. Why did you choose to take an interest in LC standardization?

Pankaj Aggarwal  21:29

I’m a chromatographer by training. And having done chromatography for the past 15 years now, including my Ph. D. program, one thing I realize is: in the pharma industry, we are generating so much data every day, that if we are able to look at that data in a critical way and do some important data analytics, it will provide a significant savings in terms of time and efforts to arrive at conclusive decisions, helping ultimately to get the molecule to the market faster.

Now, in an effort to do that, the first thing that is required, that is the basis of data analytics, is having the data in a standardized format. As one of my peers said, and I really like this quote, that if you are in a trench, you need to stop digging, rather than trying to come out or fill that trench to come out. So I feel that if we start doing LC standardization, all the historical data that is not in a standard way will be taken care of. But first, that standardization will help us to get that data analytics.

Jesse Harris  22:38

Great. Yeah, that makes sense of developing better methods to approach your problems. How would you describe … How would you define LC standardization? Because it is a term that doesn’t necessarily mean the same thing to all people.

Pankaj Aggarwal  22:53

Yes, there is a wide variety among how we can define it. But if at a very high level, I would say, if I’m able to collect, store, and analyze the data with all the metadata information in a format that is not affected by the instrument vendor or the file format, and ultimate data analytics can be done in any third-party software, that would be the ultimate LC standardization with one central storage location.

Charis Lam  23:26

That makes a lot of sense. I guess, when it comes to LC, you really have at least two kinds of data: the method you’re running, and then the results like the chromatogram, you get out. Do you think that there is a difference in the way we handle those two kinds of data?

Pankaj Aggarwal  23:42

Yes, there is a difference. The difference would be the methods that we generate, those are actually something that needs to be machine readable. Those methods are ultimately read by machines and run. And when I say machine, it’s an LC instrument. There has to be a specific way of storing that LC method, that it can be transferred from one instrument to another.

Now, when we talk about results, once the result is collected, it’s irrespective of which system it was collected on, but it will be defined by which software was used to collect it, and what do we want to do downstream with it. So there are different types, just different file formats and different aspects and uses of those things. In that way, they will definitely differ.

In one thing that they can be say, similar, is where do we store them? We can store them at the same place, or we can store them at two separate places. But one thing I would really like to emphasize is: there has to be a connection between those two. Those two are not two separate entities. Those should be treated as one entity with different requirements.

Jesse Harris  24:50

Yeah, that seems quite essential because you can’t really make sense of the results without the method data that’s associated with it. So how would you describe best practices for companies that are looking to improve their LC knowledge sharing? So for example, like how to better integrate the way in which these types of data are combined and stored.

Pankaj Aggarwal  25:15

So there are multiple efforts going on right now. In the pharmaceutical industry, the one centralized effort that everybody might already be aware of is the Allotrope Foundation, where they’re trying to standardize the format and the techniques. And how these data files are stored. As a sub branch of that is the Pistoia Alliance. That is working on creating a central repository for methods database, how the method that is being run on an instrument will be stored. So those are some centralized efforts with cross-industry consortiums.

And even at each industry, each individual pharmaceutical company, there are some individual efforts that are going on which somewhat are similar as to having best practices. As to when you are starting to create a method, what are the best practices? When are you trying to label the peaks? What are the best practices? So best practices, followed by some guidelines around how to name and store the data, and also some downstream analytic tools where the user or the analyst sees what is the return of investment if you are doing the standardization.

In addition to that, pharmaceutical companies have started looking into the cloud storage, which is ultimately one central location for storing the data, and then doing inter-connectivity between different individual enterprise systems. So I guess that’s again, a significant stride in sharing knowledge within a company. And within a quick turnaround time.

Charis Lam  26:55

All of those seem like good tips. And you mentioned being able to see ROI. Where do you think is the most, the greatest ROI, or the ROI that’s most obvious when people are just starting out?

Pankaj Aggarwal  27:08

I would say the greatest ROI would be information available at your fingertips, right? So a molecule, when it is developed, it starts from discovery to development to clinic. So those three stages. And at each stage, there are so many people involved in those programs, that the knowledge transfer becomes kind of a black hole. If there is a centralized standardized location for data storage and sharing, this black hole will be eliminated.

And if we eliminate the need of redeveloping the methods just because the information is not available, that I think is a significant gain in terms of saving the time of the analyst. And they can focus on better things and other things, which are important, and it just reduces the redundancies.

Jesse Harris  27:56

What would you say are some of the challenges in setting up a method database or standardizing the methods within a company, for example? Because what you’re describing here sounds like a lot of upside. So kind of there’s an obvious thing of like, why doesn’t everybody do this?

Pankaj Aggarwal  28:10

I would say that one of the biggest challenges is the diversity of the instruments and diversity of the systems being used to collect the data, and also the diversity in the chromatographic data systems, so CDSs that are available today. Diversity in these systems just makes this a significantly challenging task to come up with a standardized way that will fit and suit everyone.

Charis Lam  28:38

But when companies take on these challenges, what do you think is the greatest opportunity that’s going to emerge for them? What would be … so I guess … the dream scenario of having all the standardization done, and now here is what you can do with it?

Pankaj Aggarwal  28:54

So I would say that for the pharmaceutical companies, it would be that the initial investment might seem to be a bigger one. But ultimately, we’ll have a lot of free analyst time to focus on other important aspects of the projects.

And the drug development timeline that looks like 15 years today, or 5 years, might be reduced further. As we know, for COVID-19, the vaccine was developed in one year. But with these aggressive timelines, we need to make our processes more efficient. That would be the biggest gain.

Jesse Harris  29:28

Well, I certainly do hope that we continue to see the types of efficiency gains we’ve seen in the COVID era. And it’d be exciting if method standardization could contribute to that. But that covers all the questions that we had for you today. Was there anything else that you wanted to comment on before we wrap things up?

Pankaj Aggarwal  29:45

I would just say that multiple efforts are being done in the industry, cross-industry consortiums, and it’s not only the pharmaceutical industries but also the vendors who work with pharmaceutical industries who are in the LC business. There have been active efforts in that area also. It’s just that we need to maybe accelerate this a bit further to come to the implementation stage, rather than just discussion.

Jesse Harris  30:14

Great. Well, thank you so much for your time. It was a pleasure having you and I look forward to joining you again in the future.

Charis Lam  30:20

Thanks, Pankaj.

Pankaj Aggarwal  30:20

Thank you.

Jesse Harris  30:22

Pankaj makes a strong argument: reducing the time spent developing redundant methods sounds like it could lead to substantial improvements in efficiency.

Charis Lam  30:31

Exactly. Standardization might be an old topic, but technology has changed so many things: how much we have to communicate, how we communicate. There’s this gap between our data management and our data production, and standards help us to close it.

Jesse Harris  30:47

If you want to learn more about any of the organizations or consortiums mentioned in today’s episode, be sure to check out the description of the episode and there’ll be some links there for more information.

Charis Lam  30:58

This is The Analytical Wavelength. See you next time.

[closing music]

The Analytical Wavelength is brought to you by ACD/Labs. We create software to help scientists make the most of the analytical data by predicting molecular properties, and by organizing and analyzing the experimental results. To learn more, please visit us at www.acdlabs.com.

Enjoying the show?

Suscribe to the podcast using your favourite service.