The Case for CASE—Computer-Assisted Structure Elucidation

What is this chemical? This is a problem for scientists since the discovery of the elements. For the modern chemist, software tools can help solve this problem. Computer assisted structure elucidation (CASE) is one of the most powerful and versatile tools available when determining chemical structures.

Join Charis and Jesse as they discuss the past, present, and future of CASE. How can CASE help your chemistry? Let’s find out.

Read the full transcript

Jean-Marc Nuzillard 00:00

Introducing CASE in your structural elucidation protocols does not constitute a threat, but is a way to improve the quality and trustability of your structural proposals; something that will benefit to everyone.

Jesse Harris 00:20

We have a compound, but what is it? That question has plagued chemists for centuries from the discovery of the elements…

Charis Lam 00:27

…up to the present day as we synthesized novel compounds and look to nature for new drugs and medicines. And it’s a question that more and more computers are helping us to solve.

Jesse Harris 00:37

Hi, I’m Jesse.

Charis Lam 00:39

And I’m Charis. Were the hosts of the Analytical Wavelength brought to you by ACD/Labs. In this episode we’re talking about the case for CASE.

Jesse Harris 00:48

That’s a lot of cases, but in this CASE we’re interested in Computer Assisted Structure Elucidation. How computers can help us determine the structures of compounds from spectra.

Charis Lam 00:59

First, let’s hear from Dimitris Argyropoulos, NMR Business Manager at ACD/Labs. He’ll give us a bit of a background on CASE and why we should be paying attention to it. Here with us today is Dimitris Argyropoulos, NMR Business Manager at ACD/Labs. Dimitris has worked with us for about six years, and before that he was working at Agilent Technologies and Varian. He received his Ph.D. in inorganic chemistry from the University of Athens. Welcome Dimitris.

Dimitris Argyropoulos 01:30

Hello.

Charis Lam 01:31

It’s great to have you. So let’s start off with our usual icebreaker question. What is your favorite chemical?

Dimitris Argyropoulos 01:37

Hmm. That’s a question I was not quite expecting, I have to say, but if I was to pick something, probably I would say water, because it’s everywhere and it’s responsible for everything. It’s quite useful for NMR and MRI and has some unique properties, among other chemicals, or liquids, or whatever. So the color of the sea is because of water. But I don’t know if you know, heavy water is not below; it’s colorless. And water in very large amounts is actually blue for some strange reasons.

Jesse Harris 02:21

That’s a good one. So we wanted to talk to you today about computer assisted structure elucidation. So can you start off by just explaining to us what that is generally?

Dimitris Argyropoulos 02:31

So in general, Computer Assisted Structure Elucidation, or CASE, as it’s referred to in sort, is the use of computer programs together with spectroscopic data in order to elucidate the structure of an unknown compound. So the idea is that your record all sorts of spectra on it, not necessarily only NMR, but also infrared, for example. And you feed all this information into the CASE system. And the answer comes out and tells you, you know, that’s your structure. So, in very simple words, this is what computer assisted structure elucidation is.

Charis Lam 03:14

Mm hmm. How old is CASE? Is there, can you share anything about the early history of it?

Dimitris Argyropoulos 03:20

CASE is actually pretty old. So the first publications appeared more than 50 years ago. I think it was 1968 when there was the first attempt to do CASE using infrared spectrum back then. You may think quite rightly, 1968 and computers—what are we talking about? Well, yeah, exactly. This was the problem; computers were not as popular or powerful as they are today. So obviously there were quite significant limitations there on what could be done. But still, the foundations, you know, were set; the ideas started evolving; and things improved, you know, as time was, as time was passing. So initially it was sort of with only infrared spectra but then they said that, you know, maybe NMR is going to be better. In the meanwhile, computers started becoming more powerful and smaller, so you could have them on your desktop. You didn’t necessarily have to be going to a big computer room somewhere else. So, yeah, that’s more or less how things started.

Jesse Harris 04:36

Yeah, I would say that’s probably like 15 years, at least, before the start of, you know, desktop computers. So that’s quite early in the history of computing. But I understand that 2D NMR has made a big impact on the efficacy of CASE. Can you tell us a little bit about what happened when 2D NMR sort of became available?

Dimitris Argyropoulos 04:59

So yeah, that’s a nice question. So even though the evolution of CASE in terms of, let’s say computer hardware, was slow and progressive just following the evolution of computers. In terms of NMR, there was a step improvement when 2D NMR experiments became available. And the reason is that before, we could, there were only one dimensional experiments. So you could only get single information about each atom in a compound. With the introduction of 2D NMR spectra, you could get correlation information and see which atoms correlate with each other; so, which ones are close in space, which ones are close in terms of chemical bonds. So suddenly you could start grouping your atoms differently than just having them as individual entities.

And this really improved a lot, the possibilities for computer assisted structure elucidation, and allowed it to be a practical technique instead of, you know, a technique that just explores all possibilities without, you know, putting too much sense in them. So yeah, 2D NMR, which first appeared, if I remember correctly, in the early 70s, but really became mainstream in the mid to late 80s was arguably the most significant development in CASE.

Charis Lam 06:32

I think another development that you sort of alluded to is that better software, better computers, better instruments, increase the accuracy and efficiency. How have these improvements in analytical equipment impacted CASE?

Dimitris Argyropoulos 06:48

So if we look at the NMR instruments part of the question, we see that they became more powerful, and experiments that used to be very complex, or required a lot of time to set up, or could only be operated by experts, suddenly became available to the masses. Let’s say everybody could record these fancy 2D experiments. So it was not an issue anymore asking to record, for example, proton-carbon heteronuclear single quantum correlation experiments. In the past of computers, the evolution of computers and the availability of much more powerful computers helped in solving the problems in more reasonable times. In essence, what CASE does, is it will take all the information in terms of how many atoms we have, of what type and how they relate to each other, and try to build every possible combination with them and see if this combination agrees with the actual spectroscopic data in terms of chemical shifts. But this can be quite a daunting task, even if you have as few as 20 carbon atoms and another 20 hydrogens. So the improvement of computers helped a lot in this. So these tedious calculations were now performed very fast, very quickly, and the actual time required for the solution of a problem was quite manageable. So it was not days and weeks, it was maybe minutes and hours. And the same thing also applies for the actual processing of these advanced spectra. So the very first instrument that record two dimensional experiments, required approximately half an hour to one hour just to do a simple Fourier transformation of the simplest to the experimental COSY. Modern day computers can process a COSY spectrum in less than a second; CASE benefited from all options.

Jesse Harris 08:54

CASE is often associated heavily with NMR. We are talking a lot about NMR here, but it’s also clear that there are other data types that are a part of it too; you were talking a little bit earlier about IR in the early years. How does the software make sense of these different types of data and sort of bring them together in making it? Is it mainly the NMR or is there other types of information that they’re bringing in these days?

Dimitris Argyropoulos 09:14

I would start by saying that the first experiment you need to record if you want to solve an unknown structure using NMR, is a high resolution mass spectrum. So this sounds a little bit maybe funny, but if you don’t have a high resolution mass spectrum, you don’t have a molecular formula. And for all intents and purposes, you are lost in the woods. You need to have a molecular formula. And so there you have the first non-NMR technique that is of fundamental importance in CASE. Now other techniques like infrared spectroscopy or UV visible spectroscopy, can also help a lot in the sense that you may identify some characteristic groups that are there. Let’s take, for example, a structure where you have, you know that you have carbon atoms and oxygen atoms, but for whatever reason, your NMR spectra are not that great. Maybe you don’t have enough compound in there, but you record an IR and a UV visible spectrum and you see characteristic peaks of a carbonate in there. So this immediately solves quite a significant problem because you associate the carbon with the oxygen, a double bond, and you eliminate the possibilities of this oxygen being something else, being maybe hydroxyl, or an ether, or something. So the use of these other techniques helps in speeding up the problem solution by eliminating other possibilities. So yeah, there is quite a bit of use for the other techniques. Software has options to accommodate this.

Charis Lam 11:03

Right, so one of the software packages I know that is really accurate is Structure Elucidator, which we make here at ACD/Labs. Can you explain why Structure Elucidator is so powerful and what makes it special?

Dimitris Argyropoulos 11:14

Structure Elucidator has been in development at ACD/Labs for maybe 23 or more years by now, and Structure Elucidator is the only package out there that contains everything that you will need for structural elucidation. When you do computer assisted structure elucidation, the first thing you need is a molecular structure generator. So ruthenium is a program where you feed the information and start generating structures. This molecular structure generator needs to be, to begin with, fast, so it should generate structures, you know, very quickly. And also it should be exhaustive in the sense that it will be generating all possible structures really, and not, for whatever reason, omitting some of them.

The structure generator that we have in ACD/Structure Elucidator has been proven to be the fastest and the most accurate out there, and it is also the one that contains options that are very useful in real life. For example, options for generating structural symmetric molecules; symmetric molecules can really cause a big problem because, you know, you expected to have, for example, I don’t know, 20 carbons, but you see just 10 signals, so symmetry can really throw a wrench in the works and destroy anything you’ll be trying to do. But the structure generator of Structure Elucidator has the means to identify such situations, and address them.

Now CASE system is not just the structure generator, and there are quite a few people out there who believe that once you generated the structures you are done, this is not correct. Of course you need to generate the structures, but if your structure generator gives you, let’s say, 2000 structures, which is very common, have you solved your problem? No, you haven’t, because yes, before you probably had, I don’t know, a few million possibilities. Having 2000 is some progress, but you are nowhere close to the solution. So you need to somehow be able to rank these structures and find out which one is really the best one. And universally the method that is accepted for doing the ranking is to predict the chemical shifts for the atoms in the structures generated and compare them with the ones you have recorded experimentally.

Now, the key word here is predict and you need to have accurate predictions. So if you don’t have accurate predictions, then you will get the wrong result. ACD/Labs offers our proton, carbon, and X NMR Predictors, and these are universally acknowledged as being the best ones. And in structure elucidation as well as in the other packages of NMR that we have, prediction has some fundamental importance and value. It allows us to take these structures that have been generated and rank them accordingly and rank them correctly. And so we get the result. So yeah, in total the ACD/Structure Elucidator is a package of programs that contains very powerful components for solving structure, so this is the main advantage that we have. So you will find out that there are other programs that, for example, will do structure generation, okay. But they are probably slower, and/or they may not be accompanied with an equally competent prediction program. That’s not the case with Structure Elucidator. We have the fastest engine coupled to the best predictors, and there are numerous articles in the literature that confirm this.

Jesse Harris 15:17

I imagine they’re going through 2000 structures and ranking them one by one by hand has got to be quite the undertaking, so I’m sure that there have been some grad students in the days, that did that sort of thing with that stuff.

Dimitris Argyropoulos 15:29

Absolutely.

Jesse Harris 15:31

Yes, okay. So to wrap things up then, thank you so much for all of this information, but I wanted to ask you if there’s anything else about CASE that you think people should know about as a background in general information about the technique?

Dimitris Argyropoulos 15:45

Uh, no, I would just say that CASE is by now an established technique. It still has, I believe, quite a bit of future, so there are still several things that could be done to improve and enhance CASE. And what I would like to say is that people should not be afraid of CASE. We talked with several people and quite a few times I sense that some people feel a little bit insecure using such tools because they believe that, hey, you know, if a computer software can do these things, then I will not be needed. So they avoid using the software in order to maintain their job security. Yeah, on the other hand, though, you see quite a few publications every year being retracted, revised, or whatever, because the structure that was published there turned out to be not the correct one. It has been proved that if people had used CASE software to confirm what they did, they would had avoided this. And of course, having to revise a structure is quite costly, not only in terms of time and money spent for the new publication, but in terms of your reputation also. So having used a CASE program wouldn’t have happened, your job security. Instead, you would have had some additional reassurance that what you did is correct. So CASE, as well as the rest of the computer software for processing spectra are not there to replace experts and chemists. They are there to help them. And that’s what people should look at them like, and not be afraid to use them.

Charis Lam 17:21

All right. That’s some great advice. I think that it’s a tool to help you, not something to replace you. Thank you so much for joining us today, Dimitris.

Dimitris Argyropoulos 17:30

Thank you. Nice talking to you both.

Jesse Harris 17:32

Thank you very much.

Charis Lam 17:34

Dimitris gave us a great introduction to CASE. We’ve learned how it might be useful and how the software has evolved alongside analytical instruments.

Jesse Harris 17:43

What are the precise uses for CASE?

Charis Lam 17:46

For that, let’s turn to our next guest, Professor John Mark Nuzillard. Who will tell us more about where CASE is useful.

Here with us today. We have Dr. Jean-Marc Nuzillard, who is a director of the National Center for Scientific Research at the University of Reims. He has done a lot of research in the areas of applied nuclear and magnetic resonance, and structural elucidation of natural substances. So we’re really pleased to have him.

Jesse Harris 18:13

Hello. Great to have you.

Jean-Marc Nuzillard 18:14

Yeah, thank you.

Charis Lam 18:15

Hello. It’s great to have you. Well, let’s start with our first icebreaker question. What’s your favorite chemical?

Jean-Marc Nuzillard 18:22

Well, I give lectures on the NMR spectra interpretation at the time I taught analytical chemistry to masters students. And one of my favorite examples was sucrose. Sucrose has a sweet taste, it gives a lot of energy, and it makes nice in mass spectra. So, yes, sucrose.

Jesse Harris 18:41

That’s a good example. Now, with that, though, let’s transition into our conversation about NMR and CASE. Now computer assisted structure elucidation is a pretty broad term, if you think about it. Everything nowadays seems to be computer assisted in one way or another. So how would you define that term and how does it apply to the work that you do?

Jean-Marc Nuzillard 19:02

Yes sure, in the field of analytical chemistry, it’s nowadays very difficult to find a single operation that would be not computer assisted. The process that leads from the recorded raw data to a spectrum and that can be exploited for the structure determination is fully under computer control. Nobody would attempt to calculate the Fourier transform of a signal with simply paper and pencil today.

More seriously, in the first level of CASE, one can see that structural elucidation is computer aided when one attempts to know whether a set of freshly recorded spectra matches with a compound that has already been identified, at any time in the past, in any place on Earth. In this case, computers are essential for searching in databases.

Another level of CASE is de novo structural elucidation, and this kind of CASE is a true subject of auto today, I think. CASE is generally understood as a process by which a structure is proposed by a computer software, based on the interpretation of the available spectroscopic data; initially separated atoms are bound together to build the molecule that satisfies the constraints imposed by the spectra. CASE is then employed to obtain the structure of compounds that are presently unknown, either because they are really unknown, or because it was not possible to prove that they were known. The definition of CASE is also somewhat related to the nature of analytical techniques at work—NMR, mass spectrometry, and X-ray crystallography are certainly the most powerful techniques to obtain the structure of unknown molecules. I will not discuss about X-rays because it’s not a spectroscopy technique and because the requirement for crystalline samples limits its everyday use. The quick search in internet about the de novo case shows that NMR and mass spectrometry are the relevant techniques, but they have different application domains.

So if you agree, this discussion will focus on NMR-based CASE, simply because this is a topic about which I feel the most comfortable.

Charis Lam 21:21

Yes, definitely. That’s what we’re interested in as well. And one of the questions we had is how does case use differ between different areas of chemistry? So, for example, organic synthesis versus looking at natural products.

Jean-Marc Nuzillard 21:35

A synthetic organic chemist lets known compounds react together to produce new ones. The molecular structure of reactants is known, and the end products are most often the expected ones. It may happen that the products are not the expected ones, and that unexpected by-products are formed. The structure of the product and by-products have to be determined before these end products may be put to react further in a multistep organic synthesis. However, chemical reactions rarely scramble all the atoms of the starting materials, the spectroscopic signature of the initial compounds is still visible in the end product. This makes the structure determination not so difficult, and the use of CASE would be only very rarely relevant.

On the other hand, natural products chemists often have structure elucidation problems to solve, for which they have no idea at all of the structure of the compounds they have isolated. This makes the game much more complicated than the one of synthetic compounds. The use of CASE software isn’t worth being considered to resolve the indeterminacies that are present in NMR data without being influenced by preconceived ideas about the result. The structural elucidation of natural compounds is real as a favorite playground of CASE software.

Jesse Harris 23:02

Is de novo CASE used by organic chemists as much as you would expect?

Jean-Marc Nuzillard 23:04

CASE is not used by chemists as much as I would have expected, and as I already said, especially by natural product chemists. There is no single reason for this; maybe there are good reasons and not so good reasons. What I can speak about is what I have experienced from university labs, which may be different from what happens in industry.

I would like to start with a preliminary comment about the publication of new chemical structures in scientific journals. Chemists at university have a great freedom in what concerns the ways they determine the structure of molecules. The quality requirements they want, that allows one to defend a PhD thesis and to publish an article. The academic journals often want the authors to report NMR spectra in supplementary data files, but practically there is no strict requirements on the quality and usability of these documents. The reporting of the way structures are deduced from spectra is generally a nice piece of poetry. It has nothing to do with the real way in which the structures were deduced. All of this to state that there is no necessity to prove, in the most possible rigorous way, the structure of newly isolated compounds, and therefore no necessity to benefit from the confidence CASE software can bring to structure elucidation. The feeling of a lack of necessity is, in my opinion, the first hurdle for CASE. The thinking hurdle is the habit, and, as you know, habit is a thing in nature. This means that including CASE in university chemistry courses is certainly a way to accelerate the use of CASE software. I did it for many years and I’m aware of a few places where colleagues do the same, but maybe it’s not enough.

My former boss used to say that a good software is a software I know how to use. This looks like common sense, but illustrates well the power of our habits and that the learning curve is always too steep. I would say that the mission of CASE software designers would be to make learning as easy as possible, even though a lot of (in this direction) has already been achieved.

As a matter of fact, the companies that produce CASE software have tried hard to integrate seamlessly the spectral processing tasks, and this input of relevant spectral data in the structure generation process. This step is really a difficult one, and a poor spectral 1D or 2D peak picking may lead to a loss of time, something perceived as unbearable.

Other hosts may limit the use of CASE software, including some Latin fear that the machines will replace humans in the structure elucidation task, which is considered as a highly intellectual one by its practitioners. This problem is as old as in production of steam engines in industries.

Charis Lam 26:16

Definitely. So you talked a little bit about publication and teaching, can you expand a bit on those uses of CASE outside traditional academic and industrial research?

Jean-Marc Nuzillard 26:26

Yes as I’ve already mentioned briefly, CASE may be involved in analytic chemistry training and is already included in the university curriculum at some places. I also mentioned the relationship between scientific publications and CASE software. In my experience of article review work, the text of articles including supplementary data files, does not provide generally enough material for reviewers to really validate the structures that are proposed by the authors.

NMR data in articles consists most of time of lists of 1D proton and 1D carbon-13 NMR chemical shifts, and of proton-proton coupling constant values. The storage of raw NMR data and of the corresponding spectra would at least allow reviewers for the verification of published NMR data. Less often the description of 2D NMR is reported in journal articles, but the quality of 2D spectra drawings in the articles generate not sufficient for a reviewer to validate the content of NMR data tables. Finally assuming that the data in these tables of 2D NMR coalition data are correctly reported, which is not guaranteed, the availability of CASE software files may offer to a reviewer the possibility to validate the proper structure and understand the reasons for which possible alternative structures must be discarded. I insist on this necessity to publish reliable data because published data constitutes basis for the building of spectral data and structural databases. Everyone will understand the importance of storing data of the best possible quality for the future use of compound identification through database queries.

These considerations are fully in phase with the movement towards fair data and open science.

Jesse Harris 28:28

Yeah, I think that you’re also saying it makes a lot of sense in terms of higher quality data, which is so important and has been a theme of the conversations that we’ve been having this season. But I wanted to ask you, what are the challenges that you think CASE still needs to overcome?

Jean-Marc Nuzillard 28:43

The biggest challenge to overcome is one of acceptability. The technical questions come only at the second rung for me. I do not want to say that the CASE software are all technically perfect, but they are in my opinion already enough powerful to solve a wide range of real life problems. Acceptability is a matter of education and information. Journal articles, demo at scientific meetings, seminars for academic and industrial researchers, videos and talks like this one contribute to make CASE better known to those for whom this can bring benefits.

Technically, almost all possible ways of extending the scope of CASE systems have been already explored. This includes the incorporation of data that leads to the determination of three-dimensional structures. The initial purpose of early CASE software was to propose to these structures only, and that was not that bad. Obviously, this is insufficient for organic chemists because the configurations of chirality elements are of the highest importance to explain the physical, chemical, and biological properties of organic molecules. Ab initio theoretical chemistry calculations and the choice between alternative stereoisomers provide some more confidence in structure elucidation results. This approach has already been proposed and the always increasing speed of calculation, especially through cloud computing, might offer a practical way of solving extremely complex problems once CASE and ab initio calculations will be integrated together. I’m almost sure that considering 1D and 2D NMR spectra or even higher dimensionality spectra as image, and treating them as input to deep learning based software, will lead to interesting results.

This is today purely speculative, but such approaches to problem solving could bring interesting perspectives, at least for particular categories of problems. I think here to the quick identification of known compounds in the same way that the software on your smartphone can identify the species of a bird just by a few seconds of recording of its song.

Charis Lam 31:09

That would definitely be very interesting. So thank you for all the perspectives and advice you share today. Is there any last advice you’d like to leave our audience with about how to use CASE?

Jean-Marc Nuzillard 31:22

Yes. My advice is ‘be curious, stay curious’. If you think that CASE can be your friend, but you do not know how, contact a software provider and ask for a demo. I’m pretty sure that no one will refuse, and that they will promote the idea that introducing CASE in your structure elucidation protocol does not constitute a threat, but is a way to improve the quality and trustability of your structural proposals, something that will benefit to everyone. To sum up briefly, CASE is your friend.

Charis Lam 31:59

That’s great. And as a CASE software provider, we definitely agree that demos are great; asking more questions are always great. Thank you so much, Dr. Nuzillard.

Jean-Marc Nuzillard 32:09

You’re welcome.

Jesse Harris 32:10

Thank you so much for coming out. Professor Nuzillard gave us great examples of how CASE might be used for different situations, including some alternative scenarios in academia and teaching.

Charis Lam 32:24

Yes, as computing power and algorithms become more powerful and more widespread, the number of scenarios where they can be used increases.

Jesse Harris 32:32

That’s certainly been a theme of this season. In the next episode will discuss how that works in a different area of analytical chemistry—chromatography.

Charis Lam 32:43

Remember to subscribe so you’ll get notified when it goes live.

Jesse Harris 32:45

This is the analytical wavelength until next time.

Charis Lam 32:50

The Analytical Wavelength is brought to you by ACD/Labs. We create software to help scientists make the most of the analytical data by predicting molecular properties, and by organizing and analyzing the experimental results. To learn more, please visit us at www.acdlabs.com.

Enjoying the show?

Suscribe to the podcast using your favourite service.

Season 2, Episode 3