Publications & Presentations  2006 


 

 

September 10, 2006, SMASH 2006, Burlington, VT, USA

The Effect of Structure Description Schemes on Chemical Shift Prediction by Incremental and Neural Network Approaches

Yegor D. Smurnyy, Kirill A. Blinov, Brent A. Lefebvre, Antony J. Williams

Abstract

Typically, a chemical shift prediction algorithm has two major components: i) rules to encode a chemical structure into a set of numbers ("structural code") and ii) the routine to calculate a chemical shift value from a numerical input. In the current study we compare multiple algorithms with a special emphasis on the effect of the chemical structure encoding routine on the overall 13C chemical shift prediction accuracy.

Two primary methods were examined in this work: a neural network approach and an incremental scheme (rules based approach). The former implies the use of a network of artificial neurons, each of which takes an input signal (either from outside or from a peer neuron) and, after a non-linear transformation, produces an output. In this study we employ a multilayer network in which the neurons of the i-th layer receive all of the (i-1)-th layer outputs as inputs. The weights of the net are adjusted by a backpropagation algorithm. In the incremental scheme, the result is shown to vary linearly with the quantity of characteristic moieties present in a molecule. Coefficients for this method are calculated by a partial least squares regression routine.

For both of these methods, the typical experimental workflow was the following: the whole database (more than 2 million chemical shifts) is split into smaller parts according to the central atom type (in this work, we found 6 atom types to be ideal). About 5-7% of the shifts are included into the "test set" and are not used for system training. These data serve to evaluate the overall performance of the algorithm after it has been trained. In the next step, a neural network can be trained or incremental scheme coefficients calculated by regression. Finally, the performance is evaluated on the test set.

The main focus of this work surrounded efforts to optimize several aspects of the whole routine. They were:

  • Number of central atom types. Separate neural nets or coefficients sets can be used to calculate chemical shifts of chemically different atoms. In this work, 6 types were found to be ideal. Details of this will be shown.
  • The structural code. Several approaches to this problem have been suggested - either encoding individual atoms or parts of a molecule (2-3 atoms). Details of this investigation will also be presented.
  • Characteristics of the neural net/regression scheme. The most important result of our work is that we, unlike many authors, have found this to have very little effect on the prediction accuracy. We have designed several types of neural networks (different in transfer function, teaching algorithm, etc.) and found the size of a net to be the only important factor. Typically, 100-300 hidden neurons are a good compromise between precision and speed of computation.

In recent years, a number of chemical shift prediction approaches have been developed, in particular, neural nets have been popular. Most of these approaches focus on sophisticated network architecture or advanced description schemes (for example, 3D conformation). In the study of neural nets and incremental schemes shown here, with the largest quality 13C chemical shift database available, we demonstrate that the network or regression routine is not the key to chemical shift prediction quality. Rather a reliable method to convert a structure to a numerical representation leads to a good prediction with even a simple neural net or regression scheme.

As a result of this work, we find a mean error of less than 2 PPM can be obtained with our approach. This compares well with database-based (HOSE codes) methods and is better than most of the previously reported results of Neural Net approaches.


Download the poster in Adobe Acrobat format (223 Kb PDF file).


Relevant Products: ACD/CNMR, ACD/CNMR DB

TOP

This page was last updated 28 September 2006
 

  Products | Solutions | Support
Online Services | Resources
About Us | Downloads | Events
Site Map | Contact Us
 

 
Copyright © 1996 - 2008 Advanced Chemistry Development     All rights reserved