Prior to development of the logPS and logBB predictive models, a part of each data set was reserved for validation purposes and not used in modeling —the internal validation set. Additionally, two independent data sets representing another type of experimental data (directly measured fu, brain values) were extracted from publications1, 2 and used as external validation sets to confirm intrinsic accuracy of the obtained logBB model.
Performance of the models on various validation sets is presented in the table below:
| Data Set | N | R2 | RMSE |
|---|---|---|---|
| logPS internal validation set | 53 | 0.82 | 0.49 |
| log fu, brain internal validation set | 137 | 0.74 | 0.41 |
| log fu, brain external validation set (Kalvass et al., 2007) | 31 | 0.73 | 0.42 |
| log fu, brain external validation set (Summerfield et al., 2008) | 20 | 0.70 | 0.36 |
Statistical parameters obtained for internal and external test sets demonstrate good predictive power of the models with RMSE being close to the error of experimental determination in both cases.
The proposed classification scheme was validated using experimentally assigned CNS access categories for 1696 compounds.3 After removing amino acids and compounds that were affected by P-gp efflux, 94% of the remaining 1581 molecules were correctly classified by our model as shown in the figure below:
