18–21 May 2026
Europe/Warsaw timezone

Comparative Analysis of Classification Models for Pharmaceutical Permeability Prediction

20 May 2026, 11:57
18m
Room 1 B

Room 1 B

oral presentation YSS1 (ROeS)

Speaker

Jana Habus-Korbar (Student at University of Zagreb, Faculty of Science, Department of Mathematics)

Description

In this study, PERMY data set taken from Pharmaceutical Statistics Using SAS: A Practical Guide is analyzed. It describes permeability of cell membranes, which is the ability of a molecule to cross a membrane. Biological structures are a complex layer of molecules and proteins. Substances require a particular structure to pass through the target membrane and drugs that fail to demonstrate sufficient permeability should be excluded from further testing. For that specific reason, permeability is important in the early stages of drug development.
The aim of this study is to compare several classification methods for binary classification, using 71 molecular properties whose meanings are not explicitly known. Due to possible collinearity and near singular data matrices, the models were complemented with multicollinearity and principal component analysis for dimension reduction. The following methods were compared: logistic regression (including stepwise, decision tree and cluster based variable selection), decision trees, neural networks, random forests, gradient boosting trees and bagging trees. The data was split into training and validating subsets. To rank model performances, average square errors were computed, while confusion matrix and misclassification rate were used to assess classification accuracy for each algorithm. Statistics mentioned above were compared using validation data. Additionally, to assess model fit and complexity of the candidate models, metrics such as Gini index and area under the ROC curve were evaluated.
Special attention regarding possible interpretation was given to black box algorithms, primarily because of their robustness to near singular data matrices. These models were further explained using surrogate decision trees, which provide insight into variable importance and internal model structure.

64288210087

Author

Jana Habus-Korbar (Student at University of Zagreb, Faculty of Science, Department of Mathematics)

Co-author

Borna Aleksić (Student at University of Zagreb, Faculty of Science, Department of Mathematics)

Presentation materials

There are no materials yet.