Click here to flash read.
We aim to demonstrate in experiments that our cost sensitive PEGASOS SVM
achieves good performance on imbalanced data sets with a Majority to Minority
Ratio ranging from 8.6:1 to 130:1 and to ascertain whether the including
intercept (bias), regularization and parameters affects performance on our
selection of datasets. Although many resort to SMOTE methods, we aim for a less
computationally intensive method. We evaluate the performance by examining the
learning curves. These curves diagnose whether we overfit or underfit or we
choose over representative or under representative training/test data. We will
also see the background of the hyperparameters versus the test and train error
in validation curves. We benchmark our PEGASOS Cost-Sensitive SVM's results of
Ding's LINEAR SVM DECIDL method. He obtained an ROC-AUC of .5 in one dataset.
Our work will extend the work of Ding by incorporating kernels into SVM. We
will use Python rather than MATLAB as python has dictionaries for storing mixed
data types during multi-parameter cross-validation.
No creative common's license