Over the past few decades, protein interactions have gained importance in many applications of prediction and data mining. They aid in cancer prediction and various other disease diagnosis. Imbalanced data problem in protein interactions can be resolved both at data as well as algorithmic levels. This paper evaluates and surveys various methods applicable at data level as well as ensemble methods at algorithmic level. Cluster based under sampling, over sampling along with data based methods were evaluated under Data level. Ensemble classifiers were evaluated at the algorithmic level. Unstable base classifiers such as SVM and ANN can be employed for ensemble classifiers such as Bagging, Adaboost, Decorate, Ensemble non-negative matrix factorization and so on. Random forest can improve the ensemble classification in dealing with imbalanced data problem over Bagging as well as Adaboost method for high dimensional data.
Real Time Impact Factor:
1.66667
Author Name: Seena Mary Augusty, Sminu Izudheen
URL: View PDF
Keywords: Bagging; Adaboost; Decorate; Oversampling; Under sampling
ISSN: 2326-5825
EISSN: 2326-5833
EOI/DOI:
Add Citation
Views: 1