News

citefactor-journal-indexing

HOW MACHINE LEARNING METHOD PERFORMANCE FOR IMBALANCED DATA Case Study: Classification of Working Status of Banten Province

This study will examine the application of several classification methods to machine learning models by taking into account the case of imbalanced data. The research was conducted on a case study of classification modeling for working status in Banten Province in 2020. The data used comes from the National Labor Force Survey, Statistics Indonesia. The machine learning methods used are Classification and Regression Tree (CART), Naïve Bayes, Random Forest, Rotation Forest, Support Vector Machine (SVM), Neural Network Analysis, One Rule (OneR), and Boosting. Classification modeling using resample techniques in cases of imbalanced data and large data sets is proven to improve classification accuracy, especially for minority classes, which can be seen from the sensitivity and specificity values that are more balanced than the original data (without treatment). Furthermore, the eight classification models tested shows that the Boost model provides the best performance based on the highest sensitivity, specificity, G-mean, and kappa coefficient values. The most important/most influential variables in the classification of working status are marital status, education, and age.



Real Time Impact Factor: Pending

Author Name:

URL: View PDF

Keywords: Machine learning, Predictive, Resample, Sensitivity, Specificity

ISSN: 2621-8070

EISSN: 2686-3219


EOI/DOI: 10.31943/teknokom.v4i2.64


Add Citation Views: 1














Search


Advance Search

Get Eoi for your journal/conference/thesis paper.

Note: Get EOI for Journal/Conference/ Thesis paper.
(contact: eoi@citefactor.org).

citefactor-paper-indexing

Share With Us












Directory Indexing of International Research Journals