This study will examine the application of several classification methods to machine learning models by taking into account the case of imbalanced data. The research was conducted on a case study of classification modeling for working status in Banten Province in 2020. The data used comes from the National Labor Force Survey, Statistics Indonesia. The machine learning methods used are Classification and Regression Tree (CART), Naïve Bayes, Random Forest, Rotation Forest, Support Vector Machine (SVM), Neural Network Analysis, One Rule (OneR), and Boosting. Classification modeling using resample techniques in cases of imbalanced data and large data sets is proven to improve classification accuracy, especially for minority classes, which can be seen from the sensitivity and specificity values that are more balanced than the original data (without treatment). Furthermore, the eight classification models tested shows that the Boost model provides the best performance based on the highest sensitivity, specificity, G-mean, and kappa coefficient values. The most important/most influential variables in the classification of working status are marital status, education, and age.
Real Time Impact Factor:
Pending
Author Name: Pardomuan Robinson Sihombing
URL: View PDF
Keywords: Machine learning, Predictive, Resample, Sensitivity, Specificity
ISSN: 2621-8070
EISSN: 2686-3219
EOI/DOI: 10.31943/teknokom.v4i2.64
Add Citation
Views: 1