Pemodelan Regresi Logistik Ordinal Backward dengan Imputasi K-Nearest Neighbour pada Indeks Pembangunan Manusia di Indonesia Tahun 2021
Abstract
The human development index (HDI) is one of the important things to note in Indonesia today. The growth of HDI in Indonesia in 2021 is not evenly distributed in all regencies/cities and has high disparities. This study aims to find out the description of HDI data, get the best model to determine the factors that significantly affect the HDI of regencies/cities in Indonesia in 2021 and identify the classification accuracy results of the best model. The independent variables used in this study are average years of schooling, open unemployment rate, population growth rate, population density, percentage of poor people and sex ratio. The independent variables in this study contained missing values, so they were handled using k-nearest neighbour (KNN) imputation and continued modelling using ordinal logistic regression using the backward elimination technique to obtain significant factors. The results showed that the proportion of the low HDI category was 4.28%, the medium HDI category was 48.64%, and the high HDI category was 47.08%. Based on logistic regression modeling using backward elimination which has the smallest AIC value of 293.387, a model with independent variables of average years of schooling (X1), population density (X4), percentage of poor people (X5) and sex ratio (X6) is a variable that significantly affects the HDI of regencies/cities in Indonesia in 2021. The accuracy value of the classification accuracy of training data and test data from the ordinal logistic regression model of HDI of regencies/cities in Indonesia in 2021 is 83.46% and 86.61%, respectively, which means that the model is good for prediction.