DATA IMPUTATION FOR BIVARIATE GAMMA-GENERATED DATA USING PREDICTIVE MEAN MATCHING AND RANDOM FOREST METHODS
DOI:
https://doi.org/10.21009/JSA.10103Keywords:
Bivariate Gamma, Mean Absolute Percentage Error, Predictive Mean Matching, Random Forest Imputations, Root Mean Square ErrorAbstract
Missing data is a common problem in data analysis and can reduce the quality and accuracy of research results if not handled properly. This study aims to compare the Predictive Mean Matching (PMM) and Random Forest (RF) imputation methods in handling missing data with missing levels of 5%, 10%, 15%, and 20% using correlation indicators, p-values, and observing the smallest Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) values. The results show that both methods differ at each level of missing data. At 5% missing data, both methods show significant differences to the original data with a p-value smaller than α = 0.05, but the RF method produces smaller MAPE and RMSE values than PMM. At 10% missing data, the PMM method still shows significant differences to the original data, while the RF method does not. At 15% missing data, the PMM method showed results that were not significantly different from the original data and had smaller MAPE and RMSE values than RF. Meanwhile, at 20% missing data, the RF method produced the highest correlation value of 0.7788 compared to PMM at 0.7638. In general, the results of the study indicate that the greater the proportion of missing data, the imputation error rate also tends to increase. Therefore, the selection of imputation methods needs to be adjusted to the characteristics and proportion of missing data to obtain optimal imputation results.



