Abstract: The estimation and monitoring of rainfall patterns are crucial for hydrological system modelling. However, station data in Indonesia often need to be completed, making analysis using these data problematic. One way to address incomplete data is through imputation or filling in missing data by utilizing other available information. This study aims to identify the most accurate machine learning method and satellite dataset for imputing missing daily rainfall data at BMKG stations in East Java. The four machine learning methods used are Multiple Linear Regression (MLR), Convolutional Neural Network (CNN), Multiple Layer Perceptron (MLP), and Support Vector Regression (SVR). The satellite datasets used are ERA5, ERA5 Land, CMORPH CRT, CMORPH BLD, and CHIRPS. The data were divided into training and testing sets with varying ratios of 95:5%, 90:10%, 80:20%, 70:30%, and 50:50%. Based on the analysis of data proportions, scenarios with a more significant proportion of training data, such as 95:5%, yield better performance. MLP demonstrates the most significant potential for further analysis regarding machine learning methods due to its lowest loss value, measured using Mean Absolute Error (MAE). Regarding satellite data, ERA5 is more stable and reliable for rainfall imputation modelling. Combining the MLP method and ERA5 satellite data delivers the best results with the lowest loss value and minimizes the risk of overfitting. This study significantly improves the quality of rainfall data and supports more accurate meteorological analyses in Indonesia.
Keywords: Imputation missing data; Machine Learning; Mean Absolute Error; Satellite Data; Station Data
link: 10.21163/GT_2025.201.23
Dipublikasikan pada Geographia Technica, Vol 20 (1): 346-368
0 Komentar