Department of Industrial Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
* Corresponding Author Address: Department of Industrial Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashhad, Iran. Postal Code: 917794895 (babaeian.am@gmail.com)
Abstract (276 Views)
Aims: Accurate prediction of daily minimum temperature (Tmin) plays a crucial role in agricultural management, frost prevention, and energy consumption. Despite advances in machine learning methods, systematic comparisons of surface and upper-air data performance for Tmin prediction in arid and semi-arid regions such as Mashhad remain limited. This study evaluates five ensemble learning algorithms (CatBoost, XGBoost, LightGBM, AdaBoost, and Random Forest) under three data scenarios: Upper-air, surface, and combined, for next-day Tmin prediction. Methodology: The present study is applied in nature and was conducted in 2025 using data collected from the Mashhad meteorological station. Daily meteorological data from the Mashhad synoptic station and ERA5 reanalysis at 300, 500, and 700 hPa levels were utilized for the period 2000-2023. All predictors were incorporated with a one-day lag relative to the target Tmin. The algorithms were trained using cross-validation. Multicollinearity among predictors was controlled using the VIF, and the optimal subset of features was determined through the Best Subset Selection (BSS) method based on the coefficient of determination (R2). Findings: Integrating surface and upper-air data significantly improved the accuracy and stability of the models. In the combined scenario, the LightGBM algorithm achieved the best performance on the test set (R2=93.90%, MAE=1.63°C, RMSE=2.10°C, and KGE=0.93). The BSS method identified five key predictors (Tmin, relative humidity and specific humidity at 700 hPa, minimum surface humidity, and the categorical variable for summer season) as the most influential combination, effectively integrating upper-air thermodynamic and surface conditions. Conclusion: Combining surface and upper-air data within the LightGBM framework, along with systematic feature selection, provides the most accurate approach for short-term Tmin prediction.