Innovation Series: Advanced Science (ISSN 2938-9933, CNKI Indexed)

Volume 3 · Issue 3 (2026)
35
views

Research on Stock Price Prediction Based on Random Forest with Feature Engineering and Rolling Window Evaluation

 

Zhanhang Gao*

School of Big Data and Statistics, Anhui University, Auhui, China

Corresponding Author: Zhanhang Gao (we23201020@stu.ahu.edu.cn)

 

Abstract: With the growth of the stock market, equity investment has drawn considerable attention. However, high returns come with high risks, making stock selection and market timing particularly challenging. This paper develops a stock price prediction model based on the Random Forest algorithm with feature engineering. In the feature selection stage, key technical indicators are identified by integrating correlation coefficient filtering, importance evaluation using Random Forest and XGBoost, and K-means clustering. The model is evaluated using a rolling window approach, with a training window of 500 trading days and a test window of 63 trading days, along with 95% prediction intervals for robustness testing. The results demonstrate that the Random Forest model performs well under different market conditions, achieving an overall R² of 92.05% and a prediction interval coverage rate of 92.1%, which closely approximates the theoretical 95% confidence level.

 

Keywords: RF model; Stock price prediction; Feature engineering; Rolling window approach

 

References

[1]
Zhang J, Cui S, Xu Y, et al. A novel data-driven stock price trend prediction system. Expert Systems with Applications, 2018, 97: 60-69. DOI:10.1016/j.eswa.2017.12.026
[2]
Meher Bharat Kumar, Singh Manohar, et al. Forecasting stock prices of fintech companies of India using random forest with high-frequency data. Journal of Open Innovation: Technology Market and Complexity, 2024, 10(1): 78-99. DOI:10.1016/j.joitmc.2023.100180
[3]
Keren He, Qian Jiang. Research on stock prediction algorithm based on CNN and LSTM. Academic Journal of Computing and Information Science, 2022, 5(12): 23-35. DOI: 10.25236/AJCIS.2022.051215
[4]
Fischer, Thomas, Krauss, Christopher. Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 2018, 270(2): 654-669. DOI:10.1016/j.ejor.2017.11.054
[5]
Zhang Y, Lin X. Empirical analysis of china stock market based on grey system. Academic Journal of Business and Management, 2023, 5 (12): 112-115. DOI:10.25236/AJBM.2023.051219
[6]
Yu B. Is the Chinese stock market efficient? Evidence from a combined liquidity trading strategy. China Finance Review International, 2026, 16 (1): 61-96. DOI:10.1108/CFRI-01-2024-0011
[7]
Ladjmil N, Benzerra A, Bosseler B. Predicting the structural condition of sewer pipes: a comparative analysis of Random Forest and logistic regression models. Urban Water Journal, 2026, 23 (3): 445-459. DOI:10.1080/1573062X.2025.2571908
[8]
Breiman L. Random forests. Manchine Learning, 2001, 45(1): 5-32. DOI:10.1023/A:1010933404324
[9]
Semnani M A, Kordrostami S, Sheikhani R A, et al. A hybrid framework for stock price forecasting using metaheuristic feature selection approaches and transformer models enhanced by temporal embedding and attention pruning. Applied AI Letters, 2026, 7(1): 1-17. DOI:10.1002/AIL2.70018
[10]
Huisi H, Yiming Y, Jianlong H, et al. Construction of a clinical path discrimination model for stroke patients based on the XGBoost integrated learning algorithm and its application analysis in the MOP under the DIP payment model. Journal of Clinical and Nursing Research, 2025, 9 (4): 291-298. DOI:10.26689/JCNR.V9I4.10484
Download PDF

Innovation Series

Innovation Series is an academic publisher publishing journals and books covering a wide range of academic disciplines.

Contact

Francesc Boix i Campo, 7
08038 Barcelona, Spain