Abstract

The time-series forecasting, for example in supply chain management, can be challenged by high-dimensional and noisy market data, which may lead to model overfitting and the inclusion of spurious correlations. A framework for automated feature engineering and multi-stage selection can help address these challenges. The process can programmatically generate a large set of potential features, such as time-lagged variables, from heterogeneous data sources. A multi-stage selection protocol can then be applied, which may first use correlation-based filtering to prune features, followed by an iterative, model-driven stage that uses backtesting to evaluate and promote features based on their predictive value. This approach can produce a curated set of validated features, which can be used to train forecasting models that may be more robust and parsimonious, less susceptible to overfitting, and potentially improve forecast accuracy.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS