Abstract
This work presents a defensive disclosure of a novel GRU-based pipeline to predict the execution runtime of AI model computational graphs on TPU hardware. Efficiently predicting the runtime of AI model operations is essential for optimizing deployment and hardware utilization. This paper proposes a novel pipeline that integrates opcode-based runtime estimation, graph edge de pendency embeddings, configurable node feature adjustments, and a Gated Recurrent Unit (GRU) neural network to predict operation runtimes on computational graphs. The approach is applied to the TPUGraphs dataset from the ”Google- Fast or Slow? Predict AI Model Runtime” Kaggle competition, which involves predicting runtimes of Tensor Processing Unit (TPU) computations based on graph and configuration features. Experimental results demonstrate the model’s ability to capture complex graph and configuration dependencies, enabling accurate runtime predictions that can guide compiler optimization heuristics. The disclosed pipeline is designed to assist in compiler optimization and runtime prediction, and is released for public use to establish prior art in configuration-aware AI compiler techniques.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Mallapragada, Suma, "Runtime Prediction of AI Model Operations Using a GRU-Based Neural Network", Technical Disclosure Commons, (July 07, 2025)
https://www.tdcommons.org/dpubs_series/8310