Abstract
Varying computational tasks introduce thermal and power dependencies within modern computing devices. Static threshold policies might not fully capture cumulative operational stress, potentially causing over-throttling or delayed hardware responses. A closed-loop machine learning framework is disclosed that dynamically evaluates multidimensional telemetry data to compute a predictive risk score. A feature selector extracts direct measurements, system activity indicators, temporal features, and health indicators. Based on the calculated risk score, an arbitrator dynamically applies mitigation actions (e.g., scaling component voltage and frequency states). A model corrector operates during hardware idle periods to iteratively refine predictive modeling using observed performance metrics. The disclosed framework aims to preemptively adjust operational points based on evolving usage patterns, supporting extended hardware reliability and consistent application performance.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Heidarinejad, Mohsen; Kenarangi, Farid; Mittal, Arpit; and Khajeh, Amin, "Predictive Hardware State Mitigation Using Telemetry and Feedback", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10367