Abstract
A system performs task decomposition for the energy-aware parallel execution of Artificial Intelligence (AI) inference across heterogeneous computing resources. The technology decomposes a complex prompt into atomic sub-tasks and maps them into a Directed Acyclic Graph (DAG) to define dependencies. Each sub-task is then classified by its computational nature and routed to hardware, such as a local central processing unit (CPU), neural processing unit (NPU), or cloud server, based on physical energy and carbon cost models. The system analyzes the DAG to identify the critical path, routing these tasks to higher-performance resources to reduce latency while assigning non-critical tasks to low-power hardware. Independent sub-tasks are executed in parallel across the heterogeneous hardware, reducing overall completion time. This hardware-aware orchestration decreases total energy consumption by routing deterministic logic away from power-intensive AI models and lowers system latency through parallel execution of independent operations.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Bhardwaj, Utkarsh and Awasthi, Shivank, "Energy-Efficient Artificial Intelligence Inference Through Task Decomposition and Heterogeneous Parallel Execution with Critical Path Trade-off Analysis", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10230