Abstract

Latency minimization is critical to high-performance computing (HPC). Network monitoring tools that rely on out-of-band data to control latency cannot assist latency-sensitive network workloads in real time. This disclosure describes techniques that combine in-band network telemetry (INT) with the software-defined network (SDN) controller used by the cloud platform to mitigate HPC latency. INT gathers hardware-level information about buffer and queue utilization. Such information is used by the cloud SDN controller to make changes to the virtual environment. The SDN controller can directly affect decisions of the HPC master node relating to the assignment of tasks to worker nodes. The techniques leverage the deep, hardware-level information about potential latency issues signaled by buffer accumulations to inform cloud-HPC scheduling algorithms.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS