Peter DanenbergFollow


Machine learning tasks, such as natural language processing evaluations, are assigned to server resources by dividing the tasks into a number of queues. In such an environment, different tasks may have different priorities, e.g., production tasks may be higher priority than research tasks. Task scheduling mechanisms such as weighted round-robin or batch scheduling are approximate, non-deterministic, and cannot be tuned based on the real time availability of resources. This disclosure describes the use of a single resource optimization parameter, determined based on real time resource availability to establish quantiles based on the exponential distribution for the single parameter. Task queues are updated based on the single parameter and the established quantiles. By updating the single parameter based on real-time availability of resources and workloads, the described techniques provide a tunable mechanism for task scheduling.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.