Abstract

Generally, the present disclosure is directed to optimizing use of computing resources in a system. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict task allocation for a job serving a plurality of machine-learned models based on current system state and queries per second (QPS) data for the plurality of models. Alternatively, the tasks can be allocated according to one or more rules (e.g., a new task is allocated to a job until the compute usage for the job falls below a scaling threshold). Thus, the systems and methods of the present disclosure are able to efficiently serve a mix of high-QPS and low-QPS machine-learned models at low latency with minimal waste of compute resources (e.g., CPU, GPU, TPU, etc.) and memory (e.g., RAM).

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Ross, Steven J.; Olston, Christopher; and Fiedel, Noah, "Automatically Scaling Multi-Tenant Machine Learning", Technical Disclosure Commons, (December 12, 2017)
https://www.tdcommons.org/dpubs_series/949

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Automatically Scaling Multi-Tenant Machine Learning

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Automatically Scaling Multi-Tenant Machine Learning

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information