Abstract

Machine-learning models are consuming an increasing fraction of the world's computing resources. The cost of computing inferences with some machine-learning models is extremely high. Provisioning computing resources for peak performance, e.g., high availability and quality of service, entails the creation of headroom for traffic spikes (increases in demand) and preparing for the possibility of outages (decreases in capacity). Executing computer applications that utilize machine-learning models, also known as machine-learned models, can require significant capital and operational expenses.

This disclosure describes techniques to optimize use of computing resources for a machine-learning model. Multi-resolution models and/or models with recurrence are utilized. These models can compute inferences to varying degrees of quality (resolution). The multi-resolution models are served in an elastic manner such that a model of a resolution that fits both the available computing resources and is utilized to compute inferences.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Olston, Christopher; Fiedel, Noah; Chi, Ed H.; and Beutel, Alexander, "Elastic multi-resolution model-serving to compute inferences", Technical Disclosure Commons, (September 15, 2017)
https://www.tdcommons.org/dpubs_series/668

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Elastic multi-resolution model-serving to compute inferences

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Elastic multi-resolution model-serving to compute inferences

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information