Reinforcement learning techniques are provided that generate initial training data to refine a machine-learning model (e.g., a neural network). The techniques allow a machine-learning system to make a correlation between inputs and outputs, and analyze generated outputs to produce new training data that will produce better outputs. The techniques start by profiling a system (e.g., an operating system) and a mock model of the machine-learning model that provides random outputs or outputs according to a simple heuristic. The techniques can then use the outputs to adjust heuristics of the system to obtain a wide variety of performance reactions of the system. Evaluation of the profiling data of the system can be performed to distinguish good outputs from bad outputs according to a chosen performance metric. In some cases, initial training data is created based on the good outputs. A new machine-learning model can then be trained with the initial training data to produce better outputs than the outputs produced by the mock model. With each iteration of the techniques, the machine‑learning model produces better outputs. Generally, these techniques are repeated until outputs of satisfactory quality are achieved.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Schott, Mark, "Reinforcement Learning as a Basis for Optimization of Operating Systems", Technical Disclosure Commons, (November 28, 2018)