In domains such as automated speech recognition and search, pre-collected data is often used to understand user input and map it to an intention. In such systems, the context of the user query is important in arriving at the correct response. However, in many use cases, the amount of context to resolve the concept is limited. This disclosure describes data augmentation techniques that increase the amount of data available to understand user input. The techniques described herein can be used to automatically generate data that is adjacent to known correct observations and distinct from most incorrect observations. The techniques described in this disclosure enable growing a seed data set into a much larger corpus.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Aggarwal, Vikram; Vasilevski, Yuri; Sodhi, Sukhdeep Singh; and Jash, Ambarish, "Simulated Alternatives for Data Augmentation in Machine Learning", Technical Disclosure Commons, (August 31, 2022)