Abstract

In domains such as automated speech recognition and search, pre-collected data is often used to understand user input and map it to an intention. In such systems, the context of the user query is important in arriving at the correct response. However, in many use cases, the amount of context to resolve the concept is limited. This disclosure describes data augmentation techniques that increase the amount of data available to understand user input. The techniques described herein can be used to automatically generate data that is adjacent to known correct observations and distinct from most incorrect observations. The techniques described in this disclosure enable growing a seed data set into a much larger corpus.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Aggarwal, Vikram; Vasilevski, Yuri; Sodhi, Sukhdeep Singh; and Jash, Ambarish, "Simulated Alternatives for Data Augmentation in Machine Learning", Technical Disclosure Commons, (August 31, 2022)
https://www.tdcommons.org/dpubs_series/5348

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Simulated Alternatives for Data Augmentation in Machine Learning

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Simulated Alternatives for Data Augmentation in Machine Learning

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information