Systems and methods described herein allow for using audio overlay to provide actionable audio extensions with primary audio content. A data processing system can receive a first input audio signal from a client device of a user, and identify a user request from the first input audio signal. The data processing system can generate an audio response to the user’s request, and one or more audio extensions for presenting with the audio response. The data processing can then transmit the audio response and the audio extensions to the client device. The data processing system may receive a second audio signal indicative of an interaction with one of the audio extensions. In response, the data processing system can execute an operation associated with the interaction provided in the second audio signal.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.