User interaction with assistive devices and applications often includes use of speechbased interfaces where contents of the user’s speech serve as input. However, in real-life human conversations, people often use additional information beyond the content of speech, such as tone, pace, etc., to infer attributes of the speaker’s emotional state and intention, and make nuanced adjustments to the style and content of their own speech. With user permission, the techniques of this disclosure build and utilize a machine learning model to infer mood based on an analysis of user speech. The inferred mood information is then applied to tailor the content and style of the speech output of a voice-based assistive device or application. The techniques enable user interaction that more closely resembles real-world human conversations.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.