Abstract

Modern generative applications often facilitate voice-based interactions where models are used to identify user tone and paralinguistics. However, non-verbal communication from the model is typically not reflected visually to the user, leading to a disconnect in the natural flow of dialogue. This disclosure describes a method for visualizing model expression and emotion through an animated interface affordance. Emotional intent and expressivity are captured as metadata tags, including intensity, confidence, and timing. These tags are then used to generate synchronized animations in a non-anthropomorphized visual shape. This visual representation occurs both while the user speaks and during the model's response. The primary purpose of this technology is to augment verbal communication with non-verbal cues, such as empathy or excitement. By representing these intents visually, the interaction more closely mirrors human-to-human conversation and improves the perceived alignment between the user and the generative application.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS