Voice messages, while faster to input, are inconvenient if the message recipient is in a situation where they are not able to listen to the message. This disclosure describes techniques that, with permission, automatically analyze incoming voice messages and provide visual information within a messaging application to enable the user to understand the contents of the incoming voice messages at a glance. The glanceable visual information is derived by generating concise text of the message content, determining the message sentiment, analyzing the prosody and emotion of the voice, and adding emojis that correspond to the content, prosody, and emotion. The visual information is provided attached to the voice message in a user interface, which enables recipients to know about message contents immediately, without having to listen to the voice message. The techniques can thus improve the convenience and user experience of interacting with incoming voice messages within any application or platform.

