Visual stories are a popular format for online storytelling in many contexts. Visualizing text often helps a reader understand the story. There are tools that currently exist which can generate multimedia based on user input text. However, the generated media may not always match the text input and may include images that are diverse in style. This disclosure describes techniques that use generative artificial intelligence to automatically generate images, animation, and audio based on user input text and preferences. The generated assets are combined into a visual story that has a coherent visual theme and that can help viewers understand text-based content better.

