Abstract

Automatically generating audiobooks from text that features a dialog between two or more individuals faces a challenge in identifying the character that is speaking each line of quoted dialogue. Accurate character identification is particularly important as it directly impacts the clarity and comprehensibility of the generated audiobook. Producing audiobooks using manual annotations is not scalable, is prone to delay, and is costly. This disclosure describes the use of generative artificial intelligence (gen AI) techniques, including large language models (LLMs) and/or multimodal generative models to automatically identify speech in narrative written text and attribute it to particular characters. The LLM is provided the input text, along with example prompts (or pre-tuning) and tasked with generating annotated output text that associates words of the input text with particular characters. The structured text thus obtained can be used to automatically generate audiobooks or other spoken content.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS