Abstract

Disclosed herein is a mechanism for generating and providing captions based on speaker identification. In some instances, the mechanism can be used to determine intervals where a single-speaker is speaking within particular image frames to assist the task of manual captioning or manual transcription. In some instances, the mechanism can be used to provide an awareness or indication of speaker turn-changes in captions, where a particular word or phrase can be grouped by particular speaker. In some instances, the mechanism can be used to provide an awareness or indication of speaker position and identity information corresponding to the speaker.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS