This publication describes systems and techniques for multimodal content editing that enables a user to edit or extend content in a given modality using a variety of different input modes while matching the underlying format of the content and while preserving the original input. When the user uses an input mode different from the underlying format of a piece of content to provide input at a computing device to edit or extend the piece of content, the computing device may convert the input provided by the user from the input mode to the underlying format and may edit the piece of content based on the converted input. For example, if the user uses voice input to edit a text document, the computing device may convert the voice input into text and may include the converted text in the text document. In another example, if the user uses text input to edit an audio recording, the computing device may convert the text input into audio using a text-to-speech technique and may include the converted audio in the audio recording. The computing device may also preserve the provided input in its original form, such as by storing the provided input at the computing device, and may associate the stored provided input with the converted input so that the user may be able to refer back to the originally provided input. For example, if the computing device converts text input into audio for inclusion in an audio recording, the computing device may store the text input and may link the audio recording to the text input so that a user may be able to view the text input while listening to the audio recording.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Sharifi, Matt, "MULTIMODAL CONTENT EDITING", Technical Disclosure Commons, (May 25, 2021)