D ShinFollow


Many video streaming platforms enable users to save videos into user-defined libraries for easy navigation, access, and future viewing. However, in current video streaming platforms, assigning a video to a library or playlist is a manual and time-consuming task. This disclosure describes the use of a language model and visual encoding techniques to automatically assign videos to custom user-defined libraries. The techniques leverage a visual language model and few-shot instruction tuning for video-to-library assignment. When a user initiates addition of a video to their account, metadata about the video are token-concatenate with per-library names or topics from the user’s video libraries, accessed with permission. A visual encoding is generated from representative frames and provided as an input to a language model. Based on the instruction tuning and the visual encoding, the language model generates a description in generative text (tokens) that is descriptive of the fit of the current video to one or more user-defined libraries. The generated description is processed by a library semantic parser that converts the text into a machine-interpretable format, e.g., few-hot classification output of the library assignment vector that indicates the assignment of the video to specific target libraries based on the matched metadata and semantic content.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.