Extraction of dish names from user-provided content such as food photographs and captions, restaurant reviews, and other free-form text is a challenging task. Rule-based approaches are difficult to maintain and improve. Pattern matching against a predefined dictionary often suffers from low recall. Conventional machine learning models require large amounts of labeled data to perform named entity recognition (e.g., to recognize dish names) which is often costly and does not scale well across multiple languages and countries. This disclosure describes the use of a multimodal large language model to automatically extract dish names from user-generated content such as food photographs and associated free-form text such as tags, captions, etc. Dish name extraction from the user-provided tags can be formulated as an open vocabulary dish name entity recognition and discovery task, which fits naturally with the framework of pre-trained LLMs, and leverages the model capability in handling multilingual, multicultural text understanding.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.