Abstract

Restaurant menu images can be utilized to automatically obtain structured data about dish names, prices, etc. However, the raw optical character recognition (OCR) output suffers from low quality and OCR techniques do not have sufficient ability to adapt to the diversity in language and design of restaurant menus. A language model can be used together with OCR to identify dish names and other content through a named entity recognition (NER) process. However, this is not scalable due to the requirement of a large, labeled dataset across languages and countries. This disclosure describes the use of a multimodal large language model (LLM) to automatically generate digital structured menus from restaurant menu photographs. The use of a multimodal large language model enables automatic creation of structured digital menus that include price, description, ingredients, etc. without the requirement of a large amount of labeled data and can also overcome difficulties associated with low quality photographs. The capabilities of multimodal LLMs are leveraged by formulating the task of menu understanding from the user-provided photos as a multimodal information extraction or a visual question answering task which fits naturally with the framework of multimodal pretrained large models.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Lin, Bo; Yu, Jinyang; Nario, Jorge; and Yang, Xinru, "Automatic Structured Menu Extraction from Menu Photographs", Technical Disclosure Commons, (November 07, 2023)
https://www.tdcommons.org/dpubs_series/6398

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Automatic Structured Menu Extraction from Menu Photographs

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Automatic Structured Menu Extraction from Menu Photographs

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information