Defensive Publications Series

RLMF Training of LLM for Optical Character Recognition Tasks

Abstract

It is possible to provide an arbitrary image (e.g., in PDF, JPG, or any other format with handwritten or printed text in arbitrary orientations and languages) to an LLM and instruct it to extract text from the image. While an LLM can perform this task, the performance can be unsatisfactory and variable. This disclosure leverages reinforcement learning with machine feedback (RLMF) to improve the accuracy of an LLM when performing image-to-text conversion tasks. Per the techniques, known documents (where the groundtruth text content is known) and/or generated documents with text in a variety of fonts (and other parameters, such as script, orientation, size, etc.) are turned into images. An LLM is tasked with extracting text from the images. The extracted text is compared with the groundtruth to determine the number of mistakes. A machine-based reward model is created that trains the model based on the number of mistakes.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Weisz, Ágoston and Salama, Khalid, "RLMF Training of LLM for Optical Character Recognition Tasks", Technical Disclosure Commons, (June 11, 2025)
https://www.tdcommons.org/dpubs_series/8224

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

RLMF Training of LLM for Optical Character Recognition Tasks

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

RLMF Training of LLM for Optical Character Recognition Tasks

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information