Abstract

This disclosure describes techniques for detection of text cutoff in captured images of documents that include text. Optical character recognition (OCR) is applied to an input image. A bounding box for each text character (OCR symbol) is determined, defined by x and y coordinates of its four corners. A feature vector is determined and utilized to represent the spatial location of OCR symbols extracted from the image. The feature vector is constructed based on OCR symbol coordinates and is provided to a trained classifier to determine a class label for the input document, indicating whether the document includes text cutoff. Optionally, the area of an image that includes text is automatically determined and utilized to limit the area of the image utilized for downstream document processing.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Lahiri, Avisek; Yao, Xinwei; and Yu, Tianli, "Text Cutoff Detection for Document Images", Technical Disclosure Commons, (May 01, 2022)
https://www.tdcommons.org/dpubs_series/5110

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Text Cutoff Detection for Document Images

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Text Cutoff Detection for Document Images

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information