Abstract
This disclosure describes a machine learning-based system that automatically classifies web content represented as JSON documents to determine suitability for print optimization. The system uses configurable feature extraction from JSON structure, trains a LightGBM classifier with hyperparameter optimization, and deploys the model in ONNX format for efficient client-side inference. This approach prevents unnecessary cloud service calls for unsuitable content, reducing processing errors and operational costs.
Creative Commons License

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.
Recommended Citation
INC, HP, "A System for Classifying Web Content for AI-Driven Print Optimization", Technical Disclosure Commons, (October 10, 2025)
https://www.tdcommons.org/dpubs_series/8710