Abstract
The present disclosure provides systems and techniques to optimize resource usage in Apache Kafka by detecting and merging topics carrying highly similar or correlated data streams. The system consists of a Topic Similarity Detection Engine that analyzes data across Kafka topics using schema comparison, content similarity, and workload characteristics. Once topics with highly correlated data are identified, the system suggests or automatically merges these topics through a Merging Decision and Topic Merge Executor Module. The merging process is executed by dynamically adjusting partitions, consumer, and producer mappings, and ensuring consistency across merged topics. By reducing the number of partitions and optimizing throughput, the system enhances Kafka’s performance, scalability, and resource efficiency while minimizing operational complexity. The system is particularly useful in large-scale Kafka deployments where data streams from different topics overlap significantly.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Roy, Alok, "INTELLIGENT TOPIC MERGING FOR HIGHLY CORRELATED DATA IN KAFKA", Technical Disclosure Commons, (January 08, 2025)
https://www.tdcommons.org/dpubs_series/7714