Abstract
Field
Computer-implemented systems for monitoring whether identified knowledge deficits recur after remediation. This publication describes an approach that uses categorical label comparison rather than dense vector embedding similarity computation.
Background
Once an employee completes remediation for a knowledge deficit, organizations want to know if the same deficit comes back. One way to do this is with embedding-based approaches that compute cosine similarity between dense vector representations of subsequent communications and stored gap fingerprints. The approach described here takes a different path: it compares classification labels rather than embedding vectors. The computational cost is much lower, though discrimination capability is reduced.
Technical Description
For each resolved knowledge gap record, the system creates a recurrence monitoring object. This object contains: monitoring_id, gap_record_id, employee_id, taxonomy_domain, intent_category (the classification label the system assigned to the original gap-triggering communication), entity_set (named entities extracted from that communication), keyword_set (weighted keywords extracted via TF-IDF or a comparable term-weighting method), monitoring_window_start, monitoring_window_end, and comparison_count.
When a subsequent inbound communication arrives from the monitored employee, and it falls within the monitored taxonomy domain during the active monitoring window, the system runs through a four-step matching sequence:
Step 1. Intent category match: The system classifies the new communication using the same intent classification model it used originally, then compares the resulting label to the stored intent_category. If the labels don't match, the communication is marked unrelated and monitoring continues. No further evaluation happens for that particular message.
Step 2. Entity overlap: When intent categories do match, the system extracts named entities from the new communication and computes Jaccard similarity between the new entity set and the stored entity_set. The formula is straightforward: |A intersection B| / |A union B|, where A is the stored set and B is the newly extracted set.
Step 3. Keyword overlap: The system computes a weighted keyword overlap score. It takes the TF-IDF weights for keywords that appear in both the stored keyword_set and the newly extracted keywords, sums them, and divides by the total TF-IDF weight sum of the stored set.
Step 4. Recurrence determination: A composite recurrence score combines entity overlap (default weight: 0.4) and keyword overlap (default weight: 0.6). When this composite exceeds a configurable recurrence threshold (default: 0.65), the system flags the communication as recurrence and transitions the gap record to a recurrence_detected state.
The monitoring object has a finite lifecycle. If the monitoring window expires (default: 90 days) without a recurrence flag, or if the system hits the maximum comparison count (default: 100 evaluations), the monitoring object deactivates. The system records a non-recurrence determination including the monitoring_id, total comparisons performed, the highest composite score observed, and the deactivation timestamp.
An optional threshold decay function is available. The recurrence threshold can decrease over time using a linear function: threshold_current = threshold_initial minus (decay_rate times days_elapsed). A configurable floor (default: 0.40) prevents the threshold from dropping below a minimum value. The idea is that recurrence becomes less expected as time passes.
Distinguishing Characteristics
This system works with categorical label comparison: intent category string matching, Jaccard set similarity over entities, and TF-IDF weighted keyword overlap. No embedding model is needed. No high-dimensional vectors get stored. The cost per evaluation is substantially lower than computing dot products across thousands of dimensions. But the tradeoff is real. Categorical matching cannot catch semantic recurrence when the employee uses completely different words and entities to describe the same underlying knowledge gap. It also cannot tell the difference between true recurrence and successful remediation absorption, because it only compares against one reference (the gap fingerprint). There is no simultaneous comparison against both a gap fingerprint and a remediation fingerprint.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Davis, Kenneth, "Categorical Label Matching for Post-Remediation Knowledge Deficit Recurrence Monitoring", Technical Disclosure Commons, (March 25, 2026)
https://www.tdcommons.org/dpubs_series/9623