Reviewing large document collections is an activity that arises commonly in certain professional contexts such as investigative journalism. Such document collections can arise in many use contexts such as investigative journalism; academic research; litigation, arbitration or other legal context; audit; research using document archives; etc. The collections may include a large number of documents, including scanned images of documents or handwritten documents, and are often devoid of structure or organization. This makes it difficult to sift through such collections and identify important pieces of information. This disclosure describes a tool that enables easier access to such collections and features that support review and research based on such document collections. Automated techniques such as optical character recognition, entity recognition, indexing, etc. are utilized to process the document collection to index the documents and to generate timelines, connection graphs, or other views on the collection. A user interface is provided that enables users to search the collection, view event timelines, make annotations, take notes, and collaborate with others. The described techniques facilitate sensemaking and can help surface latent insight.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Efron, Niv; Chirpich, Daniel; Albrecht, Jim; Rabin, Roni; Ronen, Guy; Yoffe, Amos; Urbach, Shlomo; and Shoham, Tali Rosen, "Interactive Tool for Researching Large Unstructured Document Collections", Technical Disclosure Commons, (April 26, 2021)