Near-duplicate document detection is another service that CAAT provides, and it takes two forms. First, there is a standard text-based version of near-duplicate detection. This looks word-by-word at the text, and calculates the percentage of shared text between two documents, grouping them accordingly. Second, CAAT has a unique, conceptual version of near-duplicate detection. This looks at documents conceptually, using a concept index, and employs clustering algorithms to gather like documents together.
In both cases, the groupings help ensure consistent document review by having similar documents presented to the same reviewer at the same time. In this way, the groupings are usually supplemental to conceptual clusters and email threads. Additionally, text difference highlighting can be used to reveal the differences between documents in the same group, particularly for textual near-duplicate detection.

Copyright 2012 Content Analyst Company, LLC All rights reserved.